The Un-Official Proxomitron Forum
Path Blocking with Metacharacters - Printable Version

+- The Un-Official Proxomitron Forum (https://www.prxbx.com/forums)
+-- Forum: Forum Related (/forumdisplay.php?fid=37)
+--- Forum: Privoxy (/forumdisplay.php?fid=49)
+--- Thread: Path Blocking with Metacharacters (/showthread.php?tid=2226)

Pages: 1 2


Path Blocking with Metacharacters - Faxopita - Aug. 10, 2015 10:19 PM

You can either insert the following content in the user.action file or in a separate file. If the former case, make sure it heads the rest of the file's content; if used as a separate .action file, add its file name to Privoxy's config(.txt) file, but make sure the added section comes before the call to the user.action file. Note: exceptions should always come after generic rules as Privoxy reads “from top to bottom”.

Code:
{{alias}}
  +enhanced-block = +block{Restrained Access} +limit-connect{80}

{ +enhanced-block }
#
# Domaines fournis par "Artisan Numérique"
# Source: http://artisan.karma-lab.net/premunir-spywebs-privoxy
#
   www.google-analytics.com/*
   *.xiti.com/*
   *.hit-parade.com/*
   *.toutlemondeenblogue.com/*
   visit.geocities.com/*
   *.yimg.com/*
   *.cybermonitor.com/*
   *.overture.com
   *.mybloglog.com
   *.webtrendslive.com
   adnext.fr
   *.quantserve.com/*
   stats.wordpress.com/*
   *.ixnp.com/*
   *.statcounter.com/*
   *.extreme-dm.com/*
   *.googlesyndication.com/*
   www.typepad.com/t/stats*
   *.sitemeter.com/*
   myustats.com/*
   *.reinvigorate.net/*
   *.clicktale.net/*
   *.hittail.com/*
   */xiti.js
   cetrk.com/*

# Régies publicitaires/Plate-formes d'annonces.
#
   .netavenir.com
   .turn.com
   .bluestreak.com
   .criteo.com
   .blogbang.com/demo/js/blogbang_ad.php\?id=
   /.*\/microsoft_adcenterconversion\.js
   *.*.marketingsolutions.yahoo.com/*
   www.googleadservices.com/*
   .fmpub.net
   pubsrv.allopass.com/*
   .comclick.com
   .regieci.com
   .allo-audience.fr
   .audientia.net
   .clickintext.com
   .clickintext.net
   .intellitxt.com
   payperpost.com

# Enregistreurs/Relecteurs
#
   .clicktale.*
   cetrk.com/*
   *.robotreplay.com/*
   /.*/clickheat.js

# Médiamétrie/Traçage
#
   .estat.com
   .sitemeter.com
   .w3counter.com
   .reinvigorate.net
   /.*\/webanalytics
   .opentracker.net
   .weborama.*
   .quantserve.com
   .performancing.com
  
   cnbc.com.ToutLeMondeEnBlogue.com
   stats.wordpress.com
   *.technorati.com/*
   embed.technorati.com/linkcount
   /.*xiti.js
   *.getclicky.com/*
   *.iminr.com/*
   .netprofitblueprint.com/*
   .converdge.com
   .cybermonitor.com
   my.blogitexpress.com/.*\.js
   www.atoomic.com/js/*
   .clustrmaps.com/counter/*
   .trackalyzer.com
   log.tf1.fr

# Page Ranking
#
   www.free-pagerank.com/fcgi-bin/alive_js.fcgi.*    
   external.wikio.fr/blogs/top/getrank
   www.pagerank.fr/pagerank-actuel.gif

# Loggers un peu trop traçeurs.
#
   .mybloglog.com

# Traçage des flux (feeds)
#
   feedjit.com/*

# Special Google
#
   /.*utm.js
   /.*stat.*\.js
   /.*\/urchin.js
   /.*s_code.js
   /.*google-analyticator.*

# Nuisances
#
   .snap.com/*
   .ixnp.com/*
   .twitter.com/*
   .webreseau.com
   *.devfr.net/*
   badge.facebook.com/badge/*
   .blogbar.org

It's a good start. More path blocking coming soon…

If you like my contribution, please offer me a cup of tea via this Bitcoin address…

Code:
1HxxviDA5MybpewcyAmJ4JhfmYF9AE53xv

May your fighting spirits combined put the tracking industry and the super greedy ad tech down.


RE: Path Blocking Using Wildcard Characters - whenever - Aug. 11, 2015 02:15 AM

(Aug. 10, 2015 10:19 PM)Faxopita Wrote:  
Code:
www.google-analytics.com/*
   *.xiti.com/*
   *.hit-parade.com/*
   *.toutlemondeenblogue.com/*
   visit.geocities.com/*

Privoxy uses "Regular Expressions" for matching the path portion. I don't think the single "*" after the slash is valid RE.

If you mean to ".*", they can just be omitted.


RE: Path Blocking Using Wildcard Characters - cattleyavns - Aug. 11, 2015 04:09 AM

Just:

Code:
www.google-analytics.com
   *.xiti.com
   *.hit-parade.com
   *.toutlemondeenblogue.com
   visit.geocities.com

is enough, unless we want:
Code:
www.google-analytics.com/.*?ga\.js

Plus:
We can match webbugs-like URL with this rule:
Code:
/.{300}

Example: scorecardresearch.com/.....
If a url with more than 300 characters will get blocked
This rule might cause false positive. Webbugs is the most dangerous tracking method, I don't think we can block them completely.


RE: Path Blocking Using Wildcard Characters - Faxopita - Aug. 11, 2015 10:20 AM

Thanks to both of you. Indeed, there were some crude syntax errors that I haven't revised for two years. Below the revised version based upon your suggestions.

Code:
{ +block }
#
# Provided by "Artisan Numérique"
# Source: http://artisan.karma-lab.net/premunir-spywebs-privoxy
#
   .google-analytics.com
   .xiti.com
   .hit-parade.com
   .toutlemondeenblogue.com
    visit.geocities.com
   .yimg.com
   .cybermonitor.com
   .overture.com
   .mybloglog.com
   .webtrendslive.com
   .adnext.fr
   .quantserve.com
   .stats.wordpress.com
   .ixnp.com
   .statcounter.com
   .extreme-dm.com
   .googlesyndication.com
   .typepad.com/t/stats.*
   .sitemeter.com
   .myustats.com
   .reinvigorate.net
   .clicktale.net
   .hittail.com
   /xiti.js
    cetrk.com

# Ad Agencies/Networks
#
   .netavenir.com
   .turn.com
   .bluestreak.com
   .criteo.com
   .blogbang.com/demo/js/blogbang_ad.php\?id=
   /(.*/)?microsoft_adcenterconversion\.js
   .marketingsolutions.yahoo.com
   .googleadservices.com
   .fmpub.net
   .pubsrv.allopass.com
   .comclick.com
   .regieci.com
   .allo-audience.fr
   .audientia.net
   .clickintext.com
   .clickintext.net
   .intellitxt.com
   .payperpost.com

# Recorders
#
   .clicktale.*
   .cetrk.com
   .robotreplay.com
   /(.*/)?clickheat.js

# Audicence Measurement/Tracking
#
   .estat.com
   .sitemeter.com
   .w3counter.com
   .reinvigorate.net
   /(.*/)?webanalytics
   .opentracker.net
   .weborama.*
   .quantserve.com
   .performancing.com
  
    stats.wordpress.com
   .technorati.com
   /.*xiti.js
   .getclicky.com
   .iminr.com
   .netprofitblueprint.com
   .converdge.com
   .cybermonitor.com
   .blogitexpress.com/.*\.js
   .atoomic.com/js
   .clustrmaps.com/counter
   .trackalyzer.com
   .log.tf1.fr

# Page Ranking
#
   .free-pagerank.com/fcgi-bin/alive_js.fcgi.*    
    external.wikio.fr/blogs/top/getrank
   .pagerank.fr/pagerank-actuel.gif

# Tracker Logger
#
   .mybloglog.com

# Feed Tracking
#
    feedjit.com

# Special Google
#
   /.*utm.js
   /.*stat.*\.js
   /.*/urchin.js
   /.*s_code.js
   /.*google-analyticator.*

# Nuisances
#
   .snap.com
   .ixnp.com
   .twitter.com
   .webreseau.com
   .devfr.net
   .facebook.com/badge
   .blogbar.org

# Webbugs
#
   /.{300}

Note: sometimes I feel more comfortable writing /(.*/)? instead of /.*/


RE: Path Blocking Using Wildcard Characters - cattleyavns - Aug. 11, 2015 10:56 AM

My lastest experimental about webbugs blocking, this filter will check if a URL contains something like:

Code:
http://12.123/2?=tttttttttttttkkkk&1?=[color=#FF0000][b]0C92C3423CA7811A61745F7ED2F6A01[/b][/color]3
Demo Regex101: https://regex101.com/r/vX3cS9/1

Some websites generate MD5 (32 chars) or SHA1 (40 chars) based on our information (user-agent, plugins, date and time, timezone...) using Javascript and then send to their server and log our information, so this is a very simple method to block their tracking method, this is a variant of webbugs.

Code:
/.*?=(?:.{32}|.{40})(?:$|&)

Like my /.{300} above, use this filter carefully.


RE: Path Blocking Using Wildcard Characters - Faxopita - Aug. 11, 2015 12:11 PM

(Aug. 11, 2015 10:56 AM)cattleyavns Wrote:  
Code:
/.*?=(?:.{32}|.{40})(?:$|&)

Like my /.{300} above, use this filter carefully.

For example, I had to protect wikipedia.org through…
Code:
{ +block{Web Beacon} }
   /.{300}
   /.*?=(?:.{32}|.{40})(?:$|&)

{ -block{Web Beacon} }
   .wikipedia.org

Result returned after using Wikipedia search field is blocked otherwise.


RE: Path Blocking Using Wildcard Characters - Faxopita - Aug. 11, 2015 04:14 PM

(Aug. 11, 2015 10:56 AM)cattleyavns Wrote:  Some websites generate MD5 (32 chars) or SHA1 (40 chars) based on our information (user-agent, plugins, date and time, timezone...) using Javascript and then send to their server and log our information, so this is a very simple method to block their tracking method, this is a variant of webbugs.

Good Lord! They even use hashing to spy on us!


RE: Path Blocking Using Wildcard Characters - Faxopita - Aug. 12, 2015 10:46 AM

(Aug. 11, 2015 10:56 AM)cattleyavns Wrote:  
Code:
/.*?=(?:.{32}|.{40})(?:$|&)

Like my /.{300} above, use this filter carefully.

Dear Cattleyavns,

this request has been blocked according to the above rule:
Code:
http://ixquick.com/js/retina_mainpage.js?v=b6be3321f0250cbebf37ebb98b546e3c

Is it that kind of hash you were talking about that may act as a fingerprint?

Good day to all readers!


RE: Path Blocking Using Wildcard Characters - cattleyavns - Aug. 12, 2015 12:23 PM

(Aug. 12, 2015 10:46 AM)Faxopita Wrote:  
(Aug. 11, 2015 10:56 AM)cattleyavns Wrote:  
Code:
/.*?=(?:.{32}|.{40})(?:$|&)

Like my /.{300} above, use this filter carefully.

Dear Cattleyavns,

this request has been blocked according to the above rule:
Code:
http://ixquick.com/js/retina_mainpage.js?v=b6be3321f0250cbebf37ebb98b546e3c

Is it that kind of hash you were talking about that may act as a fingerprint?

Good day to all readers!

I don't think so, as far as I know, iquick isn't a evil site and this is a js file so I think it is safe.


RE: Path Blocking Using Wildcard Characters - Faxopita - Aug. 12, 2015 12:51 PM

On the other end, this one…
Code:
http://plus.lefigaro.fr/fpservice/user_graph?appid=81325031242245596367369127435013&remote_id=261707&jsonp_callback=window.fpAuth.linksCheckIfUserExistsCallback
looks very suspicous… It's been blocked as well, but did not prevent me from reading the related article and the web page is not broken.


RE: Path Blocking Using Wildcard Characters - cattleyavns - Aug. 12, 2015 03:26 PM

(Aug. 12, 2015 12:51 PM)Faxopita Wrote:  On the other end, this one…
Code:
http://plus.lefigaro.fr/fpservice/user_graph?appid=81325031242245596367369127435013&remote_id=261707&jsonp_callback=window.fpAuth.linksCheckIfUserExistsCallback
looks very suspicous… It's been blocked as well, but did not prevent me from reading the related article and the web page is not broken.

It's okay too, I think we should remove my second and only use /.{300} .
The second is not really helpful, to be honest.


RE: Path Blocking Using Wildcard Characters - whenever - Aug. 13, 2015 10:07 AM

(Aug. 12, 2015 10:46 AM)Faxopita Wrote:  
Code:
http://ixquick.com/js/retina_mainpage.js?v=b6be3321f0250cbebf37ebb98b546e3c

Is it that kind of hash you were talking about that may act as a fingerprint?

That's to prevent your browser from using an outdated cached version of the js file. It's not for tracking and it's safe to let it go.


RE: Path Blocking Using Wildcard Characters - Faxopita - Aug. 13, 2015 02:10 PM

Hello Privoxy users,

I created this path blocking file. It has been, so far, very successful—for me, at least—in blocking any suspicious path that could neither be recognised by the converted hosts file nor filtered properly by my .filter files. Often, I felt very lucky to have those loaded path patterns to block some nasty trackers. Anyone is warmly welcome to make this path blocking file far better than it is today. For your info, I rarely touch this file whenever I encounter something that shouldn't be blocked. When I have a problem, it's mainly a .filter file-related issue. Thus, the need to create an exception. Of course, if you visit a news article talking about, for example, a social network, it will be blocked, but you can force Privoxy to let you through the website!

Code:
{ +block{Restrained Access: Declined Paths} }
#
# Paths
#
  /(.*/)?bons?-?plans?

  /(.*/)?core/metrics?/
  /(.*/)?core(/ux/|-)

  /(.*/)?.*(campaign|comm?ercial|marketing|parte?n(er|aire?)|promo|social).*
  /(.*/)?.*(anti-?spam|bug.?snag|detect[^/]*browser|market.?place|zoneid=).*
  /(.*/)?.*(ads?.?loader|browser[^/]*detect|deal(_|-|s)|le.?guide|metrics).*
  /(.*/)?.*(ip=(\.?[0-9]+){4}|retarget|((sm|u)id|referr?er|server.?time)=).*
  /(.*/)?.*(iframe|use?r.?(agent|g?u?id)|lat.?lo?ng|(pub.?id|time.?zone)=).*
  /(.*/)?.*((ever|flash|super|mbie).?cookie|(language|resolution|screen)=).*
  /(.*/)?.*(ad.?module|aff?ill?iate|(browser|country)=|polls?|reff?err?al).*
  /(.*/)?.*(analytics?|(c|p)id=|finger.?print(s|e(d|r)|ing)?|interstitial).*
  /(.*/)?.*((brand|charset|cid|isp|MAC(.?add?r(esse?)?)|model|signature)=).*
  /(.*/)?.*(logge(r|d)|mailchimp|pixel|product.?ads?|track(er|ing)?).*
  /(.*/)?.*(-ads-?|ad.?manager|live.?chat|splash.?page|subscribe).*
  /(.*/)?.*((caid|vpid)(-|_|=|\.)).*

  /(.*/)?.*(chartbeat|cross.?sell|facebook|forester|mobiquo|sessioncam|yahoo).*
  /(.*/)?.*(brightcove|googleads|obelusmedia|tag(commander|man)|xiti|zendesk).*
  /(.*/)?.*(acymailing|bazaarvoice|boomr|cooladata|olark|omniture|trustpilot).*
  /(.*/)?.*(blueconic|bluekai|breadcrumb|freshdesk|dmptag|usabilla|nugg\.?ad).*
  /(.*/)?.*(adchemix|cedexis|segmentify|optincrusher|smartad|visual.?revenue).*
  /(.*/)?.*(adrum|gigya|hapyak|konverto|krux|linkedin|openx|parsely|proximic).*
  /(.*/)?.*(clickfunnel|disqus|google?.?plus|marocrank|optimizely|socket\.io).*
  /(.*/)?.*(captify|geo.?(ip|loc(at(e|ion|or))?|(profile?|service)s?|=)).*
  /(.*/)?.*(runcpa).*

  /(.*/)?sponsor(e?(d|s))?/
  /(.*/)?widgets?/social.*counts?/

{ +block{Restrained Access: Declined Javascript} +handle-as-empty-document }
#
# .JS Files
#
  /(.*/)?(java)?scripts?/xtcore.*\.js

  /(.*/)?(counts?|rokmedia(quer(y|ies))?|silverlight|tapestry.messages?|xtcore)\.js

  /(.*/)?.*(audience|boomerang|conversion|nagad|recomm?end(ation)?|rtb|zepto).*\.js
  /(.*/)?.*(ad.?bloc?k?|advert).*\.js

  /(.*/)?.*(analy(s|z)er?|chat.?box|counter|mouse|profile?|survey|sso|tag)[^/]*\.js
  /(.*/)?.*(click?|compteur|crm|monitor|radar)[^/]*\.js
  /(.*/)?.*(streamsense)[^/]*\.js

  /(.*/)?.*([^a-z]*ads|hitometer|injection|plusone|pub|social.*(pop-?up|tag)s?)\.js

  /(.*/)?([a-zA-Z0-9]+(-|_|\.))?i?stats?[^/]*\.(js|php)

  /(.*/)?bug\.(gif|jpe?g|png)
  /(.*/)?.*cookie.*\.js

{ -block }
  .thetrainline.com/Scripts/src/stationlist.js

Above patterns have been truly matched in actual browsing; they're not invented for the sake of playing with REGEX. However, I must admit I haven't seen any string matching this pattern: MAC(.?add?r(esse?)?); just in case of…

Cattleyavns & Whenever, the baby is yours; tweak it the way you think it should be.

New additions and updates to come soon!


RE: Path Blocking Using Wildcard Characters - whenever - Aug. 17, 2015 03:08 AM

I think (.*/)?.* is just equivalent to .*, and the ending .* could be omitted.


RE: Path Blocking Using Wildcard Characters - cattleyavns - Aug. 17, 2015 03:28 AM

As far as I know this is Privoxy author's standard.
/(.*/)?ads/ equal to:

/ads/
and
/.*?ads/