Post Reply 
Path Blocking with Metacharacters
Aug. 17, 2015, 08:14 AM
Post: #16
RE: Path Blocking Using Wildcard Characters
/(.*/)?ads/ forces a / before ads.
/.*?ads/ doesn't.

Also I'm talking about (.*/)?.* , which is different from (.*/)? .
Add Thank You Quote this message in a reply
Aug. 17, 2015, 08:33 AM (This post was last modified: Aug. 17, 2015 10:27 AM by Faxopita.)
Post: #17
RE: Path Blocking Using Wildcard Characters
(Aug. 17, 2015 03:08 AM)whenever Wrote:  I think (.*/)?.* is just equivalent to .*, and the ending .* could be omitted.

Simplication-wise:
  1. /(.*/)?.* to be replaced by /.*
  2. Indeed, .* ending lines can be omitted—I wasn't aware about this.
Thanks for your support.
Add Thank You Quote this message in a reply
Oct. 28, 2015, 10:25 AM (This post was last modified: Dec. 08, 2017 10:19 PM by Faxopita.)
Post: #18
RE: Path Blocking Using Wildcard Characters
Interesting block list from Denis Szalkowski; originally built for Silent Block add-on:
  1. Source list: contentblock-regex.txt
  2. Related article to source list: Silent Block : un Internet sans publicité (an Internet without ads)

See Privoxy-related version in .ZIP attachment below. Last section #Domaines of source list has been discarded in this version though. If you'd like those banned domains as well, visit source list and convert them to Privoxy syntax. Example:
Code:
(\.|/|%2F)coremetrics\.com(/|$|%2F|:) # Privoxy: .coremetrics.com

—–-

Unzip attachment and call DSzalkowski.action file in Privoxy's config file. Call it before user.action file.

Activate and review Privoxy logfile to fix browsing issues. Clear it daily—for example, with a cron job—so it doesn't inflate indefinitely. Otherwise, you would need to stop and re-enable Privoxy logging every time you encounter a browsing issue. You can grep lines containing Blocked: actions.

Make sure debug 1024 is not commented out in config file. If you want to create (and share) your own rules, logging and monitoring Privoxy requests is useful…

Add exceptions under { -block }. Often, you do not need to free the whole domain but the path leading to or hosting the legitimate item.

-–—

Last update: November 2016

-–—

Minuscule donations are always appreciated…
Code:
BTC --> 34WKogWorDoReJ2MSxw8rTsrGD87VMAPJY
BCH --> 1AXwyMdtMFZktZPvXScC58ESUZXptmjvge
DASH -> XusJsETR6PwDnG4Gde7cvGeRhXzUJFSxtD
ETH --> 0xb829FA99AA9AB31C32590dbc88B837bC5D91453e
ETC --> 0x059F128357331c346Ad2E23F95a4639beC3f0b3a
LTC --> MK7vxk93A1M6HHAYT38W8NPJSb8zANqCia
ZEC --> t1JNCuxdZEWUPBQiAzxZPUMqb4BM87sxs9H
DOGE -> DBPAUuCaez4JYGobAn4RHNNhFXwa9u1W6N
STRAT > SgG6jAHuxQfzW1QBaWyQRVdCdSq514BcyM


Attached File(s)
.zip  DSzalkowski.zip (Size: 7.16 KB / Downloads: 626)
Add Thank You Quote this message in a reply
Nov. 02, 2015, 10:29 AM (This post was last modified: Dec. 08, 2017 10:18 PM by Faxopita.)
Post: #19
RE: Path Blocking Using Wildcard Characters
Built by ShelbyGT500 from Copfilter forum and based on the work of Neil Van Dyke (Many Thanks)

Exhaustive rule set available from shelby.eurowireless.de/Privoxy_mod/

Get the latest archive and look for file copfilter.action in folder…
Code:
3rd_rules_privoxy/files/

In this file, you will also see…
Code:
# OTHER CONFIGURATION:
#
#     Consider adding something like the following to your "user.action" file:
#
#         { -add-header                                 \
#           +hide-forwarded-for-headers                 \
#           +hide-from-header{block}                    \
#           +hide-referrer{forge}                       \
#           -send-vanilla-wafer                         \
#           -send-wafer                                 \
#           +set-image-blocker{blank}                   }
#         /

In Privoxy 3.0.23, actions -send-vanilla-wafer and -send-wafer are no longer available. Action +hide-forwarded-for-headers must be replaced with +change-x-forwarded-for{block}. Thus, the above configuration should be rewritten as
Code:
{ -add-header                                 \
  +change-x-forwarded-for{block}              \
  +hide-from-header{block}                    \
  +hide-referrer{forge}                       \
  +set-image-blocker{blank}                   \
}
/

The other change I humbly suggest to apply is replacing all slashes (/) beginning path lines with the set /(.*/)? in order to give a chance for the pattern to be matched further down the path. This is how Privoxy syntax works. For example:
Code:
/comscore.js
becomes…
Code:
/(.*/)?comscore.js

The original path will block good.domain.com/comscore.js, but not good.domain.com/sites/all/themes/js/comscore.js; the modified path version will block both.

This is, I believe, the least change that can be done. There are a number of other things that could be changed, but those I'll leave them to you…

-–—

Minuscule donations are always appreciated…
Code:
BTC --> 34WKogWorDoReJ2MSxw8rTsrGD87VMAPJY
BCH --> 1AXwyMdtMFZktZPvXScC58ESUZXptmjvge
DASH -> XusJsETR6PwDnG4Gde7cvGeRhXzUJFSxtD
ETH --> 0xb829FA99AA9AB31C32590dbc88B837bC5D91453e
ETC --> 0x059F128357331c346Ad2E23F95a4639beC3f0b3a
LTC --> MK7vxk93A1M6HHAYT38W8NPJSb8zANqCia
ZEC --> t1JNCuxdZEWUPBQiAzxZPUMqb4BM87sxs9H
DOGE -> DBPAUuCaez4JYGobAn4RHNNhFXwa9u1W6N
STRAT > SgG6jAHuxQfzW1QBaWyQRVdCdSq514BcyM
Add Thank You Quote this message in a reply
Nov. 02, 2015, 10:40 AM (This post was last modified: Dec. 11, 2015 12:22 PM by Faxopita.)
Post: #20
RE: Path Blocking Using Wildcard Characters
EasyList to Privoxy: for Linux, Windows (more info via cattleyavns) and OS X…
Add Thank You Quote this message in a reply
Dec. 09, 2015, 10:37 AM (This post was last modified: Dec. 08, 2017 10:18 PM by Faxopita.)
Post: #21
RE: Path Blocking Using Wildcard Characters
(Nov. 02, 2015 10:29 AM)Faxopita Wrote:  Built by ShelbyGT500 from Copfilter forum and based on the work of Neil Van Dyke (Many Thanks)

Exhaustive rule set available from shelby.eurowireless.de/Privoxy_mod/

Get the latest archive and look for file copfilter.action in folder…
Code:
3rd_rules_privoxy/files/

Xmas Gift! Rule set completed rewritten with updated syntax for better use with Privoxy.

-–—

Minuscule donations are always appreciated…
Code:
BTC --> 34WKogWorDoReJ2MSxw8rTsrGD87VMAPJY
BCH --> 1AXwyMdtMFZktZPvXScC58ESUZXptmjvge
DASH -> XusJsETR6PwDnG4Gde7cvGeRhXzUJFSxtD
ETH --> 0xb829FA99AA9AB31C32590dbc88B837bC5D91453e
ETC --> 0x059F128357331c346Ad2E23F95a4639beC3f0b3a
LTC --> MK7vxk93A1M6HHAYT38W8NPJSb8zANqCia
ZEC --> t1JNCuxdZEWUPBQiAzxZPUMqb4BM87sxs9H
DOGE -> DBPAUuCaez4JYGobAn4RHNNhFXwa9u1W6N
STRAT > SgG6jAHuxQfzW1QBaWyQRVdCdSq514BcyM


Attached File(s)
.zip  Copfilter.zip (Size: 12.9 KB / Downloads: 631)
Add Thank You Quote this message in a reply
Apr. 01, 2016, 10:18 PM (This post was last modified: Oct. 16, 2018 09:12 PM by Faxopita.)
Post: #22
RE: Path Blocking Using Wildcard Characters
The Non-political Correctness Block Rules

The ruleset mainly intends to block “illegitimate” requests to third-party domains underlying, for example, news sites, based on typical path patterns used by the tracking/ad industry. Useful if your blacklist does not contain the tracking domain yet.

The ruleset is very well active on my own configuration. For “consuming” the web, it's near-ideal. For buying stuffs online, prepare to completely whitelist some shops or unblock only problematic requests by reviewing the latest crunched requests in your logfile.

The ruleset is especially useful if you are really angry, frustrated and furious against the ad/tracking industry. Though not 100% impervious to tracking, it should be a real option to consider if you want to protect, for example, your little family against insidious tracking. Why? Because the ruleset is “quite” susceptible, as you will quickly notice… It's also a good alternative to blocking JavaScript everywhere because the requests initiated by running JavaScript codes will be blocked: OS and browser specs, screen size and resolutions, sites you visit, hashes, id, IP, geolocation, web-based cryptomining, etc.

Notes #1:
  • If you use ProxHTTPSProxyMII, visit its config file and add payment processors in section SSL Pass-Thru so that the path blocker does not apply to them. Or, if you prefer, prevent Privoxy from viewing HTTPS connexions in times of payments; HTTPS proxy port temporarily set to 8118 instead of 8079.
    Examples:
    Code:
    [SSL Pass-Thru]
    *.arcot.com
    *.sagepay.com
    paymentportal*.exact3ex.co.uk
  • My ruleset is fine for browsing any news site (with, sometimes, broken webpages), but is a nightmare when visiting some commercial sites… You'll need to activate Privoxy log, choose the relevant debug options and then unblock some requests that are, in fact, legitimate for the site to run properly.

—–-
Activate permanently Privoxy's logfile. Clear it daily—for example, with a cron job—so it doesn't inflate indefinitely. Make sure debug 1024 is not commented out in Privoxy's config file!

You review the latest blocked queries to fix browsing issues. You can grep lines containing Blocked: instances. Create an alias to speed up the process…

If you just want to clear Privoxy's logfile content at 8 p.m. daily:
Code:
0 20 * * * echo $(grep "toggle?\(mini=y&\)\?set=\(enable\|disable\)" /private/var/log/privoxy/logfile.log | tail -1) > /private/var/log/privoxy/logfile.log
(if every hour, replace `20` by `*`)

Same job but at reboot time only:
Code:
@reboot    echo $(grep "toggle?\(mini=y&\)\?set=\(enable\|disable\)" /private/var/log/privoxy/logfile.log | tail -1) > /private/var/log/privoxy/logfile.log

To list your cron jobs: `contrab -l`.
-–—

Humble opinion about the rule set RefusedPath.action: I find it useful—given its susceptibilities—to protect against malvertising (have a look at this website and that one, BTW), to kill questionable request attempts based on their path patterns, including those from email spams. I personally can no longer navigate the web without it. The rule set is the result of a thorough daily analysis—started in Nov. 2014—of my Privoxy's log file.

Malvertising: malvertising involves the triggering of a chain of queries; at some point, this ruleset will likely block at least one of them, thus stopping the malware to be downloaded.

Think about it. My personal configuration is such that out of 100 requests, 30 to 35 of them are routinely blocked. It also blocks at least 20,000 request attempts per week, or close to one million “useless” (analytics, avatars, fonts, stats, widgets) and “illegitimate” (ads and tracking) request attempts annually. Now, how much is the total size of these resources that did not get downloaded over that period? Along with these big numbers, I still enjoy a far more than acceptable web browsing experience.

When I read this or that, I do definitely want to use my unforgiven set of rules! Oh, by the way, this one is a very good report on today's tracking practices…


Test any visited site with whotracks.me for fun…
Don't forget to read the other posts #18 and #21 as well…


Notes #2:
  • Last update: December 2017.
  • Included: my own exception list so you won't scrach your head in attempting to whitelist the same sites you happen to visit as me: Amazon, YouTube, GitHub, etc.

-–—

Minuscule donations are always appreciated…
Code:
BTC --> 34WKogWorDoReJ2MSxw8rTsrGD87VMAPJY
BCH --> 1AXwyMdtMFZktZPvXScC58ESUZXptmjvge
DASH -> XusJsETR6PwDnG4Gde7cvGeRhXzUJFSxtD
ETH --> 0xb829FA99AA9AB31C32590dbc88B837bC5D91453e
ETC --> 0x059F128357331c346Ad2E23F95a4639beC3f0b3a
LTC --> MK7vxk93A1M6HHAYT38W8NPJSb8zANqCia
ZEC --> t1JNCuxdZEWUPBQiAzxZPUMqb4BM87sxs9H
DOGE -> DBPAUuCaez4JYGobAn4RHNNhFXwa9u1W6N
STRAT > SgG6jAHuxQfzW1QBaWyQRVdCdSq514BcyM

—–-

Download counter for the July release: 37 Thumbs Up


Attached File(s)
.zip  VerySuspicious-December-2017.zip (Size: 436.71 KB / Downloads: 520)
Add Thank You Quote this message in a reply
Sep. 23, 2016, 04:30 PM (This post was last modified: Mar. 20, 2017 11:04 AM by Faxopita.)
Post: #23
RE: Path Blocking Using Wildcard Characters
Highlighting the Importance of Path Filtering Beyond Pure Domain Blocking

Resorting to filtering tools such as hosts file, Privoxy, DNS blocking, pfSense, etc. are good strategies to create domain blacklists. In my case, I combined Privoxy with Unbound, but it isn't enough.

Example: real-world request attempt…

Code:
https://platform2.cloud-iq.com/cartrecovery/?mode=store&session_id=&app_id=1234&basket_timeout=1500&base_campaign_id=14236&email_campaign_id=0008&baseAppId=4620&fingerprint=5346534625&page_title=Which%3F%20Magazine%20Subscription%20%3A%3A%20iSUBSCRiBE.co.uk&page_url=https%3A%2F%2Fwww.isubscribe.co.uk%2FWhich-Magazine-Subscription.cfm&cloudiqReferringURL=false&cloudiq_page_load=true&cloudiq_product_viewed=4357943579405&cloudiq_cart_started=0

Domain .cloud-iq.com was not initially blacklisted in my config. No response from my domain black list. In such a situation, resorting to path-side filtering is necessary; thanks to ProxHTTPSProxyMII, parts of my rule set were triggered even though the request was encrypted:
  1. In file: Borrowed/DSzalkowski.action
    Code:
    {+block{Denis Szalkowski} }
    /(.*[^a-z])?campaign(_|/)
    /(.*[^a-z])?fingerprint(=|\.)
    -–—
  2. In file: Borrowed/YoranBrault.action
    Code:
    {+block{Web Beacon} } # Shared by Cattleyavns
    /.{300}
    -–—
  3. In file: Personal/RefusedPath.action
    Code:
    {+block{Declined Paths} }
    /.*(campaign|comm?ercial|marketing|parte?nn?(er|air)|promo|social)
    /.*((resolution|ram.{0,3}MB|screen)=|subscribe|splash.?page|track)
    /.*([^o]ads?.?(bloc?k?|loade?r|manage?r|modul)|fingerprint|=false)
    /.*(advertize|invisible|(e.?mail|slot.?name|url)=|widget|win.?bid)
    /.*((client|request|survey).?id|https?(:|%3A)(\/\/|%2F%2F)|lytics)
    /(.*[^a-z])?(F?www\.)
    -–—
What has been captured: fingerprint, subscribe, campaign, url=, =false, https%3A%2F%2F

If there were no ProxHTTPSProxyMII, Privoxy would have let go the connexion to platform2.cloud-iq.com; simply because it cannot see the path side of any HTTPS-enabled URL without the ProxHTTPSProxyMII add-on. Frustrating!

Another example from “customer engagement” company.

Code:
https://mxm.mxmfb.com/rsps/m/27kYkShl7ccUi-xNnW2r8s-tlXSPuQesk3J_yE3fldV

Fortunaly, I had this domain blacklisted, but another cushion came in on the path side as well:
Code:
{+block{Declined Paths} }
/(.*((/|%2F)|(\?|%3F)))?(([a-z]|\d|_)+[-+]\w*(\d[a-z]|[a-z]\d)[a-z0-9-_+]*|\w*(\d[a-z]|[a-z]\d)\w*[-+][a-z0-9-_+]+)[~=?]*$

You know what you have to do for truly efficient filtering: neither rely only on EasyList sources nor domain blacklists. Above, none of my EasyList sources were triggered. For truly efficient blocking, go a step further by allowing Privoxy to view secured connexions while both keeping a suspicious eye on overly complicated websites and creating your own rule set as a fallback…

Tip: if you wish to bookmark a new site, you can assess it first via this tool.
Add Thank You Quote this message in a reply
Apr. 24, 2017, 01:04 PM (This post was last modified: Apr. 24, 2017 09:09 PM by Faxopita.)
Post: #24
RE: Path Blocking with Metacharacters
Website caught trying to track my battery status: makeuseof.com (technology website)

Link:
Code:
jizvehd.makeuseof.com/p?battery%5Bcharging%5D=true&battery%5Bcharging_time%5D=1350&battery%5Blevel%5D=0.81&demensions%5Bwidth%5D=1040&demensions%5Bheight%5D=738&&languages%5B0%5D=en-ca&languages%5B1%5D=de&languages%5B2%5D=nl-UK&languages%5B3%5D=cn&&do_not_track=false&r=121jfOadGJeiouG&page_url=http%3A%2F%2Fmakeuseof.com%2Ftag%2Fant-video-downloader-dead-easy-tool-for-downloading-online-video-firefox-ie%2F&protected=true&

Fortunately, rule set RefusedPath.action came into play to block it with numerous red alerts!

If you have that same rule set loaded, please view them by visiting this link.

Note: I changed the content of the variables…

Info on this tracking technique here.
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: