Other Filters

I've put together a list of filters that, while not as practical as my main filters, can be useful. Some are just too unreliable to be enabled full-time yet, but they may inspire someone to create something better. Others are just ways to mess with the heads of clueless webmasters by sending them all sorts of bogus information. If this teaches even one of them that nothing about the client is guaranteed, and thus prevents another incident like this ComputerHQ fiasco, it will be worthwhile.


Randomize referrer info

Rather than just block the invasive Referer header, why not have some fun with it instead? There are absolutely no restrictions on what you can send here, so webmasters who depend on its value as some sort of security measure are idiots. This filter rotates your referrer to one of several values. You will need to create blocklist named "Referrers" and associate it with a file like my ReferrerList.txt. It should be pretty obvious looking at the file how to add or change referrers. I didn't want to overdo it, so the URL match restricts this filter to a site's homepage, i.e. http://host.domain/, which is where it's most likely to be noticed. Thus you can still use the normal Referrer filter within a site to get past silly scripts that depend on its value.

Among the values I chose were the full text of the Gettysburg Address, [email protected], the FBI web site, and a Google search for the phrase "goat pr0n".

In = FALSE
Out = TRUE
Key = "Referer: Randomize referrer info (Out)"
URL = "[^/]++/(^?)"
Match = "$LST(Referrers)"
Replace = "\1"

Get rid of Click here!

Another shining beacon of webmaster cluelessness, the phrase "Click here" has found its way into countless pages. It's annoying, it's meaningless to people who don't use a mouse, and it makes the author sound like a bouncing, hyperactive 12-year-old. Imagine picking up a book or magazine that had "Read here!" in bold letters all over the place. With Proxomitron, there's no need to put up with it; this simple filter replaces the phrase with the interface-independent and much less grating "Use this link".

(In case you're wondering why the filter doesn't work on the preceding paragraph, it's because I encoded the h as h.)

Name = "Get rid of Click here!"
Active = TRUE
Limit = 32
Match = "click\shere!+{0,*}"
Replace = "Use this link"

Show when Proxomitron is not active

There are a number of filters out there that indicate when Proxomitron has filtered a page. They usually add something obvious to the title or the top of the page. While it's nice to know when Proxomitron is active, I don't really want to be reminded of it on every single page. I've come up with something that is more practical. It shows when Proxomitron is not active instead. The tricky part of this is of course how to insert something into every page without using a filter. Hm, did I mention that user CSS was cool? First add this to your user CSS file:

/* Indicate when Proxomitron is not filtering */
@media screen
{
  body
  {
    border: 10px double black;
  }
}

This puts a black border around every page. Now the following web filter inserts an override into every HTML page to turn the border off again:

Name = "Hide "Not Filtered" indicator"
Active = TRUE
URL = "$TYPE(htm)"
Limit = 32
Match = "(<(head|/title|noframes|body)(\s*|)>)\1"
Replace = "\1\r\n"
          "<style type="text/css" media="screen"><!--\r\n"
          "body { border: none !important; }\r\n"
          "--></style>$STOP()"

So what happens? Every page gets a black border around it from the user CSS statement, except when the filter above triggers and overrides it (this is why we don't use !important in the user CSS file, we want the "author" CSS from the filter to override it). So every page that Proxomitron filters will appear as normal, but anything it misses will still have a visible border. Presto! Instant "Not filtered" indicator!

Kill comment-pair delimited ads

Sometimes it's just too easy. Many pages put their advertising code in easily recognized comment-pair blocks. This filter looks for start and end comments like <!-- begin banner --> and <!-- end banner --> and deletes anything in between. This works amazingly well--provided you're careful about which keywords you target. It can just as easily kill huge chunks of the page content as well if you're not, which is why I hesitate to include it among the main filters. It turns out that the sublists used in my version of AdList also make a perfect list of comment keywords to filter on. Get them from the blocklists page and install them in Proxomitron's Config menu. For best results, put this filter earlier in the list than any other ad-killing filters you may have. I have it as my very first active web filter and it wipes out huge blocks of code that slower filters never have to look at.

Name = "Kill comment-pair delimited ads"
Active = TRUE
URL = "$TYPE(htm)"
Bounds = "<!--*-->*<!-- end*-->"
Limit = 2048
Match = "<!-- (start|begin) (\w )++{0,3}"
        "(\w&&(($LST(AdPaths)|$LST(AdDomains)|ad))\1*)*"
Replace = "<span class=prox kill=Comment detail=\1></span>"

Kill clueless ALT attributes

Even on a cable modem, many websites just load too slowly for my tastes, so I often find myself turning images off. But not suprisingly, many authors simply have no clue how to properly use the alt attribute of images. You end up with gems like top_logo.jpg (48320 bytes). Yeah, that's useful information. I bet visually impaired users are real interested in the number of bytes in an image they can't see. If I really wanted to know the filename of the image I would just look at the source, sheesh. This filter reduces the clutter by removing an alt attribute that looks like an image filename and replacing it with a blank one.

Name = "Kill clueless ALT attributes"
Active = TRUE
Bounds = "<im(g|age)\s*>"
Limit = 256
Match = "\1alt=$AV(*.(jp(e|)g|gif|png)*)\2"
Replace = "\1alt=""\2"

Compare AdList and Hosts file

These two filters provide a workaround for an undesirable side effect that occurs if you use Proxomitron along with an ad-blocking Hosts file. If you notice huge slowdowns when loading pages, even while Proxomitron is bypassed, you should give this a try. Proxomitron retries failed connections a number of times before giving up, even if they are to 127.0.0.1. Depending on the browser type (Opera is particularly bad about this) and the layout of the page (e.g., missing height and width attributes on images), the browser may not be able to display anything until Proxomitron gives up, which can take 30 seconds or so.

The workaround is to make sure that every entry in your Hosts file is also blocked within Proxomitron itself, so that it never attempts to make these "hopeless" connections. The standard URL-Killer filter calls a blockfile named AdList, so the idea is to make sure that every Hosts entry is accounted for in AdList. This is difficult to do manually since both files are huge lists with different syntax. The filter set below provides an automated way to find these "leftover" Hosts file entries.

Enable these two and point your browser to http://file..c|/windows/hosts in order to make sure that everything in your Hosts file is blocked by Proxomitron. The displayed Hosts file entries are the ones that are not accounted for in AdList. Add them to the blockfile one by one, or if a group of entries are similar, use a wildcard expression that covers them all. This is a very specialized filter. Don't enable it except for this purpose.

Name = "Compare AdList and Hosts file (part 1)"
Active = FALSE
Bounds = "127.0.0.1[^\n]+\n"
Limit = 100
Match = "127.0.0.1 $AV($LST(AdList)*)*"

Name = "Compare AdList and Hosts file (part 2)"
Active = FALSE
Limit = 256
Match = "\n"
Replace = "<br>"

Hosts file entries for ad blocking usually have the form "127.0.0.1 someadsite.com" (if you use the 0.0.0.0 form, adjust the first filter accordingly). The bounds match carefully reads one line at a time and the matching expression checks it for anything that matches AdList. If it does, the entire line is removed. The second filter merely converts all the newline characters into HTML linebreaks so the file displays neatly as HTML. By the way, you can use the http://file.. syntax to filter any local file through Proxomitron. If you're running Windows NT or 2000, the path to the Hosts file will be different, but everything else should work.

Click every ad banner!

If you ever feel guilty about blocking ads (I don't--for one thing it's not really "blocking" anything, your user agent just doesn't go out of its way to fetch linked content you don't want; that's how the web was supposed to work all along. If companies go under trying to make money on something they don't even understand, I consider it evolution in action.), this filter may boost your karma. What it does is pretend you had clicked on every banner ad on the page. This should really build an interesting "profile" for the advertisers to mull over. You appear to be eager to buy... well, just about everything!

The filter works by converting the link into an img tag with the original link destination as its source. This causes the browser to fetch that document as if you had clicked on it, only the document never gets displayed since the browser is trying to use the HTML data as an inline image. No scripts or other active content will run. However, it still counts as a hit in the logs. By setting the image's size to 0x0 and giving it a blank alt attribute, you'll never see it. So you can support your favorite site without any inconvenience to yourself. The drawback is that you must disable any other mechanisms that might prevent the request from going out, such as the "URL-Killer" or Hosts file. For this reason, it's not a good idea to enable this filter full-time, but only occasionally for amusement.

Name = "Click every ad banner!"
Active = FALSE
Bounds = "<a\s*</a>"
Limit = 512
Match = "*href=$AV(http://($LST(AdList))\2\3)*"
Replace = "<span class=prox kill=Link detail=\2>"
          "<img src="http://\2\3" alt="" width=0 height=0>"
          "</span>"

This is a cached copy of http://www.geocities.com/u82011729/prox/other.html