Proxomitron Filters - Discussions welcome > Spam Blockers
Code Cleaners
(1/1)
sidki3003:
These filters do several things, but the main function is to remove certain code outside the body tags.
I'll try to explain some aspects at the bottom of this post.
[Blocklists]
List.Bypass_Ads = "..ListsBypass Ads.txt"
[Patterns]
Name = "Remove: Various Pre-HTML Code"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)(^$LST(Bypass_Ads)|([^/]++.|)google.|216.239.)"
Bounds = "*<(html|body$SET(6=7))1*>"
Limit = 12000
Match = "$SET(7="
"<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
"
"<html lang="en-cockney">
"
"<span class=prox id=proxalert style=display:inline;>[HTML Tag fixed]</span>
"
")"
"(#("
"$NEST(<(script)2,</script*>)"
"|"
"$NEST(<(center)2>,</center>)"
"|"
"$NEST(<(table)2,</table*>)"
"|"
"<(a)2s*</a>"
")"
"&$SET(5=
<span class=prox style=display:inline;>[Pre-1 2 killed]</span>)"
")+#"
Replace = "6@5$STOP()"
Name = "Remove: Various Pre-Body Code"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)(^$LST(Bypass_Ads))"
Bounds = "<html(^slang="en-cockney")*>*<body*>"
Limit = 12000
Match = "(#("
"$NEST(<(center)2>,</center>)"
"|"
"$NEST(<(table)2,</table*>)"
"|"
"<(a)2s*</a>"
")"
"&$SET(5=
<span class=prox style=display:inline;>[Pre-Body 2 killed]</span>)"
")+#"
Replace = "@5$STOP()"
Name = "Remove: Various Post-Body Code"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)(^$LST(Bypass_Ads))"
Bounds = "(</(body)1>(^*<(/|)body)*)|(</(html)1>(^*<(/|)html)*)"
Limit = 12000
Match = "(#("
"$NEST(<(script)2,</script*>)"
"|"
"$NEST(<(center)2>,</center>)"
"|"
"$NEST(<(table)2,</table*>)"
")"
"&$SET(5=
<span class=prox style=display:inline;>[Post-1 2 killed]</span>)"
")+#"
Replace = "6@5$STOP()"
Name = "<end> Mark start"
Active = TRUE
URL = "(^$LST(Bypass_Start-End))"
Limit = 1
Match = "<end>"
Replace = "
</xmp></pre><a name="xdown"></a></html>
"
"<!-- Start injected Proxomitron filters section -->
"
------------------------ Bypass Ads.txt ------------------------
#dummy
----------------------------------------------------------------
As you can see "Bypass Ads.txt" isn't needed for the filter to work , but i strongly recommend to maintain such a list and to use it with the other ad filters as well.
"Remove: Various Pre-HTML Code":
If there is an <html> tag:
Remove everything within specified tags above <html>.
Don't do this with pages from Google's cache.
If there is no <html> tag:
Remove everything within specified tags above <body>.
Place a proper <!doctype> and <html> tag on top of the code.
The doctype tag makes sense if you don't use any <start> filters like me.
I place most of such filters just below the <head> tag. My approach is here.
"Remove: Various Pre-Body Code":
If there is (was) an <html> tag:
Remove everything within specified tags between <html> and <body>.
"Remove: Various Post-Body Code":
This one is a bit difficult to explain.
It removes certain code below the body, but needs to do a check for multi HTML pages.
"<end> Mark start":
Close notoriously open tags.
Insert an anchor that allows for bookmarking the end of a page like foo.com/foo.html#xdown .
Mark the beginning of the injected/added <end> filters.
Some notes:
The whole thing is based on the assumption that there is no legit use of certain tags outside the body.
That is *i* haven't seen that (the test drive was 10 days), but i'm not certain it isn't there.
The filters are designed to remove big chunks of code, that is they are aggressive.
"Remove: Various Post-Body Code" can fail if there is an </html> tag and another <html> more than 12000 bytes below.
This set replaces "Remove: Pre-HTML JavaScripts" and "Kill add-on JavaScripts".
Have fun, sidki
Edited by - sidki3003 on 19 Aug 2002 16:55:28
JD5000:
Wow! You've been busy! I'm trying all of the filters you've been posting.
--------
Infopros Joint :: Computer Related Links And Discussion
Navigation
[0] Message Index
Go to full version