
Columbus' egg surrounded by comments

after, it seemed so simple.
the reason i've wrote time ago a new filter to kill comments-surrounded ads was to avoid scanning of a huge list every time <!-- was found in a web page.
as all of you have read following the previous three threads (the jor/jd one, the mine one and the sidki one, all posted in the spam blockers section), there are different approaches to kill this kind of nosey stuff. but i've not abandoned my initial idea, so the solution i propose now is really simple: use two different filters, the mine one and the sidki one, working in conjunction.
you will better understand the idea reading the filters:
Name = "Kill Comments-surrounded Ads [vm]"
Active = TRUE
URL = "$TYPE(htm)"
Limit = 12000
Match = "<!-- (auto|begin|start) $LST(AdComments)"
Replace = "<span style=display:none;>[Killed Comments-surrounded Ads]</span>"
Name = "Remove Comment-Block Ads [sidki]"
Active = TRUE
URL = "$TYPE(htm)"
Limit = 12000
Match = "<!--[^>]++{0,30}$LST(AdCommentPairs)"
Replace = "<span style=display:none;>[Killed Comments-surrounded Ads]</span>"
when an opening <!-- is found, my filter is scanned first, but it calls the AdComments.txt list ONLY if auto, begin or start match, so ONLY when they match the list is scanned, otherwise it is skipped. this requires to check only 3 keywords and not the huge AdComments.txt.
when a comment not starting with auto, begin or start is found, the AdCommentPairs.txt list is scanned, but no duped comments are contained in the second list so the number of lines of code to scan is dramatically reduced using two different lists.
the only case where both lists are scanned is when a comment starting with auto, begin or start is found and it is not contained in the first list. but if you find a new comment, simply add it to the right list. on the other hand these two lists, if merged, are equal to the one used now by sidki filter, so this apparent problem is caused ONLY by the missing entry.
also, i use a trick to skip both lists on false matches. to use this trick you too, add this line to your managedtags.txt list:
<!-- (begin|open|start) (left|right|head|footer|main|menu|javascript) (^ad)$SET(1= )
and now here are the lists as they are at this time (these are more updated than the sidki one posted in the other thread):
----------------AdComments.txt begin----------------------
# Proxomitron4 URL killfile: $LST(AdComments)
# Created by altosax on July 08, 2002
# Updated on August 03, 2002
#
# List for "Kill Comments-surrounded Ads [vm]" filter.
# It removes ad-blocks surrounded by listed comments.
# To make it safer, add here longer possible comments.
# Also, you need to add both starting and ending
# comments, separated by *.
# AUTO
Banner Insertion Begin * (Auto Banner Insertion)1 Complete *-->
# BEGIN
468 Ad area * End (468 Ad area)1 *-->
ADVERT POWER * END (ADVERT POWER)1 *-->
BAD ASS Advertising * END OF (BAD ASS)1 RANDOM ADVERTISEMENTS *-->
Ban Man Pro * End (Ban Man Pro)1 *-->
BURST * END (BURST)1 *-->
CLICK2NET CODE * END (CLICK2NET CODE)1 *-->
Crucial advertisement * end (Crucial advertisement)1 *-->
EXIT CODE * END (EXIT CODE)1 *-->
Flycast Ad Copyright * End (Flycast Ad)1 Copyright *-->
ITALIA HYPERBANNER * END (ITALIA HYPERBANNER)1 *-->
LINKEXCHANGE CODE * END (LINKEXCHANGE CODE)1 *-->
linswap Code * End (linswap Code)1 *-->
Linux Waves Banner Exchange * End (Linux Waves)1 Banner Exchange *-->
MPU * END (MPU)1 *-->
Nedstat Basic code * End (Nedstat Basic code)1 *-->
of MAFIA * end of (MAFIA)1 *-->
of SpyLOG * end of (SpyLOG)1 *-->
of technojobs ad * end of technojobs ad *-->
of Top100 * end of (Top100)1 *-->
of TopList * end of (TopList)1 *-->
PayCounter * End (PayCounter)1 *-->
PayPal Logo * End (PayPal Logo)1 *-->
RealHomepageTools * End (RealHomepageTools)1 *-->
RICH-MEDIA BURST * END (BURST)1 *-->
SEXCOUNTER ADVANCED CODE * END (SEXCOUNTER)1 ADVANCED CODE *-->
SexList Counter Code * End (SexList Counter)1 Code *-->
SEXLIST REFERRER-STATS CODE * END (SEXLIST REFERRER-STATS)1 CODE *-->
SEXTRACKER CLIT CODE * DONE WITH (SEXTRACKER CLIT CODE)1 *-->
SEXTRACKER CODE * END (SEXTRACKER)1 CODE *-->
SITEWISE * END (SITEWISE)1 *-->
Tracker * End (Tracker)1 *-->
TT Side CODE * END (TT Side)1 CODE *-->
TXTAD ROTATE * END (TXTAD ROTATE)1 *-->
WEBSIDESTORY CODE * END (WEBSIDESTORY)1 CODE *-->
Web-Stat code * End (Web-Stat)1 code *-->
ZEDO * end (ZEDO)1 *-->
: Pop-Up Window * END: (Pop-Up Window)1 *-->
n Cash 2002 HTML Code * Ende (Cash 2002)1 HTML Code *-->
ning Advertising nAdvert * End (Advertising nAdvert)1 *-->
# START
ADCYCLE STANDARD * END (ADCYCLE)1 CODE *-->
EROTISM HEADER CODE * END (EROTISM HEADER)1 CODE *-->
EROTISM FOOTER CODE * END (EROTISM FOOTER)1 CODE *-->
Gamma Entertainment * End (Gamma Entertainment)1 *-->
of ExtremeDM Code * End of (ExtremeDM Code)1 *-->
OF GENERIC SITEWISE * END OF (GENERIC SITEWISE)1 *-->
of NedStat * end of (NedStat)1 *-->
of Recommend-it Code * End of (Recommend-it)1 Code *-->
of ReferStat * End of (ReferStat)1 *-->
of Sex Trail Safe-Code * End of (Sex Trail)1 Safe-Code *-->
OF SITEWISE * END OF (SITEWISE)1 *-->
of TheCounter.com * End of (TheCounter.com)1 *-->
OF WEBTRENDS LIVE * END OF (WEBTRENDS LIVE)1 *-->
Product-Specific Links * End (Product-Specific)1 Links *-->
RedMeasure * END (RedMeasure)1 *-->
# GENERIC AD COMMENTS FOR ANY DOMAIN
(of|[^a-z]|) ad(s|)[^a-z] * end (of|[^a-z]|) (ad(s|)[^a-z])1 *-->
(of|[^a-z]|) advertis(ing|ements) * end (of|[^a-z]|) (advertis(ing|ements))1 *-->
(of|[^a-z]|) banner(s|)[^a-z] * end (of|[^a-z]|) (banner(s|)[^a-z])1 *-->
-------------------AdComments.txt end--------------------------
and here the second list:
-----------------AdCommentPairs.txt begin----------------------
# NoAddURL
# Proxomitron4 URL killfile: $LST(AdCommentPairs)
# List for "Remove Comment-Block Ads [sidki]" filter.
# It removes Ad-blocks surrounded by listed comments.
# Keywords by sidki, Jor, JD, altosax
# Created by sidki on July 12, 2002
# This version by altosax. Updated August 03, 2002
###############################################
# Checked And Ordered, No Leading Wildcard Here
# ---------------------------------------------
ACTIVEADV BEGIN BANNER * END (BANNER)1 *-->
AD BOX BEGINS * (BOX/BAR ENDS)1 *-->
ADDFREESTATS.COM * END (ADDFREESTATS.COM)1 *-->
ads begin * (ads)1 end *-->
Adspace * / (Adspace)1 *-->
Adv Ins Banner * End (Adv Ins Banner)1 *-->
AD POSITION * End (AD POSITION)1 *-->
Banner Ad Cell * / Banner (Ad Cell)1 *-->
Banner code begin * (Banner code)1 end *-->
Click.it Mondadori * Fine (Click.it)1 Mondadori *-->
DoubleClick Bottom Ad BEGIN * (DoubleClick Bottom Ad)1 END *-->
DoubleClick Javascript BEGIN * (DoubleClick Javascript)1 END *-->
DoubleClick Top Ad BEGIN * (DoubleClick Top Ad)1 END *-->
FASTCLICK.COM * (FASTCLICK.COM)1 *-->
HotLogs * (HotLog)1s *-->
HTML BANNER AD * / (HTML BANNER)1 AD *-->
HTTPADS * / (HTTPADS)1 *-->
Inizio Codice Shinystat * Fine (Codice Shinystat)1 *-->
KMiNDEXs * (KMiNDEX)1s *-->
new ad code * end (new ad code)1 *-->
OSDN Navbar * End (OSDN Navbar)1 *-->
Pair Promotion Begin * End (Pair Promotion)1 *-->
PayPopup.com Advertising * (PayPopup.com)1 Advertising *-->
Rating@Mail.ru COUNTER * <!-- / (COUNTER)1 -->
Russian LinkExchange code * (Russian LinkExchange)1 code *-->
SexKey Original code * (SexKey)1 Original code *-->
SpyLOG * <!-- SpyLOG -->
STATS4ALL_START * (STATS4ALL_END)1 *-->
TOPCTO begin * (TOPCTO)1 end *-->
TopList COUNTER * (TopList COUNTER)1 *-->
TOPLIST * (TOPLIST)1 END *-->
VC active * (VC active)1 *-->
WebMeasure start * (WebMeasure)1 slutt *-->
##############################################################################
# [^>]++{0,30} ==> To Move In The Above Category Or In The AdComments.txt List
# ----------------------------------------------------------------------------
(1st|2nd|3rd|4th|5th|6th) Ad*((1st|2nd|3rd|4th|5th|6th) Ad)1 *-->
1000stars*(1000stars)1 (?)++{0,90}-->
123Advertising*(123Advertising)1 (?)++{0,90}-->
4-F-R-E-E*(4-F-R-E-E)1 *-->
468X60 AD*(468X60 AD)1 *-->
AD BANNER*(AD BANNER)1 *-->
Ad code*(Ad code)1 *-->
AD TABLE*(AD TABLE)1 *-->
ad(vertisement|) 468x60*end (ad(vertisement|[^a-z]))1 *-->
ADCALL*(ADCALL)1 (?)++{0,90}-->
ADCYCLE.COM*(ADCYCLE.COM)1 (?)++{0,90}-->
ADDFREESTATS (EASY|NORMAL) CODE*END (ADDFREESTATS)1 *-->
ADnetz.net Code*(ADnetz.net Code)1 *-->
AdSolution-Tag*(AdSolution-Tag)1 *-->
AdultPlex.Com*(AdultPlex.Com)1 (?)++{0,90}-->
Advert Block*(Advert Block)1 *-->
advertisement code*(advertisement code)1 *-->
Advertising.com Banner Code*(Advertising.com)1 (?)++{0,90}-->
Advertizment Flash*(Advertizment Flash)1 *-->
Affiliate Code*(Affiliate Code)1 *-->
affiliate links*(affiliate links)1 *-->
Amateur Pages Code*(Amateur Pages Code)1(?)++{0,90}-->
Anonymizer*(Anonymizer)1 (?)++{0,90}-->
Bananer Ad*(Bananer Ad)1 *-->
Banner Ad*(Banner Ad)1 *-->
Banner Exchange Code*(Banner Exchange Code)1 *-->
# Banner*/ (Banner)1 *-->
BannerAlto*(BannerAlto)1 *-->
BarelyLegal Banner*(BarelyLegal Banner)1(?)++{0,90}-->
BEGIN ADs*END (AD)1s *-->
begin clickXchange*end (clickXchange)1(?)++{0,90}-->
Begin HBtrack*End (HBtrack)1(?)++{0,90}-->
BelStat.be Counter*(BelStat.be Counter)1(?)++{0,90}-->
BOT AD*(BOT AD)1 *-->
btpromo*(btpromo)1 (?)++{0,90}-->
BUTTON ADS*(BUTTON ADS)1 *-->
CASH COUNT BANNER*(CASH COUNT BANNER)1 *-->
CibleClick*(CibleClick)1 (?)++{0,90}-->
Click-Counter*(Click-Counter)1 *-->
cobranding*(cobranding)1 (?)++{0,90}-->
Coolerguys advertisement*(advertisement)1 *-->
Counter Code*(Counter Code)1 *-->
Counters*END (Counters)1 *-->
DarkCounter*(DarkCounter)1 (?)++{0,90}-->
dialerfactory*(dialerfactory)1(?)++{0,90}-->
DoubleClick ADJ*(DoubleClick ADJ)1 *-->
dynad*(dynad)1 (?)++{0,90}-->
eMerite code*(eMerite code)1 *-->
Extract.Ru banner*(Extract.Ru banner)1 *-->
focusIN code*(focusIN code)1 *-->
frameJammer_hp*(frameJammer_hp)1 *-->
freecom*(freecom)1 (?)++{0,90}-->
FreepageScript1*(FreepageScript1)1 *-->
friendplay.com*(friendplay.com)1 (?)++{0,90}-->
Gallery Host*(Gallery Host)1 *-->
GeoGuide*(GeoGuide)1 *-->
HitBox Ads*(HitBox Ads)1 *-->
Hittrack Tracker*(Hittrack Tracker)1 *-->
Home Free*(Home Free)1 *-->
Honor System*(Honor System)1 *-->
HumanTag Monitor*(HumanTag Monitor)1 *-->
Hustler Banner*(Hustler Banner)1 *-->
Impression code*(Impression code)1 *-->
Impressions-Counter*(Impressions-Counter)1 *-->
InDepthInfoAd*(InDepthInfoAd)1 (?)++{0,90}-->
INLIVE CODE*(INLIVE CODE)1 *-->
INVOEGCODE*(INVOEGCODE)1 (?)++{0,90}-->
IVWs*(IVWs)1 (?)++{0,90}-->
Land Banner*(Land Banner)1 *-->
LIVE WIRE MEDIA CODE*(LIVE WIRE MEDIA CODE)1(?)++{0,90}-->
MAJOR SPONSORS*(MAJOR SPONSORS)1 *-->
MILLTO_BAR*(MILLTO_BAR)1 *-->
Money 4u HTML Code*(Money 4u HTML Code)1 *-->
NetworXXX*(NetworXXX)1 (?)++{0,90}-->
NEWS TICKER*(NEWS TICKER)1 *-->
Newsensations Banner*(Newsensations Banner)1(?)++{0,90}-->
OAS (AD|TAG|SETUP|function)*(OAS (AD|TAG|SETUP|function))1 *-->
PIGPORN*(PIGPORN)1 (?)++{0,90}-->
POPUNDER.COM CODE*(POPUNDER.COM CODE)1 *-->
popup code*(popup code)1 *-->
PornTrack JavaScript Code*(PornTrack JavaScript Code)1 *-->
PROBE CODE*(PROBE CODE)1 *-->
p?ginas de galeon*(p?ginas de galeon)1 *-->
rail ad*(rail ad)1 *-->
RBC counter*(RBC counter)1 *-->
RealTracker*(RealTracker)1 (?)++{0,90}-->
RICH MEDIA CODE*(RICH MEDIA CODE)1 *-->
RmbClick Advertisng*(RmbClick Advertisng)1 *-->
roadmap code*(roadmap code)1 *-->
rsct-click-info*(rsct-click-info)1 *-->
SE Toolbar*(SE Toolbar)1 *-->
Sex Swap Code*(Sex Swap Code)1 *-->
Sexlist#*(Sexlist)1# *-->
SEXSEARCH.COM COUNTER*(SEXSEARCH.COM COUNTER)1 *-->
SexyAVS.com Code*(SexyAVS.com Code)1 *-->
side ads*(side ads)1 *-->
Sitestat4 code*(Sitestat4 code)1 *-->
skyscraper ad*(skyscraper ad)1(?)++{0,90}-->
sponcode*(sponcode)1 (?)++{0,90}-->
sponsor ad*(sponsor ad)1 *-->
#SPONSOR TABLE*(SPONSOR TABLE)1(?)++{0,90}-->
sponsors code*(sponsors code)1 *-->
sponsorship*(sponsorship)1 (?)++{0,90}-->
technojobs ad*(technojobs ad)1(?)++{0,90}-->
TELLERCODE*(TELLERCODE)1 (?)++{0,90}-->
text ad*end (text ad)1 *-->
THEBANNER.DE Code*(THEBANNER.DE Code)1 *-->
Topsites BANNER*(Topsites BANNER)1(?)++{0,90}-->
Totally Pornstars Banner*(Totally Pornstars Banner)1(?)++{0,90}-->
TOWER Ad code*(TOWER Ad code)1 *-->
Tracking Code*(Tracking Code)1 *-->
TRAFFIC IMPRESSION*(TRAFFIC IMPRESSION)1 *-->
Traffic-Network*(Traffic-Network)1 *-->
TRAFFICHOME*(TRAFFICHOME)1 (?)++{0,90}-->
TrafficMarketPlace*(TrafficMarketPlace)1 (?)++{0,90}-->
web audit counter*(web audit counter)1(?)++{0,90}-->
webbot bot="HitCounter"*webbot bot="(HitCounter)1" *-->
WHD Code*(WHD Code)1 *-->
www.HyperCount.com*(www.HyperCount.com)1 (?)++{0,90}-->
www.paidbanner.de*(www.paidbanner.de)1 (?)++{0,90}-->
X-IT CODE*(X-IT CODE)1 *-->
XXX COUNTER*(XXX COUNTER)1 *-->
[^a-z]Ad Start*([^a-z]Ad)1 End *-->
# GENERIC AD COMMENTS FOR ANY DOMAIN
ad(s|)[^a-z] * end (ad(s|)[^a-z])1 *-->
advertis(ing|ements) * end (advertis(ing|ements))1 *-->
banner(s|)[^a-z] * end (of|[^a-z]|) (banner(s|)[^a-z])1 *-->
# USER ADDED COMMENTS
---------------------AdCommentPairs.txt end--------------------
and also this to complete the post (as arne point out, updated with the recent tegghead suggestion):
----------------------ManagedTags.txt begin--------------------
# Proxomitron4 URL killfile: $LST(ManagedTags)
# Created by altosax on May 23, 2002
# Updated on July 31, 2002
#
# List for "Tag Manager [vm]" filter.
# Use $SET(1= ) to preserve the content of the tag.
# Use $SET(1=what_do_you_like) to replace the tag.
# Remove $SET(1= ) to kill the tag.
<fonts*>$SET(1= )
<metas(w=$AV(*) )+>
<!doctype*>$SET(1= )
$NEST(<noscript,</noscript>)
$NEST(<h[#1:6],</h?>)$SET(1= )
$NEST(<style,</style>)$SET(1= )
$NEST(<select,</select>)$SET(1= )
$NEST(<textarea,</textarea>)$SET(1= )
<!-- (begin|open|start) (left|right|head|footer|main|menu|javascript) (^ad)$SET(1= )
----------------------ManagedTags.txt end-----------------------
sorry for such long post, but it was necessary to explain in detail my ideas.
<edit>:
comments lists update to August 03<edit>:
thanks to sidki for his new commentsregards,
altosax.
Edited by - altosax on 03 Aug 2002 17:22:44
Edited by - altosax on 03 Aug 2002 19:13:18