Post Reply 
[Req] Ad filter for AltaVista text-only search results
Mar. 22, 2010, 04:39 AM
Post: #1
[Req] Ad filter for AltaVista text-only search results
Good grief! I've been a forum member since 2004, and I'm just now making my first post! I've just been merrily using Proxomitron daily in its "out-of-the-box" configuration, and only this weekend tried writing some of my own filters. I'm very proud to say I wrote one from scratch that worked perfectly, only to find later (while prowling the forum) somebody had already written and posted one for the same purpose. Not wishing to be publicly humiliated for "not searching the forum first," I have dug all through the forum before posting this request to make sure somebody hasn't already tackled this one.

I want to remove the top "Sponsored Matches" from pages like these:
http://www.altavista.com/web/res_text?it...q=buy+dvds

I've done it for Google, Yahoo, and Hotbot, but AltaVista has got me stumped because the ads aren't within any normal "container." They're just floating in the <BODY> of the page like the desired results are.

These forum pages looked promising:
http://prxbx.com/forums/showthread.php?tid=646
http://prxbx.com/forums/showthread.php?tid=475
but I haven't been able to pull anything together that will do any more than suppress the <DIV></DIV> section that contains the trigger "Sponsored Matches".

God knows I don't really need another ad-free search engine. Now I'm just anxious to learn more about Scott's wonderful language.

Best regards,

Jerry
Add Thank You Quote this message in a reply
Mar. 22, 2010, 03:34 PM (This post was last modified: Mar. 22, 2010 03:36 PM by JJoe.)
Post: #2
RE: [Req] Ad filter for AltaVista text-only search results
Try

Code:
[Patterns]
Name = "www.altavista.com/ clean (2010.03.22) test"
Active = TRUE
URL = "$TYPE(htm)www.altavista.com/web/"
Limit = 9000
Match = "<div"
        "&"
        "(<(^/)*> )+ sponsored matches"
        "*"
        "?(^(^<(div|table)))"

HTH
Add Thank You Quote this message in a reply
Mar. 22, 2010, 08:20 PM
Post: #3
It works perfectly!
Thanks Jjoe, that works perfectly -- although at first glance, I will admit I don't know why. I'm going to have to sit down with the documentation and pick my way through it. But I wanted you to know right away that I saw your post and it worked fine. Thanks.

(I reserve the right to come back later and ask questions about it!)

Jerry
Add Thank You Quote this message in a reply
Mar. 22, 2010, 09:53 PM
Post: #4
. . . and I understand it!
Well, it's brilliant! I can see each piece of the code working. Especially the "include everything until you find a <div, but don't include it in the match" section.

Now I know everything!

Thanks again.

Jerry
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: