Base: Speeding up ad-list
|
Feb. 19, 2009, 11:57 PM
Post: #1
|
|||
|
|||
Base: Speeding up ad-list
A short explaining how lists works:
A list file is very similar to writing (word1|word2|word3|...), but use to be more and more large. This post will speak about how i did to create an adlist for proxomitron starting from the famous easy list from adblock plus. Taking a look to its keywords, most of them start by "http://","/" or ".". Let's try to use this list (of course after some adaptations to proxomitron): Having the URL to parse in the variable \1, and being http://www.host.com/sub1/sub2.adbureau.example if we use $TST(\1=$LST(adlist)*), then it would parse only one time and would match if we find a keyword wich match with the beginning of our url One possible code to use would be $TST(\1=*$LST(adlist)*) but it would be really really slow. It would look all the words in the list for: Code: http://www.host.com/sub1/sub2.adbureau.example So after some days of research, a pencil, a paper and using the log window, i got to something useable. Copy this filter to the clipboard and import it, go to the test window and test with this code in it: href=http://www.host.com/sub1/sub2.adbureau.ext Code: [Patterns] In gray you will see which parts of the url will be parsed by the adlist. Feel free to post suggestions or comment anything. |
|||
Feb. 20, 2009, 06:34 AM
Post: #2
|
|||
|
|||
RE: Base: Speeding up ad-list
I played a while and tried to use + operator to do this:
Code: [Patterns] The log window output: Code: http://www. The output is the consumed characters while the left characters tried to match prxfail or the $LST(adlist). It's just for demonstration and I knew it still need a lot improvement. |
|||
Feb. 20, 2009, 06:53 AM
Post: #3
|
|||
|
|||
RE: Base: Speeding up ad-list
Regarding speed -- I don't know if this is of any relevance to the Base Config approach, or even applicable, but i thought i'd share:
When switching to Paul Rupe's 3-list approach (AdHosts, AdDomains, AdPaths), i got a significant speed boost, mainly because the list invoking expression could be much better tailored to the actual list content. There is a fourth list, "AdList", which acts as a hub for the other lists. I was messing around a lot with it to get it right. Now an (off-domain host testing) entry looks like: Code: http(s|):\\+/\\+/ Same test with "ftp(s|):\\+/\\+/" and "//". I've never seen an FTP - let alone secure FTP - ad server. That test is just still in to keep that list hashable (and - to a lesser extend - for completeness). Most of above code isn't exactly interesting, but the tailored list invocation expressions work pretty well for me: AdHosts: No wildcarding needed. AdDomains: ([^/]++.|) AdPaths: [^/?]+*[/._?&;=-] |
|||
Feb. 20, 2009, 05:16 PM
Post: #4
|
|||
|
|||
RE: Base: Speeding up ad-list
Thanks guys
Whenever, it doesn't work as supposed, take a look to the gray words in the log window Code: [Patterns] Sidki, it seems very optimized, but the list should need our maintenance, While the easy adlist is frequently updated, and i think we wouldn't notice a big difference of speed. I would like to take a look, did some search for the Paul Rupe config set or info about the 3-list approach but i didn't find anything. If someone have the Paul Rupe config set would be nice to share it in the download section of our forum. |
|||
Feb. 20, 2009, 05:41 PM
Post: #5
|
|||
|
|||
RE: Base: Speeding up ad-list
He never published a complete config set.
There is a copy of the - now gone - original pages already on-site: http://prxbx.com/other/paulrupe/ Relevant section: Blocklists |
|||
Feb. 20, 2009, 06:17 PM
Post: #6
|
|||
|
|||
RE: Base: Speeding up ad-list
Just a warning: the "WillemList.txt" link in the Blocklists section is NSFW (not safe for work). Seems like some squatters got the domain after it expired
EDIT: It is now SFW (safe for work) |
|||
Feb. 20, 2009, 06:30 PM
Post: #7
|
|||
|
|||
RE: Base: Speeding up ad-list
Here it is! http://accs-net.com/smallfish/WillemList.txt
|
|||
Feb. 20, 2009, 06:32 PM
Post: #8
|
|||
|
|||
RE: Base: Speeding up ad-list
Ah, thanks! I'll update that one link on that page. (going to save a local copy)
|
|||
Feb. 20, 2009, 07:30 PM
Post: #9
|
|||
|
|||
RE: Base: Speeding up ad-list
(Feb. 20, 2009 06:17 PM)Kye-U Wrote: Seems like some squatters got the domain after it expired OT: The "<frameset>: Jump out of invisible Frames" filter in *the other* config set was missing that hijacked page. I've posted an update here. |
|||
Feb. 20, 2009, 07:37 PM
Post: #10
|
|||
|
|||
RE: Base: Speeding up ad-list
Please, update these too:
AdDomainList.txt AdPathList.txt AdKeywordList.txt CommentList.txt Hopefully i have found them here: http://homepage.usask.ca/cgi-bin/cgiwrap...klist.html |
|||
Feb. 21, 2009, 04:54 AM
Post: #11
|
|||
|
|||
RE: Base: Speeding up ad-list
Updated, thanks
|
|||
Feb. 21, 2009, 06:31 AM
Post: #12
|
|||
|
|||
RE: Base: Speeding up ad-list
This is probably irrelevant to what is being discussed here, but I improved the ad filtering speed of the Banner Blaster filter from Scott's default.cfg file by putting the keywords back into the filter and removing the reference to any external list. Since anchors are still the most common type of ads, and there can be many of them on a page, I found that the filter jumping first to the keyword list and from there into the ad host name list took too long, and slowed down page loading.
|
|||
Feb. 21, 2009, 03:39 PM
Post: #13
|
|||
|
|||
RE: Base: Speeding up ad-list
Siamesecat, i did some test and it works like the first code posted in this thread. The optimization we are looking for is precisely to parse the code only in certain parts of the full URL. If you go to http://local.ptron/.pinfo/lists/AdKeys you will see that list is not hashed. His code starts by \w so it will be similar to the first example posted here. Thanks for posting
|
|||
Mar. 15, 2009, 07:13 PM
Post: #14
|
|||
|
|||
RE: Base: Speeding up ad-list
As when we go to a page most of their ads use to come from only 4 or 5 sites, i had the idea of creating a list in memory wich would have the last matched keywords from the adlist. So in theory proxomitron would be faster because it would search before in the last matched keywords instead of the full list. I know proxomitron does a very good hashing of the list, but I'm testing this "caching" concept... comments are welcome as always
Code: $TST( Edit: The filter is the same than i posted at first place, just the working version, not the forum version. I added a test for Mem-Adlist before than the full adlist. When some of the lists matched, do a test to see if the keyword matched is in the Mem-Adlist, if it isn't them add it to the Mem-Adlist. edit: added the $WESC Thanks Sidki |
|||
Mar. 15, 2009, 08:35 PM
Post: #15
|
|||
|
|||
RE: Base: Speeding up ad-list
Very inventive idea! I'm curious about your results.
I don't know if that applies to your case, but for the situations where i'm using this $TST / $ADDLST routine, i had to do a $WESC when adding, as well as an end-of-string test, like: Code: (^$TST((\0\5)=$LST(Mem-ScriptSrc)))$ADDLST(Mem-ScriptSrc,$WESC(\0\5)(^?)) |
|||
« Next Oldest | Next Newest »
|