Feb. 19, 2009, 11:57 PM
A short explaining how lists works:
A list file is very similar to writing (word1|word2|word3|...), but use to be more and more large. This post will speak about how i did to create an adlist for proxomitron starting from the famous easy list from adblock plus.
Taking a look to its keywords, most of them start by "http://","/" or ".". Let's try to use this list (of course after some adaptations to proxomitron):
Having the URL to parse in the variable \1, and being http://www.host.com/sub1/sub2.adbureau.example if we use $TST(\1=$LST(adlist)*), then it would parse only one time and would match if we find a keyword wich match with the beginning of our url
One possible code to use would be $TST(\1=*$LST(adlist)*) but it would be really really slow. It would look all the words in the list for:
So after some days of research, a pencil, a paper and using the log window, i got to something useable.
Copy this filter to the clipboard and import it, go to the test window and test with this code in it:
href=http://www.host.com/sub1/sub2.adbureau.ext
In gray you will see which parts of the url will be parsed by the adlist.
Feel free to post suggestions or comment anything.
A list file is very similar to writing (word1|word2|word3|...), but use to be more and more large. This post will speak about how i did to create an adlist for proxomitron starting from the famous easy list from adblock plus.
Taking a look to its keywords, most of them start by "http://","/" or ".". Let's try to use this list (of course after some adaptations to proxomitron):
Having the URL to parse in the variable \1, and being http://www.host.com/sub1/sub2.adbureau.example if we use $TST(\1=$LST(adlist)*), then it would parse only one time and would match if we find a keyword wich match with the beginning of our url
One possible code to use would be $TST(\1=*$LST(adlist)*) but it would be really really slow. It would look all the words in the list for:
Code:
http://www.host.com/sub1/sub2.adbureau.example
ttp://www.host.com/sub1/sub2.adbureau.example
tp://www.host.com/sub1/sub2.adbureau.example
p://www.host.com/sub1/sub2.adbureau.example
://www.host.com/sub1/sub2.adbureau.example
//www.host.com/sub1/sub2.adbureau.example
...
ub1/sub2.adbureau.example
...
ample
mple
ple
le
eSo after some days of research, a pencil, a paper and using the log window, i got to something useable.
Copy this filter to the clipboard and import it, go to the test window and test with this code in it:
href=http://www.host.com/sub1/sub2.adbureau.ext
Code:
[Patterns]
Name = "<example> Parsing Adlist Release Candidate {ln}090220"
Active = FALSE
Limit = 256
Match = "href=$AV(\1)"
""
"$LOG(!C\1)$TST(\1=("
"(\w)\3|"
"*((^(http|ftp)://|//)(http|.|/)\w)\3"
")$LOG(W\3)$TST(\3="
"(.|/|)(\w)\9$LOG(w\9)prxfail"
"))"In gray you will see which parts of the url will be parsed by the adlist.
Feel free to post suggestions or comment anything.


