Post Reply 
Html zones
Sep. 28, 2008, 11:23 PM (This post was last modified: Sep. 28, 2008 11:30 PM by lnminente.)
Post: #1
Html zones
I noticed that using only $TYPE(htm) i had proxomitron very slow, the reason was proxomitron was searching in other files with a extension different to htm, maybe php or javascript files without .js extension.

So looking for a speed improvement i decided to divide my filters in various groups depending if they match before <html>, between <html> and </html> or after </html>. This is a work in progress, so i post them here for suggests. Here are the filters to define the zones.

Code:
[Patterns]
Name = "############## DEFINE PARTS OF HTML ###########################################"
Active = FALSE
URL = "^?"
Limit = 256
Match = "<never>"
Replace = "Be careful here. When you comment a line, don't use extrange caracters like - inside."
          "The <start> filters break some webs, so we will use </head>"

Name = "/¯¯¯¯¯Begin HTML file¯¯¯¯¯ [1:V_PreHtml] {ln}080929"
Active = TRUE
URL = "$TYPE(htm)"
Limit = 50
Match = "<start>"
Replace = "$SET(V_PreHtml=1)$SET(V_Html=)"
          "$STOP()"

Name = "|   /¯¯¯<html>¯¯¯¯¯SaveFrom[1:V_Html; ?&0:V_PreHtml] {ln}081010"
Active = TRUE
URL = "$TYPE(htm)"
Bounds = "$NEST(<,>)"
Limit = 300
Match = "(<(!DOCTYPE |)html*>)\1"
        "$TST(V_PreHtml=1)"
Replace = "$SET(V_PreHtml=)$SET(V_Html=1)"
          "<! Saved the $DTM(d) from \u > \r\n"
          "\1$STOP()"

Name = "|      /¯¯¯head [1:V_Head] {ln}081209"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)"
Bounds = "$NEST(<,>)"
Limit = 150
Match = "((<head*>)\1|($NEST(<link,*rel="stylesheet"*,/>))\2|(</head>)\2)"
        "(^$TST(Inside_Script=1))"
        "($TST(V_Html=1)|$SET(V_PreHtml=1))"
        "$SET(V_Head=1)"
Replace = "\1"
          "<!-- PROXOMITRON_STARTFILTERS_WILL_BE_PLACED_HERE -->"
          "$STOP()"

Name = "|      /¯¯¯<body> [0:V_PreHtml,V_Head; 1:V_Html,V_Postbody] {ln}081216"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)"
Bounds = "$NEST(<body,>)"
Limit = 1000
Match = "(<body*>)\1"
        "(^$TST(Inside_Script=1))"
        "($TST(V_PreHtml=1)|$TST(V_Html=1))"
        "$SET(V_Head=)"
Replace = "$SET(V_PreHtml=)$SET(V_Html=1)$SET(V_PostBody=1)"
          "\1<!-- PROXOMITRON_BODY -->\n"
          "$STOP(Comments=V_PreHtml instead of V_Html because <html> is optional)"

Name = "|   \___</html>_____ [1:V_PostHtml; 0:V_Html] {ln}080930b"
Active = TRUE
Multi = TRUE
Limit = 100
Match = "(</html>)\1"
        "(^$TST(Inside_Script=1))"
        "$TST(V_Html=1)"
Replace = "$SET(V_Html=)$SET(V_PostHtml=1)"
          "\1\n"
          "<!-- PROXOMITRON_ENDFILTERS_WILL_BE_PLACED_HERE -->\n"
          "$STOP()"

Name = "\_____End HTML file_______________ {ln}081004"
Active = TRUE
URL = "$TYPE(htm)"
Limit = 50
Match = "<end>"
Replace = "$SET(V_PostHtml=)$SET(V_Html=)$SET(V_PostBody=)$SET(Comments=)"
          "$STOP()"

Name = "."
Active = FALSE
URL = "^?"
Limit = 256
Match = "<never>"

Name = "> --- < START filters {ln}081216"
Active = TRUE
URL = "$TYPE(htm)"
Bounds = "$NEST(<!--,-->)"
Limit = 60
Match = "<!-- PROXOMITRON_STARTFILTERS_WILL_BE_PLACED_HERE -->"
Replace = "\1"
          "<!-- ____________________________________ -->\r\n"
          ""
          "<!-- Inserting Style Sheet -->\r\n"
          "<link rel="stylesheet" type="text/css" href="http://Local.ptron/Proxomitron.css" />\r\n"
          ""
          "<!-- Inserting javascript -->\r\n"
          "<script type="text/javascript" src="http://local.ptron/base_start.js"></script>\r\n"
          ""
          "<!-- ____________________________________ -->\r\n"
          "\2"
          "$STOP(Using extrange simbols here could break some sites, so be careful. Not use multi if possible)"

Name = "> --- < END filters {ln}081022"
Active = TRUE
URL = "$TYPE(htm)"
Bounds = "$NEST(<!--,-->)"
Limit = 55
Match = "<!-- PROXOMITRON_ENDFILTERS_WILL_BE_PLACED_HERE -->"
Replace = "<!-- _____________ Con: $DTM(c) _____________ -->\r\n"
          "<!-- Inserting javascript -->\r\n"
          "<script type="text/javascript" src="http://local.ptron/base_end.js"></script>\r\n"
          "<!-- _____________ _____________ -->\r\n"
          "$STOP()"

Name = "Defining [Inside_Script/Link] {ln}081124 (Read!)"
Active = TRUE
URL = "$TYPE(htm)"
Limit = 256
Match = "< ("
        "script(\s|>)$SET(Inside_Script=1)|"
        "/script >$SET(Inside_Script=)|"
        "a\s$SET(Inside_Link=1)|"
        "/a >$SET(Inside_Link=)"
        ")PrxFail$TST()"
Replace = "Place this filter always at the end of the list."

What each filter does is now autoexplained in its title. The format is 0: to clean a variable and 1: to enable it

_____________________________________________

After that, i made 2 big groups of filters:
-filters that match before <html> or after </html>
The filters added to this group needs to have added $TST(V_PreHtml=1) or $TST(V_PostHtml=1)

-filters that match between <html> and </html>
Adding $TST(V_Html=1)

To not make your filters slower, it's better if the filter fails before the $TST. So dont put the $TST() at the beginning, put it later and not after an asterisc. The reason is because the $TST() is slow.
If your filter is not very complex you can put the test just at the end. But if the filter is more complex and searchs in lists maybe would be better to put the $TST before the $LST.
You can read about it in the help:
http://www.proxomitron.info/45/help/Matc...s.html#TST

I'm testing it for a while and it gives me very good results. If you use them, remember to remove the test for the zone when you go to the test window, or it will never match there :/
I don't know if maybe sidky already uses that in his filters. If yes please tell me.

Hope you like them ;)
For testing purposes i post here my filters for offsite scripts and noscripts placed in pre-html and post-html zones.

Code:
[Patterns]
Name = "############## (PRE|POST)HTML FILTERS (read)###################################"
Active = FALSE
URL = "^?"
Limit = 256
Match = "The filters of this section needs to test for true in at least one of the next global variables:"
        "V_PreHtml"
        "V_PostHtml "

Name = "<Pre|Post Script1> Prefer Noscripts than scripts <multi>081015"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)"
Bounds = "$NEST(<script,</script>) <noscript*>"
Limit = 9000
Match = "<script"
        "(($TST(V_PreHtml=1)$SET(5=Pre-))|($TST(V_PostHtml=1)$SET(5=Post-)))"
        "*<noscript*>"
        "$SET(DELETE_NEXT_END=noscript)"
Replace = "<center><span class=prox style=display:none; title="\5Script changed by his NoScript">[$]</span></center>"

Name = "<Pre|Post Script2 offsite> Kill {ln}081009"
Active = TRUE
URL = "$TYPE(htm)"
Bounds = "$NEST(<script,</script>)"
Limit = 4000
Match = "("
        "($NEST(<(script)\3,*src=$AVQ("(*.js)\2")*,</script>)$SET(4=<a href=\2>[\5.js]</a>)"
        "$TST(\2=($AV((http|ftp)(s|)://(^\h)*))))"
        ")"
        "(($TST(V_PreHtml=1)$SET(5=pre-))|($TST(V_PostHtml=1)$SET(5=post-)))"
Replace = "<span class=prox2 style=display:none; title="\5html \3 killed \2">\4</span>"

You can test in geocities, for example in the web of sidki
http://www.geocities.com/sidki3003/prox-news.html

Note: I need to improve the test for offsite cause it fails in other sites...
Add Thank You Quote this message in a reply
Sep. 29, 2008, 11:19 PM
Post: #2
RE: Html zones
very interesting...

love the "zone" approach, sounds very promising...
Add Thank You Quote this message in a reply
Sep. 30, 2008, 09:53 AM
Post: #3
RE: Html zones
In the last minute of publishing these filters, i introduced a bug adding bounds to the third filter wich made the preventing of not match </html> inside scripts not working. Now cleaned up the code and corrected the bug.

To ProxRocks: Many thanks, it really makes my config faster, specially for filters wich uses big lists looking for ads Wink
Add Thank You Quote this message in a reply
Sep. 30, 2008, 10:24 AM
Post: #4
RE: Html zones
kind of curious, any interest in "going public"?
Add Thank You Quote this message in a reply
Sep. 30, 2008, 11:20 AM
Post: #5
RE: Html zones
I'm doing my own config set from zero from two months ago, so maybe is too soon. It goes very well for me but it still needs more work. And yes, i would like to share my config set to improve it, but there are a few complex things i want to make better before:

More speed in lists (i have a few tips to make them faster)
Better matching code for offsite.
Add toggle and investigate with it.
Investigate more with global variables, i didn't know about then in proxomitron since 10 days ago :/

The goals i'm searching is to do it simple but effective and easy to understand and maintain.
To do that I try to not modify scripts more than needed, and by the moment i use NoScript to remove or allow temporarily third party scripts.

Anyway, being public would need a lot of work and being sincerous the summer ended and now i will improve my config slowly in my few free time :/
Add Thank You Quote this message in a reply
Sep. 30, 2008, 01:03 PM
Post: #6
RE: Html zones
cool!
Add Thank You Quote this message in a reply
Sep. 30, 2008, 03:04 PM
Post: #7
RE: Html zones
Edited the first post, speaking about implementing the $TST in your own filters.

To proxrocks: Many many thanks for your interest, but i must say again my config set is SIMPLE, with capital letters and underlined Wink
Add Thank You Quote this message in a reply
Sep. 30, 2008, 04:22 PM
Post: #8
RE: Html zones
(Sep. 30, 2008 03:04 PM)lnminente Wrote:  To proxrocks: Many many thanks for your interest, but i must say again my config set is SIMPLE, with capital letters and underlined Wink

that would make it HIGHLY ADVANTAGEOUS...
many "fly-by's" take one look at Proxo and find it too "complex" and too steap of a learning curve, then off to the Recycle Bin it goes...
Add Thank You Quote this message in a reply
Sep. 30, 2008, 07:44 PM
Post: #9
RE: Html zones
That's true! Smile!
Add Thank You Quote this message in a reply
Oct. 04, 2008, 10:12 AM
Post: #10
RE: Html zones
(Sep. 28, 2008 11:23 PM)lnminente Wrote:  I noticed that using only $TYPE(htm) i had proxomitron very slow, the reason was proxomitron was searching in other files with a extension different to htm, maybe php or javascript files without .js extension.
I am a little confused.

Do you mean you want to exclude some pages that are included by $TYPE(htm)? Could you please give some examples?
Add Thank You Quote this message in a reply
Oct. 04, 2008, 01:47 PM (This post was last modified: Oct. 04, 2008 02:05 PM by lnminente.)
Post: #11
RE: Html zones
Yes, ajax applications, social networks like facebook wich uses many .js and .php files. Depending the filters you use and the height of your lists, you can notice or not a big slow down.

This gives you more speed for you complex filters, and more safety to the filters you know can only match in a "zone", avoiding breaking pages.

One example for zones here: http://es.yahoo.com/
You can see there is a lot of code before <body>, and we would like to check all the web directions finded after src or target
would be like this:
(src|target)=$AV(*($LST(ad-sites))\9*)
This easy filter could be really slow if ad-sites is a big list. But if we don't want to touch text inside script, and we are pretty sure it must match after <body> tag, better than doing another filter to preserve scripts, or a more complex filter we could use this instead:
(src|target)=(^$TST(Inside_Script=1))$TST(V_PostBody=1)
$AV(*($LST(ad-sites))\9*)

Being an easy and safer filter, its is now faster. I analyzed the use of the cpu with the cpu history of process explorer from sysinternals.
Updated the first post with new filters doing this:
-Defining a new zone with the variable V_Postbody from <body> to the end
-Defining variables to test if you are inside a link or inside a script. I introduced this to give more safety and less complexity to the rest of my filters.
Add Thank You Quote this message in a reply
Oct. 05, 2008, 02:43 AM
Post: #12
RE: Html zones
Got it! Sidki's config set is using such techniques too.
Add Thank You Quote this message in a reply
Oct. 06, 2008, 12:59 AM
Post: #13
RE: Html zones
Yes?!! Wow, really i must say it's good to know.
But please, can you post here the similaritys you have found. In that way i could publish my filters compatible with sidki's ones. Thanks in advance.
Add Thank You Quote this message in a reply
Oct. 06, 2008, 01:40 AM
Post: #14
RE: Html zones
Please check sidki-etc\Global_Vars.txt for details.
Add Thank You Quote this message in a reply
Oct. 06, 2008, 08:41 PM
Post: #15
RE: Html zones
Mmm!! That's really interesting!
Many thanks Whenever, i saw he uses three variables to do something similiar: mHtml, mHead and mBody. Sidky is indredible Hail A big Hello if you read me, my old friend Wink

I don't know how he uses these vars but I saw a filter wich will help me a lot "Bottom Add: Display Variables 7.08.28 (!nn) [sd] (d.1 l.5)"

Two vars calle my attention: uHost and uDom, i will investigate for my own blocking lists.

Thanks again Whenever. Some time ago i didn't know about global vars and i took a look to the sidki's filters and i understood nothing, so after trying and breaking some of my webs i left it out, and because of that i never saw this txt :/
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: