The Un-Official Proxomitron Forum
“Remove: Ad Scripts - Noscript” behaving - Printable Version

+- The Un-Official Proxomitron Forum (https://www.prxbx.com/forums)
+-- Forum: Proxomitron Config Sets (/forumdisplay.php?fid=43)
+--- Forum: Sidki (/forumdisplay.php?fid=44)
+---- Forum: Bug Reports (/forumdisplay.php?fid=47)
+---- Thread: “Remove: Ad Scripts - Noscript” behaving (/showthread.php?tid=1689)

Pages: 1 2


“Remove: Ad Scripts - Noscript” behaving - whenever - Oct. 27, 2010 04:09 PM

The <script> Remove: Ad Scripts - Noscript 10.10.16 (multi) [sd] (d.2 l.3) version.

Test page: http://www.mobileread.com/forums/showthread.php?t=49481

Below code was matched:

Code:
<script type="text/javascript">
<!--
    // Main vBulletin Javascript Initialization
    vBulletin_init();
//-->
</script>

<script type="text/javascript" src="http://edge.quantserve.com/quant.js"></script>
<script type="text/javascript">_qacct="p-13lWtv_C89CCg";quantserve();</script>
<noscript>
<a href="http://www.quantcast.com/p-13lWtv_C89CCg" target="_blank"><img src="http://pixel.quantserve.com/pixel/p-13lWtv_C89CCg.gif" style="display: none;" border="0" height="1" width="1" alt="Quantcast"/></a>
</noscript>

The triggered keyword "quantserve" is indeed inside the script block above the noscript block:

Code:
<!-- PROX: Script removed - Noscript Ad: AdD quantserve -->

Is this a designed behaving of the filter? Using a keyword inside a script block to remove a noscript block, and all script blocks above it?

vBulletin_init(); is needed for the "Thread Tools" drop down menu to work so I have to add below entry but it would leave the noscript code unblocked:

Code:
www.mobileread.com/forums/            $SET(0=i_noscr:2.)



RE: “Remove: Ad Scripts - Noscript” behaving - JJoe - Oct. 27, 2010 06:21 PM

(Oct. 27, 2010 04:09 PM)whenever Wrote:  Is this a designed behaving of the filter? Using a keyword inside a script block to remove a noscript block, and all script blocks above it?

No, I don't think so. The filter uses a match in the noscript block to remove the noscript block and the chain of script blocks that precede it.

(Oct. 27, 2010 04:09 PM)whenever Wrote:  vBulletin_init(); is needed for the "Thread Tools" drop down menu to work so I have to add below entry but it would leave the noscript code unblocked:

Code:
www.mobileread.com/forums/            $SET(0=i_noscr:2.)

Same problem in http://prxbx.com/forums/showthread.php?tid=1688#pid14978 , iirc.
Exception entry would leave noscript web bug.

I've been pondering the possibilities. Ideas?

As proof, following filter removes list match for "<script> Remove: Ad Scripts - Noscript" by only changing quant to quan in the noscript block.

Code:
[Patterns]
Name = "<script> Remove: Ad Scripts - Noscript  show behaviour"
Active = TRUE
Multi = TRUE
Limit = 20000
Match = "((<Script*</script> )+)\1<noscript*/noscript>$STOP()"
Replace = "\1<noscript>"
          "<a href="http://www.quancast.com/p-13lWtv_C89CCg" target="_blank"><img src="http://pixel.quanserve.com/pixel/p-13lWtv_C89CCg.gif" style="display: none;" border="0" height="1" width="1" alt="Quancast"/></a>"
          "</noscript>"



RE: “Remove: Ad Scripts - Noscript” behaving - JJoe - Oct. 28, 2010 01:32 AM

(Oct. 27, 2010 06:21 PM)JJoe Wrote:  I've been pondering the possibilities.

The noscript block shouldn't be a problem when the browser has javascript enabled. The code preceding the noscript is javascript. The noscript should not execute.

People with javascript disabled in the browser would not need the exception.

Code:
<script type="text/javascript" src="data:text/javascript,void%200%3Bfunction%20quantserve%28%29%7B%7D" charset="http://edge.quantserve.com/quant.js">
</script>
<script type="text/javascript">_qacct="p-13lWtv_C89CCg";quantserve();</script>
<noscript>
<a href="http://www.quantcast.com/p-13lWtv_C89CCg" target="_blank"><img src="http://pixel.quantserve.com/pixel/p-13lWtv_C89CCg.gif" style="display: none;" border="0" height="1" width="1" alt="Quantcast"/>
</a>
</noscript>



RE: “Remove: Ad Scripts - Noscript” behaving - whenever - Oct. 28, 2010 10:04 AM

(Oct. 28, 2010 01:32 AM)JJoe Wrote:  The noscript block shouldn't be a problem when the browser has javascript enabled. The code preceding the noscript is javascript. The noscript should not execute.

People with javascript disabled in the browser would not need the exception.

The problem is Proxomitron doesn't know if the javascript is disabled or enabled in the browser.

(Oct. 27, 2010 06:21 PM)JJoe Wrote:  The filter uses a match in the noscript block to remove the noscript block and the chain of script blocks that precede it.

Why not just remove the noscript block?

I saw the quant script block preceding the noscript block was caught by "Block: Scripts by URL" filter when I added the noscript exception.

It seems better for noscript filter to remove ad noscript blocks only and script filter to block ad script blocks only. Let them do their job separately.

Maybe sidki has some other considerations but .... Is he really leaving? Sad


RE: “Remove: Ad Scripts - Noscript” behaving - JJoe - Oct. 28, 2010 04:00 PM

Is it fair to consider this problem to be user error? Many features of this set require javascript.

(Oct. 28, 2010 10:04 AM)whenever Wrote:  Why not just remove the noscript block?

I saw the quant script block preceding the noscript block was caught by "Block: Scripts by URL" filter when I added the noscript exception.

It seems better for noscript filter to remove ad noscript blocks only and script filter to block ad script blocks only. Let them do their job separately.

I think unwanted noscript blocks are easier to detect than unwanted script blocks and usually preceded by at least one unwanted script. I have had something like this in my sets for a long time.

In no particular order or exhaustive:
A filter to remove unwanted noscript blocks could be added.
A change in the web bug filter could 'fix' some of these noscript blocks.
A filter to remove one script and noscript pair could be added. This filter must be activated by an exception that also deactivates "Remove: Ad Scripts - Noscript".

However, the web bug and script-noscript filters are less aggressive than they could be. I'm assuming that there are problems otherwise. I'm still examining the set and files.

(Oct. 28, 2010 10:04 AM)whenever Wrote:  Maybe sidki has some other considerations but .... Is he really leaving?

Sidki has posted about this more than once. I don't know much more.


RE: “Remove: Ad Scripts - Noscript” behaving - whenever - Oct. 29, 2010 07:58 AM

Let's suppose most users have javascript enabled in their browsers so left noscript blocks shouldn't be a problem.

Unexpected match means users have to add exception entries. I counted 17 "i_noscr:2" entries in out-of-box Exceptions.ptxt. Does this mean the current policy is too aggressive or not?


RE: “Remove: Ad Scripts - Noscript” behaving - JJoe - Oct. 29, 2010 07:11 PM

(Oct. 29, 2010 07:58 AM)whenever Wrote:  Unexpected match means users have to add exception entries. I counted 17 "i_noscr:2" entries in out-of-box Exceptions.ptxt. Does this mean the current policy is too aggressive or not?

Unfortunately, that depends on the user. Wink

Less aggressive brings ads and/or unwanted behaviour. More aggressive may break more pages. I'm sure it can never be perfect.

Light or minimal modes are quickly available from the menu and the Proxomitron can add a permanent exception for the user.

Sidki has looked at a lot of sites. 17 seems not too bad. We have found 2 more.

There needs to be a base to start from.
For now, this seems ok to me but I think I see fewer sites than most.

Any ideas to make things better?


RE: “Remove: Ad Scripts - Noscript” behaving - whenever - Oct. 30, 2010 07:32 AM

(Oct. 29, 2010 07:11 PM)JJoe Wrote:  Less aggressive brings ads and/or unwanted behaviour. More aggressive may break more pages. I'm sure it can never be perfect.

If 90% of the browser has javascript enabled, less aggressive means we might miss some ad noscript blocks, which in fact won't be executed by the browser, so it won't bring ads; more aggressive means we have more opportunity to kill good scripts and to break pages.

(Oct. 29, 2010 07:11 PM)JJoe Wrote:  Any ideas to make things better?

I don't know. My initial thought was:

(Oct. 28, 2010 10:04 AM)whenever Wrote:  It seems better for noscript filter to remove ad noscript blocks only and script filter to block ad script blocks only. Let them do their job separately.

but if 90% of the browser has javascript enabled, AND if web bug is the most common annoyance of noscript blocks, I prefer a less aggressive policy because it won't break page and won't bring ads. Maybe we don't need a dedicated filter for noscript block, and can just change the web bug filter to handle the noscript block, as you had suggested.


RE: “Remove: Ad Scripts - Noscript” behaving - JJoe - Oct. 30, 2010 11:30 AM

This filter does catch unwanted script blocks that the other filters do not. So, less aggressive will bring ads and/or unwanted behaviour.

Not allowing the filter to match empty lines would fix the two examples that we have found. I suppose the matching of empty lines could be controlled by a switch.


RE: “Remove: Ad Scripts - Noscript” behaving - whenever - Oct. 30, 2010 02:34 PM

Another idea is to limit the filter to match max. 2 script blocks that are preceding to noscript blocks, since tracking code usually has script and noscript blocks appear in pair and close to each other. This policy seems more reasonable and is not that aggressive like sidki's.

Changing below code:

Code:
([ \t\r]+(\n[ \t\r]+)+{0,2}$NEST(<script,>)( $NEST(<script,>))+$INEST(<script,</script)</script*>)+

to:

Code:
([ \t\r]+(\n[ \t\r]+)+{0,2}$NEST(<script,>)( $NEST(<script,>))+$INEST(<script,</script)</script*>)+{0,1}

fix the two examples we found. I don't have time to check the Exceptions.ptxt "i_noscr:2" entries yet.


RE: “Remove: Ad Scripts - Noscript” behaving - JJoe - Oct. 31, 2010 04:34 AM

I know that sidki's code is based on existing unwanted script-noscript blocks, so I'm more than a little hesitant to give it up.

How about:
Look for code that matches the existing filter and log matches.
If user preference is "get all ads" then filter matches.
If user preference is "don't break pages" then additional conditions must be met. If conditions are met then filter matches and match is logged. If conditions are not met then filter fails, failure is logged, and the routine begins again with the next script tag and the knowledge that the noscript block is matched.


RE: “Remove: Ad Scripts - Noscript” behaving - whenever - Oct. 31, 2010 10:13 AM

(Oct. 31, 2010 04:34 AM)JJoe Wrote:  If user preference is "get all ads" then filter matches.

Doesn't the filter remove all script and noscript blocks if it matches? Then what does the "get all ads" mean?

Do you mean you want to add a default preference then user can override it with an exception entry?

(Oct. 31, 2010 04:34 AM)JJoe Wrote:  the routine begins again with the next script tag and the knowledge that the noscript block is matched.

This sounds good.

What's the additional conditions?


RE: “Remove: Ad Scripts - Noscript” behaving - JJoe - Oct. 31, 2010 03:30 PM

(Oct. 31, 2010 10:13 AM)whenever Wrote:  Do you mean you want to add a default preference then user can override it with an exception entry?

"Want", I mean it could be added. Wink
But, yes. I'm thinking about allowing the user to change the filter's behaviour with a exception entry, another filter, or both.

"get all ads" and "don't break pages" were intended to be simple descriptions of the choices and users. Ads will still be missed and pages still broken regardless of the choice.

What should the default be?

(Oct. 31, 2010 10:13 AM)whenever Wrote:  What's the additional conditions?

Whatever follows the "&" that would be added to the filter's match.
Current possibilities include "no empty lines" and "max. 2 script blocks".
Which is better or use both, or is there something better?

Adding this would require updating several documents.


RE: “Remove: Ad Scripts - Noscript” behaving - JJoe - Nov. 05, 2010 12:47 AM

"<script> Remove: Ad Scripts - Noscript Test 10.11.03" isn't intended to be a replacement.
Consider it to be a diagnostic aid for now. I hope it works as I expect.

Default behaviour:
1. Matches the same code as sidki's filter but may not remove same.
2. The script or two scripts that are matched by (<script*</script > )+{1,2} and are closest to the noscript block may be removed.
Remaining scripts are returned to the buffer for other filters to evaluate.
When there are more scripts than those matched by (<script*</script > )+{1,2} in the script-noscript block, the
event is logged to Log-Rare as "WEB JS_Ad_HTM noscript2+".
3. All matched scripts may be removed when the script or scripts closest to the noscript block are not matched by (<script*</script > )+{1,2}.
Event is logged to Log-Rare as "WeirdScript".

Optional behaviour:
Setting a variable named "ScriptNoscriptFull" to "Full" will cause the filter to remove all scripts possible.


Code:
[Patterns]
Name = "<script> Remove: Ad Scripts - Noscript Set Full flag"
Active = FALSE
URL = "$TST(hCT=*html)$TST(flag=*.adurl:1.*)(^$TST(keyword=*.(a_js|i_noscr:[12]|i_level:[12]).*))"
Limit = 2
Match = "$SET(ScriptNoscriptFull=Full)nevermatch(^)"

Name = "<script> Remove: Ad Scripts - Noscript Test 10.11.03 (multi) [sd] (d.2 l.3)"
Active = TRUE
Multi = TRUE
URL = "$TST(hCT=*html)$TST(flag=*.adurl:1.*)(^$TST(keyword=*.(a_js|i_noscr:[12]|i_level:[12]).*))"
Limit = 20000
Match = "$SET(SBlock=)"
        "($NEST(<script(^$TST(tNoscript=1+void)|$TST(comment=2))$TST(script=1*),>)( $NEST(<script,>))+"
        "$INEST(<script,</script)</script(*>)+{1}"
        "([ \t\r]+(\n[ \t\r]+)+{0,2}$NEST(<script,>)( $NEST(<script,>))+$INEST(<script,</script)</script*>)+"
        "&&"
        "(\1(<script*</script > )+{1,2}(^?)$SET(SBlock=\1))+"
        "|?+$ADDLST(Log-Rare,WeirdScript \t\u)"
        ")"
        " ($NEST(<noembed,</noembed >) )+($NEST(<!--,-- >) )+"
        "("
        "<noscript(*>)+{1}$SET(tNoscript=$GET(tNoscript)void)"
        " (^<(/noscript|html|body|frameset?+{1024}))(\4)</noscript >"
        ")\5"
        "$SET(scriptt=$GET(script))"
        "&$TST(\4="
        "(^$TST(keyword=*.(a_track|i_noscr:3).*))"
        "(<(div|p(^[a-z]))\8(*>)+{1} |($NEST(<!-(-)\8,-- >) )+)"
        "(<img(*>&&("
        "(*width(=\\+"+| :) ([#*:4])\6*&&*height(=\\+"+| :) ([#*:4])\7$SET(9=Webbug \8 \6x\7)*)"
        "|(^*(width [=:]|src=$AV(*.jpe+g))|(*src=$AV(*.gif)&*(alt=$AV(?*)|usemap=)))$SET(9=Webbug \8 nodim)*"
        ")) )+{1,*}(^?+{100})*"
        "|<iframe($TST(flag=*.iframe_b:\2.*)|)("
        "$TST(\2=[12])|(^$TST(\2=0))[^>]++src=$AV("*|[^/.]+//(^"
        "api.recaptcha.|www.google.com/recaptcha/|"
        "([^/]++.|)$TST(uDom)(^.))*|*.swf*))$SET(9=iFrame)*"
        "|(*\s(src|href|action|data)\3=)++{1,2}$AV( $LST(AdList)*)*"
        "&"
        "($TST(script=1*1*)|*<(frameset|iframe)\6$SET(script=void))"
        "$SET(2=<script type="text/javascript">/*\r\n\tPROX: Empty script"
        " left in place to keep the noscript \6.\r\n*/</script>\r\n\5)"
        "|($TST(tNoscript=(1*)\2void)$SET(tNoscript=\2)|$SET(tNoscript=))$SET(script=)$SET(2=)"
        ")"
        "&"
        "$TST(ScriptNoscriptFull=Full)$SET(SBlock=)$SET(scriptt=)"
        "|$TST(SBlock=?*)$SET(script=$GET(scriptt))$SET(scriptt=)"
        "|$SET(SBlock=)$SET(scriptt=)"
        "&"
        "$SET(eAdJS=$GET(eAdJS)"
        "%3Cspan class=%22Pr0xFly-Span%22%3E$GET(mHead) Noscript:%3C/span%3E"
        "  $ESC(\9)%3Cbr class=%22Pr0xFly-Br%22 /%3E"
        ")"
        "$SET(1=$TST(keyword=(^$TST(tFrameset=*))*.i_level:5.*)"
        "<span class="Pr0x Pr0xAdScript" style="display:$GET(displayD)">"
        "&#8226;&#160;JS Ad Noscript: \9</span>"
        ")"
        "($TST(SBlock=?*)$ADDLST(Log-Rare,WEB JS_Ad_HTM noscript2+ \3 \t\9 \t\u)|)"
        "($TST(volat=*.log:2*)(^$TST(SBlock=?*))$ADDLST(Log-Main,[$DTM(d T)]\tWEB JS_Ad_HTM noscript \3 \t\9 \t\u)|)"
Replace = "$GET(SBlock)$SET(SBlock=)\r\n"
          "\1<!-- PROX: Script removed - Noscript Ad: \9 -->\r\n\2"



RE: “Remove: Ad Scripts - Noscript” behaving - whenever - Nov. 06, 2010 03:40 PM

Well done! I added it to my copy of sidki config set.

(Nov. 05, 2010 12:47 AM)JJoe Wrote:  event is logged to Log-Rare as "WEB JS_Ad_HTM noscript2+".
Event is logged to Log-Rare as "WeirdScript".

Do you need those log entries for analysis in case I got some hits?