Post Reply 
sidki's config set: 2005-06-09
May. 15, 2005, 01:14 PM
Post: #211
 
sidki3003 Wrote:
Quote:All the requests [552-556] were identical and generated by clicking on a link to the article one time.
By clicking on what link exactly? I don't see them here.
Ahh, i know what you mean! I was suspicious b/c of "exit_polls", but that was just the name of that article. Smile!

Those requests are for those little HTML docs that appear if you hover over an article link for a few seconds. HTML -> 1s cache -- hover -> new request. Since the server isn't sending any caching headers (nor is accepting any, for that matter), they get re-fetched each time.

sidki
Add Thank You Quote this message in a reply
May. 15, 2005, 01:19 PM
Post: #212
 
I see whats going on with the yahoo news link. There appears to be some javascript running when you hover over the link. I guess thats part of their redesign, but it doesn't seem very efficient.

Here's the link:

Democrats Consider Revamping Primaries
http://news.yahoo.com/s/ap/20050515/ap_on_...rimary_scramble

But I see all of their article links are doing this.

Mike

edit: lol, you beat me to it Smile!
Add Thank You Quote this message in a reply
May. 15, 2005, 01:46 PM
Post: #213
 
I was wondering why those little docs aren't cached, even if i set them to "cache 1 day".
But it's the way the script works:
http://us.i1.yimg.com/news.yahoo.com/v10/u...js?v=1116017632

They are doing an XMLHttpRequest and Firefox 1.0 has a bug to always re-fetch documents for such requests. Sad
It's fixed in 1.1 nightlies from what i've heard.

sidki
Add Thank You Quote this message in a reply
May. 15, 2005, 02:22 PM
Post: #214
 
ah..I see.

I've been looking at a nightly to run for awhile, but I'm holding back till the trunk gets a bit more stable.

Mike
Add Thank You Quote this message in a reply
May. 16, 2005, 04:23 PM
Post: #215
 
At news.yahoo.com I'm getting alot of matches like the following:

Code:
<Match: <a><body>: Block sel. JS Properties     4.01.22 (multi) [sd] (d.1) >
<a onMouseOut="cancelPreview()" onMouseOver="showPreview(event, 'hl_2', '/ap/20050516/ap_on_re_as/koreas_nuclear')" href="/s/ap/20050516/ap_on_re_as/koreas_nuclear">
</Match>
<a onMouseOut="cancelPreview()" onMouseOver="showPreview(event, 'hl_2', '/ap/20050516/ap_on_re_as/koreas_nuclear')" href="/s/ap/20050516/ap_on_re_as/koreas_nuclear">

These are matching even though theres no matching property in the list to block.

Looks like there's a couple of ways to "fix" it, but I'm not even sure its broken. Smile!

Mike
Add Thank You Quote this message in a reply
May. 16, 2005, 07:07 PM
Post: #216
 
That's because the initial quick test "\son[a-z]+=" matches there. What follows is a zero-to-infinite loop, that always matches. zero -> buffer is returned unchanged. If you append a "{1,*}" to that loop or let the second test fail some other way, you'll notice a ~50% slow-down. (Was quoting from an older email, don't laugh JJoe.)

I used to append "always" to that type of filters but was running out of space in the name field. *lol*

sidki
Add Thank You Quote this message in a reply
May. 17, 2005, 09:23 AM
Post: #217
 
It still has a problem with catching the right attribute name in \2

Test Code
Code:
<a onMouseOut="cancelPreview()" onMouseOver="show.referrer.Preview(event, 'hl_2', '/ap/20050516/ap_on_re_as/koreas_nuclear')" href="/s/ap/20050516/ap_on_re_as/koreas_nuclear">

Replacement code
Code:
\@ \2

Mike
Add Thank You Quote this message in a reply
May. 17, 2005, 01:37 PM
Post: #218
 
It doesn't try to. It just catches the first one (which is the right one in most cases). The content of \2 is only used for informational purposes and doesn't affect the replacement string.

To get my point regarding speed a bit clearer:
If a filter needs 0.02ms or 0.03ms to parse a link doesn't matter at all... from the single-link point of view.
Now look at pages with hundreds of links.
And now consider that this config has hundreds of filters, so that these micro-micro delays *will* become perceptible.

Here is that filter in its accurate, very slow incarnation:
Code:
[Patterns]
Name = "<a><body>: Block sel. JS Properties     4.01.22 (multi) [sd] (d.1) TEST"
Active = FALSE
Multi = TRUE
URL = "$TYPE(htm)(^$TST(keyword=*.a_code.*))"
Bounds = "<(a|body)\s*>"
Limit = 512
Match = "("
"(*\s)\#"
""
"("
"((on[a-z]+)\2=$AV(*)|href=$AV( (javascript)\2:*))"
"&&"
"("
"\#(.$LST(JSProperties))\3([^a-z.]|.[a-z])\#"
"($TST(volat=*.log:2.*)$ADDLST(Log-Main,[$DTM(d T)]\tWEB JS_Prop_\2 \t\3 \t\u)|)"
")+{1,*}\#"
")"
""
")+{1,*}\#"
Replace = "\@"
I didn't play with it a lot, maybe you can get it faster. It would be okay if it is slower than the old one for true matches, but not for failing (or buffer dumping).


I've added "dmp" to those filters with always-match routines. And added this to Abbreviations.txt:
Quote:dmp: This filter may match even though it doesn't change anything,
either to prevent slow-downs caused by late failing, or to
protect certain code from being matched by other filters.

Affected filters:
<*>: Tag Manager
Protect Textareas II - Apply
JS CSS Protect: Comments II - Apply
JS CSS Protect: Comments III - Other Types
<a><body>: Block sel. JS Properties

There were two or three others like the last one, but i don't remember which. If you come across them, please drop a note.

sidki
Add Thank You Quote this message in a reply
May. 17, 2005, 02:15 PM
Post: #219
 
I played around with it some, but couldn't get one that failed as fast. I understand your speed concern, as for most links, this filter "fails" (but quickly Smile! ).

Just trying to help,
Mike
Add Thank You Quote this message in a reply
May. 17, 2005, 02:25 PM
Post: #220
 
I know. Smile! I've edited above test filter, got at least rid of the global var. Maybe a good quick test before the longish one would do.

sidki
Add Thank You Quote this message in a reply
May. 17, 2005, 05:28 PM
Post: #221
 
Got it! Big Teeth (At least i hope so.)

I's a rather wild construct, but hey, it works: On true hit look back and grab attribute. Note that \2 and \3 only return the right value on the spot where the log line is.
Code:
[Patterns]
Name = "<a><body>: Block sel. JS Properties     5.05.17 (dmp multi) [sd] (d.1) WIP7"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)(^$TST(keyword=*.a_code.*))"
Bounds = "<(a|body)\s*>"
Limit = 512
Match = "(*\s((on[a-z]+)\2=|href="+ (javascript)\2:))\#"
"("
"\#(.$LST(JSProperties))\3([^a-z.]|.[a-z])\#"
"&&"
"(*\s((on[a-z]+)\2=|href="+ (javascript)\2:))+"
"($TST(volat=*.log:2.*)$ADDLST(Log-Main,[$DTM(d T)]\tWEB JS_Prop_\2 \t\3 \t\u)|)*"
")+\#"
"&(^$TST(script=*)|$TST(comment=1))"
Replace = "\@"

Gotta run,
sidki
Add Thank You Quote this message in a reply
May. 18, 2005, 10:15 AM
Post: #222
 
One feature your original filter had that I overlooked, is it can fix multiple attributes on a single match.

Code:
<a onMouseOut="cancel.referrer.Preview()" onMouseOver="show.referrer.Preview(event, 'hl_2', '/ap/20050516/ap_on_re_as/koreas_nuclear')" href="/s/ap/20050516/ap_on_re_as/koreas_nuclear">

Given this, capturing the attribute name is not important as you pointed out. Also, capturing the original property name doesn't seem like it matters either, since it is actually part of the replacement text. Perhaps all you need to log is the fact that the filter matched.

It seems that I sent you on a wild goose chase, as there is nothing "wrong" with the original filter. I just didn't fully understand it. Sorry about that.

Mike
Add Thank You Quote this message in a reply
May. 18, 2005, 11:18 AM
Post: #223
 
You didn't - all fine. Smile!

I did test above filter with both, filter-worthy properties in multiple attributes, and multiple filter-worthy properties in one attribute. Your test-string works for me, sure you've picked the right version (WIP7)?

You can't test for the right \2 and \3 in the replacement match, because they are reassigned on the fly while the loop continues. The log line- which grabs them right after they got their new values - should work correctly, no?

sidki
Add Thank You Quote this message in a reply
May. 18, 2005, 12:36 PM
Post: #224
 
I just tried the new filter again, no problems, it works good.

I guess I need more coffee. Smile!

Mike
Add Thank You Quote this message in a reply
May. 18, 2005, 12:50 PM
Post: #225
 
Phew - glad to hear that! Was quite an effort yesterday to get that darn goose into the pot. But it's always fun as well. *lol*

sidki
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: