Author Topic: Banner Killers: Another Two  (Read 22657 times)

lnminente

  • Jr. Member
  • **
  • Posts: 73
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • Email
Banner Killers: Another Two
« Reply #45 on: August 23, 2002, 03:25:23 PM »
Other tip for speed:

Watching AdDimensions.txt i see that the small width is 41 (i'm not sure if 41).

Would be faster if test that the width is 41 or bigger, if not, then bypass checking AdDimensions.txt

Hope you like it



Edited by - lnminente on 23 Aug 2002  16:27:49
 

sidki3003

  • Sr. Member
  • ****
  • Posts: 476
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Banner Killers: Another Two
« Reply #46 on: August 23, 2002, 03:32:54 PM »
Hi lnminente,

Sounds interesting
I'll check that out.

 
 

lnminente

  • Jr. Member
  • **
  • Posts: 73
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • Email
Banner Killers: Another Two
« Reply #47 on: August 23, 2002, 03:41:09 PM »
   

 
 

sidki3003

  • Sr. Member
  • ****
  • Posts: 476
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Banner Killers: Another Two
« Reply #48 on: August 23, 2002, 05:37:10 PM »
Hi altosax and lnminente,

altosax,

3.
<iframe*>(*</iframe>|)

I'm happy about this one.
It does indeed check the entire byte range for "</iframe>" before falling back to "nothing".
That is, it does the opposite as:
<iframe*>(|*</iframe>)

I also changed <object> and <embed> accordingly.
Not sure about layer/ilayer.



lnminente,

*Big* speed improvement , especially for the "Kill: Banners (not linked)" filter.
Thanks a lot for this idea!


I changed so much, i hope i didn't break anything else.
So i'll have to test it for a while.

regards, sidki


 
 

Jor

  • Sr. Member
  • ****
  • Posts: 421
    • ICQ Messenger - 10401286
    • AOL Instant Messenger - jor otf
    • Yahoo Instant Messenger - jor_otf
    • View Profile
    • http://members.outpost10f.com/~jor/
    • Email
Banner Killers: Another Two
« Reply #49 on: August 23, 2002, 06:23:10 PM »
Hi,

A few notes/questions:
Kill: Banners (linked) uses <as*</a>. Isn't $NEST(<a,</a>) faster?

Kill: Banners (not linked) uses a lot of strings which could be optimized.
Mine looks like this: <i(mg|nput)s*>|<frames*>|$NEST(<iframe,</iframe>)|$NEST(<layer,</layer>)|$NEST(<ilayer,</ilayer>)|$NEST(<object,</object>)|<embeds*>|$NEST(<applet,</applet>)
(As you can see I also added applet and frame. The latter because in MSIE's backwards compatibility mode, it works the same as iframe when used inline, with the exception that it has no closing tag).

Question: is there a difference in functionality between <iframe*>(*</iframe>|) or <iframe*>(|*</iframe>) and $NEST(<iframe,</iframe>)?
Iframes without a closing tag don't work anyway.

Also, embed has no closing tag: it was never part of the HTML standard, and thus did not transfer to XHTML, which added closing tags for all elements. Correct HTML uses <object> instead, and this tag does require a closing tag.

Typical <embed> usage (only in HTML4.01 Transitional Documents and lesser) is still:
<EMBED SRC="/path/file.cmx WIDTH="100" HEIGHT="200">
<NOEMBED>
  <P>Sorry, but you do not have a Corel CMX plugin for
   displaying Corel CMX image files. Here is an alternate
   version, as a regular GIF.</P>
<IMG SRC="/path/file.gif" HEIGHT="200" WIDTH="100"
 ALT="stupid example image">
</NOEMBED>


Lastly, is the post edited with all changes, or need I download the zipfile again to get the last version of AdDims as well?

Edited by - Jor on 23 Aug 2002  19:26:25
 

lnminente

  • Jr. Member
  • **
  • Posts: 73
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • Email
Banner Killers: Another Two
« Reply #50 on: August 23, 2002, 06:38:17 PM »
quote:

lnminente,

*Big* speed improvement , especially for the "Kill: Banners (not linked)" filter.
Thanks a lot for this idea!



It's good to hear it. Thanks to you, my friend.

 
 

sidki3003

  • Sr. Member
  • ****
  • Posts: 476
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Banner Killers: Another Two
« Reply #51 on: August 23, 2002, 06:46:51 PM »
Hi Jor,

Just the answers to 2 questions. I have to look into the others more carefully.

Nested/not nested <a> bounds:
Well, at least not statistically significant. Here is what i typically get:

not nested:
Sample Text: 4031 bytes
Successful Matches: 6
Avg time: 6.762277 (milliseconds)

nested:
Sample Text: 4031 bytes
Successful Matches: 6
Avg time: 6.739955 (milliseconds)

Plus: See Scott's comments on using $NEST() in such cases not being a good idea.


Update: The first post is always edited to contain the latest changes to filters and list.
Today's changes are not yet included though.

/sidki


 
 

sidki3003

  • Sr. Member
  • ****
  • Posts: 476
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Banner Killers: Another Two
« Reply #52 on: August 23, 2002, 08:21:22 PM »
quote:

Kill: Banners (not linked) uses a lot of strings which could be optimized.
Mine looks like this: <i(mg|nput)s*>|<frames*>|$NEST(<iframe,</iframe>)|$NEST(<layer,</layer>)|$NEST(<ilayer,</ilayer>)|$NEST(<object,</object>)|<embeds*>|$NEST(<applet,</applet>)


i(mg|nput)s:
Do you see a speed difference or is it just to shorten the line?

$NEST():
After reading *this* and *that*, i did some benchmarks to compare nested versus not nested.
The differences were really minor (talking about my box of course).
Since then i don't use $NEST() with structures that aren't nested.
quote:

(As you can see I also added applet and frame. The latter because in MSIE's backwards compatibility mode, it works the same as iframe when used inline, with the exception that it has no closing tag).


I'll include those in the filter.
quote:

Question: is there a difference in functionality between <iframe*>(*</iframe>|) or <iframe*>(|*</iframe>) and $NEST(<iframe,</iframe>)?
Iframes without a closing tag don't work anyway.


I didn't know that, thanks for sharing . As to using $NEST(), see above.
quote:

Also, embed has no closing tag: it was never part of the HTML standard, and thus did not transfer to XHTML, which added closing tags for all elements. Correct HTML uses <object> instead, and this tag does require a closing tag.


embed:
I see both, <embed> with and without closing tag in practice, and the source i mostly use mentions both, too.
http://developer.netscape.com/docs/manuals/htmlguid/tags14.htm#1286379
So maybe it's just a matter of taste if you want to leave </embed> alone or not.

object:
Not sure about that one, i think i've seen it working without closing tag (IE6).



/sidki

Edited by - sidki3003 on 23 Aug 2002  22:12:19
 

altosax

  • Sr. Member
  • ****
  • Posts: 328
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Banner Killers: Another Two
« Reply #53 on: August 23, 2002, 08:40:09 PM »
jor wrote:

quote:

Question: is there a difference in functionality between <iframe*>(*</iframe>|) or <iframe*>(|*</iframe>) and $NEST(<iframe,</iframe>)?
Iframes without a closing tag don't work anyway.



you are right. i don't know why i've changed my bounds, probably after reading something. now i changed them back to <iframe*</iframe>.

<edit>: i've found where i've read that. here:
http://asp.flaaten.dk/pforum/topic.asp?whichpage=1&ARCHIVEVIEW=&TOPIC_ID=790#3428

thanks,
altosax.

Edited by - altosax on 24 Aug 2002  19:24:41
 

sidki3003

  • Sr. Member
  • ****
  • Posts: 476
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Banner Killers: Another Two
« Reply #54 on: August 23, 2002, 08:52:31 PM »
But let's not forget the difference between "<tag*>(*</tag>|)" and "<tag*>(|*</tag>)".
At least to me that wasn't obvious at all.

 
 

altosax

  • Sr. Member
  • ****
  • Posts: 328
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Banner Killers: Another Two
« Reply #55 on: August 23, 2002, 09:11:44 PM »
sidki wrote:

quote:

But let's not forget the difference between "<tag*>(*</tag>|)" and "<tag*>(|*</tag>)".
At least to me that wasn't obvious at all.



because of the OR function, if the first expression is true the second will never be evaluate. this means that if you write:

<tag*>(|*</tag>)

the *</tag> will never be evaluate because the first expression _nothing_ is always true. in fact <tag*> is an existing expression then always matches with _nothing_.
instead, if you write:

<tag*>(*</tag>|)

the *</tag> will always be evaluate first so if it exists it will match.

hth,
altosax.

 
 

sidki3003

  • Sr. Member
  • ****
  • Posts: 476
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Banner Killers: Another Two
« Reply #56 on: August 23, 2002, 09:24:27 PM »
I've got that by now

 
 

sidki3003

  • Sr. Member
  • ****
  • Posts: 476
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Banner Killers: Another Two
« Reply #57 on: August 24, 2002, 12:38:38 AM »
Update:
Big speed-up in the AdDims call
Added two more tags to check
Quite a few other things

The filter has turned into sort of teamwork.
So big thanks to all who contributed.

Changes in the first post.

Edited by - sidki3003 on 24 Aug 2002  04:21:00
 

altosax

  • Sr. Member
  • ****
  • Posts: 328
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Banner Killers: Another Two
« Reply #58 on: August 24, 2002, 12:30:44 PM »
hi sidki,
i've found in the last bruce eckel's book, the recent beta release of "thinking in proxomitron language", these 3 contributes that you can apply to your banner filters.

1.
In the first one, you can replace the line:

Match = "<a[^>]++shref=$AV(1)*> (([^('][^<>]++)3 <*/a>|)&*ssrc=$AV(4)*&(*alt=$AV(2)|)&"

with these (alt text snipped after first 18 chars):

Match = "<a[^>]++shref=$AV(1)*> (([^('][^<>]++)3 <*/a>|)&*ssrc=$AV(4)*"
        "&((*alt="")$SET(2=Ad)|*alt=$AV((?+{18})2*|2)|$SET(2=Ad))&"

or these if you prefer (alt text not snipped):

Match = "<a[^>]++shref=$AV(1)*> (([^('][^<>]++)3 <*/a>|)&*ssrc=$AV(4)*"
        "&((*alt="")$SET(2=Ad)|*alt=$AV(2)|$SET(2=Ad))&"

This comes from the modifies he made to the default "Banner Blaster" and always return a value for the 2 variable, so you always will have a title in the replacement expression.

2.
Because you already have matched the bounds, in the first filter you can replace this:

"<[^>]+>"
"&&*ssrc=$AV(4)*"
"&&*width=[#41:*]*"
"&&$LST(AdDims)*"

with this:

"<[^>]+>"
"&*ssrc=$AV(4)"
"&*width=[#41:*]"
"&$LST(AdDims)"

and in the second one you can replace this:

"[^>]+>"
"&&*width=[#41:*]*"
"&&$LST(AdDims)*"

with this:

"[^>]+>"
"&*width=[#41:*]"
"&$LST(AdDims)"

this way you haven't to re-match the whole bounds, with a little speed improvement.

3.
according to what he wrote, and tegghead also, you could rewrite the addims list manually setting the replacement instead of using the variables when possible. this should improve a little bit the speed. so the line:

(*width=([#120]|[#173]|[#230:240]|[#400:500])6 & *height=([#60])7)$SET(9=a.common.2 6x7)

could be re-written as:

(*width=([#120]|[#173]|[#230:240]|[#400:500])6 & *height=[#60])$SET(9=a.common.2 6x60)

the same for all other lines.


search for all free bruce eckel's books at http://www.mindview.net/

regards,
altosax.

 
 

sidki3003

  • Sr. Member
  • ****
  • Posts: 476
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Banner Killers: Another Two
« Reply #59 on: August 24, 2002, 02:58:27 PM »
Hi altosax,

1.
((*alt="")$SET(2=Ad)|*alt=$AV(2)|$SET(2=Ad))

"Ad" is not really an extra info, is it? We know that through the replacement link anyway.

I prefer to see the entire alt text, since it's shown only in the flyover.

2.
Not so.

"<[^>]+>"
"&&*ssrc=$AV(4)*"
"&&*width=[#41:*]*"
"&&$LST(AdDims)*"

takes care that these tests appear all within the *same* "<[^>]+>" range

Example:
http://www.extremetech.com/
In AdDims a.button.2 must be uncommented for this.

<a href="http://www.extremetech.com/category2/0,3971,236,00.asp" class="bgcolor2">
<img src="/images/spacer.gif" width="26" height="1" border="0">
<img src="http://common.ziffdavisinternet.com/util_get_image/1/0,3363,i=12372,00.jpg" width="100" height="30" alt="" border="0">
<img src="/images/spacer.gif" width="26" height="1" border="0">
</a>

With

"<[^>]+>"
"&*ssrc=$AV(4)*"
"&*width=[#41:*]*"
"&$LST(AdDims)*"

the AdDims call matches the 2nd <img*>, while the src test matches the first.
Avoiding that is why i made this routine in the first place.

Same goes for filter 2.

3.
Stuffing a (small) string into a variable and recalling it later happens at speed of light (IMO).
JarC's point was to give me more variables for the filters, if i could get rid of 6 (and 7).

I searched for Bruce Eckel's Proxomitron book on the link you posted and on Google, too.
Nothing. Can you give me a direct link?



Jor,

Do you have example links for frame and applet tags with adish dimensions?
I want to see if the check should take place for all links or for off-site links only.


Also, did anyone find any entries in AdPaths being too restrictive for the check on the current host?
If so, they can be moved from the list to the filters. It's this line:
"(ad|promo(s|)|ban|banner(s|)"



regards, sidki

Edited by - sidki3003 on 24 Aug 2002  17:01:25