$NEST command behavior
|
Jul. 13, 2004, 06:34 PM
Post: #1
|
|||
|
|||
Hi all,
I've noticed some odd behavior with the $NEST command. When using the $NEST command in the bounds match, what is the difference in entering: Code: $NEST(<table,</table>) When I "test" a web filter, I don't see any apparent difference. Yet when loading pages, the filter matches differently depending on which bounds I use. Here is the filter is question: Code: [Patterns] Heres a link to a page that does this: http://dir.yahoo.com/Computers_and_Internet/Internet/ If anybody can shed some light on this, I'd appreaciate it. Thanks Mike |
|||
Jul. 13, 2004, 09:04 PM
Post: #2
|
|||
|
|||
I've once heard that a rule of thumb is to NOT have an * in the bounds parameter...
But I'm with you - waiting until someone has a better answer... All I can say to this point is that the method without the * is the most common scheme... |
|||
Jul. 13, 2004, 11:53 PM
Post: #3
|
|||
|
|||
z12:
I must be doing something wrong - I can't get either filter to make a difference when I surf to your test site. As for using an asterisk in the bounds field, it only makes sense that if you allow the boundry checker to continue, then it will match anything and everything until it finds the ending boundry. That will take up CPU cycles, if nothing else. Plus, what does it do to the position of the matching cursor when it comes time to start the match process? I have a feeling that the cursor won't be where you expect it to be, and essentially, your match will always fail. At least, that's the results I got. When I disabled both of these filters, the page displayed in the exact same way, leading me to believe that one might not need these filters after all. But my mileage may be varying from that of other forum members. Any one else wanna chime in here? Oddysey I'm no longer in the rat race - the rats won't have me! |
|||
Jul. 14, 2004, 02:42 AM
Post: #4
|
|||
|
|||
Hi all,
Ok, AdList is a custom list of mine, that may be why you saw no difference. To eliminate that difference for this test, replace the matching expression like so: Code: Match = "<(\w)\0*=$AV(*.overture.*)*" For me, this is very repeatable. The Table I'm having issues with is the sponsored link for juno. Filter_1 always fails, Filter_2 will always match. The thing is, I prefer Filter_1. I seem to have layout issues on other sites when I use the Filter_2 format for boundry matching. It's a mystery to me. Mike |
|||
Jul. 14, 2004, 07:43 AM
Post: #5
|
|||
|
|||
Mike;
OK, now I feel dumb. After adding your modification of *.overture.*, both filters work for me - they both remove the BS table on the far right. Not coincidentally, they also remove another much smaller table at the bottom. To gain an understanding of why this was so, I checked the source code for that page. I think what happens is this: Your "asterisk" version of the filter effectively doesn't need a limit on the boundry - it's gonna match until the cows come home. However, the "asterisk-less" version is set to only 1500 bytes. The code for that table says.... 6970 bytes! I increased the limit to 8000, and everything started working correctly. Now, what happens when Yahoo decides to no longer use overture? Oddysey I'm no longer in the rat race - the rats won't have me! |
|||
Jul. 14, 2004, 01:03 PM
Post: #6
|
|||
|
|||
Hi Oddysey,
hmmm, thats interesting. For some reason, the bounds limit of 1500 doesn't seem to be working for you. When I check that page, I get 7 matches for my "small table killer" filter, and all the tables are less than 1500. As for overture, since yahoo bought them, I think we'll be (not) seeing them for a while. Attached is a pic of the filtered page: Mike Edit: I attached a jpg but that didn't seem to work. |
|||
Jul. 14, 2004, 01:42 PM
Post: #7
|
|||
|
|||
$NEST() is a specialized command. It's very fast, but it won't always match when standard bounds do.
And sometimes it fails. You can boil down the problem that filter 1 doesn't catch all ad tables on that Yahoo page to this: Why doesn't $NEST(<table,</table >) see the first closing tag in this string? <table><tr><td>Juno's</td></tr></table><table><tr><td>Juno's</td></tr></table> $NEST() does some sort of quote checking for other reasons (i think it had something to do with document.write strings, not sure) and ignores the </table> within the single quotes. That filter 2 matches is a false hit so to speek, the ">" that it matches is not the one that belongs to that table tag but to the next one. $NEST(<table[^>]+>,</table >) doesn't match. $NEST() is not really intended to be used like filter 2, there were a couple of discussions with Scott about that a while back. If you need to match some content within the opening tag you can take $INEST(), which is a powerful command and way too less used imo. <table[^>]++this=that$INEST()(<table,</table)</table > A very nice thing about it is if you omit the closing tag, it will still match everything except the latter, thereby allowing other filters to match it, without setting the first filter to "multi". sidki |
|||
Jul. 15, 2004, 02:00 AM
Post: #8
|
|||
|
|||
Hi sidki3003,
Sorry about the slow reply, but I had to think about what you said. What you said about the $NEST command not always matching the closing tag seems to be the case here. It's somewhat of a relief to know that there are known issues with it...I thought that maybe I was losing my mind. Since I'm just trying to remove un-nested tables with this filter, I've just modified my filter to insure no other table tag is included in the bounds. Normally I don't have a problem with $NEST, but with the table tag, I know I've had problems before. Oddly enough, I've never noticed the issue with td or div tags, which are other commonly nested tags. Have you ever heard of a problem with using $NEST with tags that shouldn't be nested, such as STYLE or A tags? Normally for tags that have opening & closing tags I use $NEST in my bounds check as shown in filter 1. Mike |
|||
Jul. 15, 2004, 04:06 AM
Post: #9
|
|||
|
|||
Hi Mike,
Hmm... at some time at Arne's board everyone started to use $NEST(<a\s,[/url]) instead of <a\s*[/url] because of the speed gain. Scott said several times that that isn't a good idea. We didn't really understand why and didn't know about that quote problem either, but silently went back to the old bounds. So i try avoiding to use $NEST() for non-nested tags and hence don't know how problematic it is, but i could imagine that things like <a href="foo">Juno's[/url]<a href="foo">Juno's[/url] aren't that rare. sidki |
|||
Jul. 15, 2004, 11:03 AM
Post: #10
|
|||
|
|||
Hi sidki,
Ok, it looks like I have some filters to modify. Thanks Mike |
|||
Aug. 01, 2004, 08:33 PM
Post: #11
|
|||
|
|||
Just came accross this, which may clarify things (the Yahoo ad tables are all in one huge line):
Quote:--- In prox-list@y..., Mona <...> wrote: |
|||
Aug. 02, 2004, 02:28 PM
Post: #12
|
|||
|
|||
Hi sidki,
Thanks for that info... that explains much. Usually, when I had problems with $NEST some js was involved in the match. I had been fooling around with $INEST a bit, as I wasn't sure if that had the same problem, to see if I could get that to do what I wanted. Here is the last version of a sponsored link table killer I was fooling with: Code: [Patterns] I think it was working ok, but I think the last issue I had was that the byte limit was too high. Besides fooling with $INEST, the idea behind the filter was to "sneak up" on the table tag that was closest to ">sponsored link<", as I noticed that sometimes the $NEST command would sometimes match several table tags ahead of where I wanted it to. I haven't been trying to improve this filter lately as I have been playing with using the "dom container killer" javascript to see if I can get it to do the same thing, and then some. So far, that is looking very promising. Mike |
|||
Aug. 02, 2004, 02:57 PM
Post: #13
|
|||
|
|||
Hi Mike,
That's funny - i was playing with the same idea! It works pretty well here, too. I make sure that the ad'ish string isn't more than six tags away from the opening table tag. Code: [Patterns] $INEST does the same quote checking as $NEST tho, but i run into problems with it very rarely. I saw your DOM container post, pretty interesting stuff! Unfortunately i can't use it because i rely on making the kills visible if needed (toggling display none/inline). And on my slow machine the empty space is hidden with a considerable delay (apparently time to upgrade). But it's good to know that a JS/DOM expert is around. I get lost there often enough. *lol* sidki |
|||
Aug. 02, 2004, 04:57 PM
Post: #14
|
|||
|
|||
Hi sidki
I like the way that filter makes sure it close to the "adish" tag. I've seen before where my filter matched to the closest table tag, but it wasn't close enough. Very nice. Hmmm, thats an interesting idea about toggling visibility. I think I'll look into modifying the dom container killer to see if I can do that. It sure would make it easier to tweak it, as right now it's hard to see what is being removed. I'll have to think about that for a bit. Oh, and I wouldn't consider myself a js/dom expert. I mostly just struggle though it. Mike |
|||
« Next Oldest | Next Newest »
|