Post Reply 
Need help with Sidki filter causing problem
Dec. 18, 2008, 11:31 AM
Post: #1
Need help with Sidki filter causing problem
This filter (Sidki 2008-01-02) is causing problems at dslreports forums:

[Patterns]
Name = "
... Remove/Hide: Ad Containers - Headers 7.10.30 [sd] (d.3 l.3)"
Active = TRUE
URL = "$TYPE(htm)(^$TST(keyword=*.(a_ads|a_adcont|a_adcont_h|i_level:[12]).*))"
Limit = 4500
Match = "]+("
"(> ]+)+>"
"( (^]+(>)\4)++{0,1} (^"
"|( )\#"
"))+*,$TST(\0=td$SET(#=)$SET(3=)|*)"
"|"
"(^$TST(\7=*)&$TST(\4=>)|$TST(\6=span)|$TST(\7=*.i_adtag:[12].*))"
"$SET(1=-hide)$SET(3=
"• \0-head\1: \4\2\3"

Here is a link to the thread started by another Proxo user:

http://www.dslreports.com/forum/r2159834...e#21600467

When I disable the above filter, the problem with the odd looking post by lilhurricane in the thread disappears. The main Security forum page at dslr was also displaying very weirdly and I disabled web filters in Proxo and the forum page displayed properly. That was 12 hours ago. Since then it has sporadically displayed correctly even though I reenabled web filters and did not know which filter was the culprit. I have found the culprit but I don't know if the culprit was causing the Security forum main page weird display or not. I am certain though that the display in the bugs forum thread that is weird is caused by this filter.

I believe this is the same filter that about a year or more ago was causing the same problem at another site and I posted about it in Sidki's forum and he referred me to another thread and said it was fixed in the latest filters which I didn't have. So, I wonder if this is a somewhat frequent problem where this filter has to be tweaked?
Add Thank You Quote this message in a reply
Dec. 22, 2008, 11:02 AM
Post: #2
RE: Need help with filter causing problem
Bump...
Add Thank You Quote this message in a reply
Dec. 25, 2008, 02:52 PM
Post: #3
RE: Need help with Sidki filter causing problem
I've added a character limit (23, can be adjusted down to ~16).
That should fix your issue and similar ones.


Attached File(s)
.txt  ad-headers_12-25.txt (Size: 1.47 KB / Downloads: 954)
Add Thank You Quote this message in a reply
Dec. 26, 2008, 10:20 AM
Post: #4
RE: Need help with Sidki filter causing problem
I have to admit I looked at that filter for a long time.
I just couldn't wrap my head around the matching expression.

My only consolation was the html for this type of ad can vary a lot.
I've tried to write a similar filter for quite sometime without success.
Perhaps one day I'll figure it out how your filter works. Smile!

z12
Add Thank You Quote this message in a reply
Dec. 26, 2008, 12:30 PM
Post: #5
RE: Need help with Sidki filter causing problem
I think it's not that the code itself is difficult to understand. It's the subroutines which make the core difficult to spot. When i don't understand my own code, i decompose the respective filter.

Code:
// Comment: Only check these three containers
<(div|td|center)\0

// Comment: Don't match within scripts, comments, noscript blocks
(^$TST(script=*)|$TST(comment=1)|$TST(tNoscript=1))

// Comment: Subroutine 1 start -- Where to apply the Core RegExp
[^>]+(

// Comment: Skip certain tags
(> <(font|br+|img|h[1-6]|p|s(mall|pan|trong)|!--[^\n]++--)\6[^>]+)

+>

// Comment: Now test the code after the first tag
// Comment: And the second tag, unless it's a comment or our container is closing
( (^<(!-|/+(div|td|center)))[^>]+(>)\4)++{0,1}

// Comment: Fail on opening tags -- Skip HTML entities and non-characters
(^<)(\&[a-z]+; |[^a-z])+

)\8
// Comment: Subroutine 1 end

// Comment: Core RegExp
(
(a(d(vert(isers|s|)|s|)(^-) |n(nunci|zeigen+ ))|marketplace )
(^[a-z0-9ä_+])
|
(
ad(s\sby\s|vert(enti|isem))
|pubb+lici(dad|t[? eé&])
|(\w |)sponsor(^ed[a-z])
|(from|visit) our (advertiser|partner|sponsor)
)
[a-z0-9 ]+{0,23} (^[a-z0-9])
)\2

// Comment: Subroutine 2 start -- Either write back all open tags, or just hide the matched tag
(
// Comment: Hide-or-remove switch
$TST(keyword=(^*.i_adtag:[#*:0].)\7)

// Comment: Remove
$INEST(<$TST(\0),(*(

// Comment: Push unclosed tags into stack
<(t(able|body|foot|d|r|h)|div)\5$INEST(<$TST(\5),</$TST(\5))</$TST(\5) >
|(<(/|)(t(able|body|foot|d|r|h)|div)*> )\#

))+*,</$TST(\0))</$TST(\0) >$TST(\0=td$SET(#=<td style="height:0;padding:0">)$SET(3=</td>)|*)
|

// Comment: Hide
(^$TST(\7=*)&$TST(\4=>)|$TST(\6=span)|$TST(\7=*.i_adtag:[12].*))
$SET(1=-hide)$SET(3=<\0 style="display:none!important"\8\2)
)
// Comment: Subroutine 2 end

// Comment: Log line
($TST(volat=*.log:2*)$ADDLST(Log-Main,[$DTM(d T)]\tWEB Ad-Head\1 \0 \t\6 \4\2 \t\u)|)
Add Thank You Quote this message in a reply
Dec. 26, 2008, 05:09 PM
Post: #6
RE: Need help with Sidki filter causing problem
Thanks for the explanation.

Indeed, I wasn't sure what the intent of the $INEST subroutine was.
Mentally, my biggest problem was determining the scope of the match.
The variable nature of it confused me, and frankly, still does.
I desperately wanted to see an & or &&. It's just the way my brain works.

Armed with your explanation, I plan on doing some testing till I get it.
Thanks again.

z12
Add Thank You Quote this message in a reply
Dec. 27, 2008, 11:11 AM
Post: #7
RE: Need help with Sidki filter causing problem
The "write back unmatched tags" subroutine works quite well and is part of several filters.
It's JD's idea (mentally pushing the "thanks JD" button).

(Dec. 26, 2008 05:09 PM)z12 Wrote:  Mentally, my biggest problem was determining the scope of the match.
The variable nature of it confused me, and frankly, still does.

Ahh! The filter has two scopes.
Code:
<mytag> TEXT_NODE ( /* block scope */ $INEST(<mytag>,</mytag)</mytag > | /* tag scope */ )

You can also use it as fall-back if the entire block exceeds the filter's byte limit.
IIRC hpguru came up with it (mentally pushing the "thanks hpguru" button).

Quote:Thanks again.

You're welcome!

OT: So, it seems this forum is now rating its members by their social compatibility, aka thanks given/received stats. I guess it will take me a while to get used to it. Wink

Anyway, let me use this opportunity to thank you for your inventive JavaScript ideas. Quite some of them are implemented in "proxjs-full.js". Smile!
Add Thank You Quote this message in a reply
Dec. 27, 2008, 08:01 PM
Post: #8
RE: Need help with Sidki filter causing problem
(Dec. 27, 2008 11:11 AM)sidki3003 Wrote:  OT: So, it seems this forum is now rating its members by their social compatibility, aka thanks given/received stats. I guess it will take me a while to get used to it. Wink

OT: Don't worry; I get the feeling you'll rack up "Thanks Received" in no time Cheers It's not really a rating, it's more as a way of people of giving thanks for help without having to post. Perhaps we can think about hiding the stats.
Visit this user's website
Add Thank You Quote this message in a reply
Dec. 28, 2008, 01:05 PM
Post: #9
RE: Need help with Sidki filter causing problem
sidki3003 Wrote:Ahh! The filter has two scopes.

That's what confused me, thanks for the clarification.
I think I can visualize the matching expression now.
As I understand it, the scope of this filter is determined after the "text match".

It's a method I've never really considered before.
It seems a rather clever way to avoid the "byte limit" issue with the outer matching tag.
I need to grok this. Smile!

z12
Add Thank You Quote this message in a reply
Jan. 08, 2009, 03:11 AM
Post: #10
RE: Need help with Sidki filter causing problem
(Dec. 25, 2008 02:52 PM)sidki3003 Wrote:  I've added a character limit (23, can be adjusted down to ~16).
That should fix your issue and similar ones.

Thank you so much! Smile! Sorry for the tardy reply. After a week went by with only a bump and Christmas came along, I sort of forgot about it. I'll download and install the fix in a few minutes and I'll let the poster at dslr who first brought this up know. They might already know because I gave the link to this thread when I posted it. Maybe they have kept up with it.
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: