|
Some text matching questions
|
|
Sep. 17, 2008, 04:18 PM
Post: #16
|
|||
|
|||
|
RE: Some text matching questions
Line 1 checks for the beginning of a tag and sets a variable to "1" when the start of a tag is matched and empties the variable when an end of tag is matched.
It does not "instructs it not to match anything while the variable is set to '1'" (<$SET(open=1)|>$SET(open=))PrxNeverMatch or (<$SET(open=1)PrxNeverMatch|>$SET(open=)PrxNeverMatch) Line 5 is the parenthesis that starts the part that places things following "pink" into variable "1" Line 7 requires pink(s|) to be followed by: a space a < an 's a character that is not a letter or digit. Line 9 does not match when called while inside a tag. (^$TST(open=1)) "NOT if open is 1". For visual clarity. Seems to have worked. |
|||
|
Sep. 19, 2008, 04:19 AM
Post: #17
|
|||
|
|||
|
RE: Some text matching questions
Yes, the separate lines help.
But there is still some confusion: After re-reading Kye-U's explanation of the purpose of "PrxNeverMatch" I sort of see what it's used for, but don't understand how it functions specifically. PrxNeverMatch is not mentioned in the Prox help file and a google search only nets 7 results, none of them explanations. Does it simply make sure that the characters "<" and ">" are excluded when the non-digits/letters are matched later? Does it always follow whatever is not to be matched? And since it follows the entire parenthetical expression, why aren't the $SETs negated? Why is (s|) necessary? Why not match (pink|pinks) ? Actually, I don't understand why the "|" after PrxNeverMatch is necessary either. An "or" would seem to match one but not the other. |
|||
|
Sep. 19, 2008, 11:44 AM
Post: #18
|
|||
|
|||
RE: Some text matching questions
(Sep. 19, 2008 04:19 AM)Guest Wrote: Why is (s|) necessary? Why not match (pink|pinks) ? technically, you can do it EITHER way... (pink|pinks) might be "easier" to SEE what is happening when 'reading' a filter... but i "suspect" (no proof) that pink(s|) makes the filter FASTER... i tend to use the (s|) tidbit more often than not under that "assumption" anyway, lol... in the web filter editor page, there is a "TEST" option that will show you how "fast" your filter 'parses' your input, if you wish to "test" the speeds of the two methods, we'd be delighted to hear the results
|
|||
|
Sep. 20, 2008, 02:30 AM
Post: #19
|
|||
|
|||
|
RE: Some text matching questions
I tried it (not sure I did it right), with these results:
pink((s|) Sample Text: 30000 bytes Successful Matches: 27 Avg time: 2.969000 (milliseconds) ((pink|pinks) Sample Text: 30000 bytes Successful Matches: 27 Avg time: 3.140000 (milliseconds) So yes, a bit faster. It's a little comforting that (pink|pinks) will work though because the other method still doesn't make sense to me. It seems like it should be pink(|s) if anything. |
|||
|
Sep. 20, 2008, 03:36 AM
Post: #20
|
|||
|
|||
| RE: Some text matching questions | |||
|
Sep. 20, 2008, 04:23 AM
Post: #21
|
|||
|
|||
|
RE: Some text matching questions
We are trying to not match the text inside tags.
Code: <a href="http://www.yourpsp.com/pink/" class="external text" title="http://www.yourpsp.com/pink/" rel="nofollow">Official If we give a variable a value of 1 when a < is found and no value when a > is found, when the variable is 1 Proxo is probably inside a tag. Looking at the code: Proxo finds a < and sets the variable. Eventually finds pink/ but we don't want that one changed. We need to break that match. So we have Proxo check its memory. Variable is 1, match is probably inside tag. No match. Another pink/. Variable is 1, don't match. Next Proxo finds a > and clears the variable. Later Proxo finds Pink. Proxo checks, finds an empty variable, and matches. "PrxNeverMatch"? We need to know about < and > but we don't need to replace them. There'd be log spam and debug spam from the unnecessary matches. Match and replace would be alright or ok but not very neat and might be slower. Quote:why aren't the $SETs negated?Cause Mr. Lemmon didn't want them negated. This is a very useful trick. Quote:Why not match (pink|pinks)Cause pinks would never match... Think about it... Proxo will always check for what first? pink would always match. You could use (pinks|pink) I was following the crowd, (s|). There is always more than one way. Quote:should be pink(|s) if anythingIt may depend on the rest of the expression but you will probably always want (s|). Nothing always matches and in (|s) it will always be checked first. Quote:Actually, I don't understand why the "|" after PrxNeverMatch is necessary either. An "or" would seem to match one but not the other.The filter is looking for a <, a >, or some pink(s|). If you remove the OR, all matches must start with < or > and would be doomed due to "PrxNeverMatch". You could use two filters. One to set the variable and another to look for some pink(s|). You could forget variables, match from > to <, and check for pink(s|). You could use the stack or recursion or ???. There is always more than one way. |
|||
|
Sep. 22, 2008, 03:06 AM
Post: #22
|
|||
|
|||
|
RE: Some text matching questions
Thanks to everyone for the detailed explanations. Much appreciated. Having it explained in actual language helps, and I'm slowly getting it.
Something finally dawned on me. The filter completes its check, (meaning goes through the entire filter) 7 characters after a match, then begins again with the next character it encounters. I understood it when I read "...we have Proxo check its memory." That can't occur until the latter part of the filter expression. For some reason I was thinking of it checking the entire page while still within the initial iteration of the filter. Yes, stupid I know. I'll just accept that PrxNeverMatch does what it does. Doubtful I'd know how to apply it in a non-similar instance though. |
|||
|
Sep. 22, 2008, 03:33 PM
Post: #23
|
|||
|
|||
RE: Some text matching questions
(Sep. 22, 2008 03:06 AM)Vendettta Wrote: Yes, stupid I know. These things are new to you. "Stupid" is the wrong word. (Sep. 22, 2008 03:06 AM)Vendettta Wrote: I'll just accept that PrxNeverMatch does what it does. Doubtful I'd know how to apply it in a non-similar instance though. "PrxNeverMatch" simply breaks the match after the variable has been set. You could use any string or phrase that wouldn't match, (^), $TST(). The variable's value will persist and can be used later. Filters test data as it arrives, "on the fly". You can buffer 32766 bytes. |
|||
|
Sep. 23, 2008, 08:23 PM
Post: #24
|
|||
|
|||
|
RE: Some text matching questions
Maybe would be better to use "^?" instead of "PrxNeverMatch", to be sure it would never never match?
|
|||
|
Sep. 24, 2008, 05:07 AM
Post: #25
|
|||
|
|||
|
RE: Some text matching questions
Using what I've learned from the pink/red example, I've been trying to find a solution to my other original question (replacing exclamation points with periods), and I still can't get anything to work that doesn't destroy javascript and comments. The closest to success has been this one:
Code: Match = (<$SET(open=1)|>$SET(open=))PrxNeverMatchIt's clearly modeled on the red/pink filter, but the ones I created from scratch were disasters. The above example does seem to preserve javascript and comments, and it replaces many "!"s, but some are left untouched, including ones at the end of paragraphs, in header text, and consecutive exclamation points. |
|||
|
Sep. 24, 2008, 02:46 PM
Post: #26
|
|||
|
|||
RE: Some text matching questions
(Sep. 23, 2008 08:23 PM)lnminente Wrote: Maybe would be better to use "^?" instead of "PrxNeverMatch", to be sure it would never never match? Why not both? You could use PrxNeverMatch(^?). PrxNeverMatch would explain things better than (^?). Sidki's set appears to use PrxFail$TST(). PrxFail probably fails quicker than $TST(). $TST() probably fails quicker than (^?). PrxNeverMatch probably fails quicker than (^?). (\s|?) says "space or any single byte". Seems unnecessary? Code: [Patterns](Sep. 24, 2008 05:07 AM)Vendetta Wrote: but some are left untouched, including ones at the end of paragraphs, in header text, Some URLs would help. |
|||
|
Sep. 25, 2008, 01:52 AM
Post: #27
|
|||
|
|||
|
RE: Some text matching questions
That one is significantly better. It gets most of them while avoiding javascript. I'm testing it on this page.
Out of 16 exclamation points, it left 3. They were after a space and after quotation marks. |
|||
|
Sep. 25, 2008, 02:26 AM
Post: #28
|
|||
|
|||
RE: Some text matching questions
(Sep. 25, 2008 01:52 AM)Vendettta Wrote: Out of 16 exclamation points, it left 3. They were after a space and after quotation marks. So then, Code: [Patterns]
|
|||
|
Sep. 25, 2008, 03:56 AM
Post: #29
|
|||
|
|||
|
RE: Some text matching questions
Perfect
![]() Now if I only understood the fourth line. The first part seems clear: When "!" is matched, a replacement stack variable is set to a space and "." Next looks like something about a run of repeating characters from one to infinity in number, but it's not apparent how the whole line should be expressed in a logical sequence. |
|||
|
Sep. 25, 2008, 09:24 AM
Post: #30
|
|||
|
|||
|
RE: Some text matching questions
Yes!! You said probably, and now i tested it and you was right, both is better.
As you said PrxNeverMatch or PrxFail gives the filter more speed and adding (^?) or $TST() later gives you the absolute security it will never match. Thanks!
|
|||
|
« Next Oldest | Next Newest »
|

Search
Member List
Calendar
Help



![[-]](images/ONi/collapse.gif)


