Addings/modifications to help files
|
Jul. 20, 2009, 03:35 AM
Post: #16
|
|||
|
|||
RE: Addings/modifications to help files
Quote:10a Loops -- Limiting expression scopes. No, "+" doesn't look ahead, it just repeats the preceding expression blindly. The help file has below words: Quote:An important point to make about + is that it's a "blind" run. This means it repeats at long as the condition it's testing is true regardless of anything the follows it! regardless of anything the follows it means it doesn't look ahead. BTW, the words might should be "regardless of anything that follows it". Quote:Example: I think "*" is equal to "?++" in prox language while the help file has: Quote:A double-plus acts much like the single "+" plus except it also pays attention to what comes afterwards (it can "see" so to speak). So, The "*" itself looks ahead. To match a string until meet a ">", "[^>]+>" is faster than "*>" because "+" just blindly repeat the "[^>]" while "*" has to check after each character match if it is followed by a ">". Quote:10b Avoiding superfluous tests in OR conditions. I don't understand what the examples are trying to show. A more detailed example might help. ![]() |
|||
Jul. 20, 2009, 06:11 AM
(This post was last modified: Jul. 20, 2009 06:59 AM by sidki3003.)
Post: #17
|
|||
|
|||
RE: Addings/modifications to help files
(Jul. 20, 2009 03:35 AM)whenever Wrote: No, "+" doesn't look ahead, it just repeats the preceding expression blindly. The help file has below words: Sure it doesn't. It *removes* look ahead capatibility from subexpressions like "*>". Have a look at the examples below that statement. Quote:I think "*" is equal to "?++" in prox language while the help file has: Effectively yes. Speedwise the difference is around an order of magnitude. Quote:So, The "*" itself looks ahead. Right. That's what we need to get rid of in mentioned example situations. Quote:To match a string until meet a ">", "[^>]+>" is faster than "*>" because "+" just blindly repeat the "[^>]" while "*" has to check after each character match if it is followed by a ">". Now that we have made "*" blind, it's much faster than "[^>]+>". A more accurate expression for "making blind" is: Limiting the subexpression's scope, so that - after the initial match - there is nothing left to look ahead. Quote:I don't understand what the examples are trying to show. \*\*\*+{98} instead of \*+{100} makes the expression start with two unique chars, which is what we want. Quote:A more detailed example might help. I couldn't think of any. I should also note that techniques.txt is addressing advanced filter writers, who know the help files (i know them too ![]() (No one else would be interested in such things anyway.) Thanks for looking at that draft. ![]() I assume that its content is logically correct (all statements have been tested, of course), but maybe some wordings and/or examples could be improved to make it easier to understand. |
|||
Jul. 20, 2009, 07:54 AM
Post: #18
|
|||
|
|||
RE: Addings/modifications to help files
(Jul. 20, 2009 06:11 AM)sidki3003 Wrote:(Jul. 20, 2009 03:35 AM)whenever Wrote: No, "+" doesn't look ahead, it just repeats the preceding expression blindly. The help file has below words: Yes, the subexpressions "*>" doesn't look ahead when you suffix it with a "+", it is similar to Atomic Grouping in general regex flavors, but I think the "*" *within* the subexpressions still looks ahead until it finds the first ">". That's where I think slower than "[^>]+>". Quote:\*\*\*+{98} instead of \*+{100} makes the expression start with two unique chars, which is what we want.That's interesting, although I still couldn't understand why. Better change the doc to: Quote:Example: |
|||
Jul. 20, 2009, 08:18 AM
Post: #19
|
|||
|
|||
RE: Addings/modifications to help files
(Jul. 20, 2009 07:54 AM)whenever Wrote: Yes, the subexpressions "*>" doesn't look ahead when you suffix it with a "+", it is similar to Atomic Grouping in general regex flavors, but I think the "*" *within* the subexpressions still looks ahead until it finds the first ">". That's where I think slower than "[^>]+>". Significantly faster. Which is the point of entire chapter 10. Test it. ![]() Quote:Better change the doc to: Done (although basic Prox). ![]() |
|||
Jul. 20, 2009, 03:44 PM
(This post was last modified: Jul. 21, 2009 01:40 AM by whenever.)
Post: #20
|
|||
|
|||
RE: Addings/modifications to help files
(Jul. 20, 2009 08:18 AM)sidki3003 Wrote: Significantly faster. Which is the point of entire chapter 10. Test it. A test proved I was wrong. ![]() "*>" is much faster than "[^>]+>", even "?++>" is faster than "[^>]+>". This is totally different from what I know about common regex flavors' behaving on Greedy vs. Lazy. It seems "*" in prox is not simply the ".*?" in common regex flavors and Scott had made special optimization for it. "Look around" in common regex flavors doesn't consume characters, I think you are not meaning that when you say "look ahead" in your docs, so my suggestion is as below: Quote:10a "+" -- Suppressing expression match attempts |
|||
Jul. 21, 2009, 06:02 PM
Post: #21
|
|||
|
|||
RE: Addings/modifications to help files
Good points!
Rephrasing 10b was easy: Code: prefix(-possible_suffix|)\1*some_string I do have problems integrating the changes into 10a. Although your suggestion is more exact, i perceive it as harder to understand than before. Also, i'd like to keep the "scope limiting" part, because it describes the actual process nicely (and because Scott was using it frequently too). I'll revisit it later. |
|||
« Next Oldest | Next Newest »
|