Author Topic: Super Opener - Text Links Only, work in progress  (Read 4533 times)

altosax

  • Sr. Member
  • ****
  • Posts: 328
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Super Opener - Text Links Only, work in progress
« on: May 25, 2002, 03:11:50 PM »
hi all members,
i've started working on superopener to create a text links only version. i'll post in this thread all the changes i've made to discuss them with your help. this filter is really hard to study but i think here there are some filters guru that could help me.

at first, excuse me for the long (and probably hard to read) message. i know the jd5000 solution but understand his filter is more or less the same of understandig the bpm original, so i've started from bpm version, trying to make no errors.

this is the second part of the original filter set of three parts, in its unmodified form. all my consideration have to be referred to this part.



Name = "Links >^ SUPER-OPENER BETA 37 (aB) (bC)"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)"
Bounds = "<as*(<(\|)/a>|(<as))(^<!-- BPM_(W|A) -->)"
Limit = 450
Match = "<a"
        "("
        "([^>]++(shref=$AV(*))1[^>]+>)"
        "&&"
        "((shref=$AV(*)|starget=$AV(_blank|_new)"
        "|(sclass=$AV(*))2|(sstyle=$AV(*))3|((s[^ ]+)|>)#))+"
        ")"
        ""
        "( $NEST(<,(^(\|)/a(^?))*,> )+)4"
        ""
        "("
        "<(/a>|as)"
        "$SET(6= class="BPM-supero-d")"
        "$SET(7=&loz;)"
        "|"
        "(&[^; ]++; |[^<] )5"
        "($NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)$SET(7=<font size=-2>&loz;</font>)"
        "|"
        "(( (&[^; ]++;|[^<])"
        "$NEST( <,(^(\|)/a(^?))*,>)+)++{1,2})5"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)"
        "|"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+"
        "(&[^; ]++;|[^<])"
        " (&[^; ]++;|[^<]|$NEST(<,(^(\|)/a(^?))*,> )+)++ )5"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+"
        "(&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)"
        ")"
Replace = "<a123@45</a>"
          "<a id=BPM-supero6"
          " title=Open?in?new?window"
          " target=_blank123>"
          "78</a>"
          "<!-- BPM_W -->"



changelog and explanations:

1. removed the filter "Comments >^ Remove temporary proxomitron comment tags (C)"

it's the third part of the superopener filter set. you don't really need it because it removes only this short comment "<!-- BPM_W -->" added by "Links >^ SUPER-OPENER BETA 37 (aB) (bC)" to each matching links. btw, you have to consider this balance: with "Comments" active you have additional checks of this filter for each character of the web page that non matches the previous filters (their number is equal to the non matching characters contained in the page); with "Comments" disabled, you have additional checks for each active filter, for each character of "<!-- BPM_W -->" and for each link matched by superopener. the number of additional checks in this case is (14 x N x M) where N is the number of active filters and M the number of links matched in the page. it's clear that you can take advantage from disabling "Comments" only disabling also the addition of "<!-- BPM_W -->" in "Links" filter.

2. removed (^<!-- BPM_(W|A) -->) from Bounds
3. removed <!-- BPM_W --> from Replace
4. removed Multi = TRUE

these changes are strictly related because superopener needs to add <!-- BPM_W --> to avoid double matching of the code replaced by itself. in fact, it needs the option Multi = TRUE to enable other filters to match the <a> tags, but needs also a protection for the <a> tags that itself adds to the end of the link to avoid an infinite loop (this is the reason of the ^ function in the Bounds). then, you can safely remove Multi = TRUE only if superopener is placed at the very end of your filter set. from this point i'll assume that you have placed superopener at the end of your filter set.

5. simplyfied Bounds in "<as*(/a>|(<as))"

due to the wildcard * there is no need to have <(\|) in the first part of the substring. i've left (<as) in the second part because it makes superopener able to match also erroneously nested links in the form <a..<a.. correcting them in the replacement code. this is the first point where i don't agree with jd5000 solution, btw the discussion is open and suggestions are welcome.

- note about Bounds: it is possible add in bounds an exclusion for the images, something like (^<img*>) but my goal is remove from superopener all matching code related to images links so i can't add this exception because i have first to find and remove unnecessary code. i'll add the exception only at the end.

ok, here i stopped my first session. this is the actual result:



Name = "Super Opener - Text Links Only 2"
Active = TRUE
URL = "$TYPE(htm)"
Bounds = "<as*(/a>|(<as))"
Limit = 450
Match = "<a"
        "("
        "([^>]++(shref=$AV(*))1[^>]+>)"
        "&&"
        "((shref=$AV(*)|starget=$AV(_blank|_new)"
        "|(sclass=$AV(*))2|(sstyle=$AV(*))3|((s[^ ]+)|>)#))+"
        ")"
        ""
        "( $NEST(<,(^(\|)/a(^?))*,> )+)4"
        ""
        "("
        "<(/a>|as)"
        "$SET(6= class="BPM-supero-d")"
        "$SET(7=&loz;)"
        "|"
        "(&[^; ]++; |[^<] )5"
        "($NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)$SET(7=<font size=-2>&loz;</font>)"
        "|"
        "(( (&[^; ]++;|[^<])"
        "$NEST( <,(^(\|)/a(^?))*,>)+)++{1,2})5"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)"
        "|"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+"
        "(&[^; ]++;|[^<])"
        " (&[^; ]++;|[^<]|$NEST(<,(^(\|)/a(^?))*,> )+)++ )5"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+"
        "(&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)"
        ")"
Replace = "<a123@45</a>"
          "<a id=BPM-supero6"
          " title=Open?in?new?window"
          " target=_blank123>"
          "78</a>"



now some comments to the code:


"([^>]++(shref=$AV(*))1[^>]+>)"

it matches <a this_content>, no problem.
---------------

"&&"
"((shref=$AV(*)|starget=$AV(_blank|_new)"
"|(sclass=$AV(*))2|(sstyle=$AV(*))3|((s[^ ]+)|>)#))+"

it matches <a this_content>, no problem. at this point the <a..> tag is entirely stored in the 1, 2, 3 variables and the stack
---------------

"( $NEST(<,(^(\|)/a(^?))*,> )+)4"

here it is!! this code matches images links, because it matches <a..><this_image><even_doubled></a>. we don't need this line of code neither the 4 variable in the replacement. but if this line is removed, we need something to match <a..>this_text</a>.
---------------

"<(/a>|as)"
"$SET(6= class="BPM-supero-d")"
"$SET(7=&loz;)"

if the link is an image link this code matches the closing tag </a> or an eventually erroneous nested <a> tag and set 6 for the style and 7 to make possible the addition of the lozange. in a text only version we need only the first line and can safely remove the class "BPM-supero-d" from the first of the 3 superopener filters. note also that the closing </a> tag, even when matches, is not stored anyway, but is expressely replaced.
----------------

"(&[^; ]++; |[^<] )5"
"($NEST(<,(^(\|)/a(^?))*,> )+ )8"
"<((\|)/a>|as)$SET(7=<font size=-2>&loz;</font>)"

this seems to be an error. i think the second line of this code is unnecessary, but i still haven't understood at all this block of code so i'll write nothing about my conclusions. btw, this could be the bug that sometimes add the lozenge to text links.
----------------

"(( (&[^; ]++;|[^<])"
"$NEST( <,(^(\|)/a(^?))*,>)+)++{1,2})5"
"( (&[^; ]++;|[^<])"
" $NEST(<,(^(\|)/a(^?))*,> )+ )8"
"<((\|)/a>|as)"
"|"
"( (&[^; ]++;|[^<])"
" $NEST(<,(^(\|)/a(^?))*,> )+"
"(&[^; ]++;|[^<])"
" (&[^; ]++;|[^<]|$NEST(<,(^(\|)/a(^?))*,> )+)++ )5"
"( (&[^; ]++;|[^<])"
" $NEST(<,(^(\|)/a(^?))*,> )+"
"(&[^; ]++;|[^<])"
" $NEST(<,(^(\|)/a(^?))*,> )+ )8"
"<((\|)/a>|as)"

here is where i need your help. this code apply to text links but i'm not sure we need such complicated thing. do you agree? for example, i think we can remove (//|) from all $NEST commands.


that's all. thank you for your patience reading all this. i'm looking in advance for your comments and suggestions. now i'm still testing the first result, mainly its compatibility with the rest of the filter set. it's still a text+images version but as you all understand this word require to be done step by step.

see you later in this thread,
regards to all,
altosax.



Edited by - altosax on 25 May 2002  16:16:22

Edited by - altosax on 25 May 2002  21:36:51
 

JD5000

  • Full Member
  • ***
  • Posts: 241
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://home.satx.rr.com/jd5000/
    • Email
Super Opener - Text Links Only, work in progress
« Reply #1 on: May 25, 2002, 07:02:01 PM »
Great idea Altosax. Hope we can figure this bugger out.

On the bounds, I've since changed it to "<as*/a>", without any known problems. I will try your solution out tho.

As I'm sure you know, but others might not. It's is possible to remove the following code from the filter to disable the placement of the lozenge's on image's (but, that is a workaround, not a fix).

"$SET(6= class="BPM-supero-d")"
"$SET(7=◊)"

Also changing "( $NEST(<,(^(\|)/a(^?))*,> )+)4" to "( $NEST(<,*,> )+)4" also disables the lozenge's on images.


quote:
"(&[^; ]++; |[^<] )5"
"($NEST(<,(^(\|)/a(^?))*,> )+ )8"
"<((\|)/a>|as)$SET(7=<font size=-2>◊</font>)"

this seems to be an error. i think the second line of this code is unnecessary, but i still haven't understood at all this block of code so i'll write nothing about my conclusions. btw, this could be the bug that sometimes add the lozenge to text links.


"bug that sometimes add the lozenge to text links", huh? I don't think I've come across that one. The only time it places a lozenge on a link is when the link only has one letter.



--------

"Imagination is more important than knowledge" - Einstein

altosax

  • Sr. Member
  • ****
  • Posts: 328
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Super Opener - Text Links Only, work in progress
« Reply #2 on: May 25, 2002, 08:29:16 PM »
hi jd5000,
thanks for reading my long post and for your reply. i've wrote what follows offline, but i've found the same conclusion on that block of code. it is bugged anyway because sometime i've seen the lozenge added to text links and cj wrote me the same.

btw this is the post i wrote offline, excuse me for this, but this filter require time to be analized, i can write about it on line.
------------------------------------------------------------------

hi all, friends.
i've tested my first modified version of superopener and all works fine. it doesn't need a long and hard testing because the modifies i've made in this first phase are 100% safe. if you use superopener for both images and text links, here are my modified filters, replace the 3 filters of the original filter set with this 2 filters. note: you have to place them at the very end of your filter set, just before filters that match <end>. they have exactly the same feature of the 3 original ones but are really faster because the main filter, the "Links" one, no more have the option "Allow for multiple match" enabled.



Name = "Style - Super Opener 1"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)"
Limit = 16
Match = "</head>|(<body)1$SET(2=<head>)"
Replace = "2"
          "<style type="text/css">"
          "#BPM-supero:hover { border: thin dotted #dd0000 }"
          "A.BPM-supero-d { font: 9pt verdana;"
          " color: #b22222; text-decoration: none;"
          " position: relative; top: -6px; left: -11px; }"
          "</style>"
          "</head>"
          "1"
          "$STOP()"



Name = "Links - Super Opener 2"
Active = TRUE
URL = "$TYPE(htm)"
Bounds = "<as*(/a>|(<as))"
Limit = 450
Match = "<a"
        "("
        "([^>]++(shref=$AV(*))1[^>]+>)"
        "&&"
        "((shref=$AV(*)|starget=$AV(_blank|_new)"
        "|(sclass=$AV(*))2|(sstyle=$AV(*))3|((s[^ ]+)|>)#))+"
        ")"
        ""
        "( $NEST(<,(^(\|)/a(^?))*,> )+)4"
        ""
        "("
        "<(/a>|as)"
        "$SET(6= class="BPM-supero-d")"
        "$SET(7=&loz;)"
        "|"
        "(&[^; ]++; |[^<] )5"
        "($NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)$SET(7=<font size=-2>&loz;</font>)"
        "|"
        "(( (&[^; ]++;|[^<])"
        "$NEST( <,(^(\|)/a(^?))*,>)+)++{1,2})5"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)"
        "|"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+"
        "(&[^; ]++;|[^<])"
        " (&[^; ]++;|[^<]|$NEST(<,(^(\|)/a(^?))*,> )+)++ )5"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+"
        "(&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)"
        ")"
Replace = "<a123@45</a>"
          "<a id=BPM-supero6"
          " title=Open in new window"
          " target=_blank123>"
          "78</a>"



i worked a little on this block of code:

"(&[^; ]++; |[^<] )5"
"($NEST(<,(^(\|)/a(^?))*,> )+ )8"
"<((\|)/a>|as)$SET(7=<font size=-2>&loz;</font>)"

and i've discovered that it matches text links when the text is only one char long, i.e something like <a href"url_here">m</a>. as i wrote, i think that this block of code is a little bugged but now i'm sure that it doesn't match images (i simply removed it and made some tests).

and here is my second attempt, now for a version that apply to text links only.
warning: it is a completely untested one, i need your help for this.
btw, now i'm surfing with this version, so i've started to test it myself.



Name = "Style - Super Opener Text Links Only 1"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)"
Limit = 16
Match = "</head>|(<body)1$SET(2=<head>)"
Replace = "2"
          "<style type="text/css">"
          "#BPM-supero:hover { border: thin dotted #dd0000 }"
          "</style>"
          "</head>"
          "1"
          "$STOP()"




Name = "Links - Super Opener Text Links Only 2"
Active = TRUE
URL = "$TYPE(htm)"
Bounds = "<as*(/a>|(<as))"
Limit = 450
Match = "<a"
        "("
        "([^>]++(shref=$AV(*))1[^>]+>)"
        "&&"
        "((shref=$AV(*)|starget=$AV(_blank|_new)"
        "|(sclass=$AV(*))2|(sstyle=$AV(*))3|((s[^ ]+)|>)#))+"
        ")"
        ""
        "( $NEST(<,(^(\|)/a(^?))*,> )+)4"
        ""
        "("
        "<(/a>|as)"
        "|"
        "(&[^; ]++; |[^<] )5"
        "($NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)$SET(7=<font size=-2>&loz;</font>)"
        "|"
        "(( (&[^; ]++;|[^<])"
        "$NEST( <,(^(\|)/a(^?))*,>)+)++{1,2})5"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)"
        "|"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+"
        "(&[^; ]++;|[^<])"
        " (&[^; ]++;|[^<]|$NEST(<,(^(\|)/a(^?))*,> )+)++ )5"
        "( (&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+"
        "(&[^; ]++;|[^<])"
        " $NEST(<,(^(\|)/a(^?))*,> )+ )8"
        "<((\|)/a>|as)"
        ")"
Replace = "<a123@45</a>"
          "<a id=BPM-supero"
          " title=Open in new window"
          " target=_blank123>"
          "78</a>"



and here the changelog for this version (i'm continuing the previous numbering):

6. removed the line of code: "$SET(6= class="BPM-supero-d")"
7. removed the line of code: "$SET(7=&loz;)"

both changes in "Links" filter to remove the ability to set the class attribute for the image and add the lozenge to images links.

8. removed 6 from replacement code

in a text links only we no more need to add the attribute class for image links.

9. at this point we can safely remove these 3 lines of code from "Style" filter:

"A.BPM-supero-d { font: 9pt verdana;"
" color: #b22222; text-decoration: none;"
" position: relative; top: -6px; left: -11px; }"

because no link now have the class BPM-supero-d.

note1: with this second version the image links are still processed;
note2: we still need to replace 7 for a single char links;
note3: there is still a block of code with a bug;
note4: as i wrote, this second version is still UNTESTED, please help me to test it.

phew, this filter is really a bad beast,
let me know your suggestions.

reagards,
altosax.

 
 

JD5000

  • Full Member
  • ***
  • Posts: 241
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://home.satx.rr.com/jd5000/
    • Email
Super Opener - Text Links Only, work in progress
« Reply #3 on: May 25, 2002, 09:08:16 PM »
On the text only filter you posted, I can confidently say it should be safe for daily use. It's almost the same as the modified filter I've been using for the last couple of months. The only diff is the bounds & ( $NEST(<,(^(\|)/a(^?))*,> )+)4 which I just changed today.


Do you know of a test site for that bug? Maybe I have seen it, but don't remember.

*thinks*

I have noticed a bug at deviantart.com where it adds ">" to the end of some images on the front page. I just lowered the byte limit & that fixed it. (I only changed it on the text only filter)

--------

"Imagination is more important than knowledge" - Einstein

JD5000

  • Full Member
  • ***
  • Posts: 241
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://home.satx.rr.com/jd5000/
    • Email
Super Opener - Text Links Only, work in progress
« Reply #4 on: May 25, 2002, 09:11:49 PM »
The big bug for me is, when it doesn't add the "style" to the second part of the link.

EXAMPLE:http://aintitcool.com/

--------

"Imagination is more important than knowledge" - Einstein

altosax

  • Sr. Member
  • ****
  • Posts: 328
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Super Opener - Text Links Only, work in progress
« Reply #5 on: May 26, 2002, 01:08:37 AM »
hi jd5000,
i've visited aintitcool and read the html code.

what you write is reversed: it is not superopener that don't apply the style, is the site that change the style of the links!!

their code is <a..><font..>text_of_the_link</font></a>. when superopener match the link its replacement is:
<a..><font..>text_of_the_li</font></a><a id=BPM..>nk</a>
it correctly apply the style to "nk" but the site not to "text_of_the_li".

i've visited also deviantart and again the output is not a bug in superopener. this is the code matched by superopener:

<Match: Links >^ SUPER-OPENER BETA 37 (aB) (bC) >
<a href="http://www.deviantart.com/deviation.php?id=371703"><img src="http://thumbs.deviantart.com/thumb?type=100&file=large/indyart/anime/Ashril.jpg&radius=5.1&opacity=0.6&xoff=2&yoff=3&color=96A096" width="115" height="115" border="0" title="Ashril by ~mandy-chan
Submitted: 5/25/2002
Cat/Sec: IndyArt->Anime
Res: 300x300
Filesize: 0kb
Comments: 0
Views: 3
Downloads: 0
Score: 350"></a>
</Match>

as you can see there is a line containing a closing acute parens in

Cat/Sec: IndyArt->Anime

if you read the code in debug mode you can see this parens of the same color (navy) of the opening <a> tag. the code from <a> to this parens is green, the rest of code is yellow. this means that this parens confuse superopener and it stores the matching code in the wrong variables causing a wrong output.
so this is due to the poorly written code of the site the should have been &gt; instead of >.

anyway, this doesn't means that superopener can't be improved ;)

warm regards,
altosax.

 
 

JD5000

  • Full Member
  • ***
  • Posts: 241
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://home.satx.rr.com/jd5000/
    • Email
Super Opener - Text Links Only, work in progress
« Reply #6 on: May 26, 2002, 02:08:36 AM »
Ahh, thanks for the info. Now it won't bug me.

--------

"Imagination is more important than knowledge" - Einstein

altosax

  • Sr. Member
  • ****
  • Posts: 328
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Super Opener - Text Links Only, work in progress
« Reply #7 on: June 05, 2002, 12:24:11 PM »
hi all.

because noone continue posting in this thread, i've stopped my work and posted Super Opener - Text Links Only in the download sub-forum so everyone who need can easy find it.

best regards,
altosax.