<BLOCKQUOTE id=quote><font size=1 face="Verdana, Arial, Helvetica" id=quote>quote:<hr height=1 noshade id=quote>Xartica's filter does not seem to work (?)<hr height=1 noshade id=quote></BLOCKQUOTE id=quote></font id=quote><font face="Verdana, Arial, Helvetica" size=2 id=quote>
Try this one, after Xartica's comment I've been playing with the inital version. It isn't perfect yet... but maybe it works for you
[Patterns]
Name = "URL: Use Google Service to Convert ... files To HTML (on right)"
Active = TRUE
Multi = TRUE
URL = "(^*(google.com|216.239.3.100)*)"
Bounds = "<a\s*(<(\\|)/a>|(<a\s)\9)"
Limit = 256
Match = "(^*CLASS=PRX*)\0"
"(HREF=$AV(([a-z]+://|$SET(\#=\h))\8"
"\4.(DOC|PDF|PPT|PUB|RTF|XLS)\5))\1"
"(^*<IMG*)\2"
Replace = "\0\1\2\r\n<A CLASS=PRXdirect HREF="
http://216.239.39.100"
"/search?q=cache:$ESC(\@\4.\5)&hl=en">\5& #187;HTML</A>\9"
corrected 2009. Remember to change IPAbout how 8 is filled, the match needs to swallow the http:// part, as it must be removed from the URL that is fed to Google.
Then there's the cases where the links are relative and a hostname is absent, in this case there also won't be a http, so the right side of the OR makes sure the host part is present (luckily it does not return a http:// in front so we won't need to remove it again)
8 is not used thereafter and the only reason for putting it there was to prevent 4 from taking up the http:// (if any), as <u>it's</u> (4) purpose is to match the entire url without protocol prefix upto the extension dot.
Since the extension is also needed in the link text, it ends up in 5.
Now, the first part of the original link is stored in \0, the full href in \1 and the remainder in \2, these are then used to reconstruct the original link.
@ will have the value of h if it wasn't present in the link, and 4 and 5 recreate the path to the document without protocol prefix. The result is $ESCaped so spaces in document names don't throw off Google.
Finally 5 is then reused to create a text link, <b>the space between & #</b> is so it doesn't get converted during post (it had changed when I came back for the edits) and <b>needs to be removed before use</b>.
One last bit, the 9, I got the bounds from one of the Super Opener versions, it used 9 to catch the remainder of the buffer in case the <a tags where overlapping, it returns in the replace as last. Sadly, constructs like
<a href..> text/image link <br><a href..> image/text link </a> </a>
seem to occur more than I thought. so I thought better to go with something that already has proven it's worth...
HTH
JarC
Edited by - TEggHead on 06 Jun 2002 00:12:33