The Un-Official Proxomitron Forum

Full Version: Help with a URL cleanup
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
thought I had this one working, but nope, it does not

details

The CID= can be any number from 5 to 9 digits, ie CID=38744477 or CID=29846

Matching expression:

<a\1href=$AV(http://www.zzz.zn/redir.php?y=[0-9]\&xp=[0-9]\&CID=*\&url=\2)\3>

Replacement text

\1href="$UESC(\2)"\3

Test (which fails to [Match])

http://www.zzz.zn/redir.php?y=1&xp=4&CID...F2812.html
Code:
[Patterns]
Name = "URL Cleanup"
Active = TRUE
URL = "$TYPE(htm)"
Limit = 256
Match = "<a\1href=$AV(http://www.zzz.zn/redir.php\?y=[0-9]\&xp=[0-9]\&CID=[0-9]+\&url=\2)\3>"
Replace = "<a\1href="$UESC(\2)"\3>"

A few notes Smile!

-you must escape the ? character using a backslash
-when testing filters, remember to test it as if it were on the actual page; in other words, copy/paste the code section from the site, such as:

Code:
<a href="http://www.zzz.zn/redir.php?y=1&xp=4&CID=72940&url=http%3A%2F%2Fwww.cnn.com%2Fnews%2F2812.html">

Hope this helps Wink

Guest

I used your code just a provided above (copy and paste) and then used the bottom code for the Test window

received result
[No Match]

????
The sample link is cleaned up fine with the standard Un-Prefix URLs filter.. .

I've created a blocklist for all the sites using redirected URLs, and modified the filter to use it. There aren't may it won't handle. However, I've come across several sites which redirect to the same host, and I've created a modified filter to trap these, since the standard filter excludes them. The (^\h) excludes the same host, so I've removed it:
Code:
Standard Un-Prefix URLs:

href=($AV(?????*[^a-z0-9]((http|ftp)(%3A|:)(%2F|/)(%2F|/)(^\h)[^&]+)\1*)&("|)\0)

Modified:

href=($AV(?????*[^a-z0-9]((http|ftp)(%3A|:)(%2F|/)(%2F|/)[^&]+)\1*)&("|)\0)
Reference URL's