Post Reply 
Filter to add hostname to links
Jun. 16, 2006, 10:40 AM
Post: #1
Filter to add hostname to links
I am trying to write a filter to add a hostname to a link that doesn't have one.
I think that is the root of the problem that I have with Media Player Classic accessing Shockwave Flash links. If the hostname is missing from a link, it is
omitted from the request, which results in an unworkable request for a new page.
I tried matching a link like this:
Quote:<a href="/comics/comics_todays.html">Today's Comic[/url] ?
I was unable to get any matches with these 3 attempts. What did I do wrong this time?
Code:
[Patterns]
Name = "Add hostname to links 1"
Active = TRUE
Bounds = "<a\s*>"
Limit = 170
Match = "href=$AV(/\4)"
Replace = "href="http://\h/\4""

Name = "Add hostname to links 1a"
Active = TRUE
Bounds = "<a\s*>"
Limit = 170
Match = "$SET(host=\h) href=$AV(/\4)"
Replace = "href="http://$GET(host)/\4""

Name = "Add hostname to links 2"
Active = TRUE
Bounds = "<a\s*>"
Limit = 170
Match = "$SET(host=\h) $SET(2=https+://)"
            "href=$AV(^$TST(\2host) \4)"
Replace = "href="\2$GET(host)\4""
Add Thank You Quote this message in a reply
Jun. 16, 2006, 07:00 PM
Post: #2
RE: Filter to add hostname to links
Siamesecat Wrote:What did I do wrong this time?
The Bounds and Match must match the same thing.
Your Bounds match from < to > but the Match doesn't.

Try
Code:
[Patterns]
Name = "Add hostname to links 1"
Active = TRUE
Bounds = "<a\s*>"
Limit = 170
Match = "\1href=$AV(/\4)\2"
Replace = "\1href="http://\h/\4"\2"

HTH
Add Thank You Quote this message in a reply
Jun. 18, 2006, 07:25 AM
Post: #3
RE: Filter to add hostname to links
JJoe,
Thanks for the tip. Silly me! I forgot about that principle.
I found instances where there was no slash at the start of the page info, so had
to make allowance for that. I finally got a working filter. What I am wondering
now is if there would be any problem with a secure page having a link to another
secure page on the same host? Is it possible to omit the protocol and hostname in
a link to a secure page? If not, then the filter should be fine. If so, how would
I adjust the protocol to match? What do you think about this filter? So far, I
need it only on the Garfield site, but that is bound to change.

Code:
[Patterns]
Name = "Add hostname to links"
Active = TRUE
URL = "[^.]+.(garfield)"
Bounds = "<a\s*>"
Limit = 170
Match = "\0href=$AV((^http(s|)://)(^\h)(/|) \4)\5"
Replace = "\0href="http://\h/\4"\5"
Fixing the links on a page did not solve my problem with Shockwave leaving off
the protocol and the hostname in its links, but it did solve the problem of
some linked images not being displayed in the Shockwave window.
Add Thank You Quote this message in a reply
Jun. 18, 2006, 01:52 PM
Post: #4
RE: Filter to add hostname to links
You can get the protocol from the URL() command.
So maybe:
Code:
[Patterns]
Name = "Add hostname to links"
Active = TRUE
URL = "[^.]+.(garfield)"
Bounds = "<a\s*>"
Limit = 170
Match = "$URL((http(s|)://)\1*) "
        "\0href=$AV((^http(s|)://)(^\h)(/|) \4)\5"
Replace = "\0href="\1\h/\4"\5"
There are still holes (the Base tag, javascript constructs, etc) tho.
I'd probably worry about them later.

I don't now why relative links would be a problem...
Has your set removed a <BASE*> tag?

HTH
Add Thank You Quote this message in a reply
Jun. 19, 2006, 05:55 AM
Post: #5
RE: Filter to add hostname to links
Quote:Match = "$URL((http(s|)://)\1*) "
Since the link being fixed lacks a protocol and a hostname, why would $URL
find any match? What <BASE*> tag?
The page for which I made the filter has no <BASE> tag on it.
None of my filters does anything to <BASE> tags.
Add Thank You Quote this message in a reply
Jun. 19, 2006, 12:19 PM
Post: #6
RE: Filter to add hostname to links
siamesecat Wrote:Since the link being fixed lacks a protocol and a hostname, why would $URL
find any match?
$URL checks the Page's URL. It's like a Filter's "URL Match".
$URL((http(s|)://)\1*) should always match. \1 will contain the protocol.

siamesecat Wrote:What <BASE*> tag?
The page for which I made the filter has no <BASE> tag on it.
None of my filters does anything to <BASE> tags.
How could I know without asking? Wink

HTH
Add Thank You Quote this message in a reply
Jun. 20, 2006, 05:45 AM
Post: #7
RE: Filter to add hostname to links
JJoe,
Thanks very much for your help.
Do you know of any pages which have <BASE> tags? I do not remember seeing any such tags in source code.
Add Thank You Quote this message in a reply
Jun. 20, 2006, 12:21 PM (This post was last modified: Jun. 20, 2006 12:22 PM by JJoe.)
Post: #8
RE: Filter to add hostname to links
Sure.
Pages in Google's cache have BASE tags.
http://64.233.187.104/search?q=cache:www...roxomitron
http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html
http://evolt.org/
and others.

You are welcome.
Have fun
Add Thank You Quote this message in a reply
Jun. 21, 2006, 10:01 AM
Post: #9
RE: Filter to add hostname to links
Hi Siamesecat

Relative paths can be written many ways and (sometimes) resolve differently.
Code:
For example, on http://www.garfield.com/comics :

/swfs/navbar.swf <---resolves to---> www.garfield.com/swfs/navbar.swf
comics_todays_bot.swf <---resolves to---> www.garfield.com/comics/comics_todays_bot.swf

If these same relative paths were at www.garfield.com they would resolve to :

www.garfield.com/swfs/navbar.swf
www.garfield.com/comics_todays_bot.swf
Keep in mind that there are other ways of writing relative paths.

Check out Grypens forum over at castlecops, theres a thread about "Relative Links". You'll find lots of links to sites where you can see different kinds of relative links.

As JJoe mentioned, there can be other factors that determine the absolute path such as the base tag. The base tag can be a problem with IE as it doesn't follow the w3 standard and allows multiple base tags within the page. Supposedly this will be "fixed" in IE7 but I've read where there are still problems with it.

I wrote some filters a while back to convert relative paths to absolute. If your interested, I could post them.

Mike
Add Thank You Quote this message in a reply
Jun. 21, 2006, 02:05 PM
Post: #10
RE: Filter to add hostname to links
http://www.webreference.com/html/tutorial2/3.html

castlecop's thread

HTH
Add Thank You Quote this message in a reply
Jun. 21, 2006, 03:32 PM
Post: #11
RE: Filter to add hostname to links
Now that Prox-List is available...

ConvertRelativePath.cfg

HTH
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: