Post Reply 
[Req] Washingtonpost.com redirect address bar URLs and links to the print page
Mar. 11, 2009, 11:51 PM
Post: #1
[Req] Washingtonpost.com redirect address bar URLs and links to the print page
1. Where I want the filter to match:
Should match when URL is put in address bar and also match on links to Washingtonpost.com pages.

2. What I want the end-result to be:
Any ocurrence of ".html" should be replaced with "_pf.html"

3. Examples:
change: http://www.washingtonpost.com/wp-dyn/con...id=topnews
to: http://www.washingtonpost.com/wp-dyn/con...id=topnews

AND

change: http://www.washingtonpost.com/wp-dyn/con...01499.html
to: http://www.washingtonpost.com/wp-dyn/con...99_pf.html

4. Orher Info
I'm very familiar with Proxo but have only modified filters but never written one. I did get a header filter to test match and replace correctly but it would not work when the address was typed in the address bar.
Here is that filter:

[HTTP headers]
In = FALSE
Out = TRUE
Key = "URL: Print Wpost"
URL = ""[^.]+.(washingtonpost)""
Match = "(*)\1.html"
Replace = "$JUMP(*)\1_pf.html"

I guess I also need a Web Page Filter that targets the href tag.

Thanks very much.
Add Thank You Quote this message in a reply
Mar. 12, 2009, 01:22 AM
Post: #2
RE: [Req] Washingtonpost.com redirect address bar URLs and links to the print page
The replace code is buggy, try $JUMP(\1_pf.html) and tell us.
The URL has many "

I trick i use when in the test window is usig JUMP instead of $JUMP to see the result.
Add Thank You Quote this message in a reply
Mar. 12, 2009, 10:55 AM
Post: #3
RE: [Req] Washingtonpost.com redirect address bar URLs and links to the print page
Thanks lnminente. I've much to learn. I think I took your suggestion and I did have extra quote marks in there. I did come up with something that works. I also had to change JUMP to RDIR. Here it is:

[HTTP headers]
In = FALSE
Out = TRUE
Key = "URL: Print Wpost"
URL = "[^.]+.(washingtonpost)"
Match = "(*\1.html)"
Replace = "$RDIR(\1_pf.html)"

Any criticisms or suggestions are welcome. I was also wondering if it was possible for the address bar URL to be changed to the actual page URL like what happens when I type in an alias.
Add Thank You Quote this message in a reply
Mar. 12, 2009, 11:44 AM
Post: #4
RE: [Req] Washingtonpost.com redirect address bar URLs and links to the print page
Hi Dave, nice to see people wanting to learn about proxo Smile!

Some things for a happy start (if i bore you tell me Wink ):
1- use code tags when posting filters. This time was well, but in other time we could have problems importing your filters 'cause some strings could get modified by the forum.

2- header filters are very tricky.
I would like to tell you very clear what i found, but i don't have it clear now because i didn't documented it. So everytime i write a header filter i just remember some tipical fails and i test them in the test window.

The tipycal fails use to be with the URL match, its match is very strange. A tip is leave this part to the end of the filter when possible. And if having problems there try instead using the $URL()
When writing a filter from time to time use (parts of your filter here)\0$LOG(w\0) where the \0 is the string matched, just to be sure it matched what you suposed.
Not all the commands work in match or replacement so try moving them, some also work in the URL bar.
When the kill command \k doesn't work, use a $RDIR or $JUMP instead

If someone have something documented please post here, and we could complete it more a write a guide Wink
Add Thank You Quote this message in a reply
Mar. 12, 2009, 01:20 PM
Post: #5
RE: [Req] Washingtonpost.com redirect address bar URLs and links to the print page
OK, found the formatting in the help:

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "URL: Print Wpost (out)"
URL = "[^.]+.(washingtonpost)"
Match = "(*\1.html)"
Replace = "$RDIR(\1_pf.html)"

I think I really need to get into the Proxo help file. I can see the text matching language can get complex and can only be mastered by use.

I was surprised the filter works on links. I didn't think it would. Shows how much I don't know.
Add Thank You Quote this message in a reply
Mar. 12, 2009, 04:59 PM
Post: #6
RE: [Req] Washingtonpost.com redirect address bar URLs and links to the print page
Yep! The good thing of header filters is they are independant of obfuscated codes or encoded scripts. The browser wants an object from some address and the header filter will see it clearly.

You can see these directions in http://local.ptron/.pinfo/urls/

Some other times, the urls are sent encoded, and the remote server unencodes them and send us to the right address across the Location header...
Add Thank You Quote this message in a reply
Mar. 13, 2009, 04:00 AM
Post: #7
RE: [Req] Washingtonpost.com redirect address bar URLs and links to the print page
(Mar. 12, 2009 01:20 PM)dave Wrote:  I was surprised the filter works on links.
I don't think it would work on your first example url. On the other hand, we need to prevent the redirected _pf.html url from being redirected over and over.
Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "!-|||||||||||| URL: Print Wpost (out)"
URL = "www.washingtonpost.com/(^*_pf.html)\1.html\2&$JUMP(http://www.washingtonpost.com/\1_pf.html\2)"
Add Thank You Quote this message in a reply
Mar. 13, 2009, 10:13 PM
Post: #8
RE: [Req] Washingtonpost.com redirect address bar URLs and links to the print page
whenever, thank you so much. I was looking thru the proxo help trying to refine my filter when I came across Scott saying that it's best to use the URL match part of the filter to test the URL and then I decided to come back here and lo and behold there is your post and of course your filter works fine. It takes out the problems I recognized were in mine.

In the meantime I had written another filter for Yahoo News that accomplishes the same thing. I am sure it is rough and could be refined but it appears to me that because Yahoo uses a different format for its URLs, my filter will work without problems. I could be wrong, of course. I asume I should still be using only the URL match portion of the filter for this function and since I'm not, that makes it unrefined.

Anyways, FWIW, here it is:

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "URL: Print Ynews (out)"
URL = "(www.|)news.yahoo.com"
Match = "(*\1;)"
Replace = "$JUMP(\1/print)"

Go to news.yahoo.com to test my filter.
Add Thank You Quote this message in a reply
Mar. 14, 2009, 03:17 AM
Post: #9
RE: [Req] Washingtonpost.com redirect address bar URLs and links to the print page
It seems all news on that page has a s/ap in their path so I added it to the filter to avoid mismatch.
Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "!-|||||||||||| URL: Print Ynews (out)"
URL = "news.yahoo.com/(s/ap(^*/print(^?))*)\1&$JUMP(http://news.yahoo.com/\1/print)"
btw, please refer to this post why we add !-|||||||||||| before URL.
Add Thank You Quote this message in a reply
Mar. 14, 2009, 08:26 PM
Post: #10
RE: [Req] Washingtonpost.com redirect address bar URLs and links to the print page
whenever wrote:

Quote:It seems all news on that page has a s/ap in their path so I added it to the filter to avoid mismatch.

TYVM for this filter. I have added it and it seems to work fine. Again I see that it does all its work without using the match and replace section of the filter.

About me, FWIW. The only background I have in coding anything is some college courses years ago. I did well in them and expect if I had gone in that direction I would have developed into a competent programmer. Today I understand general coding concepts and have found that when I want I can come up with something to suit my needs whether it be a batch program, a web page or something else. My basic approach is to find examples of what I want and then adopt them to my needs. This usually involves nothing more than copy and paste and trial and error.

As to Proxo, I understand the program but this is my first attempt at writing a filter. As a guess, I would say it would take me a minimum of 100 hours studying and working with filters before I began to really understand them.

Quote:btw, please refer to this post why we add !-|||||||||||| before URL.
Can't say I have a complete understanding of the concepts here but some of it makes some sense. Smile!
Add Thank You Quote this message in a reply
Mar. 15, 2009, 02:29 PM
Post: #11
RE: [Req] Washingtonpost.com redirect address bar URLs and links to the print page
(Mar. 14, 2009 08:26 PM)dave Wrote:  Again I see that it does all its work without using the match and replace section of the filter.
That's because I am using !-|||||||||||| URL instead of URL, which is not a proper http header so the match and replace won't work. URL is not a proper http header too but it is specially handled by proxomitron. You can find this and other more tricks at http://mizzmona.proxfilter.net/proxomitron/notes/. Smile!
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: