The Un-Official Proxomitron Forum

Full Version: A Future Proxy
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
A new proxy may be coming, but not for several months at least. Hopefully you won't see this post as spamming - The proxy is not currently available, there's no web site I can send you to, and there isn't even a formal name for it yet. The full version probably won't be freeware, though I hope to be able to produce a powerful subset that anybody can have.

I tried to attach a couple of images showing interesting statistics produced by the proxy, but perhaps only certain forum members can do that.

The main objective of the new proxy is probably security / privacy. I think most people may want to use it for ad blocking, but if doing so then I truly hope they'll 'accidentally' wind up being a bit more secure too.

There's at least 2 things that Proxo does that I will be shying away from:

1. I'd really rather not get into decrypting SSL content. We don't want this proxy to be able to see or manipulate things like people's bank accounts. However, we also can't ignore the fact that bad things can happen from crud that hides behind an SSL mask. So we did put in some special consideration for SSL into some of the proxy's blocking methods. For now that's about as far as I'm willing to go for SSL.

2. While it can do several kinds of forwarding to other proxies and servers, there won't be an option to specify multiple proxies to try. Different URL patterns can be sent to different proxies and/or gateways, but once a determination is made then the chosen routing is final. It won't try more than one proxy to see if they're operational, and it won't choose from a list of 2 or more equal choices.

Another difference is how users interact with the proxy. In Windows (the only version being built at this time) there's a tray icon that enables viewing a scrolling log window and menus that enable / disable various options. However, compared to Proxo, most user interaction is via the web interface of the proxy itself. It uses HTML and forms where Proxo has windows and dialog boxes.

We're now trying to throw together some documentation for it and that's much more difficult than anticipated. The proxy has too many options, many of which either required in-depth knowledge of HTTP or are unlikely to ever be truly useful to anybody. Early on it was simple to just throw in some quirky option, but in order to release it we're forced to choose between documenting them or removing them.

This is definitely not a Proxo clone by any stretch of the imagination. Honestly I've never been able to quite grasp the language of the Proxo specifications. I get lost trying to study some of the large Proxo configurations, sometimes I can't even figure out exactly what they were trying to accomplish. Chances are that people may hate our specification formats, but they make sense to me (and maybe only to me). It's likely that Proxo does some things much easier and/or better than ours.

As a general rule, I prefer blocking requests based on domain and URL patterns rather than mucking around in server response content. When I spot something undesirable at some site, I'm much more prone to add that site to a block list rather than neuter its particular HTML / script issues. I don't ever want to scan through every HTML looking for links to doubleclick (whom I dislike for tracking). I think it's easier and faster to leave a site's page content alone as much as possible - and then block crud when the browsers ask to fetch from places I don't want.

Currently the proxy reflects my personal preference for blocking vs. manipulation. It has quite a few more methods to identify requests to prohibit than it has for content management.

I'm not totally against content filtering though. When Google turned on their 'Suggest' keylogger kind of crap, I quickly went for the kill without much contemplation. Google then lost some of their 'Sponsored Links' out of anger. If Google turns OFF their 'Suggest' or restores it to opt-in, then I'll be happy to see all their 'Sponsored Links' again. And "No", I'm not going to permanently keep their cookies nor forge some for the purpose of muting their 'Suggest'.

Here are a few other tidbits and opinions that may be popular for discussion:

In its default configuration, the proxy validates the data content of images for most popular types (GIF, JPG, PNG, etc). We found that it doesn't require much resource to do that, and it adds a beneficial layer to help defend the browsers. It's surprising how often something like a 'gif' file is actually not 'gif' content at all, even when it comes from 'respectable' places. The proxy also does validation of image MIME, occasionally discarding an image based on the Content-Type header presented by the server, or fixing a bogus MIME to reflect the actual image content. I wish browsers would pay more attention to what they should get vs. trying to interpret crap content.

Regular expressions can be used in quite a few places, including URL patterns and data content modification. RegEx knowledge isn't always required because there's also plenty of Non-RegEx methods. Currently it's using a DLL of PCRE ( http://www.pcre.org/ ). I've used other RegEx libraries with the proxy, but PCRE seems efficient enough and is well maintained.

You can configure the proxy to read HOSTS files directly. Too bad the Windows operating systems (and others) don't handle large HOSTS efficiently, but this proxy certainly does. It hashes the entries and builds fast-scan buckets designed just for the purpose. Though compatible with any HOSTS file, the domain blocking goes beyond that. For example, a specification to block 'example.com' could also block 'www.example.com', 'other.stuff.example.com', etc. Some repetitive HOSTS file zone entries can be absorbed into other parent specifications having less zones.

Another optional feature caches IP addresses. I've been surprised how many sites have a low DNS TTL, so the cache feature helps boost the speed of browsing activities. When some goofy network engineer casts a low TTL for purposes of load balancing or fail-over, then we're better off choosing not to participate in the (usually) useless DNS re-queries. But if the proxy can't make a connection using an IP that it cached, then it will re-query DNS and retry the connection if appropriate. An in-process cache also seems to help the Windows threading by avoiding the OS call to (re)get the host server IP. It makes sense for common browsing where there is an HTML fetch followed by a burst of many images, CSS, scripts, etc. competing for resources.

Logging needs to be plentiful. There's the scrolling Log Window showing URL usage and things like blocking status, a flexible in-memory tracking feature that shows usage grouped by host servers, and there's a feature to store hourly or daily activity into files. You really need to know exactly where you've been so that you can decide whether you ever want to go there again. Sorting the in-memory tracking by most-used host servers is helpful to spot usage that may need attention - they're usually the ones Not accessed frequently (like trackers and malware sites) because their domains or IP's are not an integral part of the sites you were viewing.

Once the proxy blocks something, the next step (which is often overlooked) is to determine how to respond to the browser. With ours you can usually specify exactly what you want the browser to get, including things like substituting a file of content or redirecting to another URL. By default the proxy tries to give back content determined by what was asked for: A request for a blocked image returns a valid image, one for a script will return a valid script (consisting of one blank byte), SSL and a few others yield special HTTP response codes, etc. I think a proxy should try to act as a beneficial front-end to assist the client browsers, definitely don't want to throw any garbage back at them.

Pipelining is really tough on a proxy, especially one that wants to interdict and filter on the user's behalf. It would have been SO much easier if the proxy could ask "may I have some more please" vs. having to detect and deal with request floods jamming in. Different browsers seem to have different rules about how they pipeline, but so far we've been able to keep dancing without having to detect who's doing it. In general, Firefox seems to have gotten better at it, and Opera can get downright aggressive at times. We could probably improve the proxy's methods for pipelining support, but in this arena I'm currently more concerned about maintaining future compatibility than I am about being tuned to today's environment.

Thanks for reading. I'm interested in hearing other opinions about proxy functionality in general.
Greycode;
Quote:Thanks for reading. I'm interested in hearing other opinions about proxy functionality in general.
Well, that's ambitious! Wink

But my only comment is that The Proxomitron was built by Scott Lemmon with one guiding credo: "Viewing the web your way", implying 'not the way the page-generating idiots want you to view it'. That's why he scans HTML code, etc. and modifies it on the fly. Blocking can be handled by other products equally as well, but in the end, for me as well as for him, it's not the sneaky-pete under-the-radar spy stuff that gets me, it's the brain-blasting, mind-numbingly stupid, ultimately offensive way the content is presented. Ask Siamesecat about background colors, or Shea about themes, and you'll get an earful. Sinister Ask ProxRocks about eliminating images while gathering important financial data from a site that simply can't be s-canned. (Oh wait, be prepared for more than an earful!)

And most importantly, all of that devolves to the fact that SSL is only a method of encapsulating content, it's no more secure than the TP found in the Men's Room on the second floor. Go ahead and use the standard libraries, just as you did for RegExp's. I myself visit a site every day that thinks it has to have SSL (https://), but it's so full of distasteful crap that if I didn't filter it down to the essentials, then I'd probably be broke about now. Albeit my eyesight and my mindset would be in good shape. Whistling

Oh, and maybe a second comment..... Errr, I'm not sure that you and I would agree on what TimeToLive means.

Tell me, don't most servers maintain a fixed IP address? And if that's the case, then DNS caching problems (if they exist) wouldn't be so much concerned about time as they would about MRU algorithms, no? Besides, load balancing is on the server end, the user will never realize in any practical sense which particular machine within a server farm actually serviced his query. Moreover, the HTTP 1.1 spec says that we can keep a connection open for generating and fulfilling several/many queries in a streaming fashion, hence I'm of a mind that TTL isn't worth too much fuss.

But do carry on!

HTH




Oddysey
Nearly all modern browsers have URL blocking ability with or without using an extension. So why would people still use a local proxy? I think one of the important need is content filtering. I know some IE shells even have regex filtering built in.

Does it support socks parent proxy? I can't wait to try it if a development version is available. Smile!
put me in the Odd-camp...

*viewing the web YOUR way* is of UTMOST priority for ALL of us Proxo-users (else we'd all be simply relying on ad-blocking HOSTS files or modern-browser "content filters")...

SSL-filtering is also a MUST...
i get more flashy-girating-imagery promo-crap eyesore BS on some of my "secure" sites than i do a DOZEN non-secure sites COMBINED...

and several of my "secure" sites will do POP-UP "survey" CRAP if i were not filtering out that eyesore BS...


i'm all for a "new" local proxy hitting the scenes, but i suspect MANY of us Proxo users will be VERY picky before "jumping ship" Big Teeth


(how'd i do, Odd? that was 'minimalist earful', was it not?)
Thanks for the replies, and I'm glad you people are willing to express your opinions.

I'm not Scott Lemmon. The primary credo of this new proxy is not the same as Proxo.

If SSL content modification is really essential then eventually it will have to do that. For any non-SSL content modifications or protocol manipulation, we've already got the ability. We'll even do it on the fly with content that arrives chunked and compressed. If you're running the proxy on your own PC it would be a waste to re-compress filtered content, but if you're accessing it from elsewhere then you might turn on the recompression option.

ProxRocks - I hope you won't even consider "jumping ship". If a new proxy becomes appealing, just chain it before your Proxo. You might initially send everything through Proxo. Later, for example, you might let the new proxy handle things like image requests indepently and send anything else through Proxo.

Whenever - People may want to use a local proxy in addition to what a browser can do because the proxy's capabilities would be inherited by all browsers, or things that can act as browsers such as many email clients or various media players. I'm streaming WinAmp through it as I write this. It enables capabilities that may only available as Firefox extensions to IE and Chrome. It also enables other things that browsers either don't or can't do. Whatever your browser currently does, it can always keep doing that.

A few versions of Socks are supported, and by the way Tor works fine.

Regarding Odyssey's question on TTL: Like I said, I was surprised at the results of using an in-process cache. Maybe a lot of that speed gain was due to not having to always make the OS call before connecting, but still there's low TTL out there. Several chunks of Yahoo use 60 seconds of TTL, the www of cnbc.com has 20 seconds, addons.mozilla.org lowers the bar with a whopping 5 seconds.

I hope the images are attached this time, and sorry I had to blot a couple of things. FYI, in the 'Server Response Items' some have an asterisk, an HTML title remark appears on mouse hover. The one for 'Resp Code' indicates that those counters exclude SSL tunnels. The one for MIME says that they exclude SSL and also responses having no real content. The one for 'Resp Size' says they include HTTP headers and SSL.
Graycode;

Granted, this is your baby, not Scott's, and you did give us fair warning at the outset, so the onus is on us, not you. So long as a user can chain proxies together, then there really should be no problem, eh? Whistling

But one thing you mention catches my eye..... Running WinAmp through your proxy tells me either that WinAmp is using Port 80, or else you're filtering on more ports than just 80 and 443 - which is it? Proxomitron filters only those two ports, per Scott's original intent, and some few of us have been lusting after the ability to filter any and all ports. Your thoughts on that, please.

I've replied to your question about attachments, in the other thread, but let me use this one as an excuse to further "pound the pulpit", if you will. Liar The essence of making a point is to clarify your intentions for the reader. If I have to stop what I'm doing (reading your prose) and go to the bottom of the post, click a link, wait for something to happen, then peruse that while it's "conveniently" covering part or all of my reading material, then futz around arranging the two (or more) windows, then the drive of your original posting is lost on me. Others may have the attention span necessary for this kind of shenanigan, but I, unfortunately, no long have such - it's a common malady amongst us older farts, we seem to want things to be ever-more-easier. Drool

Adding the image directly after your point makes it all crystal-clear, without having to scroll, jump, finger-twitch, screw around, whatever. And your narrative continues in a cohesive fashion, at least for someone like me (your choice, am I senile, or am I in the throws of ADD Sad), which most folks would account as a good thing. Hail

Please, take my remarks with a large dose of salt, and do carry on! Big Teeth

~!~!~!~!~!~

ProxRocks;

That had to be the most terse reply I've seen from you in ages! You drop a 'lude this morning?


[Image: rolley.gif]



Oddysey
(Oct. 31, 2008 07:01 PM)Oddysey Wrote: [ -> ]But one thing you mention catches my eye..... Running WinAmp through your proxy tells me either that WinAmp is using Port 80, or else you're filtering on more ports than just 80 and 443 - which is it?

This proxy accommodates the HTTP protocol. The particular port number is often insignificant. Port number does come into play in a few scenarios that involve security. This proxy will always refuse to connect to any port less than about 20, I forget if we included FTP, Telnet, and SMTP. The purpose of that is to prohibit some site from having your own browser or the proxy probing your own LAN or that of a host server. Then there's a firewall section of the proxy where I normally prohibit various port ranges below 1024 (again for security because I don't want to probe host servers either).

Finally there's SSL tunneling that has been used by creeps to get into places that they wouldn't normally be allowed if they weren't having the proxy do that. We allow specificatin of special rules for ports where tunneling is and isn't allowed, default is only 443. Just because a request uses SSL CONNECT doesn't insure actual SSL, and unfortunately too many servers may have fallen prey to that.

As for the WinAmp, the ShoutCast ICY protocol is significantly close to HTTP. Below is the headers involved in what I was listening to - look familiar? That of course is followed by audio data content pumping through. I'm currently on my Win98 machine, so it's Winamp 5.32.

Code:
GET / HTTP/1.0
Host: 87.98.219.173
User-Agent: WinampMPEG/5.32
Accept: */*
Icy-MetaData: 1
Connection: close

ICY 200 OK
icy-notice1:<BR>This stream requires <a href="http://www.winamp.com/">Winamp</a><BR>
icy-notice2:SHOUTcast Distributed Network Audio Server/Linux v1.9.8<BR>
icy-name:[XRM] - Alternative
icy-genre:Rock Alternative
icy-url:http://www.xrmradio.com/?g=a
content-type:audio/mpeg
icy-pub:1
icy-metaint:32768
icy-br:96

What's noteworthy about that in terms of a proxy is that it can't insist on having full content before feeding back to the browser. It "streams" the content through, even when filtering the data. Ok, some content filters need to specify how much they want it to collect before each round of filtering (hard to find anything if seeing only a few bytes at a time). Hopefully you get the idea.

(Oct. 31, 2008 07:01 PM)Oddysey Wrote: [ -> ]I've replied to your question about attachments, in the other thread, but let me use this one as an excuse to further "pound the pulpit", if you will.

Well I'm the newbie here, so feel free to pound away. I do see and appreciate your point about images being attachments instead of being seen in line with text at this forum.

(Edited) This is one of the images using the attachment as its source.
[Image: attachment.php?aid=185]
Reference URL's