Post Reply 
The base config
Dec. 22, 2008, 11:19 AM
Post: #16
RE: The base config
ProxRocks Wrote:perhaps it might be a better start if we "define" exactly what a "base config" is going to be...

Is the "base config" the framework?
Perhaps we should use two different names for clarity.
The phrase "base config" sounds like content filtering is happening.

ProxRocks Wrote:so this is what i mean by saying a "framework" needs established - the skeleton needs to be in place so the muscles and cartilage have a place to attach themselves to... once the skeleton, muscles, and cartilage are in working order, we can start applying our skin...

So simplifying the real steps involved...
Step 1: Create a framework.
Step 2: Build a config.

Here's some filters I think would be useful in the framework:
(Not actual filter names)

Headers:
Parse url (out): handy for checking url to see if it's offsite and whatnot.
Capture Real User-Agent (out): Could be handy, if it's spoofed.
Capture Real Referer (out): Could be useful for various filters if the header gets modified.
Flag Offsite urls (out): Real Referer exist's but is not same domain as url.
---
Content-Type: Capture (strip?) charset (in): useful info when injecting scripts and setting default if missing.
Content-Type: Filter xhtml as text/html: By default proxo doesn't filter xhtml. Probably the easiest way to do this.
X-Filter: Don't Filter local.ptron (in): However, I do fix the mime-type for local.ptron scripts.

Web:
Strip leading white space: Why chew up the byte limit needed for "Start" filters on white space.
UTF page converters: Filters won't match on UTF-16 or UTF-32 pages, it's uncommon but why not handle it.
Content Sniffer: At a minimum, checks if page is html or not.
Check Start Html: Arrange head tags so code can be safely injected. Marks where to insert head code.
Inject Head Code: Inject default js, css. Filter specific js & css could be injected afterwards.
Mark End Html: Useful for identifying the real end of page.
Identify "InScript": Probably a two part filter, one at top and bottom of web filters.

Of course, most of these filters would be setting some variables.

z12
Add Thank You Quote this message in a reply
Dec. 22, 2008, 05:22 PM
Post: #17
RE: The base config
(Dec. 22, 2008 11:19 AM)z12 Wrote:  Perhaps we should use two different names for clarity.
Many people doesn't knows what is a framework, me for example, i heard it many times and i have an idea about what it is. Could be more confussing changing the name now, but if framework is the correct name for this, let's call it correctly.

(Dec. 22, 2008 11:19 AM)z12 Wrote:  Content-Type: Filter xhtml as text/html: By default proxo doesn't filter xhtml. Probably the easiest way to do this.

Maybe this will help. Not matching when not needed: text/(css|html|javascript|plain), but in other text/* will match Smile!
Code:
[HTTP headers]
In = TRUE
Out = FALSE
Key = "Content-Type: Enable filtering by Content-Type {ln}081222 (in)"
URL = "(^local.ptron/*)"
Match = "(text/(^css|html|javascript|plain)*)\1"
Replace = "\1$FILTER(true)$LOG(w$DTM(c): Enable filtering by Content-Type (\1) in \u)"

(Dec. 22, 2008 11:19 AM)z12 Wrote:  Content Sniffer: At a minimum, checks if page is html or not.
Check Start Html: Arrange head tags so code can be safely injected. Marks where to insert head code.
Inject Head Code: Inject default js, css. Filter specific js & css could be injected afterwards.
Mark End Html: Useful for identifying the real end of page.
Identify "InScript": Probably a two part filter, one at top and bottom of web filters.
Take a look to the "html zones"
Add Thank You Quote this message in a reply
Dec. 22, 2008, 08:21 PM
Post: #18
RE: The base config
(Dec. 22, 2008 11:19 AM)z12 Wrote:  Capture Real Referer (out): Could be useful for various filters if the header gets modified.
Flag Offsite urls (out): Real Referer exist's but is not same domain as url.

This filter fakes the referrer if it isn't the current host, changing it to http://current_host/
When it matches defines the variable RealReferer
Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Referer: Fake referer if not current host {ln}081222 (out) WIP"
URL = "(^local.ptron/*)"
Match = "( http://(^\h)*)\1"
Replace = "http://\h/$SET(RealReferer=\1)$LOG(C$DTM(c): Fake referer if not current host. Changed from \1 to http\:\/\/\h in \u)"


Edit: Ups, i forgot about making modifications, that's not for the framework... Anyway i will keep it posted here for a future
Add Thank You Quote this message in a reply
Dec. 23, 2008, 04:06 PM
Post: #19
RE: The base config
lnminente Wrote:Many people doesn't knows what is a framework, me for example, i heard it many times and i have an idea about what it is. Could be more confussing changing the name now, but if framework is the correct name for this, let's call it correctly.

The framework would be a stripped down config.
I see it as a starting point, geared to handling basic tasks.

I don't see anybody downloading the framework per se.
Unless they want to build their own config, which is what I would like to encourage.
Hopefully, the framework would make that easier.

Once the framework is in place, I propose we use it to make a basic config.
I think it would be a wasted effort to try and start with a "do all" config.
Just toss in a few filters that deomonstrate how the framework works and what it can do.

Once the basic config is "out there", hopefully people would try it.
At that point, filters could be added based on user feedback.

In the end, no one filter set will be right for everybody.
Proxo is all about re-writing the web "your way".

But hopefully, people would write and submit filters based around the framework/config.
It certainly would make sharing filters easier than it is now.

Just perhaps, a community based config will quell the "proxo is dead" mantra.

z12
Add Thank You Quote this message in a reply
Dec. 24, 2008, 01:01 PM
Post: #20
RE: The base config
(Dec. 23, 2008 04:06 PM)z12 Wrote:  Just perhaps, a community based config will quell the "proxo is dead" mantra.

For sure!! We will do it Cheers
Add Thank You Quote this message in a reply
Dec. 25, 2008, 07:33 AM
Post: #21
RE: The base config
Quote:Headers:
Parse url (out): handy for checking url to see if it's offsite and whatnot.
Capture Real User-Agent (out): Could be handy, if it's spoofed.
Capture Real Referer (out): Could be useful for various filters if the header gets modified.
Why would you want to do that? To where would you capture the data?
If you had spoofed the user-agent, presumably you know about it, so what would be the point of looking up the original one? Same thing about the referer: if you are faking it, why record the real one somewhere?
Add Thank You Quote this message in a reply
Dec. 25, 2008, 11:52 AM
Post: #22
RE: The base config
-Almost things i block use to be offsite.
-The data could be captured to a log file, to a variable, or to the URL itself.
-To know the original user-agent i guess could be good for injecting specific javascripts.
-We could record the real referer for loging purposes. To easy debugging our filters, could be good to Log all or most of the changes we do.
Add Thank You Quote this message in a reply
Dec. 26, 2008, 12:10 PM
Post: #23
RE: The base config
Siamesecat Wrote:Why would you want to do that? To where would you capture the data?

My thought was to capture these values into variables when the request was made, before any header filters kick in.

Siamesecat Wrote:If you had spoofed the user-agent, presumably you know about it, so what would be the point of looking up the original one? Same thing about the referer: if you are faking it, why record the real one somewhere?

You really can't presume too much.
With respect to the "framework", there's really no way to know what filters will be present in a config.

In regards to the user-agent, there's no way to know ahead of time what browser is currently being used.
I suppose you could have a filter that sets a variable that says I'm using browser X.
But if you switch browsers, you'd have to uncheck that filter and enable a different one for browser Y.

If the user-agent is spoofed and there's a web filter that was "browser dependent", examining the user-agent field would be useless.
All the web filter would see would be the spoofed user-agent.

I suppose a header filter that actually spoofed the user-agent could set a variable.
But then your back to setting a variable that could be named anything depending on who wrote the filter.
This would make it harder to share filters.

Pretty much the same logic applies to the Referer header.
Once it's spoofed, you run into the same issues that the user-agent header has.

Also, I can see where other header filters may need referer info before the referer header is processed.
You could end up with several header filters trying to determine if a request is off-site.
To avoid code duplication and redundancy it would be easier to do it just once, right off the bat.

Of course the problem with these variables is that they depend on the browser sending the correct values.
But I don't see a way around that other than document the fact that these headers shouldn't be spoofed by the browser.

z12
Add Thank You Quote this message in a reply
Jan. 10, 2009, 01:58 AM
Post: #24
RE: The base config
Bypass web filter when parsing trusted webs defined in the list

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "URL: Bypass web filters in Trusted webs {ln}081222 (out)"
Match = "$LST(Trusted-Web)"
Replace = "$FILTER(False)$LOG(w$DTM(c): Bypass web filters in Trusted webs: \u)"
Add Thank You Quote this message in a reply
Jan. 10, 2009, 01:36 PM
Post: #25
RE: The base config
lnminente, once you start adding "URL:" style header filters to your config, you can't predict the header filter shoot order anymore. If that doesn't matter, okay.

Otherwise you can easily rewrite each such filter. For instance, my pendant to above filter looks like:
Code:
[HTTP headers]
In = TRUE
Out = FALSE
Key = "| * Bypass Web Filters on sel. Sites - Clear Flags     7.01.07 [mona] (d.r) (In)"
URL = "$SET(hPrefix=)$SET(hRealCT=)$TST(keyword=*.a_web.*)$FILTER(0)&(local.ptron|$LOG(RRESP $DTM(c) : Filters off!))"
Add Thank You Quote this message in a reply
Jan. 10, 2009, 02:24 PM
Post: #26
RE: The base config
I take note, Whenever and Proxrocks also told me that when i wrote the filter for faster access to debugging. The reason i don't still use this method is because i didn't need to change the position of my header filters by now.

I'm posting my filters as i use them, with slightly modifications. If we see this thread come to a real framework and if we can really trust in the method without "URL:" in the title, then it would be better for the framework. Thanks for remembering that Sidki Wink
Add Thank You Quote this message in a reply
Jan. 17, 2009, 01:46 AM
Post: #27
RE: The base config
Sidki, it seems i'm at a step higher in needs for http header filters now, i introduced some variables like extension, and now i need to order then, so recoding some filters as the german Nostradamus predicted Wink
Code:
[HTTP headers]
In = TRUE
Out = TRUE
Key = "!:1. Set var extension {ln}090116"
URL = "$SET(path=\p)$TST(path=([^/]+/)+([^.]+.([^.]+)\1)+)$SET(path=)$SET(Extension=\1)"

Replaced by following URL Parser filter
Add Thank You Quote this message in a reply
Jan. 17, 2009, 05:58 PM
Post: #28
RE: The base config
Smile!

Just thinking... In case you want to set more URL dependent global variables - "Domain" for instance, you could do that with a single filter/list.

Here is what i use, maybe some parts are of use for you, too:

edit: Defunct inline code replaced with intact attachment.


Attached File(s)
.txt  URL-Parser.txt (Size: 2.06 KB / Downloads: 738)
Add Thank You Quote this message in a reply
Jan. 17, 2009, 07:46 PM
Post: #29
RE: The base config
I tested it in your config and it works very nicely.Superb Sidki!

Following filter extracted from last sidki config, it needs the above list
Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "! : URL Parser {sd,th}040526 (Out)"
URL = "$LST(URL-Parser)"
Add Thank You Quote this message in a reply
Jan. 17, 2009, 07:56 PM
Post: #30
RE: The base config
If you really want to use it for your project, feel free to change whatever you want, of course! Smile!
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: