Author Topic: Word Blocking  (Read 4286 times)

paulu

  • Newbie
  • *
  • Posts: 2
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • Email
Word Blocking
« on: August 22, 2002, 12:25:20 AM »
Hi Guys,
        Looking for some serious help here, I am an absolute total novice
with prox, I think the term "wet behind the ears" would best describe my
current situation..

Here's my problem, i work in a school for emotionally disturbed teenagers
and my current web filter is very slow and to be honest is very naff..

I have been able to get Prox to run without a hitch and can block specific web
sites no problem!!

It's just configuring it to block "specific" words thats giving me a headache
i need to be able to filter out urls that contain such words as porn, hardcore
etc etc. guess you all get the picture..

Problem of course is that whilst i can block specific sites, our errr
young chaps just go and look for new sites, which are not yet in the blocklist.

I did find the following file which i have been testing but no joy ..

             
+++ Filter Starts Here +++
Name = "Keyword content blocker"
Active = TRUE
Multi = TRUE
Limit = 200
Match = "(<html>|<head>)*(sex|porn|hardcore|adult|teen|xxx)*"
Replace = "<html><head><title>Forbidden</title></head>"
          "<body>"
          "<h3>This page is just to good for us to let you see it!</h3>"
          "</body></html>
k"

+++ Filter Ends Here +++

I know you guys probably think i am a plonker, but whilst given time i
could hack my way throught the problem, i am working against the clock
so please any help here would be most welcome..

Cheers All Paul..


 
 

Jor

  • Sr. Member
  • ****
  • Posts: 421
    • ICQ Messenger - 10401286
    • AOL Instant Messenger - jor otf
    • Yahoo Instant Messenger - jor_otf
    • View Profile
    • http://members.outpost10f.com/~jor/
    • Email
Word Blocking
« Reply #1 on: August 22, 2002, 12:55:04 AM »
quote:
It's just configuring it to block "specific" words thats giving me a headache
i need to be able to filter out urls that contain such words as porn, hardcore
etc etc. guess you all get the picture..


Yup, sure do.

A very basic filter that does this, is the following:

[Blocklists]
List.KillList = "..ListsKillList.txt"

[Patterns]
In = FALSE
Out = TRUE
Key = "URL-Killer: not allowed (out)"
URL = "$LST(KillList)"
Replace = "URL from KillList killedk"


It uses a list called KillList, which contains the words you are looking for. It will prevent any access to those sites from browsers which are behind the Proxomitron.
The advantage of using a list, is that it is far easier to edit and maintain.

It will display the page killed.html in Proxomitron's HTML folder instead, so edit that one for your wanted replacement page. Since that can be any HTML page, it is even possible to send an administrative warning by using scripting.


Let me know if this is what you are looking for, and I can upload a zipfile with config file and example blocklist you can directly import into Proxomitron.

Oh, and make sure you are using Proxomitron 4.x

quote:
I know you guys probably think i am a plonker


Not at all -- we were all n00bs once

 
 

hpguru

  • Sr. Member
  • ****
  • Posts: 257
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://lightning.prohosting.com/~hpguru/
    • Email
Word Blocking
« Reply #2 on: August 22, 2002, 01:21:56 AM »
I have a set of filters which I wrote just for that purpose and which I have attached. Just unzip the files to your proxo folder and merge the Oops.cfg with yours. Then whatever keywords you want to block just add them to the Oops list. If you find a site is blocked in error add its hostname to the OopsExceptions list. If you find a site that should have matched but didn't and you are sure it contains one of the keywords you are filtering then increase the byte limit in the "Oops! II - Body Check" filter until it does match.

The Oops list I am providing is empty because I don't feel right publishing a list of nasty words, but if you like I will send it to you via email. My addy is hpguru --AT-- hotmail -DOT- com.


List.Oops = "..ListsOops.txt"
List.OopsExceptions = "..ListsOopsExceptions.txt"


[HTTP headers]
In = FALSE
Out = TRUE
Key = "URL-Killer: Oops! (out)"
URL = "((*$LST(Oops))&(^$LST(OopsExceptions)))&$RDIR(http://Local.ptron/Oops.html)"


[Patterns]
Name = "Add </head> tag when missing - altosax v2"
Active = TRUE
Multi = TRUE
URL = "$TYPE(htm)"
Limit = 7
Match = "(</head>)1$STOP()|(<body)2$SET(1=</head>)"
Replace = "12$STOP()"


Name = "Oops! I - Header Check"
Active = TRUE
URL = "(^$LST(OopsExceptions))"
Bounds = "$NEST(<head>,</head>)"
Limit = 2500
Match = "*($LST(Oops))*"
Replace = "<head><title>Oops!</title>
"
          "<SCRIPT LANGUAGE="JavaScript">
"
          "document.location.href = "http://Local.ptron/Oops.html";
"
          "</SCRIPT>
"
          "<NOSCRIPT>
"
          "<meta http-equiv="refresh" content="0;url=http://Local.ptron/Oops.html">
"
          "</NOSCRIPT>
"
          "$STOP()"


Name = "Oops! II - Body Check"
Active = TRUE
URL = "(^$LST(OopsExceptions))"
Limit = 2048
Match = "</head>*($LST(Oops))*"
Replace = "<SCRIPT LANGUAGE="JavaScript">"
          "document.location.href = "http://Local.ptron/Oops.html";"
          "</SCRIPT>"
          "<NOSCRIPT>"
          "<meta http-equiv="refresh" content="0;url=http://Local.ptron/Oops.html">"
          "</NOSCRIPT></head><body></body></html>"
          "$STOP()"




Attachment: oops.zip 55,84 KB

Edited by - hpguru on 22 Aug 2002  02:39:44
Facing each other,
a thousand miles apart.

altosax

  • Sr. Member
  • ****
  • Posts: 328
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Word Blocking
« Reply #3 on: August 22, 2002, 01:50:12 AM »
i suggest you, in addition to what other here have wrote, to go to the steve martin hosts pages:

http://www.smartin-designs.com

and download his hosts file for adult sites. it is named adult.zip and can easily found on his site. download and use it to completely block all known porn site (it has more than 42000 entries).

regards,
altosax.

 
 

paulu

  • Newbie
  • *
  • Posts: 2
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • Email
Word Blocking
« Reply #4 on: August 27, 2002, 10:40:40 PM »


Hi Guys
       Thanks for the assistance, have had proxo running on a temp machine at work for the last few days, with a clean install of 98 and proxo 4.3 its as stable a a rock "at present!!"

But have noticed its just a tad slow, i.e. theres a delay as compared to the original porn filter, its currently running on an old 300mhz cyrix,(identical machine to "other filter" with 32 meg ram, However i have another spare machine to try it on (500mhz celeron 128 meg) but i was wondering what helps proxo most, more ram or more cycles..??

Cheers for now, and thanks again for the help, makes a nice change to find
a forum that actually does help out newbies...

Paulu.....

 
 

JD5000

  • Full Member
  • ***
  • Posts: 241
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://home.satx.rr.com/jd5000/
    • Email
Word Blocking
« Reply #5 on: August 27, 2002, 11:15:05 PM »
Hi paulu,

I would think CPU would help more then ram..

Also, have you thought of blocking search engines? Unless they now the url of the  site, how else would they get there? If say, you only allow google as the search engine & have a filter that blocks any query that contains a "blocked" word. You could then redirect them to the killed.htm. You could also log the time & date, if you wanted to...

Just a random thought.

~JD

--------
Infopros Joint :: Computer Related Links And Discussion

Edited by - JD5000 on 28 Aug 2002  00:24:02

TEggHead

  • Jr. Member
  • **
  • Posts: 93
    • ICQ Messenger - 21893433
    • AOL Instant Messenger -
    • Yahoo Instant Messenger - eljarec
    • View Profile
    • Email
Word Blocking
« Reply #6 on: August 27, 2002, 11:45:23 PM »
quote:

but i was wondering what helps proxo most, more ram or more cycles..??



If the main purpose is to block access, then I'd rather opt for the hosts file method, this way all the listed pornhosts will end up being resolved to localhost (i.e. the URL resuest does not even leave the machine )

Also, I would copy the pornhosts file to my blocklist folder and add the file as blocklist (e.g. named PornHosts), strip out the ip numbers to leave only the hostname on each line.
Then I'd create just a headerfilter (out)

In = FALSE
Out = TRUE
Key = "URL: Block Pornsites"
Match = "*&$URL(*$LST(PornHosts)*)"
Replace = "PornHost Blocked<---k"

JMO




Edited by - TEggHead on 28 Aug 2002  00:46:57
 

altosax

  • Sr. Member
  • ****
  • Posts: 328
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Word Blocking
« Reply #7 on: August 28, 2002, 12:23:16 AM »
tegghead wrote:

quote:

In = FALSE
Out = TRUE
Key = "URL: Block Pornsites"
Match = "*&$URL(*$LST(PornHosts)*)"
Replace = "PornHost Blocked<---k"



i agree with the tegghead suggestion to use a hosts file.
as alternative method you can use his filter but if you download the steve martin's hosts file for adult sites and strip out the localhost ip, you don't need the match field he suggests because the hosts file contains the complete address of the sites and hosts blocked.
more simply, you can use:

URL = "$LST(PornHosts)"

or, if you prefer, one of these:

URL = "$LST(PornHosts)/"
URL = "$LST(PornHosts)*"

<edit>: just replaced the wrong Match field with the URL field. sorry for the mistake.

regards,
altosax.

Edited by - altosax on 28 Aug 2002  13:50:28
 

TEggHead

  • Jr. Member
  • **
  • Posts: 93
    • ICQ Messenger - 21893433
    • AOL Instant Messenger -
    • Yahoo Instant Messenger - eljarec
    • View Profile
    • Email
Word Blocking
« Reply #8 on: August 28, 2002, 09:43:27 AM »
quote:

you don't need the match field he suggests because the hosts file contains the complete address of the sites and hosts blocked.
more simply, you can use:



True, the hosts file contains the full domain, but you need to look otherway around, URL field will contain more than just full hostname, it will have protocol and path also present, this is why wildcard in front and trailing is present...

I mean, the last time I tried this, expressions had to match against the full URL...unless you use the HOST header to filter on, which does indeed only contain exact what is listed in the promoted 'hosts to blocklist' file



 
 

altosax

  • Sr. Member
  • ****
  • Posts: 328
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
Word Blocking
« Reply #9 on: August 28, 2002, 12:11:13 PM »
you are right, tegghead, about the need of a leading wildcard with web filters, not header filters: the protocol is removed from proxomitron in header filters so you don't need to match it (look at the bypass filter, or the allowcookie filter).

also i've specified i'm referring to the steve martin's adult hosts file i've suggested to use in my first post on this thread.

new comments are welcome,
altosax.