The "BadWord" filter referred to above is slow because of its leading wildcard.
When writing filters, the goal should be to make it fail (NOT match) as soon as possible.
Below is a (fast) webfilter I wrote for a slightly different purpose.
It will ALMOST do what you want ~~ match against visible text and flag "keywords".
I say "almost" because it BRACKETS matched keywords with underscore characters, rather than actually inserting html tags in an attempt to highlight the text.
It does this because I haven't been able to figure out a BOUNDS or $NEST
argument that will prevent the filter from matching/replacing within TAGS, scripts, etc. (which would be a baaaaad thing)
Name = "word flagged when Keyword found"
Active = TRUE
Multi = " TRUE
Match = "(([^=])(s| |[-\=:>'[(;?/.",]))1($LST(Naughty)([a-z-]+|))2"
Replace = "1______2______"
=============
long-winded explanation of the filter logic:
The core idea here is to match starting at beginning-of-words only.
(Selectively ennumerating valid "leading" chars
makes the filter quite a bit faster.)
In the match string, valid starting points (characters) are ennumerated as:
s| |[-\=:>'[(;?/.",]
in other words, a "word" can ONLY begin following a SPACE or TAB character,
or a dash, a backslash, a forward-slash, an equal sign, a colon...
(extra backslashes in it because some of the chars need to be "escaped")
So, the filter won't perform a lookup in your blocklist until it reaches one
of the "valid" characters. YOU MAY WANT TO ADD/DELETE "VALID"
CHARACTERS, INSTEAD OF USING EXACTLY THOSE I'VE LISTED.
Yep, the filter only is SUPPOSED TO match from beginning-of-word;
This special handling is due to how we (I) have defined a "word".
The tail-end argument
([a-z-]+|)
Is there to accomodate the replacement... and to enable you to use
word STEMS as blocklist items. (I wrote it this way for use in
a porn filter ~~ so that one blocklist item like "fat(-|s|ass|)"
can cover a lot of ground.) YOUR BLOCKLIST CAN ALSO CONTAIN
HYPHENATED WORDS, AND MULTIPLE WORDS (PHRASES).
=========Example:==========
That ending argument serves the purpose of including any extra, end-of-word,
characters in the match... but STILL allowing the match to return TRUE if
there aren't any extra ~~ if the word found in-page EXACTLY matches a
blocklist word. Example: Put "curd" on a blocklist line, and the filter will
match 'curd', 'curdhead', 'curd-head', and 'CurdsRUs'.
It will not (by design) match "thecurd".
---------------------------
=============FOR FURTHER DEVELOPMENT===========
Here's my attempt at a "nested" version of the matchstring.
It didn't work as expected.
Match =
"$INEST(>,([^=])(s| |[-\=:>'[(;?/.",])1($LST(Naughty)([a-z-]+))2,<)"
I'VE NEVER SEEN A PROX FILTER THAT IS CONSTRAINED TO MATCHING
ONLY *VISIBLE* TEXT WITHIN A PAGE, AND WOULD GREATLY APPRECIATE
ANY HELP TOWARD ACCOMPLISHING THIS.
===============================================