Author Topic: HTML Code oversized  (Read 2626 times)

lnminente

  • Jr. Member
  • **
  • Posts: 73
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • Email
HTML Code oversized
« on: July 03, 2002, 01:46:50 AM »
Sorry, I don't know HTML code, but read this.

I have seen webs that bypass the filters, oversizing the code.
Putting too many tabulators between > and <. When the size is bigger than 512 bytes or 1024 bytes, the filter don't go.

Would be good, to have a filter that reduce the size of the html pages, converting that (tabulators are represented as (tab)):

<center>(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)<div align="center">(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)<center>(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)<center>(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)(tab)<div align="center">


In that (with enters and erasing tabulators):

<center>
<div align="center">
<center>
<center>
<div align="center">


I'm working in a filter, but I don't obtain that i would like

[Patterns]
Name = "Reduce size of HTML code"
Active = FALSE
Bounds = ">*<"
Limit = 2048
Match = ">1s2<"
Replace = ">1 2<"

Edited by - lnminente on 06 Jul 2002  03:54:03
 

sidki3003

  • Sr. Member
  • ****
  • Posts: 476
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
HTML Code oversized
« Reply #1 on: July 25, 2002, 08:26:44 PM »
Hi lnminente,

I don't get the point.
If you don't use bounds there is no hardcoded byte limit.
If you use bounds the "limit" part of the filter allows you to choose a byte limit up to 32768.

sidki


 
 

lnminente

  • Jr. Member
  • **
  • Posts: 73
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • Email
HTML Code oversized
« Reply #2 on: July 25, 2002, 11:21:17 PM »
Many thanks sidki3003 and sorry for my poor english.

I am confused, because sometimes "Flash animation killer" works well and other times don't work with the default config of 4.3 version.

I modified some filters and now the flash killer works when code is "oversized".

 
 

sidki3003

  • Sr. Member
  • ****
  • Posts: 476
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
HTML Code oversized
« Reply #3 on: July 26, 2002, 12:26:57 AM »
Good to hear it worked for you
I have to increase the byte limits of certain filters all the time LOL.

sidki

 
 

TEggHead

  • Jr. Member
  • **
  • Posts: 93
    • ICQ Messenger - 21893433
    • AOL Instant Messenger -
    • Yahoo Instant Messenger - eljarec
    • View Profile
    • Email
HTML Code oversized
« Reply #4 on: July 29, 2002, 10:39:27 AM »
I see what you mean, sometimes the added whitespace makes the total character count for a block of html just a bit too large to fit in the bounds specified by a filter, resulting in stuf slipping thru.

But I totally disagree with Sidki about there not beeing a hardcoded limit, no matter if you use bounds or not, a filter will never work if the specified limit is not large enough to fit the smallest required block of html bounded by the tags as targeted by the filter.

Actually using no bounds basically never happens, because even if bounds are not separately specified, they are always part of the matching expression (even * on either side is using bounds, albeit very inefficient), the byte limit however always matters. Only now it serves to specify how much of the html code to cycle thru in one pass of the filter (meaning that it will take longer to finish, while at the same time preventing other filters from having a go at that code until it is finished itself).

So depending on worst case scenario, a large buffer (8000+) may slow down pageloading (especially in determining if a filter fails, which is where speed is of the utmost importance), yet if the matching expression targets a relative small piece of html, that same filter (bytelimit:8000+) will only cycle thru that piece of bounded html if it finds it, and not thru the amount specified by the byte limit...so a 8000+ bytelimit filter may actually never cycly thru more than the odd ~128 bytes at a time, given strict enough bounds. w

In order to accomodate situations where a lot of whitespace is added, a simple tag, normally requiring on average 128 byte limit, may need to be raised to 256 bytes only to accomodate for this.

The problem with trying to reduce the code by stripping whitespace is that you need to have bounds that preferrably are not part of the generally targeted ones. Also, it is almost impossible to distinguish between whitespace that may be removed or whitespace used to format). So any filter targeting whitespace will trigger a lot of times (taking up time too), making it's purpose and effect questionable (sometimes it is simpler and faster to just up the byte limit for that particular filter), having said this, here's what I use to strip whitespace and blank lines...

Name = "General: Strip Whitespace and Blank Lines from HTML code"
Active = FALSE
URL = "$TYPE(htm)"
Limit = 64
Match = "(    |   | ||
|
|
|
|s)+{3,*}"
Replace = "
"


The spaces are intentional, it could also have been written as
Match = "(( |)(   ||
|
|s))+{3,*}"

but I think the first form is better readable...

The {3,*} is so it will only match if at least three matching whitespace chars in a row are found (need not be 3 identical, any combination of at least 3 is valid)

Hope this gives you something to play with...

JarC


Edited by - TEggHead on 29 Jul 2002  12:05:01
 

altosax

  • Sr. Member
  • ****
  • Posts: 328
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • http://
    • Email
HTML Code oversized
« Reply #5 on: July 29, 2002, 04:27:16 PM »
stripping whitespace, you have to take care of the <pre> tag otherwise this filter could break definitively the layout of the page.

regards,
altosax.

 
 

lnminente

  • Jr. Member
  • **
  • Posts: 73
    • ICQ Messenger -
    • AOL Instant Messenger -
    • Yahoo Instant Messenger -
    • View Profile
    • Email
HTML Code oversized
« Reply #6 on: July 30, 2002, 02:17:30 AM »
Hi TEggHead, and too many thanks for your attention.

I think you make it.

And as you well say "sometimes it is simpler and faster to just up the byte limit for that particular filter", maybe this filter will work perfectly for specified sites with ugly code.

I will play with this filter, too many thanks.

Edited by - lnminente on 31 Jul 2002  01:56:24