Post Reply 
"Content-Encoding: deflate" Issue?
Dec. 22, 2010, 08:25 AM
Post: #1
"Content-Encoding: deflate" Issue?
Test URL: http://www-31.ibm.com/storage/cn/disk/ds...pecs.shtml

With sidki's 2010-10-23 config, displayed well under IE and Firefox but no display under Opera v11.

I noticed that the difference was Opera got a deflate encoded content while IE and Firefox got gzip encoded page.

Force "Accept-Encoding: gzip" or bypass Proxomitron gives Opera page display.

Force "Accept-Encoding: deflate" makes Firefox no display.

It seems Proxomitron has problem to handle "Content-Encoding: deflate" content?
Add Thank You Quote this message in a reply
Dec. 22, 2010, 10:28 AM (This post was last modified: Dec. 22, 2010 02:02 PM by whenever.)
Post: #2
RE: "Content-Encoding: deflate" Issue?
Changes.txt Wrote:* Added support for gzip and deflate content-encoded pages using
zlib.dll (http://www.gzip.org/zlib/)

Just think it might be a zlib.dll issue.

Have to go for dinner now.

Update:
Latest zlib v1.2.5 doesn't fix this issue.
Add Thank You Quote this message in a reply
Dec. 22, 2010, 09:55 PM
Post: #3
RE: "Content-Encoding: deflate" Issue?
It's the server's fault, not Proxo's. The server is producing invalid DEFLATE compression, but its GZIP compression is OK.

The reason it's happening in Opera is that browser's 'Accept-Encoding' specifies deflate first. Other browsers specify gzip before deflate. That server is compressing with the first method it sees from that header that it can accommodate.

PS - Generally GZIP should be preferred over DEFLATE. GZIP adds some prefix & suffix data around the compression, and among those can be the CRC and Length of the uncompressed payload. So with GZIP an application can validate the content was delivered properly. With DEFLATE it can be a crap shoot unless the decompressor happens to detect an invalid stream condition (as it does in this case).
Add Thank You Quote this message in a reply
Dec. 22, 2010, 11:12 PM (This post was last modified: Dec. 23, 2010 01:03 AM by JJoe.)
Post: #4
RE: "Content-Encoding: deflate" Issue?
I think this is an old problem.

From Wed Sep 25, 2002 3:35 pm at
http://tech.groups.yahoo.com/group/prox-...sage/13269

SRL Wrote:I think this is it! It looks like Yahoo's servers and the ones used by Anandtech send deflate slightly differently. Yahoo server a raw stream while AnAndtech adds a two byte deflate header (different from the more elaborate gzip header). Proxomitron assumes there's no header since that's how all the servers I tested at the time worked, but it makes the stream supplied by anandtech looks corrupt to the zlib routines.

I guess I'll just need to check for the deflate header bytes first. Hopefully it won't lead to a conflict (like a raw stream that just happens to begin with the header byte value). It's hard to say which way's "correct" - deflate was never a very official format (IE started using it and a few other browsers followed suit). I'd think the header should be there though - I remember originally being a bit surprised that it wasn't. In fact I wonder if it causes some browsers problems too - maybe that's why servers won't send deflate to certain UA's?

and then later in Docs

changes.txt Wrote:* Created a work-around for servers that send "deflate" content
encoding with/without the normal header bytes. Seems not all
servers format it the same.

and in header filters

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Accept-encoding: Allow webpage encoding (out)"
Match = "*"
Replace = "gzip,deflate"


http://www.gzip.org/zlib/zlib_faq.html#faq38

Quote:# What's the difference between the "gzip" and "deflate" HTTP 1.1 encodings?

"gzip" is the gzip format, and "deflate" is the zlib format. They should probably have called the second one "zlib" instead to avoid confusion with the raw deflate compressed data format. While the HTTP 1.1 RFC 2616 correctly points to the zlib specification in RFC 1950 for the "deflate" transfer encoding, there have been reports of servers and browsers that incorrectly produce or expect raw deflate data per the deflate specficiation in RFC 1951, most notably Microsoft. So even though the "deflate" transfer encoding using the zlib format would be the more efficient approach (and in fact exactly what the zlib format was designed for), using the "gzip" transfer encoding is probably more reliable due to an unfortunate choice of name on the part of the HTTP 1.1 authors.

Bottom line: use the gzip format for HTTP 1.1 encoding.

So there should probably be something like

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Accept-Encoding: 2 GZip only     10.12.22 [srl] (d.1) (Out) MOD"
URL = "^$TST(volat=*.encoded:1.*)|$TST(keyword=*.a_web.*)"
Match = "?&*(gzip|x-gzip|deflate)\1(*(, (gzip|x-gzip|deflate))\#)+$SET(o=\1\@)($TST(o=deflate, \2)$SET(o=\2, deflate)|)|"
Replace = "$GET(o)$SET(o=)"

in sidki based sets to make 'deflate' the last resort.

BTW, I think my Wireshark does inflate the stream correctly.

Edit: changed deflate to inflate.
Add Thank You Quote this message in a reply
Dec. 23, 2010, 02:03 AM
Post: #5
RE: "Content-Encoding: deflate" Issue?
Glad to know it is not Proxomitron's own problem, at least not fully.

(Dec. 22, 2010 11:12 PM)JJoe Wrote:  BTW, I think my Wireshark does inflate the stream correctly.

So does Opera.

(Dec. 22, 2010 11:12 PM)JJoe Wrote:  in sidki based sets to make 'deflate' the last resort.

How about just switch back to the version from 2007-09-09 config:

Code:
[HTTP headers]
In = FALSE
Out = TRUE
Key = "Accept-Encoding: 2 gzip     4.11.22 [srl] (d.1) (Out)"
URL = "^$TST(volat=*.encoded:1.*)|$TST(keyword=*.a_web.*)"
Replace = "gzip, x-gzip, deflate"

It works and is much simpler.
Add Thank You Quote this message in a reply
Dec. 23, 2010, 03:40 AM
Post: #6
RE: "Content-Encoding: deflate" Issue?
(Dec. 22, 2010 11:12 PM)JJoe Wrote:  I think this is an old problem.

That's exactly right JJOE. That server's DEFLATE is encoded to RFC1950 specification.

But we don't know exactly how Scott accommodated the detection and decompression. In terms of coding for ZLIB, a different value must be passed to an initialization function for RFC1950 decompression vs. a raw decompression.

In my proxy I'm still spitting out a warning about RFC1950 decompression when it's encountered. I've encountered it on very rare occasions and (unlike this case) they're usually servers whose data should be avoided anyway. I mean ... since common IE was never able to handle that particular method, it raises my suspicion about why some web server would have chosen to use it.

(Dec. 23, 2010 02:03 AM)whenever Wrote:  
Code:
Replace = "gzip, x-gzip, deflate"

I hate to pick on Opera, but there's a few sites that give it Both "gzip" and also a nested round of compression for the additionally specified "x-gzip". It might be better to stick with whatever Firefox and IE are using, just "gzip, deflate" aught to do. Opera further specifies its compression options in its "TE:" header, but fortunately most web servers seem to be ignoring that.
Add Thank You Quote this message in a reply
Dec. 23, 2010, 08:01 AM
Post: #7
RE: "Content-Encoding: deflate" Issue?
(Dec. 23, 2010 03:40 AM)Graycode Wrote:  I hate to pick on Opera, but there's a few sites that give it Both "gzip" and also a nested round of compression for the additionally specified "x-gzip".

Does Opera handle the nested compression well?

Do you have any links for test? I am wondering if Proxomitron could handle it.
Add Thank You Quote this message in a reply
Dec. 23, 2010, 03:23 PM (This post was last modified: Dec. 23, 2010 03:25 PM by JJoe.)
Post: #8
RE: "Content-Encoding: deflate" Issue?
(Dec. 23, 2010 02:03 AM)whenever Wrote:  
Code:
Replace = "gzip, x-gzip, deflate"

(Dec. 23, 2010 03:40 AM)Graycode Wrote:  It might be better to stick with whatever Firefox and IE are using, just "gzip, deflate" aught to do.

Sidki's 2007-09-09 version was simple. Scott's and Graycode's are simpler.

However, sidki's current allows x-gzip, does not add methods to the request, and preserves order. Why?

Not adding methods and requesting all that the Proxomitron can (might be able to) handle seems like the thing to do. I suspect that preserving order was unintentional.
Add Thank You Quote this message in a reply
Dec. 23, 2010, 04:41 PM
Post: #9
RE: "Content-Encoding: deflate" Issue?
(Dec. 23, 2010 08:01 AM)whenever Wrote:  Does Opera handle the nested compression well?

Do you have any links for test? I am wondering if Proxomitron could handle it.

No, the browser does not handle nested compression well at all. For whatever reason they chose to use some different header values than other browsers, and there's a price to pay in global compatibility.

Opera has exception lists that it downloads from home, and these decoding issues is one of the things they seek out and try to adjust for by modifying the headers they use for some sites.

Some of these servers may have changed by now ...

Cars.com and pickuptrucks.com provided DEFLATE and then picked up on the GZIP specified in Opera's TE: header to GZIP that again.

A CGI routine at share.com was producing both DEFLATE and also GZIP. It was mentioned in the Opera forums: http://my.opera.com/community/forums/top...?id=263669


(Dec. 23, 2010 03:23 PM)JJoe Wrote:  However, sidki's current allows x-gzip, does not add methods to the request, and preserves order. Why?

Generally a browser specifies them in the order of its preference, assuming it will be doing the decoding. When using a proxy that's going to decode, then it seems reasonable that the proxy could specify its preference. There isn't a way to specify one of the 2 DEFLATE, so putting GZIP first may help to avoide a few DEFLATE issues.

PS - I'd also marked Userstyles.org as providing RFC1950 version of DEFLATE to Opera, not sure if it still does.
Add Thank You Quote this message in a reply
Dec. 24, 2010, 03:04 AM
Post: #10
RE: "Content-Encoding: deflate" Issue?
(Dec. 23, 2010 03:23 PM)JJoe Wrote:  However, sidki's current allows x-gzip, does not add methods to the request, and preserves order. Why?

Not adding methods and requesting all that the Proxomitron can (might be able to) handle seems like the thing to do. I suspect that preserving order was unintentional.

I don't know what sidki's intention is.

However, according to my watch, Proxomitron doesn't re-compress the data after it decompresses and filters the filterable content, browsers always get chunked, decompressed data, so the "Accept-Encoding" is only a thing between Proxomitron and the server, then why not make Proxomitron happy by adding/removing methods and forcing order?

Like Graycode advised, the safest way might be sticking with whatever Firefox and IE are using:

Code:
Replace = "gzip, deflate"
Add Thank You Quote this message in a reply
Dec. 26, 2010, 04:37 AM (This post was last modified: Dec. 26, 2010 04:38 AM by JJoe.)
Post: #11
RE: "Content-Encoding: deflate" Issue?
Browsers aren't the only User-Agents that can use the Proxomitron. Some of these are not HTTP/1.1 compliant.

The Proxomitron can change the Accept-Encoding header and not decompress the data. The User-Agent might not be able to decompress the improperly compressed data. For example, from sidki's "User-Agents.ptxt"

Code:
## ||||||||||||||||||||||||||||| Bypass webfilters ||||||||||||||||||||||||||||

## Flash (POST, application/x-fcs)
## ----------------------------------------------------------------------------
Shockwave Flash            $SET(1=\0$FILTER(0))

## XP Search Assistant (text/xml)
## ----------------------------------------------------------------------------
SCAgent                $SET(1=\0$FILTER(0))

## WinBatch (text/plain)
## ----------------------------------------------------------------------------
WinBatch Internet Extender Ver:    $SET(1=\0$FILTER(0))

"$FILTER(0)" bypasses the webfilters and decompression.

So, when possible:
The Accept-Encoding header should only be sent per standards.
When sent, the header should be acceptable to the User-Agent.
I think.

"Accept-Encoding: 2 gzip 4.11.22 [srl] (d.1) (Out)" always sends "gzip, x-gzip, deflate". Simple but too often incorrect for my tastes.

"Accept-Encoding: 2 GZip only 07.11.16 [srl] (d.1) (Out)" allows a server at IBM to send a chunked deflate steam that the Proxomitron can't handle and may send an empty Accept-Encoding header.

For now, moving gzip to the front and not adding any methods seems best for the published sidki set. I'm still studying the empty header issue.
Again, I think.
Add Thank You Quote this message in a reply
Dec. 26, 2010, 09:46 PM (This post was last modified: Dec. 26, 2010 09:52 PM by Graycode.)
Post: #12
RE: "Content-Encoding: deflate" Issue?
(Dec. 26, 2010 04:37 AM)JJoe Wrote:  For now, moving gzip to the front and not adding any methods seems best for the published sidki set.

That sounds like a good idea.

As a follow up to Whenever's question about double-compression, a better example would be romanianadventure.com. It picks up on Opera's 'x-gzip' and then also on its 'gzip'. That server (or a web appliance connected to it) responds with TWO 'Content-Encoding' headers. The attached image shows a debug method of my proxy with the headers and resultant data. The proxy handled chunked and the first of the two gzip layers, yielding the still-compressed 2nd layer.

When viewed without going through a proxy then Opera displays gibberish. Browsers don't handle multiple layered compression.

When going through a proxy the result depends on how the proxy is coded. Mainly it depends on what that proxy does with the extra 'Content-Encoding' header ... passing it on to the browser or removing it since having more than one compression is invalid for browsers. Yet in either case the proxy won't be able to safely modify the content because of the nested compression.

It's not the kind of thing that would normally be encountered. I've chosen not to attempt to address it, other than logging a warning about detection of multiple compression. I think trying to handle invalid situations is one of the things that made some browser versions so insecure. It's a judgement call between limiting accommodation to normal standards-based usage vs. allowing an assumption or interpretation of something that later on turns out to be an available malware vector.

A couple more Opera-specific site encoding examples are mentioned in:
http://my.opera.com/Lex1/blog/patch-for-...pped-pages
Note that proposed solution of patching the Opera.dll involves putting 'gzip' before 'deflate' and eliminating the 'x-gzip' portion. Those people need to discover Proxo vs. patching a DLL that way Smile!


Attached File(s)
.gif  RomanianAdventure.gif (Size: 23.07 KB / Downloads: 754)
Add Thank You Quote this message in a reply
Post Reply 


Forum Jump: