Threaded Mode | Linear Mode

bugmenot · (This post was last modified: Jun. 16, 2009 11:31 PM by bugmenot.)

There's something really weird about the specific folder (link removed as it's proper UTF-8 now - does someone have another example?) - all of the non latin characters (in this case Hebrew, Arabic and Russian) show up as Gibberish in Proxomitron. But there's no Gibberish when bypassing Proxomitron.

The SSH command "file" reports this about the HTML file:

Quote:Little-endian UTF-16 Unicode character data, with very long
lines, with CRLF, CR line terminators

While it reports this about normal Unicode files:

Quote:UTF-8 Unicode HTML document text, with CR, LF line terminators

Why does Proxomitron break this folder? I can't even convince them it's a real problem because they don't have Proxomitron (or know anyone with it except me)...can you at least tell me what do you think the admins did to create such weird files?

Anyway, I've tried running these filters but they don't match anything:

Top Remove: Unicode BOM: HTML 6.12.09 (multi) [sj sd] (d.1)
Top Remove: Unicode BOM: Other 7.10.28 (multi) [sj sd] (d.1)
UTF-16 to UTF-8 Page Converter 7.01.06 (multi) [sj sd mona] (d.1)

Neither did (plus ASCII-Table.ptxt):

<iframe>: Unicode to ASCII 7.01.06 (multi) [sd] (d.1 l.2)
<iframe>: BASE16 to ASCII 8.11.21 (multi) [gz sd] (d.2 l.2)
<a>: Unicode to ASCII 7.11.14 (multi) [sd] (d.1 l.3)