Post Reply 
Converting non latin characters to UTF-8 as Proxomitron can't read Unicode UTF-16
Jun. 11, 2009, 07:57 AM (This post was last modified: Jun. 16, 2009 11:31 PM by bugmenot.)
Post: #1
Converting non latin characters to UTF-8 as Proxomitron can't read Unicode UTF-16
There's something really weird about the specific folder (link removed as it's proper UTF-8 now - does someone have another example?) - all of the non latin characters (in this case Hebrew, Arabic and Russian) show up as Gibberish in Proxomitron. But there's no Gibberish when bypassing Proxomitron.

The SSH command "file" reports this about the HTML file:
Quote:Little-endian UTF-16 Unicode character data, with very long
lines, with CRLF, CR line terminators
While it reports this about normal Unicode files:
Quote:UTF-8 Unicode HTML document text, with CR, LF line terminators

Why does Proxomitron break this folder? I can't even convince them it's a real problem because they don't have Proxomitron (or know anyone with it except me)...can you at least tell me what do you think the admins did to create such weird files?

Anyway, I've tried running these filters but they don't match anything:
  1. Top Remove: Unicode BOM: HTML 6.12.09 (multi) [sj sd] (d.1)
  2. Top Remove: Unicode BOM: Other 7.10.28 (multi) [sj sd] (d.1)
  3. UTF-16 to UTF-8 Page Converter 7.01.06 (multi) [sj sd mona] (d.1)

Neither did (plus ASCII-Table.ptxt):
  1. <iframe>: Unicode to ASCII 7.01.06 (multi) [sd] (d.1 l.2)
  2. <iframe>: BASE16 to ASCII 8.11.21 (multi) [gz sd] (d.2 l.2)
  3. <a>: Unicode to ASCII 7.11.14 (multi) [sd] (d.1 l.3)

sexo por telefono
comprar en china
Add Thank You Quote this message in a reply
Post Reply 


Messages In This Thread
Converting non latin characters to UTF-8 as Proxomitron can't read Unicode UTF-16 - bugmenot - Jun. 11, 2009 07:57 AM

Forum Jump: