[Webkit-unassigned] [Bug 66056] The XML parser doesn't ignore user's encoding choice for XML files

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Sat Aug 13 03:16:58 PDT 2011


https://bugs.webkit.org/show_bug.cgi?id=66056





--- Comment #11 from Leif Halvard Silli <xn--mlform-iua at xn--mlform-iua.no>  2011-08-13 03:16:58 PST ---
Alexey, in bug 66084 and bug 66085 you said:

]]
Comment #1 From Alexey Proskuryakov 2011-08-12 21:44:00 PST (-) [reply]
A BOM is most authoritative indication of encoding, because there are few ways to get it wrong. It's much easier to get an encoding declaration or an HTTP header wrong.

There are some synthetic examples of strings in other encodings that can be mistaken for a BOM, but it hasn't been a practical issue.
[[

Don't you see that the same argument is true for XML files, when it comes to user's manual text encoding choice?

Because, the user's encoding choice is much more likely to be incorrect than the encoding specified by the file itself.  Thus you are helping the user if youi ignore his or her choice.

This is so because XML files's strict encoding rules - including FATAL ERROR rules, which for the most part are well understood and supported by the tools and editors that produce XML files.

Even when the HTTP Content-Type: specifies something for an application/xhtml+xml file, the HTTP is more likely to be correct than the user's encoding choice. This is so, once again, because it is a FATAL ERROR if the HTTP specifies something which is is incompatible with the file's real encoding. [*]

[*] EXCEPTION: Unfortunately - or fortuneattly, it is mostly only when an (UTF-8 encoded) file includes a BOM that it is possible to detect that the HTTP header specifies an incorrect encoding.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list