[Webkit-unassigned] [Bug 66185] Sniff UTF-8 instead of defaulting to WINDOWS-1252 (or other locale defaults)

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Aug 15 02:35:51 PDT 2011


https://bugs.webkit.org/show_bug.cgi?id=66185


Leif Halvard Silli <xn--mlform-iua at xn--mlform-iua.no> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|WONTFIX                     |




--- Comment #7 from Leif Halvard Silli <xn--mlform-iua at xn--mlform-iua.no>  2011-08-15 02:35:51 PST ---
(In reply to comment #6)
> Any change should benefit someone. That someone could be users, Web page developers, or browser developers (in order of decreasing importance).

+1

> Implementing UTF-8 sniffing in WebKit will not benefit users, because there are no known pages that we display incorrectly because of this.

- Do you assume that authors always test their page in Webkit? 

- Do you question one of the most important principles in the design of HTML5 (namely: that UAs should behave the same way, since authors often test in a single browser) ?

- Is it only a matter of "display"? What if the UTF-8 page does not display any non-ASCII  _letters_ but, for inssance contains directly typed no-break-speace characters? This is enough for Chrome to sniff it as UTF-8. Chrome and IE will then send the form UTF-8 encoded, while Webkit will use Windows-1252.

- Further more, as Webkit seems to reuse its HTML parsing code as much as possible in its XML parser, implementing UTF-8 detection could perhaps also improve the current (not so perfect) handling of UTF-8 in XML pages.

> But no sniffing algorithm is perfect, so there is risk of false positive detection, and some real life pages may get broken.

This sounds more like FUD than a real argument. (But I hope someone who can explain UTF-8 detection better than I can, can step in.)

> It will not benefit Web developers, because it would make WebKit behavior less predictable. For best compatibility, they will still need to declare charset explicitly, and when they forget to, they risk that WebKit or some other browser won't detect charset. Note that different engines will implement sniffing differently, increasing the burden on Web developers.

This does not sound convincing. At least 2 mayor UAs (Chrome/IE) *do* perform detection. Webkit would become the 3rd. Which in itself would be argument in favour of also implementing UTF-8 detection in the fourth (Firefox). Etc. I fail to see that this would be bad for developers.

As for different implementation: If it becomes an issue, then this - too - can be standarized in a spec.

Further more: Chrome has already implemented it - so you may have access to an open source implementation that you, the developers, can reuse.

> It will not benefit browser developers. For us, it's just more code, with its own bugs, including possible security ones. Widely implementing a useless MAY-level feature will also mean that authors will start relying on it (intentionally or not), which further increases the barrier of entry for new browsers, hurting competition, and eventually end users, too.

Here you admit what I spoke about above: Authors might test a page in Chrome only - or in IE only.

I fail to see that it is worse to rely on UTF-8 detection than it is to rely on the Windows-1252 default. (On the contrary: it is better to rely on UTF-8, due to its many benefits.)

> Without strong evidence of end users getting incorrectly decoded pages because of this, implementing UTF-8 sniffing in WebKit will be a clear loss for every group listed above.

If this - "clearl loss for every group listed above"  - is how you see it, then you should perhaps file a bug against HTML5, to test your arguments?

Meanwhile, I answered to all your claims. I hope that someone who can more convincingly argue in favour of UTF-8 detection, would also comment on your arguments.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list