[Webkit-unassigned] [Bug 66185] Sniff UTF-8 instead of defaulting to WINDOWS-1252 (or other locale defaults)
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Mon Aug 15 16:46:46 PDT 2011
https://bugs.webkit.org/show_bug.cgi?id=66185
--- Comment #10 from Leif Halvard Silli <xn--mlform-iua at xn--mlform-iua.no> 2011-08-15 16:46:46 PST ---
(In reply to comment #9)
> […] a good reason would be presenting real life Web pages that misrender or misbehave when UTF-8 sniffing is not performed.
(1) Pages without internal encoding info will misrender/misbehave when saved to the harddisk. Examples:
* http://store.apple.com/no (and http://store.apple.com/dk and http://store.apple.com/se)
* http://ntntv.gov.eg/
NOTES:
- When saving the Apple Store page as source code, with Safari, and reloading the saved page, then it works fine. But if I open the page in another Webkit - e.g. iCab, then it does not work fine any more (and opposite too - if I save with iCab and open in Safari, then it don't work). So, clearly, Safari does somethign else *instead* of the UTF-8 detection algorithm - may be it is related to features of that page or may be Safari stores some metadata somewhere.
- In contrast, page number 2 (http://ntntv.gov.eg/) is misrendered as soon as it is saved to the harddisk and reloaded.
(2) HTML5's encoding sniffing algorithm lists 10 locales whose suggested default encoding is UTF-8. One should think that this will lead to several UTF-8 encoded pages that needs to be sniffed when the user does not use that locale. Here are two examples:
a) http://iranlinkbox.ir
NOTES: The reason why that iranian page is misrendred in Webkit, is because the <meat at charset> element in the DOM is located in the <body> rather than in the <head>. The reason why Firefox nevertheless handles it, is because it implements step 3 of the algorithm (which is also a MAY) - where it searches for @charset in the first 1024 bytes - if one removes the <meta at charset> from the page, then it fails in FIrefox. But in Chrome, the encdoing is detected even if the <meta at charset> is fremoved - thus we can conclude that it is the UTF-8 detection that steps in.
b) http://www.galenika.rs/index.php?lang=RUS
NOTES: This page has very malformed, internal encoding info. Therefore it fails in both Safari and Firefox. But in Chrome, Opera and IE - it works.
--
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
More information about the webkit-unassigned
mailing list