[Webkit-unassigned] [Bug 66185] Sniff UTF-8 instead of defaulting to WINDOWS-1252 (or other locale defaults)

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Aug 15 16:46:46 PDT 2011


https://bugs.webkit.org/show_bug.cgi?id=66185





--- Comment #10 from Leif Halvard Silli <xn--mlform-iua at xn--mlform-iua.no>  2011-08-15 16:46:46 PST ---
(In reply to comment #9)

> […] a good reason would be presenting real life Web pages that misrender or misbehave when UTF-8 sniffing is not performed.

(1) Pages without internal encoding info will misrender/misbehave when saved to the harddisk. Examples:
    * http://store.apple.com/no (and http://store.apple.com/dk and http://store.apple.com/se)
    * http://ntntv.gov.eg/
      NOTES: 
      - When saving the Apple Store page as source code, with Safari, and reloading the saved page, then it works fine. But if I open the page in another Webkit - e.g. iCab, then it does not work fine any more  (and opposite too - if I save with iCab and open in Safari, then it don't work). So, clearly, Safari does somethign else *instead* of the UTF-8 detection algorithm - may be it is related to features of that page or may be Safari stores some metadata somewhere. 
      - In contrast, page number 2 (http://ntntv.gov.eg/)  is misrendered as soon as it is saved to the harddisk and reloaded.

(2) HTML5's encoding sniffing algorithm lists 10 locales whose suggested default encoding is UTF-8. One should think that this will lead to several UTF-8 encoded pages that needs to be sniffed when the user does not use that locale. Here are two examples:

    a)  http://iranlinkbox.ir

         NOTES: The reason why that iranian page is misrendred in Webkit, is because the <meat at charset> element in the DOM is located in the <body> rather than in the <head>. The reason why Firefox nevertheless handles it, is because it implements step 3 of the algorithm (which is also a MAY)  - where it searches for @charset in the first 1024 bytes - if one removes the <meta at charset> from the page, then it fails in FIrefox. But in Chrome, the encdoing is detected even if the <meta at charset> is fremoved - thus we can conclude that it is the UTF-8 detection that steps in.

    b) http://www.galenika.rs/index.php?lang=RUS

        NOTES: This page has very malformed, internal encoding info. Therefore it fails in both Safari and Firefox. But in Chrome, Opera and IE - it works.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the webkit-unassigned mailing list