[Webkit-unassigned] [Bug 66056] New: The XML parser doesn't ignore user's encoding choice for XML files

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Thu Aug 11 07:16:52 PDT 2011


           Summary: The XML parser doesn't ignore user's encoding choice
                    for XML files
           Product: WebKit
           Version: 528+ (Nightly build)
          Platform: All
               URL: http://malform.no/testing/html5/bom/xml_BOM-less.html
        OS/Version: All
            Status: UNCONFIRMED
          Severity: Major
          Priority: P2
         Component: XML
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: xn--mlform-iua at xn--mlform-iua.no


   Webkit fails to *ignore* user's choice of encoding for XML files.


   According to section 4.3.3 of the XML 1.0 spec, it is a FATAL ERROR if the page is in another encoding than the declared (explicitliy or implicitly/default) encoding:

   In the absence of information provided by an external transport protocol 
   (e.g. HTTP or MIME), it is a fatal error for an entity including an encoding
   declaration to be presented to the XML processor in an encoding other 
   than that named in the declaration, or for an entity which begins with 
   neither a Byte Order Mark nor an encoding declaration to use an encoding
   other than UTF-8.

THUS: It ought to be impossible (aka "FATAL ERROR) to interpret the page with another encoding than the declared (explicitly or implicitley/default) encoding.

However, Webkit does not behave that way.


 -- variant 1 --

1. In a browser in the Webkit family (including nightly build), go to the "Text Encodings" submenu of the "View" menu and select "Western (Macintosh)". NOTE: This step changes - for the current window or tab - the default encoding from "Default/Automatic" to the encoding that you selected.

2. Now, within the same window or tab, visit one of these XHTML (application/xhtml+xml) pages:
2.1. http://malform.no/testing/html5/bom/cyrillic-encoding-declaration
2.2. http://malform.no/testing/html5/bom/cyrillic-http-charset
Page 2.1. includes an internal XML encoding declaration: <?xml version="1.0" encoding="KOI8-R" ?>
Page 2.2. is served with the charset=KOI8-R in the HTTP Content-Type: header

 -- variant 2 -- (the opposite way)

1. With the encoding set to "Default/Automatic", visit of these XHTML (application/xhtml+xml) pages:
1.1. http://malform.no/testing/html5/bom/cyrillic-encoding-declaration
1.2. http://malform.no/testing/html5/bom/cyrillic-http-charset

2. Now, manually choose the encoding "Western (Macintosh)" from the encoding menu 

EXPECTED RESULTS - FOR BOTH VARIANTS:  Webkit should ignore that the user changed the default encoding to "Western (Macintosh)" and instead, in accordance with section 4.3.3. of XML 1.0,  assume that the encoding of the page to be  in the declared encoding.

ACTUAL RESULTS:  Webkit instead pays respect to the user's choice of default encoding (i.e. it renders the page as 'Western (Macintosh)'). Also, it does so, without displaying a fatal error.


[OTHER PARSERS:] Firefox does not have this bug. Opera *does* have a similar bug. I don't know if IE9 has this bug. I don't think XML parsers in general (e.g. XMLlib2) have this bug.

Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

More information about the webkit-unassigned mailing list