[Webkit-unassigned] [Bug 35954] New: XHTML PIs (processing instructions) are treated like HTML PIs when used inside an XHTML DOCTYPE

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Mar 9 20:17:42 PST 2010


           Summary: XHTML PIs (processing instructions) are treated like
                    HTML PIs when used inside an XHTML DOCTYPE
           Product: WebKit
           Version: 528+ (Nightly build)
          Platform: All
        OS/Version: All
            Status: UNCONFIRMED
          Severity: Major
          Priority: P2
         Component: XML
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: xn--mlform-iua at xn--mlform-iua.no

FIrst of all: This bug relates to the XML parsing of XHTML documents (not
text/html parsing!). However this bug also is related to text/html issues,
which I explain along the way.

How to  reproduce the bug:

(1) Add this DOCTYPE to a XHTML document. The Interntal DTD Subsets inside the
DOCTYPE appliesa hack in the form of a XHTML processing instruction, to fool
text/html parsers from displaying a "]>" inside the body.  The whole hack is
explained in a e-mail message to the W3 validator's mailinglist:
This is the code:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<?parser-hack ><!--?>

(2) If you wish, try to load the page as text/html. However, the point in this
bug is XML, so load the page as "application/xhtml+xml". 

(3) Results in Firefox, Konqueror and Opera: works 100%

(4) Result in Webkit: "yellow scren of death" in the form of the following
     "This page contains the following errors: error on line 3 at column 1:
Extra content at the end of the document"
      In short: Nothing is displayed.

(5) Remove the "<?parser-hack ><!--?>" and reload the page - voila, it works in
Webkit as well.

(6) Place the "<?parser-hack ><!--?>" inside the body of the XHTML page.
Reload. No problems


 Apparently, when a PI is placed inside the internal subset  of an XHTML
Doctype, then Webkit parses the XHTML PI as if it was a HTML4 PI. Meaning, that
it thinks that it ends when it sees the first ">".  And thus, Webkit also sees
the HTML comment "start tag" - the "<!--". 

In text/HTML mode, then the point of this hack is exactly that the browser
thinks the PI ends with the ">" and that it also sees the "<!--". 

However, this is in XHTML/XML mode. And thus is should parse the DOCTYPE,
including PIs, according to XHTML/XML rules. Hence: it is permitted withi a ">"
inside the PI. And a "<!--" should not affect the parsing.

I tested in Webkit latest nightly version 4.0.4 (5531.21.10, r55610). And also
in iCab, And in Safari for Mac Intel and PPC and for Windows.

Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

More information about the webkit-unassigned mailing list