[Webkit-unassigned] [Bug 14945] An ampersand ("&") appearing in a document is treated as a fatal error (instead of a non-fatal error)

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Aug 14 20:01:56 PDT 2007


http://bugs.webkit.org/show_bug.cgi?id=14945


robburns1 at mac.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|INVALID                     |




------- Comment #9 from robburns1 at mac.com  2007-08-14 20:01 PDT -------
(In reply to comment #8)
> Section 2.1 gives this definition:
> 
> [Definition: A textual object is a well-formed XML document if:]
> 
> * Taken as a whole, it matches the production labeled document.
> * It meets all the well-formedness constraints given in this specification.
> * Each of the parsed entities which is referenced directly or indirectly within
> the document is well-formed.
> 
> I believe the production you pasted does not match the production labelled
> document. I say this because the problem is in an attribute, where the AttValue
> production would apply:
> 
> [9]     EntityValue        ::=          '"' ([^%&"] | PEReference | Reference)*
> '"'
> |  "'" ([^%&'] | PEReference | Reference)* "'"
> [10]    AttValue           ::=          '"' ([^<&"] | Reference)* '"'
> |  "'" ([^<&'] | Reference)* "'"
> 
> Note that & is not allowed in an attribute value except in a Reference. The
> production for Reference is:
> 
> [67]           Reference           ::=           EntityRef | CharRef
> [68]    EntityRef          ::=          '&' Name ';'
> [66]    CharRef    ::=          '&#' [0-9]+ ';' | '&#x' [0-9a-fA-F]+ ';'
> 
> A Reference can be an EntityRef or a CharRef. Either way, it must start with &
> and end with ;, and cannot contain an & in the middle. Thus, the attribute
> value here:
> 
> href='http://www.nytimes.com/2007/08/12/books/review/Hitchens-t.html?_r=2&adxnnl=1&ampladxnnlx=1186840914-OUgjhcnZejswml3KgknPNg&pagewanted=all'
> 
> Does not match the AttrValue production, and as a result the document as a
> whole does not match the document production, and thus it is not well-formed.
> Resolving as INVALID.
> 

I don't think that the example unambiguously includes a charRef with an
ampersand inside it. That would be a call the parser would have to make. It may
be that the WebKit parser, right now, is geared toward making assumptions that
lead it in that direction. It may also be the case that it would be difficult
to parse it in another way. However, it is a stretch to read the spec as
requiring that an XML processor treat this text in that particular way. In
fact, I would say that the spec provides XML processors with an easy way out of
this in that, once another ampersand is reached, the parser can assume its no
longer part of the previous character reference.

Specific character references are inherently a layer on top of the XML
processing. As long as the amerpsand doesn't interfere with producing an
unambiguous DOM tree (and I don't think it does), then there should be no
reason that the sequence following the ampersand cannot be compared to the
known character references.

Again, this is an opportunity to improve XML handling in WebKit. If this bug is
addressed, then WebKit will not break on pages that other implementations break
on (other bug reports should be filed with those implementations for
interoperability). WebKit could prominently display an error on the page: bug
black bug graphic if you prefer. However, there's no reason WebKit cannot
recover from this error and display the remainder of the page.

I fail to see any down-side.

Reopening to allow further discussion.


-- 
Configure bugmail: http://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the webkit-unassigned mailing list