[Webkit-unassigned] [Bug 35831] WebCore PreloadScanner Entity Detection Bug - Non-HTML Entities are being treated as entities

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Sun Mar 7 12:03:20 PST 2010


Antti Koivisto <koivisto at iki.fi> changed:

           What    |Removed                     |Added
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|                            |INVALID

--- Comment #2 from Antti Koivisto <koivisto at iki.fi>  2010-03-07 12:03:20 PST ---
PreloadScanner (unlike the main tokenizer currently) implements the HTML5
entity parsing. The spec says (10.2.4):

"Consume the maximum number of characters possible, with the consumed
characters matching one of the identifiers in the first column of the named
character references table (in a case-sensitive manner).

If no match can be made, then this is a parse error. No characters are
consumed, and nothing is returned.

If the last character matched is not a U+003B SEMICOLON character (;), there is
a parse error.

If the character reference is being consumed as part of an attribute, and the
last character matched is not a U+003B SEMICOLON character (;), and the next
character is in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9),
SMALL LETTER A to U+007A LATIN SMALL LETTER Z, then, for historical reasons,
all the characters that were matched after the U+0026 AMPERSAND character (&)
must be unconsumed, and nothing is returned.

Otherwise, return a character token for the character corresponding to the
character reference name (as given by the second column of the named character
references table)."

Basically, if a named entity in attribute ends in non-alphanumeric character
other than ; it is considered a parse error but the entity is still returned.

As far as I see the implementation matches the spec. If you think HTML5 is
wrong here, you should send mail to the whatwg list and explain why.

Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

More information about the webkit-unassigned mailing list