[webkit-dev] HTML5 tokenizer landing soon

Adam Barth abarth at webkit.org
Sun Jun 13 22:21:33 PDT 2010


People of WebKit,

As mentioned recently on webkit-dev, Eric, Tonyg, and I have been
working on implementing the HTML5 parsing algorithm in WebKit:

http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg11472.html

We're now ready to turn the new tokenization algorithm on by default
(probably early this week).  The new code passes all the existing
LayoutTests, with the exception of roughly 40 tests that "expect"
behavior that violates the HTML5 specification [1].

There are some differences between the old parser and the HTML5
parser.  We've written up a brief document outlining those
differences:

https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en

If these differences cause real compatibility issues on the web, we
should contribute this information to the working group so we can
improve the specification.  If these differences cause compatibility
issues for WebKit-specific HTML (e.g., for Dashboard widgets), we
might need to add a flag to support some subset of these parsing
quirks for non-web uses of WebKit.

Please be on the lookout for parsing-related regressions and CC Eric,
Tonyg, and me on the bugs.  There's still a lot of work to do
(including implementing the tree construction algorithm), but turning
the tokenization code on by default is an important milestone for the
project.

Happy parsing,
Adam

[1] See https://spreadsheets.google.com/ccc?key=0AppchfQ5mBrEdDFJUW5DOGNsdmtvZkN0ZmIzMjdaT0E&hl=en
for details.


More information about the webkit-dev mailing list