[webkit-dev] Update on HTML5 parser

Adam Barth abarth at webkit.org
Fri May 28 06:18:15 PDT 2010

Hi webkit-dev,

As some of you know, Eric and I have been working on implementing the
HTML5 parsing algorithm in WebKit.  This morning, at 5:58am, we
reached an important milestone: we got Gmail working.  :)

The HTML5 parsing algorithm actually consists of two algorithms: a
tokenizer, which converts a stream of characters into a stream of
tokens, and a parser, which converts a stream of tokens into a DOM.
In this first phase, we're focused on implementing the tokenizer and
wiring it up to the existing parser.  In the second phase, we'll
replace the existing parser with the HTML5 algorithm.  As we near the
end of phase 1, we'll evaluate whether it makes sense to enable the
HTML5 tokenizer by default or whether we should wait until the end of
phase 2 before flipping the switch.

If you're interested in keeping track of our progress, you can add
yourself to the CC list of
<https://bugs.webkit.org/show_bug.cgi?id=39259>.  If you want to play
with the parser, you can run the LayoutTests with the HTML5 parser
using the --html5-parser command line flag:

./WebKitTools/Script/run-webkit-tests --html5-parser

Currently, there are a lot of failures and some crashes, but we're
working on driving that list to zero.  If you're interested in helping
out, a good starting point might be to fix a crashing LayoutTest.  If
you want to run Safari with the new parsing algorithm, you can turn it
on using the following run-time setting:

defaults write com.apple.Safari WebKitHTML5ParserEnabled -bool YES

Gmail probably won't work with top-of-tree for a day or so because we
still have a couple of the necessary patches out for review.  (If you
want the tried-and-true parser back, run the same command but replace
YES with NO.)


More information about the webkit-dev mailing list