[webkit-dev] HTML5 tokenizer landing soon

Adam Barth abarth at webkit.org
Mon Jun 14 12:07:31 PDT 2010


On Mon, Jun 14, 2010 at 11:05 AM, Oliver Hunt <oliver at apple.com> wrote:
> Have you done perf testing?

Yes.  We've been working with our parsing benchmark:

http://trac.webkit.org/browser/trunk/WebCore/benchmarks/parser/html-parser.html

> What's the change?

Last time we measured, the new parser was ~1% slower than the old
parser.  I believe parsing accounts for <5% of PLT, so that
corresponds to a <0.05% slowdown on PTL, which is, AFAIK,
unmeasurable.  We'll double check perf before we switch over.

We think the new parser will end up being faster than the old parser.
We've done just enough performance optimization to remove perf as a
blocking issue for switching over.  There's a bunch more we can do.
For example, we're currently wasting a bunch of time converting
new-style tokens into old-style tokens to feed them to the old tree
constructor.  Once we start working on phase 2 (the HTML5 tree
constructor), we won't need to waste time there.

Adam


> On Jun 13, 2010, at 10:21 PM, Adam Barth wrote:
>
>> People of WebKit,
>>
>> As mentioned recently on webkit-dev, Eric, Tonyg, and I have been
>> working on implementing the HTML5 parsing algorithm in WebKit:
>>
>> http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg11472.html
>>
>> We're now ready to turn the new tokenization algorithm on by default
>> (probably early this week).  The new code passes all the existing
>> LayoutTests, with the exception of roughly 40 tests that "expect"
>> behavior that violates the HTML5 specification [1].
>>
>> There are some differences between the old parser and the HTML5
>> parser.  We've written up a brief document outlining those
>> differences:
>>
>> https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en
>>
>> If these differences cause real compatibility issues on the web, we
>> should contribute this information to the working group so we can
>> improve the specification.  If these differences cause compatibility
>> issues for WebKit-specific HTML (e.g., for Dashboard widgets), we
>> might need to add a flag to support some subset of these parsing
>> quirks for non-web uses of WebKit.
>>
>> Please be on the lookout for parsing-related regressions and CC Eric,
>> Tonyg, and me on the bugs.  There's still a lot of work to do
>> (including implementing the tree construction algorithm), but turning
>> the tokenization code on by default is an important milestone for the
>> project.
>>
>> Happy parsing,
>> Adam
>>
>> [1] See https://spreadsheets.google.com/ccc?key=0AppchfQ5mBrEdDFJUW5DOGNsdmtvZkN0ZmIzMjdaT0E&hl=en
>> for details.
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
>


More information about the webkit-dev mailing list