[webkit-dev] HTML5 tokenizer landing soon

Adam Barth abarth at webkit.org
Mon Jun 14 18:36:00 PDT 2010

On Mon, Jun 14, 2010 at 12:47 PM, Oliver Hunt <oliver at apple.com> wrote:
> We have historically not taken patches that "will certainly be faster" without evidence that it will be faster -- Adam already said this will regress performance which makes me sad :-(

Oliver, I certainly don't want you to be sad.  We spent a little more
time on performance in
<https://bugs.webkit.org/show_bug.cgi?id=40592>.  I've now measured
the top-of-tree performance more carefully, and it looks like the
HTML5 parser is roughly a 5% speedup on the parsing benchmark [1].

== HTML5 ==

avg 2660.75
stdev 20.699939613438488

avg 2658.7
stdev 20.717384004743458

== Legacy ==

avg 2806.5
stdev 19.883410170290208

avg 2806.25
stdev 18.552290963651902

I suspect there's more performance headroom, but I think we should
focus more on correctness in this phase of the project.


[1] These numbers are from two runs of 10 iterations for each parser.
The performance numbers are more repeatable than the stdev would make
it seem because the runtime drifts upwards during the 10 iterations
(for reasons I don't quite understand).

> On Jun 14, 2010, at 12:11 PM, Eric Seidel wrote:
>> The new parser will certainly be faster than the old, mostly because
>> it's now hackable.  The old parser was un-touchable for fear of
>> breaking the world.  This one is tested, perf-tested, documented and
>> much better designed. May the optimizing begin!
>> -eric
>> On Mon, Jun 14, 2010 at 12:07 PM, Adam Barth <abarth at webkit.org> wrote:
>>> On Mon, Jun 14, 2010 at 11:05 AM, Oliver Hunt <oliver at apple.com> wrote:
>>>> Have you done perf testing?
>>> Yes.  We've been working with our parsing benchmark:
>>> http://trac.webkit.org/browser/trunk/WebCore/benchmarks/parser/html-parser.html
>>>> What's the change?
>>> Last time we measured, the new parser was ~1% slower than the old
>>> parser.  I believe parsing accounts for <5% of PLT, so that
>>> corresponds to a <0.05% slowdown on PTL, which is, AFAIK,
>>> unmeasurable.  We'll double check perf before we switch over.
>>> We think the new parser will end up being faster than the old parser.
>>> We've done just enough performance optimization to remove perf as a
>>> blocking issue for switching over.  There's a bunch more we can do.
>>> For example, we're currently wasting a bunch of time converting
>>> new-style tokens into old-style tokens to feed them to the old tree
>>> constructor.  Once we start working on phase 2 (the HTML5 tree
>>> constructor), we won't need to waste time there.
>>> Adam
>>>> On Jun 13, 2010, at 10:21 PM, Adam Barth wrote:
>>>>> People of WebKit,
>>>>> As mentioned recently on webkit-dev, Eric, Tonyg, and I have been
>>>>> working on implementing the HTML5 parsing algorithm in WebKit:
>>>>> http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg11472.html
>>>>> We're now ready to turn the new tokenization algorithm on by default
>>>>> (probably early this week).  The new code passes all the existing
>>>>> LayoutTests, with the exception of roughly 40 tests that "expect"
>>>>> behavior that violates the HTML5 specification [1].
>>>>> There are some differences between the old parser and the HTML5
>>>>> parser.  We've written up a brief document outlining those
>>>>> differences:
>>>>> https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en
>>>>> If these differences cause real compatibility issues on the web, we
>>>>> should contribute this information to the working group so we can
>>>>> improve the specification.  If these differences cause compatibility
>>>>> issues for WebKit-specific HTML (e.g., for Dashboard widgets), we
>>>>> might need to add a flag to support some subset of these parsing
>>>>> quirks for non-web uses of WebKit.
>>>>> Please be on the lookout for parsing-related regressions and CC Eric,
>>>>> Tonyg, and me on the bugs.  There's still a lot of work to do
>>>>> (including implementing the tree construction algorithm), but turning
>>>>> the tokenization code on by default is an important milestone for the
>>>>> project.
>>>>> Happy parsing,
>>>>> Adam
>>>>> [1] See https://spreadsheets.google.com/ccc?key=0AppchfQ5mBrEdDFJUW5DOGNsdmtvZkN0ZmIzMjdaT0E&hl=en
>>>>> for details.
>>>>> _______________________________________________
>>>>> webkit-dev mailing list
>>>>> webkit-dev at lists.webkit.org
>>>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>> _______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

More information about the webkit-dev mailing list