[webkit-dev] HTML5 tokenizer landing soon

Oliver Hunt oliver at apple.com
Mon Jun 14 12:47:03 PDT 2010

We have historically not taken patches that "will certainly be faster" without evidence that it will be faster -- Adam already said this will regress performance which makes me sad :-(


On Jun 14, 2010, at 12:11 PM, Eric Seidel wrote:

> The new parser will certainly be faster than the old, mostly because
> it's now hackable.  The old parser was un-touchable for fear of
> breaking the world.  This one is tested, perf-tested, documented and
> much better designed. May the optimizing begin!
> -eric
> On Mon, Jun 14, 2010 at 12:07 PM, Adam Barth <abarth at webkit.org> wrote:
>> On Mon, Jun 14, 2010 at 11:05 AM, Oliver Hunt <oliver at apple.com> wrote:
>>> Have you done perf testing?
>> Yes.  We've been working with our parsing benchmark:
>> http://trac.webkit.org/browser/trunk/WebCore/benchmarks/parser/html-parser.html
>>> What's the change?
>> Last time we measured, the new parser was ~1% slower than the old
>> parser.  I believe parsing accounts for <5% of PLT, so that
>> corresponds to a <0.05% slowdown on PTL, which is, AFAIK,
>> unmeasurable.  We'll double check perf before we switch over.
>> We think the new parser will end up being faster than the old parser.
>> We've done just enough performance optimization to remove perf as a
>> blocking issue for switching over.  There's a bunch more we can do.
>> For example, we're currently wasting a bunch of time converting
>> new-style tokens into old-style tokens to feed them to the old tree
>> constructor.  Once we start working on phase 2 (the HTML5 tree
>> constructor), we won't need to waste time there.
>> Adam
>>> On Jun 13, 2010, at 10:21 PM, Adam Barth wrote:
>>>> People of WebKit,
>>>> As mentioned recently on webkit-dev, Eric, Tonyg, and I have been
>>>> working on implementing the HTML5 parsing algorithm in WebKit:
>>>> http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg11472.html
>>>> We're now ready to turn the new tokenization algorithm on by default
>>>> (probably early this week).  The new code passes all the existing
>>>> LayoutTests, with the exception of roughly 40 tests that "expect"
>>>> behavior that violates the HTML5 specification [1].
>>>> There are some differences between the old parser and the HTML5
>>>> parser.  We've written up a brief document outlining those
>>>> differences:
>>>> https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en
>>>> If these differences cause real compatibility issues on the web, we
>>>> should contribute this information to the working group so we can
>>>> improve the specification.  If these differences cause compatibility
>>>> issues for WebKit-specific HTML (e.g., for Dashboard widgets), we
>>>> might need to add a flag to support some subset of these parsing
>>>> quirks for non-web uses of WebKit.
>>>> Please be on the lookout for parsing-related regressions and CC Eric,
>>>> Tonyg, and me on the bugs.  There's still a lot of work to do
>>>> (including implementing the tree construction algorithm), but turning
>>>> the tokenization code on by default is an important milestone for the
>>>> project.
>>>> Happy parsing,
>>>> Adam
>>>> [1] See https://spreadsheets.google.com/ccc?key=0AppchfQ5mBrEdDFJUW5DOGNsdmtvZkN0ZmIzMjdaT0E&hl=en
>>>> for details.
>>>> _______________________________________________
>>>> webkit-dev mailing list
>>>> webkit-dev at lists.webkit.org
>>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

More information about the webkit-dev mailing list