[webkit-dev] HTML5 tokenizer landing soon

Mon Jun 14 12:11:50 PDT 2010

The new parser will certainly be faster than the old, mostly because
it's now hackable.  The old parser was un-touchable for fear of
breaking the world.  This one is tested, perf-tested, documented and
much better designed. May the optimizing begin!

-eric

On Mon, Jun 14, 2010 at 12:07 PM, Adam Barth <abarth at webkit.org> wrote:
> On Mon, Jun 14, 2010 at 11:05 AM, Oliver Hunt <oliver at apple.com> wrote:
>> Have you done perf testing?
>
> Yes.  We've been working with our parsing benchmark:
>
> http://trac.webkit.org/browser/trunk/WebCore/benchmarks/parser/html-parser.html
>
>> What's the change?
>
> Last time we measured, the new parser was ~1% slower than the old
> parser.  I believe parsing accounts for <5% of PLT, so that
> corresponds to a <0.05% slowdown on PTL, which is, AFAIK,
> unmeasurable.  We'll double check perf before we switch over.
>
> We think the new parser will end up being faster than the old parser.
> We've done just enough performance optimization to remove perf as a
> blocking issue for switching over.  There's a bunch more we can do.
> For example, we're currently wasting a bunch of time converting
> new-style tokens into old-style tokens to feed them to the old tree
> constructor.  Once we start working on phase 2 (the HTML5 tree
> constructor), we won't need to waste time there.
>
> Adam
>
>
>> On Jun 13, 2010, at 10:21 PM, Adam Barth wrote:
>>
>>> People of WebKit,
>>>
>>> As mentioned recently on webkit-dev, Eric, Tonyg, and I have been
>>> working on implementing the HTML5 parsing algorithm in WebKit:
>>>
>>> http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg11472.html
>>>
>>> We're now ready to turn the new tokenization algorithm on by default
>>> (probably early this week).  The new code passes all the existing
>>> LayoutTests, with the exception of roughly 40 tests that "expect"
>>> behavior that violates the HTML5 specification [1].
>>>
>>> There are some differences between the old parser and the HTML5
>>> parser.  We've written up a brief document outlining those
>>> differences:
>>>
>>> https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en
>>>
>>> If these differences cause real compatibility issues on the web, we
>>> should contribute this information to the working group so we can
>>> improve the specification.  If these differences cause compatibility
>>> issues for WebKit-specific HTML (e.g., for Dashboard widgets), we
>>> might need to add a flag to support some subset of these parsing
>>> quirks for non-web uses of WebKit.
>>>
>>> Please be on the lookout for parsing-related regressions and CC Eric,
>>> Tonyg, and me on the bugs.  There's still a lot of work to do
>>> (including implementing the tree construction algorithm), but turning
>>> the tokenization code on by default is an important milestone for the
>>> project.
>>>
>>> Happy parsing,
>>> Adam
>>>
>>> [1] See https://spreadsheets.google.com/ccc?key=0AppchfQ5mBrEdDFJUW5DOGNsdmtvZkN0ZmIzMjdaT0E&hl=en
>>> for details.
>>> _______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>
>>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>