[webkit-dev] HTML5 tokenizer landing soon
eric at webkit.org
Mon Jun 14 12:11:50 PDT 2010
The new parser will certainly be faster than the old, mostly because
it's now hackable. The old parser was un-touchable for fear of
breaking the world. This one is tested, perf-tested, documented and
much better designed. May the optimizing begin!
On Mon, Jun 14, 2010 at 12:07 PM, Adam Barth <abarth at webkit.org> wrote:
> On Mon, Jun 14, 2010 at 11:05 AM, Oliver Hunt <oliver at apple.com> wrote:
>> Have you done perf testing?
> Yes. We've been working with our parsing benchmark:
>> What's the change?
> Last time we measured, the new parser was ~1% slower than the old
> parser. I believe parsing accounts for <5% of PLT, so that
> corresponds to a <0.05% slowdown on PTL, which is, AFAIK,
> unmeasurable. We'll double check perf before we switch over.
> We think the new parser will end up being faster than the old parser.
> We've done just enough performance optimization to remove perf as a
> blocking issue for switching over. There's a bunch more we can do.
> For example, we're currently wasting a bunch of time converting
> new-style tokens into old-style tokens to feed them to the old tree
> constructor. Once we start working on phase 2 (the HTML5 tree
> constructor), we won't need to waste time there.
>> On Jun 13, 2010, at 10:21 PM, Adam Barth wrote:
>>> People of WebKit,
>>> As mentioned recently on webkit-dev, Eric, Tonyg, and I have been
>>> working on implementing the HTML5 parsing algorithm in WebKit:
>>> We're now ready to turn the new tokenization algorithm on by default
>>> (probably early this week). The new code passes all the existing
>>> LayoutTests, with the exception of roughly 40 tests that "expect"
>>> behavior that violates the HTML5 specification .
>>> There are some differences between the old parser and the HTML5
>>> parser. We've written up a brief document outlining those
>>> If these differences cause real compatibility issues on the web, we
>>> should contribute this information to the working group so we can
>>> improve the specification. If these differences cause compatibility
>>> issues for WebKit-specific HTML (e.g., for Dashboard widgets), we
>>> might need to add a flag to support some subset of these parsing
>>> quirks for non-web uses of WebKit.
>>> Please be on the lookout for parsing-related regressions and CC Eric,
>>> Tonyg, and me on the bugs. There's still a lot of work to do
>>> (including implementing the tree construction algorithm), but turning
>>> the tokenization code on by default is an important milestone for the
>>> Happy parsing,
>>>  See https://spreadsheets.google.com/ccc?key=0AppchfQ5mBrEdDFJUW5DOGNsdmtvZkN0ZmIzMjdaT0E&hl=en
>>> for details.
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
More information about the webkit-dev