[webkit-dev] HTML5 tokenizer landing soon

Mon Jun 14 12:58:45 PDT 2010

----------------------------------------
> From: abarth at webkit.org
> Date: Mon, 14 Jun 2010 12:07:31 -0700
> To: oliver at apple.com
> CC: webkit-dev at lists.webkit.org
> Subject: Re: [webkit-dev] HTML5 tokenizer landing soon
>
> On Mon, Jun 14, 2010 at 11:05 AM, Oliver Hunt  wrote:
>> Have you done perf testing?
>
> Yes. We've been working with our parsing benchmark:
>
> http://trac.webkit.org/browser/trunk/WebCore/benchmarks/parser/html-parser.html
>
>> What's the change?
>
> Last time we measured, the new parser was ~1% slower than the old
> parser. I believe parsing accounts for <5% of PLT, so that
> corresponds to a <0.05% slowdown on PTL, which is, AFAIK,
> unmeasurable. We'll double check perf before we switch over.
>
> We think the new parser will end up being faster than the old parser.
> We've done just enough performance optimization to remove perf as a
> blocking issue for switching over. There's a bunch more we can do.

I'm starting to fear that the next blink of my disk light was cause me
to go into a fit. One thing you can consider right away is,
"plays nice with the other kids on a variety of playground equipment.."
That is, it may be great when it has unlimited memory but does
it start thrashing as soon as part of it is in VM. Not 
sure how to test this entirely but this is such a huge problem I 
just thought I would mention it again. Essentially it
comes down to memory coherence.

> For example, we're currently wasting a bunch of time converting
> new-style tokens into old-style tokens to feed them to the old tree
> constructor. Once we start working on phase 2 (the HTML5 tree
> constructor), we won't need to waste time there.
>
> Adam
>
>
>> On Jun 13, 2010, at 10:21 PM, Adam Barth wrote:
>>
>>> People of WebKit,
>>>
>>> As mentioned recently on webkit-dev, Eric, Tonyg, and I have been
>>> working on implementing the HTML5 parsing algorithm in WebKit:
>>>
>>> http://www.mail-archive.com/webkit-dev@lists.webkit.org/msg11472.html
>>>
>>> We're now ready to turn the new tokenization algorithm on by default
>>> (probably early this week).  The new code passes all the existing
>>> LayoutTests, with the exception of roughly 40 tests that "expect"
>>> behavior that violates the HTML5 specification [1].
>>>
>>> There are some differences between the old parser and the HTML5
>>> parser.  We've written up a brief document outlining those
>>> differences:
>>>
>>> https://docs.google.com/document/edit?id=1as5xYjyMSCph4960iz0-Kb7hZKf_L6f2vts57NMcVBI&hl=en
>>>
>>> If these differences cause real compatibility issues on the web, we
>>> should contribute this information to the working group so we can
>>> improve the specification.  If these differences cause compatibility
>>> issues for WebKit-specific HTML (e.g., for Dashboard widgets), we
>>> might need to add a flag to support some subset of these parsing
>>> quirks for non-web uses of WebKit.
>>>
>>> Please be on the lookout for parsing-related regressions and CC Eric,
>>> Tonyg, and me on the bugs.  There's still a lot of work to do
>>> (including implementing the tree construction algorithm), but turning
>>> the tokenization code on by default is an important milestone for the
>>> project.
>>>
>>> Happy parsing,
>>> Adam
>>>
>>> [1] See https://spreadsheets.google.com/ccc?key=0AppchfQ5mBrEdDFJUW5DOGNsdmtvZkN0ZmIzMjdaT0E&hl=en
>>> for details.
>>> _______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>
>>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_________________________________________________________________
The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail.
http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4