[webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

Thu Jan 10 02:22:16 PST 2013

The data Eric and Adam were using comes from a python library a few of us
have been developing called "telemetry." Its basically a bunch of python
that lets us write performance tests against any browser that speaks the
inspector websocket protocol. We're using it a lot of "should we
parallelize X" questions, as well as regression-style "have our changes to
X stayed a win over time?"

They might have other ways in mind to obtain this data that is more
webkit-y, but I figure a bit on how we got this far might be useful for
this mailing list.

Roughly, telemetry scripts connect up to a host and port where you've
arranged to have an inspector websocket listening, e.g. $MY_PHONE_IP:9222,
or google-chrome --remote-debugging-port=9222 && telemetry
--browser=$LOCALHOST:9222. Once that's established, we have communication
with WebCore's InspectorAgent, and assuming we trust the agent, can do some
pretty powerful stuff from there.

The benchmark being discussed here [webkit_benchmark] navigates the browser
from page to page, enabling inspector's TimelineAgent as it does in order
to get performance data about the page load. We then postprocess that data
stream into a human consumable csv and there is [some amount] of rejoicing.
Assuming we trust inspector timeline [Pavel's done a number of fixes to
help us trust it more!] this gets pretty clean results, pretty easily.

A key challenge with telemetry has been getting stable runs on real world
sites. The archive.org technqiues are cool, but they dont capture some of
the big ones, like a logged-in gmail account. We've addressed this using
tonyg and simonjam's http://code.google.com/p/web-page-replay/. If the
browser under test supports web page replay [~= redirecting dns requests to
the replay server instead of the real site], then you can get stable,
repeatable runs against super complex real world sites --- its worked on
every site we've tried so far.

The core telemetry framework is here:
http://src.chromium.org/chrome/trunk/src/tools/telemetry/

Its in chromium repo, but please dont hold that against it --- its movable,
given interest.

The actual webkit benchmark is pretty simple, because most of the
functionality comes from telemetry:
https://codereview.chromium.org/11791043/

With the patch above landed, obtaining the benchmarking results that Eric
got against chrome should be ~= getting a telemetry checkout and doing:
./run_multipage_benchmarks --browser=canary
webkit_benchmark page_sets/top_25.json

Or if you had an android with chrome on it:
./run_multipage_benchmarks --browser=android-chrome
webkit_benchmark page_sets/top_25.json

Anyway, I'll leave it to Eric/Adam to speak to how this maps back into the
WebKit ecosystem. The use of inspector protocol makes it a theoretical
possibility on other ports, but I know some people get nervous (or run away
angrily!) when they hear that we're using Inspector as a perf data source.
 :)

- Nat

On Thu, Jan 10, 2013 at 1:44 AM, Antti Koivisto <koivisto at iki.fi> wrote:

> When loading web pages we are very frequently in a situation where we
> already have the source data (HTML text here but the same applies to
> preloaded Javascript, CSS, images, ...) and know we are likely to need it
> in soon, but can't actually utilize it for indeterminate time. This happens
> because pending external JS resources blocks the main parser (and pending
> CSS resources block JS execution) for web compatibility reasons. In this
> situation it makes sense to start processing resources we have to forms
> that are faster to use when they are eventually actually needed (like token
> stream here).
>
> One thing we already do when the main parser gets blocked is preload
> scanning. We look through the unparsed HTML source we have and trigger
> loads for any resources found. It would be beneficial if this happened off
> the main thread. We could do it when new data arrives in parallel with JS
> execution and other time consuming engine work, potentially triggering
> resource loads earlier.
>
> I think a good first step here would be to share the tokens between the
> preload scanner and the main parser and worry about the threading part
> afterwards. We often parse the HTML source more or less twice so this is an
> unquestionable win.
>
>
>   antti
>
>
> On Thu, Jan 10, 2013 at 7:41 AM, Filip Pizlo <fpizlo at apple.com> wrote:
>
>> I think your biggest challenge will be ensuring that the latency of
>> shoving things to another core and then shoving them back will be smaller
>> than the latency of processing those same things on the main thread.
>>
>> For small documents, I expect concurrent tokenization to be a pure
>> regression because the latency of waking up another thread to do just a
>> small bit of work, plus the added cost of whatever synchronization
>> operations will be needed to ensure safety, will involve more total work
>> than just tokenizing locally.
>>
>> We certainly see this in the JSC parallel GC, and in line with
>> traditional parallel GC design, we ensure that parallel threads only kick
>> in when the main thread is unable to keep up with the work that it has
>> created for itself.
>>
>> Do you have a vision for how to implement a similar self-throttling,
>> where tokenizing continues on the main thread so long as it is cheap to do
>> so?
>>
>> -Filip
>>
>>
>> On Jan 9, 2013, at 6:00 PM, Eric Seidel <eric at webkit.org> wrote:
>>
>> > We're planning to move parts of the HTML Parser off of the main thread:
>> > https://bugs.webkit.org/show_bug.cgi?id=106127
>> >
>> > This is driven by our testing showing that HTML parsing on mobile is
>> > be slow, and long (causing user-visible delays averaging 10 frames /
>> > 150ms).
>> > https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002
>> > Complete data can be found at [1].
>> >
>> > Mozilla moved their parser onto a separate thread during their HTML5
>> > parser re-write:
>> >
>> https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading
>> >
>> > We plan to take a slightly simpler approach, moving only Tokenizing
>> > off of the main thread:
>> >
>> https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit
>> > The left is our current design, the middle is a tokenizer-only design,
>> > and the right is more like mozilla's threaded-parser design.
>> >
>> > Profiling shows Tokenizing accounts for about 10x the number of
>> > samples as TreeBuilding.  Including Antti's recent testing (.5% vs.
>> > 3%):
>> > https://bugs.webkit.org/show_bug.cgi?id=106127#c10
>> > If after we do this we measure and find ourselves still spending a lot
>> > of main-thread time parsing, we'll move the TreeBuilder too. :)  (This
>> > work is a nicely separable sub-set of larger work needed to move the
>> > TreeBuilder.)
>> >
>> > We welcome your thoughts and comments.
>> >
>> >
>> > 1.
>> https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0
>> > (Epic thanks to Nat Duca for helping us collect that data.)
>> > _______________________________________________
>> > webkit-dev mailing list
>> > webkit-dev at lists.webkit.org
>> > http://lists.webkit.org/mailman/listinfo/webkit-dev
>>
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo/webkit-dev
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20130110/62f9c8ba/attachment.html>