[webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

Antti Koivisto koivisto at iki.fi
Thu Jan 10 01:44:20 PST 2013

When loading web pages we are very frequently in a situation where we
already have the source data (HTML text here but the same applies to
preloaded Javascript, CSS, images, ...) and know we are likely to need it
in soon, but can't actually utilize it for indeterminate time. This happens
because pending external JS resources blocks the main parser (and pending
CSS resources block JS execution) for web compatibility reasons. In this
situation it makes sense to start processing resources we have to forms
that are faster to use when they are eventually actually needed (like token
stream here).

One thing we already do when the main parser gets blocked is preload
scanning. We look through the unparsed HTML source we have and trigger
loads for any resources found. It would be beneficial if this happened off
the main thread. We could do it when new data arrives in parallel with JS
execution and other time consuming engine work, potentially triggering
resource loads earlier.

I think a good first step here would be to share the tokens between the
preload scanner and the main parser and worry about the threading part
afterwards. We often parse the HTML source more or less twice so this is an
unquestionable win.


On Thu, Jan 10, 2013 at 7:41 AM, Filip Pizlo <fpizlo at apple.com> wrote:

> I think your biggest challenge will be ensuring that the latency of
> shoving things to another core and then shoving them back will be smaller
> than the latency of processing those same things on the main thread.
> For small documents, I expect concurrent tokenization to be a pure
> regression because the latency of waking up another thread to do just a
> small bit of work, plus the added cost of whatever synchronization
> operations will be needed to ensure safety, will involve more total work
> than just tokenizing locally.
> We certainly see this in the JSC parallel GC, and in line with traditional
> parallel GC design, we ensure that parallel threads only kick in when the
> main thread is unable to keep up with the work that it has created for
> itself.
> Do you have a vision for how to implement a similar self-throttling, where
> tokenizing continues on the main thread so long as it is cheap to do so?
> -Filip
> On Jan 9, 2013, at 6:00 PM, Eric Seidel <eric at webkit.org> wrote:
> > We're planning to move parts of the HTML Parser off of the main thread:
> > https://bugs.webkit.org/show_bug.cgi?id=106127
> >
> > This is driven by our testing showing that HTML parsing on mobile is
> > be slow, and long (causing user-visible delays averaging 10 frames /
> > 150ms).
> > https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002
> > Complete data can be found at [1].
> >
> > Mozilla moved their parser onto a separate thread during their HTML5
> > parser re-write:
> >
> https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading
> >
> > We plan to take a slightly simpler approach, moving only Tokenizing
> > off of the main thread:
> >
> https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit
> > The left is our current design, the middle is a tokenizer-only design,
> > and the right is more like mozilla's threaded-parser design.
> >
> > Profiling shows Tokenizing accounts for about 10x the number of
> > samples as TreeBuilding.  Including Antti's recent testing (.5% vs.
> > 3%):
> > https://bugs.webkit.org/show_bug.cgi?id=106127#c10
> > If after we do this we measure and find ourselves still spending a lot
> > of main-thread time parsing, we'll move the TreeBuilder too. :)  (This
> > work is a nicely separable sub-set of larger work needed to move the
> > TreeBuilder.)
> >
> > We welcome your thoughts and comments.
> >
> >
> > 1.
> https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0
> > (Epic thanks to Nat Duca for helping us collect that data.)
> > _______________________________________________
> > webkit-dev mailing list
> > webkit-dev at lists.webkit.org
> > http://lists.webkit.org/mailman/listinfo/webkit-dev
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo/webkit-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20130110/6cbacdd4/attachment.html>

More information about the webkit-dev mailing list