[webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

Oliver Hunt oliver at apple.com
Wed Jan 9 18:38:42 PST 2013


How will we ensure thread safety?  Even at just the tokenizing level don't we use AtomicString?  AtromicString isn't threadsafe wrt StringImpl IIRC so this seems like it sould add a world of hurt.

I realise it's been a long time since I've worked on this so it's completely possible that I'm not aware of the current behaviour.

That aside I question what the benefit of this will be.  All those cases where we've started parsing html are intrinsically tied to the web's general "single thread of execution" model, which implies that even if we do push parsing into a separate thread we'll just end up with the ui thread blocked on the parsing thread which doesn't seem hugely superior.

What is the objective here? To improve performance, add parallelism, or reduce latency?

--Oliver

On Jan 9, 2013, at 6:10 PM, Adam Barth <abarth at webkit.org> wrote:

> On Wed, Jan 9, 2013 at 6:00 PM, Eric Seidel <eric at webkit.org> wrote:
>> We're planning to move parts of the HTML Parser off of the main thread:
>> https://bugs.webkit.org/show_bug.cgi?id=106127
>> 
>> This is driven by our testing showing that HTML parsing on mobile is
>> be slow, and long (causing user-visible delays averaging 10 frames /
>> 150ms).
>> https://bug-106127-attachments.webkit.org/attachment.cgi?id=182002
>> Complete data can be found at [1].
> 
> In case it's not clear from that link, the "ParseHTML" column is the
> total amount of time the web inspector attributes to HTML parsing when
> loading those URLs on a Nexus 7 using a top-of-tree build of
> Chromium's content_shell (similar to WebKitTestRunner).
> 
> The HTML parser parses data a chunk at a time, which means the total
> time doesn't tell the whole story.  The "ParseHTML_max" column shows
> the largest single block of time spent in the HTML parser, which is
> more of a measure of the main thread "jank" caused by the parser.
> 
> Antti has pointed out that the inspector isn't the best source of
> data.  He measured total time using instruments, and got numbers that
> are consistent (within a factor of 2) of the inspector measurements.
> (We were using different data sets, so we wouldn't expect perfect
> agreement even if we were measuring precisely the same thing.)
> 
> Adam
> 
> 
>> Mozilla moved their parser onto a separate thread during their HTML5
>> parser re-write:
>> https://developer.mozilla.org/en-US/docs/Mozilla/Gecko/HTML_parser_threading
>> 
>> We plan to take a slightly simpler approach, moving only Tokenizing
>> off of the main thread:
>> https://docs.google.com/drawings/d/1hwYyvkT7HFLAtTX_7LQp2lxA6LkaEWkXONmjtGCQjK0/edit
>> The left is our current design, the middle is a tokenizer-only design,
>> and the right is more like mozilla's threaded-parser design.
>> 
>> Profiling shows Tokenizing accounts for about 10x the number of
>> samples as TreeBuilding.  Including Antti's recent testing (.5% vs.
>> 3%):
>> https://bugs.webkit.org/show_bug.cgi?id=106127#c10
>> If after we do this we measure and find ourselves still spending a lot
>> of main-thread time parsing, we'll move the TreeBuilder too. :)  (This
>> work is a nicely separable sub-set of larger work needed to move the
>> TreeBuilder.)
>> 
>> We welcome your thoughts and comments.
>> 
>> 
>> 1. https://docs.google.com/spreadsheet/ccc?key=0AlC4tS7Ao1fIdGtJTWlSaUItQ1hYaDFDcWkzeVAxOGc#gid=0
>> (Epic thanks to Nat Duca for helping us collect that data.)
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo/webkit-dev



More information about the webkit-dev mailing list