[webkit-dev] Feature Announcement: Moving HTML Parser off the Main Thread

Thu Jan 10 23:00:15 PST 2013

Adam,

Thanks for your detailed reply. Seems like you guys have a pretty good plan in place. 

I hope this works and produces a performance improvement. That being said this does look like a sufficiently complex work item that success is far from guaranteed. So to play devil's advocate, what is your plan for if this doesn't work out?

I.e. are we talking about adding a bunch of threading support code in the optimistic hope that it makes things run fast, and then forgetting about it if it doesn't?  Or are you prepared to roll put any complexity that got landed if this does not ultimately live up to promise?  Or is this going to be one giant patch that only lands if it works?

I'm also trying to understand what would happen during the interim when this work is incomplete, we have thread-related goop in some critical paths, and we don't yet know if the WIP code is ever going to result in a speedup. And also, what will happen sometime from now if that code is never successfully optimized to the point where it is worth enabling. 

I appreciate that this sort of question can be asked of any performance work but in this particular case my gut tells me that this is going to result in significantly more complexity than the usual incremental performance work. So it's good to understand what plan B is. 

Probably a good answer to this sort of question would address some fears that people may have. If this work does lead to a performance win then probably everyone will be happy. But if it doesn't then it would be great to have a "plan of retreat". 

-Filip

Dnia 10 sty 2013 o godz. 12:07 Adam Barth <abarth at webkit.org> napisał(a):

> Thanks everyone for your feedback.  Detailed responses inline.
> 
> On Wed, Jan 9, 2013 at 9:41 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>> I think your biggest challenge will be ensuring that the latency of shoving things to another core and then shoving them back will be smaller than the latency of processing those same things on the main thread.
> 
> Yes.  That's something we know we have to worry about.  Given that we
> need to retain the ability to parse HTML on the main thread to handle
> document.write and innerHTML, we should be able to easily do A/B
> comparisons to make sure we understand any performance trade-offs that
> might arise.
> 
>> For small documents, I expect concurrent tokenization to be a pure regression because the latency of waking up another thread to do just a small bit of work, plus the added cost of whatever synchronization operations will be needed to ensure safety, will involve more total work than just tokenizing locally.
> 
> Once we have the ability to tokenize on a background thread, we can
> examine cases like these and heuristically decide whether to use the
> background thread or not at runtime.  As I wrote above, we'll need
> these ability anyway, so keeping the ability to optimize these cases
> shouldn't add any new constraints to the design.
> 
>> We certainly see this in the JSC parallel GC, and in line with traditional parallel GC design, we ensure that parallel threads only kick in when the main thread is unable to keep up with the work that it has created for itself.
>> 
>> Do you have a vision for how to implement a similar self-throttling, where tokenizing continues on the main thread so long as it is cheap to do so?
> 
> It's certainly something we can tune in the optimization phase.  I
> don't think we need a particular vision to be able to do it.  Given
> that we want to implement speculative parsing (to replace preload
> scanning---more on this below), we'll already have the ability to
> checkpoint and restore the tokenizer state across threads.  Once you
> have that primitive, it's easy to decide whether to continue
> tokenization on the main thread or on a background thread.
> 
> On Wed, Jan 9, 2013 at 10:04 PM, Ian Hickson <ian at hixie.ch> wrote:
>> Parsing and (maybe to a lesser extent) compiling JS can be moved off the
>> main thread, though, right? That's probably worth examining too, if it
>> hasn't already been done.
> 
> Yes, once we have the tokenizer running on a background thread, that
> opens up the possibility of parsing other sorts of data on the
> background thread as well.  For example, when the tokenizer encounters
> an inline script block, you could imagine parsing the script on the
> background thread as well so that the main thread has less work to do.
> (You could also imagine making the optimizations without a background
> tokenizer, but the design constraints would be a bit different.)
> 
> On Thu, Jan 10, 2013 at 12:11 AM, Zoltan Herczeg <zherczeg at webkit.org> wrote:
>> Parsing, especially JS parsing still takes a large amount of time on page
>> loading. We tried to improve the preload scanner by moving it into
>> anouther thread, but there was no gain (except some special cases).
>> Synchronization between threads is surprisingly (ridiculously) costly,
>> usually worth for those tasks, which needs quite a few million
>> instructions to be executed (and tokenization takes far less in most
>> cases). For smaller tasks, SIMD instruction sets can help, which is
>> basically a parallel execution on a single thread. Anyway it is worth
>> trying, but it is really challenging to make it work in practice. Good
>> luck!
> 
> This is something we're worried about and will need to be careful
> about.  In the design we're proposing, preload scanning is replaced by
> speculative parsing, so the overhead of the preload scanner is removed
> entirely.  The way this works is a follows:
> 
> When running on the background thread, the tokenizer produces a queue
> of PickledTokens.  As these tokens are queued, we can scan them to
> kick off any preloads that we find.  Whenever the tokenizer queues a
> token that creates a new insertion point (in the terminology of the
> HTML specification), the tokenizer checkpoints itself but continues
> tokenizing speculatively.  (Notice that tokens produced in this
> situation are still scanned for preloads but might not ever actually
> result in DOM being constructed.)
> 
> After the main thread has processed the token that created the
> insertion point, if no characters were inserted, the main thread
> continues processing PickledTokens that were created speculative.  If
> some characters were inserted, the main thread instead instructs the
> tokenizer to roll back to that checkpoint and continue tokenizing in a
> new state.  In this case, the queue of speculative tokens is
> discarded.
> 
> Notice that in the common case, we're execute JavaScript and tokenize
> in parallel, something that's not possible with a main-thread
> tokenizer.  Once the script is done executing, we expect it to be
> common to be able to result tree building immediately as the tokenizer
> will have already tokenized much of the subsequent data.
> 
> On Thu, Jan 10, 2013 at 12:37 AM, Maciej Stachowiak <mjs at apple.com> wrote:
>> I presume from your other comments that the goal of this work is responsiveness, rather than page load speed as such. I'm excited about the potential to improve responsiveness during page loading.
> 
> The goals are described in the first link Eric gave in his email:
> <https://bugs.webkit.org/show_bug.cgi?id=106127#c0>.  Specifically:
> 
> ---8<---
> 1) Moving parsing off the main thread could make web pages more
> responsive because the main thread is available for handling input
> events and executing JavaScript.
> 2) Moving parsing off the main thread could make web pages load more
> quickly because WebCore can do other work in parallel with parsing
> HTML (such as parsing CSS or attaching elements to the render tree).
> --->8---
> 
>> One question: what tests are you planning to use to validate whether this approach achieves its goals of better responsiveness?
> 
> The tests we've run so far are also described in the first link Eric
> gave in his email: <https://bugs.webkit.org/show_bug.cgi?id=106127>.
> They suggest that there's a good deal of room for improvement in this
> area.  After we have a working implementation, we'll likely re-run
> those experiments and run other experiments to do an A/B comparison of
> the two approaches.  As Filip points out, we'll likely end up with a
> hybrid of the two designs that's optimized for handling various work
> loads.
> 
>> The reason I ask is that this sounds like a significant increase in complexity, so we should be very confident that there is a real and major benefit. One thing I wonder about is how common it is to have enough of the page processed that the user could interact with it in principle, yet still have large parsing chunks remaining which would prevent that interaction from being smooth.
> 
> If you're interested in reducing the complexity of the parser, I'd
> recommend removing the NEW_XML code.  As previously discussed, that
> code creates significant complexity for zero benefit.
> 
>> Another thing I wonder about is whether yielding to the event loop more aggressively could achieve a similar benefit at a much lower complexity cost.
> 
> Yielding to the event loop more could reduce the "ParseHTML_max" time,
> but it cannot reduce the "ParseHTML" time.  Generally speaking,
> yielding to the event loop is a trade-off between throughput (i.e.,
> page load time) and responsiveness.  Moving work to a background
> thread should let us achieve a better trade-off between these
> quantities than we're likely to be able to achieve by tuning the yield
> parameter alone.
> 
>> Having a test to drive the work would allow us to answer these types of questions. (It may also be that the test data you cited would already answer these questions but I didn't sufficiently understand it; if so, further explanation would be appreciated.)
> 
> If you're interested in building such a test, I would be interested in
> hearing the results.  We don't plan to build such a test at this time.
> 
> On Thu, Jan 10, 2013 at 1:44 AM, Antti Koivisto <koivisto at iki.fi> wrote:
>> When loading web pages we are very frequently in a situation where we
>> already have the source data (HTML text here but the same applies to
>> preloaded Javascript, CSS, images, ...) and know we are likely to need it in
>> soon, but can't actually utilize it for indeterminate time. This happens
>> because pending external JS resources blocks the main parser (and pending
>> CSS resources block JS execution) for web compatibility reasons. In this
>> situation it makes sense to start processing resources we have to forms that
>> are faster to use when they are eventually actually needed (like token
>> stream here).
> 
> Indeed.
> 
>> One thing we already do when the main parser gets blocked is preload
>> scanning. We look through the unparsed HTML source we have and trigger loads
>> for any resources found. It would be beneficial if this happened off the
>> main thread. We could do it when new data arrives in parallel with JS
>> execution and other time consuming engine work, potentially triggering
>> resource loads earlier.
> 
> A couple people have tried to move preload scanning to a background
> thread, but they haven't had much success.  Given that moving the
> parser to a background thread gets us background preload scanning for
> free, I don't think it's worth investing effort in moving just the
> preload scanner anymore.
> 
>> I think a good first step here would be to share the tokens between the
>> preload scanner and the main parser and worry about the threading part
>> afterwards. We often parse the HTML source more or less twice so this is an
>> unquestionable win.
> 
> We've discussed doing that for a number of years, but no one has
> actually succeeded in doing it.  Given that moving the parsing to a
> background thread gets us token reuse for free (because of the switch
> from preload scanning to speculative tokenization), I don't think it's
> worth investing effort in reusing the preload scanner's tokens
> anymore.
> 
> Adam
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo/webkit-dev