[webkit-dev] PreloadScanner aggressiveness

Thu Jan 7 12:49:07 PST 2010

On Jan 7, 2010, at 12:09 PM, Mike Belshe wrote:

> Hi -
>
> I've been working on SPDY, but I think I may have found a good  
> performance win for HTTP.  Specifically, if the PreloadScanner,  
> which is responsible for scanning ahead within an HTML document to  
> find subresources, is throttled today.  The throttling is  
> intentional and probably sometimes necessary.  Nonetheless, un- 
> throttling it may lead to a 5-10% performance boost in some  
> configurations.  I believe Antti is no longer working on this?  Is  
> there anyone else working in this area that might have data on how  
> aggressive the PreloadScanner should be?  Below I'll describe some  
> of my tests.
>
> The PreloadScanner throttling happens in a couple of ways.  First,  
> the PreloadScanner only runs when we're blocked on JavaScript (see  
> HTMLTokenizer.cpp).  But further, as it discovers resources to be  
> fetched, it may delay or reject loading the subresource at all due  
> to throttling in loader.cpp and DocLoader.cpp.  The throttling is  
> very important, depending on the implementation of the HTTP  
> networking stack, because throwing too many resources (or the low- 
> priority ones) into the network stack could adversely affect HTTP  
> load performance.  This latter problem does not impact my Chromium  
> tests, because the Chromium network stack does its own  
> prioritization and throttling (not too dissimilar from the work done  
> by loader.cpp).

The reason we do this is to prevent head-of-line blocking by low- 
priority resources inside the network stack (mainly considering how  
CFNetwork / NSURLConnection works).

>
> Theory:
> The theory I'm working under is that when the RTT of the network is  
> sufficiently high, the *best* thing the browser can do is to  
> discover resources as quickly as possible and pass them to the  
> network layer so that we can get started with fetching.  This is not  
> speculative - these are resources which will be required to render  
> the full page.   The SPDY protocol is designed around this concept -  
> allowing the browser to schedule all resources it needs to the  
> network (rather than being throttled by connection limits).   
> However, even with SPDY enabled, WebKit itself prevents resource  
> requests from fully flowing to the network layer in 3 ways:
>    a) loader.cpp orders requests and defers requests based on the  
> state of the page load and a number of criteria.
>    b) HTMLTokenizer.cpp only looks for resources further in the body  
> when we're blocked on JS
>    c) "preload" requests are treated specially (docloader.cpp); if  
> they are discovered too early by the tokenizer, then they are either  
> queued or discarded.

I think your theory is correct when SPDY is enabled, and possibly when  
using HTTP with pipelining. It may be true to a lesser extent with non- 
pipelining HTTP implementations when the network stack does its own  
prioritization and throttling, by reducing latency in getting the  
request to the network stack. This is especially so when issuing a  
network request to the network stack may involve significant latency  
due to IPC or cross-thread communication or the like.

>
> Test Case
> Can aggressive preloadscanning (e.g. always preload scan before  
> parsing an HTML Document) improve page load time?
>
> To test this, I'm calling the PreloadScanner basically as the first  
> part of HTMLTokenizer::write().  I've then removed all throttling  
> from loader.cpp and DocLoader.cpp.  I've also instrumented the  
> PreloadScanner to measure its effectiveness.
>
> Benchmark Setup
> Windows client (chromium).
> Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0%  
> packet loss.
> I run through a set of 25 URLs, loading each 30 times; not recycling  
> any connections and clearing the cache between each page.
> These are running over HTTP; there is no SPDY involved here.

I'm interested in the following:

- What kind of results do you get in Safari?
- How much of this effect is due to more aggressive preload scanning  
and how much is due to disabling throttling? Since the test includes  
multiple logically indpendent changes, it is hard to tell which are  
the ones that had an effect.

>
> Results:
> Baseline
> (without my changes)	Unthrottled	Notes
> Average PLT	2377ms	2239ms	+5.8% latency redux.
> Time spent in the PreloadScanner	1160ms	4540ms	As expected, we spend  
> about 4x more time in the PreloadScanner. In this test, we loaded  
> 750 pages, so it is about 6ms per page. My machine is fast, though.
> Preload Scripts discovered	2621	9440	4x more scripts discovered
> Preload CSS discovered	348	1022	3x more CSS discovered
> Preload Images discovered	11952	39144	3x more images discovered
> Preload items throttled	9983	0	
> Preload Complete hits	3803	6950	This is the count of items which  
> were completely preloaded before WebKit even tried to look them up  
> in the cache. This is pure goodness.
> Preload Partial hits	1708	7230	These are partial hits, where the  
> item had already started loading, but not finished, before WebKit  
> tried to look them up.
> Preload Unreferenced	42	130	These are bad and the count should be  
> zero. I'll try to find them and see if there isn't a fix - the  
> PreloadScanner is just sometimes finding resources that are never  
> used. It is likely due to clever JS which changes the DOM.
>
>
>
> Conclusions:
> For this network speed/client processor, more aggressive  
> PreloadScanning clearly is a win.   More testing is needed for  
> slower machines and other network types.  I've tested many network  
> types; the aggressive preload scanning seems to always be either a  
> win or a wash; for very slow network connections, where we're  
> already at capacity, the extra CPU burning is basically free.  For  
> super fast networks, with very low RTT, it also appears to be a  
> wash.  The networks in the middle (including mobile simulations) see  
> nice gains.
>
> Next Steps and Questions:
> I'd like to land my changes so that we can continue to gather data.   
> I can enable these via macro definitions or I can enable these via  
> dynamic settings.  I can then try to do more A/B testing.

I'd like answers to my questions above before we consider that.

>
> Are there any existing web pages which the WebKit team would like  
> tested under these configurations?  I don't see a lot of testing  
> that I can leverage from the initial great work Antti did for  
> verifying that I'm not breaking anything.
>
> Is there any other information or data from the original  
> PreloadScanner work which I should read?

There's the original blog announcement of preload scanning:

http://webkit.org/blog/166/optimizing-page-loading-in-web-browser/

It might be a good idea to try replicating those results with proposed  
changes.

Regards,
Maciej

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20100107/e2a5987f/attachment.html>