[webkit-dev] PreloadScanner aggressiveness

Thu Jan 7 12:09:12 PST 2010

Hi -

I've been working on SPDY, but I think I may have found a good performance
win for HTTP.  Specifically, if the PreloadScanner, which is responsible for
scanning ahead within an HTML document to find subresources, is throttled
today.  The throttling is intentional and probably sometimes necessary.
 Nonetheless, un-throttling it may lead to a 5-10% performance boost in some
configurations.  I believe Antti is no longer working on this?  Is there
anyone else working in this area that might have data on how aggressive the
PreloadScanner should be?  Below I'll describe some of my tests.

The PreloadScanner throttling happens in a couple of ways.  First, the
PreloadScanner only runs when we're blocked on JavaScript (see
HTMLTokenizer.cpp).  But further, as it discovers resources to be fetched,
it may delay or reject loading the subresource at all due to throttling in
loader.cpp and DocLoader.cpp.  The throttling is very important, depending
on the implementation of the HTTP networking stack, because throwing too
many resources (or the low-priority ones) into the network stack could
adversely affect HTTP load performance.  This latter problem does not impact
my Chromium tests, because the Chromium network stack does its own
prioritization and throttling (not too dissimilar from the work done by
loader.cpp).

*Theory*:
The theory I'm working under is that when the RTT of the network is
sufficiently high, the *best* thing the browser can do is to discover
resources as quickly as possible and pass them to the network layer so that
we can get started with fetching.  This is not speculative - these are
resources which will be required to render the full page.   The SPDY
protocol is designed around this concept - allowing the browser to schedule
all resources it needs to the network (rather than being throttled by
connection limits).  However, even with SPDY enabled, WebKit itself prevents
resource requests from fully flowing to the network layer in 3 ways:
   a) loader.cpp orders requests and defers requests based on the state of
the page load and a number of criteria.
   b) HTMLTokenizer.cpp only looks for resources further in the body when
we're blocked on JS
   c) "preload" requests are treated specially (docloader.cpp); if they are
discovered too early by the tokenizer, then they are either queued or
discarded.

*Test Case*
Can aggressive preloadscanning (e.g. always preload scan before parsing an
HTML Document) improve page load time?

To test this, I'm calling the PreloadScanner basically as the first part of
HTMLTokenizer::write().  I've then removed all throttling from loader.cpp
and DocLoader.cpp.  I've also instrumented the PreloadScanner to measure its
effectiveness.

*Benchmark Setup*
Windows client (chromium).
Simulated network with 4Mbps download, 1Mbps upload, 100ms RTT, 0% packet
loss.
I run through a set of 25 URLs, loading each 30 times; not recycling any
connections and clearing the cache between each page.
These are running over HTTP; there is no SPDY involved here.

*Results:*
Baseline
(without my changes)UnthrottledNotesAverage PLT2377ms2239ms+5.8% latency
redux.Time spent in the PreloadScanner1160ms4540msAs expected, we spend
about 4x more time in the PreloadScanner. In this test, we loaded 750 pages,
so it is about 6ms per page. My machine is fast, though.Preload Scripts
discovered262194404x more scripts discoveredPreload CSS discovered34810223x
more CSS discoveredPreload Images discovered11952391443x more images
discoveredPreload items throttled99830Preload Complete hits38036950This is
the count of items which were completely preloaded before WebKit even tried
to look them up in the cache. This is pure goodness.Preload Partial hits1708
7230These are partial hits, where the item had already started loading, but
not finished, before WebKit tried to look them up.Preload
Unreferenced42130These
are bad and the count should be zero. I'll try to find them and see if there
isn't a fix - the PreloadScanner is just sometimes finding resources that
are never used. It is likely due to clever JS which changes the DOM.

*Conclusions:*
For this network speed/client processor, more aggressive PreloadScanning
clearly is a win.   More testing is needed for slower machines and other
network types.  I've tested many network types; the aggressive preload
scanning seems to always be either a win or a wash; for very slow network
connections, where we're already at capacity, the extra CPU burning is
basically free.  For super fast networks, with very low RTT, it also appears
to be a wash.  The networks in the middle (including mobile simulations) see
nice gains.

*Next Steps and Questions:*
I'd like to land my changes so that we can continue to gather data.  I can
enable these via macro definitions or I can enable these via dynamic
settings.  I can then try to do more A/B testing.

Are there any existing web pages which the WebKit team would like tested
under these configurations?  I don't see a lot of testing that I can
leverage from the initial great work Antti did for verifying that I'm not
breaking anything.

Is there any other information or data from the original PreloadScanner work
which I should read?

Thanks!
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20100107/91287870/attachment.html>