[webkit-dev] Regarding the preload scanner in WebKit

Fri Dec 31 02:28:43 PST 2010

On Fri, Dec 31, 2010 at 2:16 AM, Aneesh Bhasin <contact.aneesh at gmail.com> wrote:
> Hi Adam,
>
> Thanks for the quick reply. A few more questions for my own understanding :
>
> On Thu, Dec 30, 2010 at 10:08 AM, Adam Barth <abarth at webkit.org> wrote:
>> On Wed, Dec 29, 2010 at 8:19 PM, Aneesh Bhasin <contact.aneesh at gmail.com> wrote:
>>> I have been reading about the preload scanner that is supported in the
>>> WebKit since quite some time now (
>>> http://webkit.org/blog/166/optimizing-page-loading-in-web-browser/ ) . As
>>> per my understanding (please correct me if I am wrong), the preload scanner
>>> kicks in when the main parser has halted waiting for some javascript to
>>> load. To better utilize this time, a 'side' parser is started which parses
>>> the HTML to see if more resoursec (esp. scripts and CSS) are there which can
>>> be loaded in parallel.
>>
>> That's correct.
>
> Is this 'side parser' started in a new thread - using pthread etc ? I
> did not see any threading code in WebCore/html/PreloadScanner.*

Nope.  It's run on the main thread while we're waiting for the
network.  We could probably move the preload scanner to another
thread, but we haven't tried that yet.

By the way, the preload scanner is now located at
<http://trac.webkit.org/browser/trunk/WebCore/html/parser/HTMLPreloadScanner.cpp>.
 You might be looking at the old version before we re-wrote the HTML
parser.

>>> I have a few questions regarding this - What exactly is meat by 'loading' in
>>> context of the above - does it mean downloading the script/CSS from the
>>> remote host to the local client
>>
>> Yes.  The preload scanner kicks off the network requests.  We rely on
>> the network machinery to cache the results so they'll be loaded faster
>> when the "real" parser finds them.
>>
>>> or does it also include parsing/ executing
>>> (in case of script) and parsing/rule-list creation (in case of CSS) ?
>>
>> Not yet, but that's certainly an area where we'd consider improving
>> the preload scanner.
>
> Ok, so IIUC, all the actual parsing/execution still happens
> sequentially (the side parser just looks ahead and queues up any
> script/CSS referenced in the html page). And hence if I refer to two
> javascript or CSS files by URL in my html, the second file would not
> be parsed before the first one even if the second got downloaded
> before the first one - right ?

Correct.

> Lastly, the sequential nature of 'real'
> parser also means that there will be no FOUC issues since the HTML
> code after a CSS link is evaluated only after the CSS file has been
> downloaded and parsed - right ?

There's another mechanism that prevents the FOUC.  Basically, we wait
for stylesheets to be loaded before we try rendering the page.  That's
mostly independent of parsing.

> Thanks again for the help..

Good luck!

Adam

>>> Also,
>>> how many such parallel loadings can happen at the same time - is it
>>> configurable in source or by some other API ?
>>
>> The only limit is the network stack.  In most cases, the browser will
>> limit the number of concurrent requests outstanding to a host, but
>> some browsers might find ways to remove that limitation.  For example,
>> if the browser is using a network stack that supports SPDY, the
>> network stack might be able to multiplex many HTTP requests over a
>> single socket.
>>
>>> Thanks for answering the above !
>>
>> My pleasure.
>>
>> Adam
>>
>