[webkit-dev] Parallel CSS styling

Eric Seidel eric at webkit.org
Wed Jun 8 15:48:24 PDT 2011


If you are interested in optimization suggestions inside WebKit, I'm
happy to have a lengthy discussion with you over #webkit.

We've talked about moving the HTML5 parser off into its own thread,
which may be a win on some pages.
WebCore's memory usage is way too high (which also affects execution time).

James Robinson, Simon Jameson and others at Google have looked at perf
in the recent past.  Maciej Stachowiak, Geoff Garen, Oliver Hunt and
Gavin Barraclough are also perf experts @ Apple.

We're all reachable in #webkit and happy to talk with you about perf
in WebKit as we're very interested in making things faster!

Again, best of luck with your efforts.  At this time WebKit does not
believe that parallel CSS styling would yield any noticeable
performance increase on standard page loads.

-eric

On Wed, Jun 8, 2011 at 3:31 PM, Eric Seidel <eric at webkit.org> wrote:
> I used Safari's built-in Page Load Test mechanism to test the page.
>
> I created a flickr.pltsuite and placed it in
> /Applications/Safari.app/Contents/Resources/flickr.pltsuite with the
> contents:
> http://www.flickr.com/photos/tags/sanfrancisco
> http://www.flickr.com/photos/tags/sanfrancisco
> http://www.flickr.com/photos/tags/sanfrancisco
> http://www.flickr.com/photos/tags/sanfrancisco
> http://www.flickr.com/photos/tags/sanfrancisco
>
> Using http urls (instead of local file urls) in PLT suites is not
> recommended, as network load causes the results to be wildly
> inconsistent, but does provide an easy way to get Safari to reload the
> same page multiple times (and allows for easy sharking of Safari).
>
> An average run of this PLT took between 7 and 9 seconds on my machine
> (loading this flickr page 5 times in a row):
> FINISHED:    Total Time = 9178.8 ms
>              Mean Time = 1835.8 ms
>  Square-Mean-Root Time = 1816.4 ms
>              Heap Size = 11.61 MB
>
> I Sharked safari while running my flickr plt, and found that
> styleForElement accounted for 0.4% of total time spent
> loading/reloading flickr.com/photos/tags/sanfrancisco.  I've attached
> my shark sample (but doubt it will succeed in going through to the
> list).  Feel free to email me if you want it.
>
>
> For giggles I also loaded up the Web Inspector in Chrome and used the
> Timeline view to record events.
>
> Looking at only the "rendering" events, we see a total of 7 "resolve
> style" events, totally 10ms in time.  Now, it's very possible that our
> internal counters are missing some time spent in rendering, but I
> don't think they're missing anywhere near enough to mean that this
> actually affects page load time.
>
> Although I suspect you're optimizations are very interesting.  I
> believe you're optimizing the wrong things here.
>
> Best of luck.
>
> -eric
>
> On Tue, Jun 7, 2011 at 11:54 AM, Eric Seidel <eric at webkit.org> wrote:
>> You noted it spends 66% of its "CSS time" in StyleForElement.  What
>> about total page load time?
>>
>> Then again 6450ms spent in CSS sounds like a lot of time regardless.
>> Answering what % of total page load time we're spending in CSS (or
>> StyleForElement) is important.
>>
>> Loading www.flickr.com/photos/tags/sanfrancisco on my 4-core 2.66ghz
>> Mac Pro doesn't take anywhere near 6 (or for that mater 9!) seconds,
>> so I'm confused how you got 9s in CSS code on a (supposedly faster)
>> machine loading that flickr page.
>>
>> I'm building WebCore so I can shark the page now.
>>
>> Thanks again for your investigation efforts.
>>
>> -eric
>>
>> On Tue, Jun 7, 2011 at 11:22 AM, Kulanthaivel Palanichamy
>> <kulanthaivel at codeaurora.org> wrote:
>>> Eric,
>>>
>>> You're right that in flickr.com main page, Webkit spends very little time
>>> in StyleForElement. However, if you visit
>>> http://www.flickr.com/photos/tags/sanfrancisco/ , WebKit spends most of
>>> its CSS time in StyleForElement. For example, in our test machine (an
>>> 8-core Intel Xeon, 2.8GHz) StyleForElement takes 6450ms out of  9748 ms
>>> spent on CSS (66%). Our algorithm focuses on that 66%, and makes it scale
>>> linearly. The version of Webkit that we tested includes this patch: Bug
>>> 49876 - Optimize matching of descendant selectors
>>>
>>> Other websites that would benefit:
>>> •       amazon (68% in SFE)
>>> •       Google search (57%)
>>> •       Yahoo sports (56%)
>>> •       Apple (58%)
>>> •       Wikipedia article (65%)
>>>
>>> -Kulanthaivel
>>>
>>>> Do you have statistics on how much total time rendering flickr.com is
>>>> in CSS/Style code at all?  I believe it to be very low.
>>>>
>>>> -eric
>>>>
>>>> On Mon, Jun 6, 2011 at 1:16 PM, Kulanthaivel Palanichamy
>>>> <kulanthaivel at codeaurora.org> wrote:
>>>>> Hi All,
>>>>>
>>>>> At Qualcomm Innovation Center we have been working on a parallel
>>>>> algorithm
>>>>> for CSS styling and wanted to see if there is any interest in the
>>>>> community to see it implemented in WebKit. The overall idea is that we
>>>>> replace CSS matching and styling with a parallel implementation assuming
>>>>> there is a barrier before and after the computation. CSS style
>>>>> application
>>>>> will be performed by the main thread, such that we avoid the need to
>>>>> make
>>>>> thread safe data structures accessed in other passes. The algorithm is
>>>>> task-based, so we would need to implement a thread pool and a simple
>>>>> task
>>>>> scheduler (or maybe use an existing one).
>>>>>
>>>>> In particular, our algorithm requires modifying Element::recalcStyle()
>>>>> and
>>>>> some of the methods it invokes. Code that calls Element::recalcStyle()
>>>>> will not have to be changed. By the time Element::recalStyle() returns,
>>>>> all threads involved on the parallel CSS styling have completed their
>>>>> execution.  Effectively, there is a barrier when Element::recalcStyle()
>>>>> begins and another before it returns.
>>>>>
>>>>> Our experiments show that our CSS computation for complex websites
>>>>> scales
>>>>> rather well. For example, we observed that, for  flickr.com, Webkit
>>>>> spends
>>>>> 75% of its time in CSS doing CSS matching. Thus, our algorithm would
>>>>> give
>>>>> a maximum speedup of 1.6X on 2 cores and 2.3X on 4 cores.
>>>>>
>>>>> Please let us know whether this would be of interest to the community.
>>>>>
>>>>>
>>>>> -Kulanthaivel
>>>>>
>>>>> _______________________________________________
>>>>> webkit-dev mailing list
>>>>> webkit-dev at lists.webkit.org
>>>>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>>>>
>>>>
>>>
>>>
>>>
>>
>


More information about the webkit-dev mailing list