[webkit-dev] Iterating SunSpider

Mike Belshe mike at belshe.com
Tue Jul 7 15:01:42 PDT 2009

On Mon, Jul 6, 2009 at 10:11 AM, Geoffrey Garen <ggaren at apple.com> wrote:

>  So, what you end up with is after a couple of years, the slowest test in
>> the suite is the most significant part of the score.  Further, I'll predict
>> that the slowest test will most likely be the least relevant test, because
>> the truly important parts of JS engines were already optimized.  This has
>> happened with Sunspider 0.9 - the regex portions of the test became the
>> dominant factor, even though they were not nearly as prominent in the real
>> world as they were in the benchmark.  This leads to implementors optimizing
>> for the benchmark - and that is not what we want to encourage.
> How did you determine that regex performance is "not nearly as prominent in
> the real world?"

For a while regex was 20-30% of the benchmark on most browsers even though
it didn't consume 20-30% of the time that browsers spent inside javascript.

So, I determined this through profiling.  If you profile your browser while
browsing websites, you won't find that it spends 20-30% of its javascript
execution time running regex (even with the old pcre).  It's more like 1%.
 If this is true, then it's a shame to see this consume 20-30% of any
benchmark, because it means the benchmark scoring is not indicative of the
real world.  Maybe I just disagree with the mix ever having been very
representative?  Or maybe it changed over time?  I don't know because I
can't go back in time :-)  Perhaps one solution is to better document how a
mix is chosen.

I don't really want to make this a debate about regex and he-says/she-says
how expensive it is.  We should talk about the framework.  If the framework
is subject to this type of skew, where it can disproportionately weight a
test, is that something we should avoid?

Keep in mind I'm not recommending any change to existing SunSpider 0.9 -
just changes to future versions.

Maciej pointed out a case where he thought the geometric mean was worse; I
think thats a fair consideration if you have the perfect benchmark with an
exactly representative workload.  But we don't have the ability make a
perfectly representative benchmark workload, and even if we did it would
change over time - eventually making the benchmark useless...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20090707/28990ee4/attachment.html>

More information about the webkit-dev mailing list