[webkit-dev] Iterating SunSpider
Mike Belshe
mike at belshe.com
Tue Jul 7 15:01:42 PDT 2009
On Mon, Jul 6, 2009 at 10:11 AM, Geoffrey Garen <ggaren at apple.com> wrote:
> So, what you end up with is after a couple of years, the slowest test in
>> the suite is the most significant part of the score. Further, I'll predict
>> that the slowest test will most likely be the least relevant test, because
>> the truly important parts of JS engines were already optimized. This has
>> happened with Sunspider 0.9 - the regex portions of the test became the
>> dominant factor, even though they were not nearly as prominent in the real
>> world as they were in the benchmark. This leads to implementors optimizing
>> for the benchmark - and that is not what we want to encourage.
>>
>
> How did you determine that regex performance is "not nearly as prominent in
> the real world?"
>
For a while regex was 20-30% of the benchmark on most browsers even though
it didn't consume 20-30% of the time that browsers spent inside javascript.
So, I determined this through profiling. If you profile your browser while
browsing websites, you won't find that it spends 20-30% of its javascript
execution time running regex (even with the old pcre). It's more like 1%.
If this is true, then it's a shame to see this consume 20-30% of any
benchmark, because it means the benchmark scoring is not indicative of the
real world. Maybe I just disagree with the mix ever having been very
representative? Or maybe it changed over time? I don't know because I
can't go back in time :-) Perhaps one solution is to better document how a
mix is chosen.
I don't really want to make this a debate about regex and he-says/she-says
how expensive it is. We should talk about the framework. If the framework
is subject to this type of skew, where it can disproportionately weight a
test, is that something we should avoid?
Keep in mind I'm not recommending any change to existing SunSpider 0.9 -
just changes to future versions.
Maciej pointed out a case where he thought the geometric mean was worse; I
think thats a fair consideration if you have the perfect benchmark with an
exactly representative workload. But we don't have the ability make a
perfectly representative benchmark workload, and even if we did it would
change over time - eventually making the benchmark useless...
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20090707/28990ee4/attachment.html>
More information about the webkit-dev
mailing list