[webkit-dev] Iterating SunSpider

Tue Jul 7 18:43:44 PDT 2009

On Tue, Jul 7, 2009 at 4:45 PM, Maciej Stachowiak <mjs at apple.com> wrote:

>
> On Jul 7, 2009, at 4:28 PM, Mike Belshe wrote:
>
>
>> When SunSpider was first created, regexps were a small proportion of the
>> total execution in what were the fastest publicly available at the time.
>> Eventually, everything else got much faster. So at some point, SunSpider
>> said "it might be a good idea to quadruple the speed of regexp matching
>> now". But if it used a geometric mean, it would always say it's a good idea
>> to quadruple the speed of regexp matching, unless it omitted regexp tests
>> entirely. From any starting point, and regardless of speed of other
>> facilities, speeding up regexps by a factor of N would always show the same
>> improvement in your overall score. SunSpider, on the other hand, was
>> deliberately designed to highlight the area where an engine most needs
>> improvement.
>>
>> I don't think the optimization of regex would have been effected by using
>> a different scoring mechanism.  In both scoring methods, the score of the
>> slowest test is the best pick for improving your overall score.
>>
>
> I don't see how that's the case with geometric means. With a geometric
> mean, the score of the test you can most easily optimize is the best pick,
> assuming you goal is to most improve the overall score. Improving the
> fastest test by a factor of 2 improves your score exactly as much as
> improving the slowest test by a factor of 2. Thus, there is no bias towards
> improving the slowest test, unless there is reason to believe that test
> would be the easiest to optimize. We chose summation specifically to avoid
> this phenomenon - we wanted the benchmark to make us think about what most
> needs improvement, not just what is easiest to optimize.

Usually with performance you end up with an exponentially increasing effort
to squeeze out the same amount of perf.    I've rarely seen a case where a
single test can continually be improved at less effort than going after the
slower test.

I don't think a benchmark is often the right way to ever decide "what most
needs improvement".

> (There are other benchmarks that use summation, for example iBench, though
> I am not sure these are examples of excellent benchmarks. Any benchmark that
> consists of a single test also implicitly uses summation. I'm not sure what
> other benchmarks do is as relevant of the technical merits.)

Hehe - I don't think anyone has iBench except apple :-)

A lot of research has been put into benchmarking over the years; there is
good reason for these choices, and they aren't arbitrary.  I have not seen
research indicating that summing of scores is statistically useful, but
there are plenty that have chosen geometric means.

Mike

>
>
> Regards,
> Maciej
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20090707/bdce4b34/attachment.html>