[webkit-dev] Iterating SunSpider

Sat Jul 4 11:47:28 PDT 2009

I'd like to understand what's going to happen with SunSpider in the future.
 Here is a set of questions and criticisms.  I'm interested in how these can
be addressed.

There are 3 areas I'd like to see improved in
SunSpider, some of which we've discussed before:

#1: SunSpider is currently version 0.9.  Will SunSpider ever change?
Or is it static?
I believe that benchmarks need to be able to
move with the times.  As JS Engines change and improve, and as new
areas are needed
to be benchmarked, we need to be able to roll the version, fix bugs, and
benchmark new features.  The SunSpider version has not changed for ~2yrs.
 How can we change this situation?  Are there plans for a new version
already underway?

#2: Use of summing as a scoring mechanism is problematic
Unfortunately, the sum-based scoring techniques do not withstand the test of
time as browsers improve.  When the benchmark was first introduced, each
test was equally weighted and reasonably large.  Over time, however, the
test becomes dominated by the slowest tests - basically the weighting of the
individual tests is variable based on the performance of the JS engine under
test.  Today's engines spend ~50% of their time on just string and date
tests.  The other tests are largely irrelevant at this point, and becoming
less relevant every day.  Eventually many of the tests will take near-zero
time, and the benchmark will have to be scrapped unless we figure out a
better way to score it.  Benchmarking research which long pre-dates
SunSpider confirms that geometric means provide a better basis for
comparison:  http://portal.acm.org/citation.cfm?id=5673 Can future versions
of the SunSpider driver be made so that they won't become irrelevant over
time?

#3: The SunSpider harness has a variance problem due to CPU power savings
modes.
Because the test runs a tiny amount of Javascript (often under 10ms)
followed by a 500ms sleep, CPUs will go into power savings modes between
test runs.  This radically changes the performance measurements and makes it
so that comparison between two runs is dependent on the user's power savings
mode.  To demonstrate this, run SunSpider on two machines- one with the
Windows "balanced" (default) setting for power, and then again with "high
performance".  It's easy to see skews of 30% between these two modes.  I
think we should change the test harness to avoid such accidental effects.

(BTW - if you change SunSpider's sleep from 500ms  to 10ms, the test runs in
just a few seconds.  It is unclear to me why the pauses are so large.  My
browser gets a 650ms score, so run 5 times, that test should take ~3000ms.
 But due to the pauses, it takes over 1 minute to run test, leaving the CPU
~96% idle).

Possible solution:
The dromaeo test suite already incorporates the SunSpider individual tests
under a new benchmark harness which fixes all 3 of the above issues.   Thus,
one approach would be to retire SunSpider 0.9 in favor of Dromaeo.
http://dromaeo.com/?sunspider  Dromaeo has also done a lot of good work to
ensure statistical significance of the results.  Once we have a better
benchmarking framework, it would be great to build a new microbenchmark mix
which more realistically exercises today's JavaScript.

Thanks,
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20090704/82247f5b/attachment.html>