[webkit-dev] Iterating SunSpider
mjs at apple.com
Sat Jul 4 15:30:06 PDT 2009
On Jul 4, 2009, at 1:06 PM, Peter Kasting wrote:
> On Sat, Jul 4, 2009 at 11:47 AM, Mike Belshe <mike at belshe.com> wrote:
> #3: The SunSpider harness has a variance problem due to CPU power
> savings modes.
> This one worries me because it decreases the consistency/
> reproducibility of test scores and makes it harder to compare
> engines or to track one engine's scores over time. For example,
> doing a bunch of CPU work just before running the benchmark can
> affect whether and when the CPU throttles down during the benchmark
> Possible solution:
> The dromaeo test suite already incorporates the SunSpider individual
> tests under a new benchmark harness which fixes all 3 of the above
> issues. Thus, one approach would be to retire SunSpider 0.9 in
> favor of Dromaeo. http://dromaeo.com/?sunspider Dromaeo has also
> done a lot of good work to ensure statistical significance of the
> results. Once we have a better benchmarking framework, it would be
> great to build a new microbenchmark mix which more realistically
> One complaint I have heard about the Dromaeo tests (not the harness)
> is that the actual JS that gets run differs from browser to browser
> (e.g. because it is a direct copy of a source library that does UA
> sniffing). If this is true it means that this suite as-is isn't
> useful to compare engines to each other.
> However, the Dromaeo _harness_ is probably a win as-is.
> Of course, changing anything about Sunspider raises the question of
> tracking historical performance. Perhaps the harness could support
> versioning, or perhaps people are simply willing to say "Sunspider
> 1.0 scores cannot be compared to Sunspider 0.9 scores". I believe
> this is the approach the V8 benchmark takes.
I think versioning the test content is right, and I think we should do
that over time. I think a harness change to avoid triggering
powersaving mode on Windows would be a reasonable thing to do to the
harness without a version change. I don't think Dromaeo is a good
choice of harness - I don't think their results are stable enough and
I am not confident in the statistical soundness of their methodology.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the webkit-dev