[webkit-dev] Iterating SunSpider

Tue Jul 7 17:08:35 PDT 2009

On Jul 7, 2009, at 4:19 PM, Peter Kasting wrote:

> For example, the framework could compute both sums _and_ geomeans,  
> if people thought both were valuable.

That's a plausible thing to do, but I think there's a downside: if you  
make a change that moves the two scores in opposite directions, the  
benchmark doesn't help you decide if it's good or not. Avoiding  
paralysis in the face of tradeoffs is part of the reason we look  
primarily at the total score, not the individual subtest scores. The  
whole point of a meta-benchmark like this is to force ourselves to  
simplemindedly look at only one number.

> We could agree on a way of benchmarking a representative sample of  
> current sites to get an idea of how widespread certain operations  
> currently are.  We could talk with the maintainers of jQuery, Dojo,  
> etc. to see what sorts of operations they think would be helpful to  
> future apps to make faster.  We could instrument browsers to have  
> some sort of (opt-in) sampling of real-world workloads.  etc.   
> Surely together we can come up with ways to make Sunspider even  
> better, while keeping its current strengths in mind.

I think these are all good ideas. I think there's one way in which  
sampling the Web is not quite right. To some extent, what matters is  
not average density of an operation but peak density. An operation  
that's used a *lot* by a few sites and hardly used by most sites, may  
deserve a weighting above its average proportion of Web use. I would  
like to hear input on what is inadequately covered. I tend to think  
there should be more coverage of the following:

- property access, involving at least some polymorphic access patterns
- method calls
- object-oriented programming patterns
- GC load
- programming in a style that makes significant use of closures

I think the V8 benchmark does a much better job of covering the first  
four of these things. I also think it overweights them, to the  
exclusion of most other considerations(*). As I mentioned before, I'd  
like to include some of V8's tests in a future SunSpider 2.0 content  
set.

It would be good to know what other things should be tested that are  
not sufficiently covered.

Regards,
Maciej

* - For example, Mozilla's TraceMonkey effort showed relatively little  
improvement on the V8 benchmark, even though it showed significant  
improvement on SunSpider and other benchmarks. I think TraceMonkey  
speedups are real and significant, so this would tend to undermine my  
confidence in the V8 benchmark's coverage. Note: I don't mean to start  
a side thread about whether the V8 benchmark is good or not, I just  
wanted to justify my remarks above.