[webkit-dev] Iterating SunSpider
mjs at apple.com
Tue Jul 7 17:08:35 PDT 2009
On Jul 7, 2009, at 4:19 PM, Peter Kasting wrote:
> For example, the framework could compute both sums _and_ geomeans,
> if people thought both were valuable.
That's a plausible thing to do, but I think there's a downside: if you
make a change that moves the two scores in opposite directions, the
benchmark doesn't help you decide if it's good or not. Avoiding
paralysis in the face of tradeoffs is part of the reason we look
primarily at the total score, not the individual subtest scores. The
whole point of a meta-benchmark like this is to force ourselves to
simplemindedly look at only one number.
> We could agree on a way of benchmarking a representative sample of
> current sites to get an idea of how widespread certain operations
> currently are. We could talk with the maintainers of jQuery, Dojo,
> etc. to see what sorts of operations they think would be helpful to
> future apps to make faster. We could instrument browsers to have
> some sort of (opt-in) sampling of real-world workloads. etc.
> Surely together we can come up with ways to make Sunspider even
> better, while keeping its current strengths in mind.
I think these are all good ideas. I think there's one way in which
sampling the Web is not quite right. To some extent, what matters is
not average density of an operation but peak density. An operation
that's used a *lot* by a few sites and hardly used by most sites, may
deserve a weighting above its average proportion of Web use. I would
like to hear input on what is inadequately covered. I tend to think
there should be more coverage of the following:
- property access, involving at least some polymorphic access patterns
- method calls
- object-oriented programming patterns
- GC load
- programming in a style that makes significant use of closures
I think the V8 benchmark does a much better job of covering the first
four of these things. I also think it overweights them, to the
exclusion of most other considerations(*). As I mentioned before, I'd
like to include some of V8's tests in a future SunSpider 2.0 content
It would be good to know what other things should be tested that are
not sufficiently covered.
* - For example, Mozilla's TraceMonkey effort showed relatively little
improvement on the V8 benchmark, even though it showed significant
improvement on SunSpider and other benchmarks. I think TraceMonkey
speedups are real and significant, so this would tend to undermine my
confidence in the V8 benchmark's coverage. Note: I don't mean to start
a side thread about whether the V8 benchmark is good or not, I just
wanted to justify my remarks above.
More information about the webkit-dev