[Webkit-unassigned] [Bug 172968] Consider using geometric mean in Speedometer 2.0

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Aug 28 20:39:55 PDT 2017


https://bugs.webkit.org/show_bug.cgi?id=172968

Ryosuke Niwa <rniwa at webkit.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ggaren at apple.com,
                   |                            |mjs at apple.com

--- Comment #9 from Ryosuke Niwa <rniwa at webkit.org> ---
(In reply to Mathias Bynens from comment #7)
> (In reply to Ryosuke Niwa from comment #6)
> > Another reason we should probably consider using geomean is that we now have
> > both release & debug builds of Ember.js after
> > https://trac.webkit.org/changeset/221205 and
> > https://trac.webkit.org/changeset/221206.
> > 
> > We did this because we noticed that debug build was 4x slower and therefore
> > constitutes a fundamentally different kind of a test.
> > 
> > However, if we used arithmetic mean to compute the score, then we’re
> > effectively giving 4x more weight to debug build of Ember.js compared to its
> > release build even though only ~5% of websites that use Ember.js use debug
> > builds in production.
> 
> IMHO that just means we should remove the debug build of Ember.js from the
> benchmark altogether.

The goal of the Speedometer benchmark is to measure plausible ways DOM APIs will be used, not necessary only the most popular way, or most optimized way. Since 5% of websites that use ember.js use debug build, we should include it in the benchmark given how radically different its performance characteristics is. Additionally, this doesn't solve the problem that Vue.js contributes less than 1% of the total score whereas Inferno contributes more than 23% at least in Safari.

(In reply to Addy Osmani from comment #8)
> There are a few possible options here:
> 
> 1. Switch to the Geometric mean. Avoids an issue where the lower execution
> times of frameworks like Vue and Preact don't contribute much to the final
> score. Also avoids Speedometer appearing to highlight the cost of some
> frameworks more than others.

We should probably do this.

> 2. Adopt a hybrid approach of measuring both Arithmetic and Geometric means,
> taking an average of the two.

This has one problem that we're still going to give ~2x more weight to debug build of ember.js compared to release build of ember.js

> 3. Minimize the impact to overall scores by excluding the Ember debug build.
> This may not be sufficient alone.

Right. Just removing debug build of ember.js doesn't solve the issue of Inferno account for ~23% of the test score while Vue.js accounts for less than 1%.

> 4. Consider other weighting factors to each implementation to avoid any
> specific framework contributing more to the score than others.

Given we don't know have a good understanding of how popular each framework / library is, I don't think we could reasonably do this. And it's subject to a lot of interpretations and opinions.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20170829/4740ce13/attachment.html>


More information about the webkit-unassigned mailing list