<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[182587] trunk/Websites/perf.webkit.org</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/182587">182587</a></dd>
<dt>Author</dt> <dd>rniwa@webkit.org</dd>
<dt>Date</dt> <dd>2015-04-08 21:58:19 -0700 (Wed, 08 Apr 2015)</dd>
</dl>

<h3>Log Message</h3>
<pre>The results of A/B testing should state statistical significance
https://bugs.webkit.org/show_bug.cgi?id=143552

Reviewed by Chris Dumez.

Added statistical comparisons between results for each configuration on analysis task page using
Welch's t-test. The probability as well as t-statistics and the degrees of freedoms are reported.

* public/v2/app.js:
(App.TestGroupPane._populate): Report the list of statistical comparison between every pair of
root configurations in the results. e.g. if we've got A, B, C configurations then compare A/B, A/C
and B/C.
(App.TestGroupPane._computeStatisticalSignificance): Compute the statistical significance using
Welch's t-test. Report the probability by which two samples do not come from the same distribution.
(App.TestGroupPane._createConfigurationSummary): Include the array of results for this configuration.
Also renamed &quot;items&quot; to &quot;requests&quot; for clarity.

* public/v2/index.html: Added the template for showing statistical comparisons.

* public/v2/js/statistics.js: Renamed tDistributionQuantiles to tDistributionByOneSidedProbability
for clarity. Also factored out the functions to convert from one-sided probability to two-sided
probability and vice versa.
(Statistics.supportedConfidenceIntervalProbabilities):
(Statistics.confidenceIntervalDelta):
(Statistics.probabilityRangeForWelchsT): Added. Computes the lower bound and the upper bound for
the probability that two values are sampled from distinct distributions using Welch's t-test.
(Statistics.computeWelchsT): This function now takes two-sided probability like all other functions.
(.tDistributionByOneSidedProbability): Renamed from tDistributionQuantiles.
(.oneSidedToTwoSidedProbability): Extracted.
(.twoSidedToOneSidedProbability): Extracted.
(Statistics.MovingAverageStrategies): Converted the one-sided probability to the two-sided probability
now that computeWelchsT takes two-sided probability.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkWebsitesperfwebkitorgChangeLog">trunk/Websites/perf.webkit.org/ChangeLog</a></li>
<li><a href="#trunkWebsitesperfwebkitorgpublicv2appjs">trunk/Websites/perf.webkit.org/public/v2/app.js</a></li>
<li><a href="#trunkWebsitesperfwebkitorgpublicv2indexhtml">trunk/Websites/perf.webkit.org/public/v2/index.html</a></li>
<li><a href="#trunkWebsitesperfwebkitorgpublicv2jsstatisticsjs">trunk/Websites/perf.webkit.org/public/v2/js/statistics.js</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkWebsitesperfwebkitorgChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Websites/perf.webkit.org/ChangeLog (182586 => 182587)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Websites/perf.webkit.org/ChangeLog        2015-04-09 04:47:24 UTC (rev 182586)
+++ trunk/Websites/perf.webkit.org/ChangeLog        2015-04-09 04:58:19 UTC (rev 182587)
</span><span class="lines">@@ -1,5 +1,40 @@
</span><span class="cx"> 2015-04-08  Ryosuke Niwa  &lt;rniwa@webkit.org&gt;
</span><span class="cx"> 
</span><ins>+        The results of A/B testing should state statistical significance
+        https://bugs.webkit.org/show_bug.cgi?id=143552
+
+        Reviewed by Chris Dumez.
+
+        Added statistical comparisons between results for each configuration on analysis task page using
+        Welch's t-test. The probability as well as t-statistics and the degrees of freedoms are reported.
+
+        * public/v2/app.js:
+        (App.TestGroupPane._populate): Report the list of statistical comparison between every pair of
+        root configurations in the results. e.g. if we've got A, B, C configurations then compare A/B, A/C
+        and B/C.
+        (App.TestGroupPane._computeStatisticalSignificance): Compute the statistical significance using
+        Welch's t-test. Report the probability by which two samples do not come from the same distribution.
+        (App.TestGroupPane._createConfigurationSummary): Include the array of results for this configuration.
+        Also renamed &quot;items&quot; to &quot;requests&quot; for clarity.
+
+        * public/v2/index.html: Added the template for showing statistical comparisons.
+
+        * public/v2/js/statistics.js: Renamed tDistributionQuantiles to tDistributionByOneSidedProbability
+        for clarity. Also factored out the functions to convert from one-sided probability to two-sided
+        probability and vice versa.
+        (Statistics.supportedConfidenceIntervalProbabilities):
+        (Statistics.confidenceIntervalDelta):
+        (Statistics.probabilityRangeForWelchsT): Added. Computes the lower bound and the upper bound for
+        the probability that two values are sampled from distinct distributions using Welch's t-test.
+        (Statistics.computeWelchsT): This function now takes two-sided probability like all other functions.
+        (.tDistributionByOneSidedProbability): Renamed from tDistributionQuantiles.
+        (.oneSidedToTwoSidedProbability): Extracted.
+        (.twoSidedToOneSidedProbability): Extracted.
+        (Statistics.MovingAverageStrategies): Converted the one-sided probability to the two-sided probability
+        now that computeWelchsT takes two-sided probability.
+
+2015-04-08  Ryosuke Niwa  &lt;rniwa@webkit.org&gt;
+
</ins><span class="cx">         Unreviewed fix after r182496 for when the cached runs JSON doesn't exist.
</span><span class="cx"> 
</span><span class="cx">         * public/v2/app.js:
</span></span></pre></div>
<a id="trunkWebsitesperfwebkitorgpublicv2appjs"></a>
<div class="modfile"><h4>Modified: trunk/Websites/perf.webkit.org/public/v2/app.js (182586 => 182587)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Websites/perf.webkit.org/public/v2/app.js        2015-04-09 04:47:24 UTC (rev 182586)
+++ trunk/Websites/perf.webkit.org/public/v2/app.js        2015-04-09 04:58:19 UTC (rev 182587)
</span><span class="lines">@@ -1334,7 +1334,39 @@
</span><span class="cx">         range.min -= margin;
</span><span class="cx"> 
</span><span class="cx">         this.set('configurations', configurations);
</span><ins>+
+        var comparisons = [];
+        for (var i = 0; i &lt; configurations.length - 1; i++) {
+            var summary1 = configurations[i].summary;
+            for (var j = i + 1; j &lt; configurations.length; j++) {
+                var summary2 = configurations[j].summary;
+                comparisons.push({
+                    label: summary1.configLetter + ' / ' + summary2.configLetter,
+                    result: this._computeStatisticalSignificance(summary1.measuredValues, summary2.measuredValues)
+                });
+            }
+        }
+        this.set('comparisons', comparisons);
</ins><span class="cx">     }.observes('testResults', 'buildRequests'),
</span><ins>+    _computeStatisticalSignificance: function (values1, values2)
+    {
+        var tFormatter = d3.format('.3g');
+        var probabilityFormatter = d3.format('.2p');
+        var statistics = Statistics.probabilityRangeForWelchsT(values1, values2);
+        if (isNaN(statistics.t) || isNaN(statistics.degreesOfFreedom))
+            return 'N/A';
+
+        var details = ' (t=' + tFormatter(statistics.t) + ' df=' + tFormatter(statistics.degreesOfFreedom) + ')';
+
+        if (!statistics.range[0])
+            return 'Not statistically significant' + details;
+
+        var lowerLimit = probabilityFormatter(statistics.range[0]);
+        if (!statistics.range[1])
+            return 'Statistical significance &gt; ' + lowerLimit + details;
+
+        return lowerLimit + ' &lt; Statistical significance &lt; ' + probabilityFormatter(statistics.range[1]) + details;
+    },
</ins><span class="cx">     _updateReferenceChart: function ()
</span><span class="cx">     {
</span><span class="cx">         var configurations = this.get('configurations');
</span><span class="lines">@@ -1458,12 +1490,13 @@
</span><span class="cx">             revisionList: summaryRevisions,
</span><span class="cx">             formattedValue: isNaN(mean) ? null : testResults.formatWithDeltaAndUnit(mean, ciDelta),
</span><span class="cx">             value: mean,
</span><ins>+            measuredValues: valuesInConfig,
</ins><span class="cx">             confidenceIntervalDelta: ciDelta,
</span><span class="cx">             valueRange: range,
</span><span class="cx">             statusLabel: App.BuildRequest.aggregateStatuses(requests),
</span><span class="cx">         });
</span><span class="cx"> 
</span><del>-        return Ember.Object.create({summary: summary, items: requests, rootSet: rootSet});
</del><ins>+        return Ember.Object.create({summary: summary, requests: requests, rootSet: rootSet});
</ins><span class="cx">     },
</span><span class="cx"> });
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkWebsitesperfwebkitorgpublicv2indexhtml"></a>
<div class="modfile"><h4>Modified: trunk/Websites/perf.webkit.org/public/v2/index.html (182586 => 182587)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Websites/perf.webkit.org/public/v2/index.html        2015-04-09 04:47:24 UTC (rev 182586)
+++ trunk/Websites/perf.webkit.org/public/v2/index.html        2015-04-09 04:58:19 UTC (rev 182587)
</span><span class="lines">@@ -634,7 +634,7 @@
</span><span class="cx">                                 {{partial &quot;testGroupRow&quot;}}
</span><span class="cx">                             {{/with}}
</span><span class="cx">                         &lt;/tr&gt;
</span><del>-                        {{#each items}}
</del><ins>+                        {{#each requests}}
</ins><span class="cx">                             &lt;tr class=&quot;request&quot;&gt;
</span><span class="cx">                                 {{#with ../this}}
</span><span class="cx">                                     &lt;td class=&quot;config-letter&quot; {{action toggleShowRequestList this}}&gt;&lt;/td&gt;
</span><span class="lines">@@ -645,6 +645,19 @@
</span><span class="cx">                         {{/each}}
</span><span class="cx">                     &lt;/tbody&gt;
</span><span class="cx">                 {{/each}}
</span><ins>+                {{#each comparisons}}
+                    &lt;tbody&gt;
+                        &lt;tr&gt;
+                            &lt;td colspan=&quot;2&quot;&gt;{{label}}&lt;/td&gt;
+                            {{#with ../this}}
+                                {{#each repositories}}
+                                    &lt;td&gt;&lt;/td&gt;
+                                {{/each}}
+                            {{/with}}
+                            &lt;td colspan=&quot;2&quot;&gt;{{result}}&lt;/td&gt;
+                        &lt;/tr&gt;
+                    &lt;/tbody&gt;
+                {{/each}}
</ins><span class="cx">             &lt;/table&gt;
</span><span class="cx">             &lt;div class=&quot;reference-chart&quot;&gt;
</span><span class="cx">                 {{#if referenceChart}}
</span></span></pre></div>
<a id="trunkWebsitesperfwebkitorgpublicv2jsstatisticsjs"></a>
<div class="modfile"><h4>Modified: trunk/Websites/perf.webkit.org/public/v2/js/statistics.js (182586 => 182587)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Websites/perf.webkit.org/public/v2/js/statistics.js        2015-04-09 04:47:24 UTC (rev 182586)
+++ trunk/Websites/perf.webkit.org/public/v2/js/statistics.js        2015-04-09 04:58:19 UTC (rev 182587)
</span><span class="lines">@@ -26,21 +26,21 @@
</span><span class="cx"> 
</span><span class="cx">     this.supportedConfidenceIntervalProbabilities = function () {
</span><span class="cx">         var supportedProbabilities = [];
</span><del>-        for (var quantile in tDistributionQuantiles)
-            supportedProbabilities.push((1 - (1 - quantile) * 2).toFixed(2));
</del><ins>+        for (var probability in tDistributionByOneSidedProbability)
+            supportedProbabilities.push(oneSidedToTwoSidedProbability(probability).toFixed(2));
</ins><span class="cx">         return supportedProbabilities
</span><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     // Computes the delta d s.t. (mean - d, mean + d) is the confidence interval with the specified probability in O(1).
</span><span class="cx">     this.confidenceIntervalDelta = function (probability, numberOfSamples, sum, squareSum) {
</span><del>-        var quantile = (1 - (1 - probability) / 2);
-        if (!(quantile in tDistributionQuantiles)) {
</del><ins>+        var oneSidedProbability = twoSidedToOneSidedProbability(probability);
+        if (!(oneSidedProbability in tDistributionByOneSidedProbability)) {
</ins><span class="cx">             throw 'We only support ' + this.supportedConfidenceIntervalProbabilities().map(function (probability)
</span><span class="cx">             { return probability * 100 + '%'; } ).join(', ') + ' confidence intervals.';
</span><span class="cx">         }
</span><span class="cx">         if (numberOfSamples - 2 &lt; 0)
</span><span class="cx">             return NaN;
</span><del>-        var deltas = tDistributionQuantiles[quantile];
</del><ins>+        var deltas = tDistributionByOneSidedProbability[oneSidedProbability];
</ins><span class="cx">         var degreesOfFreedom = numberOfSamples - 1;
</span><span class="cx">         if (degreesOfFreedom &gt; deltas.length)
</span><span class="cx">             throw 'We only support up to ' + deltas.length + ' degrees of freedom';
</span><span class="lines">@@ -61,6 +61,25 @@
</span><span class="cx">         return this.computeWelchsT(values1, 0, values1.length, values2, 0, values2.length, probability).significantlyDifferent;
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    this.probabilityRangeForWelchsT = function (values1, values2) {
+        var result = this.computeWelchsT(values1, 0, values1.length, values2, 0, values2.length);
+        if (isNaN(result.t) || isNaN(result.degreesOfFreedom))
+            return {t: NaN, degreesOfFreedom:NaN, range: [null, null]};
+
+        var lowerBound = null;
+        var upperBound = null;
+        for (var probability in tDistributionByOneSidedProbability) {
+            var twoSidedProbability = oneSidedToTwoSidedProbability(probability);
+            if (result.t &gt; tDistributionByOneSidedProbability[probability][Math.round(result.degreesOfFreedom - 1)])
+                lowerBound = twoSidedProbability;
+            else if (lowerBound) {
+                upperBound = twoSidedProbability;
+                break;
+            }
+        }
+        return {t: result.t, degreesOfFreedom: result.degreesOfFreedom, range: [lowerBound, upperBound]};
+    }
+
</ins><span class="cx">     this.computeWelchsT = function (values1, startIndex1, length1, values2, startIndex2, length2, probability) {
</span><span class="cx">         var stat1 = sampleMeanAndVarianceForValues(values1, startIndex1, length1);
</span><span class="cx">         var stat2 = sampleMeanAndVarianceForValues(values2, startIndex2, length2);
</span><span class="lines">@@ -71,10 +90,11 @@
</span><span class="cx">         var degreesOfFreedom = sumOfSampleVarianceOverSampleSize * sumOfSampleVarianceOverSampleSize
</span><span class="cx">             / (stat1.variance * stat1.variance / stat1.size / stat1.size / stat1.degreesOfFreedom
</span><span class="cx">                 + stat2.variance * stat2.variance / stat2.size / stat2.size / stat2.degreesOfFreedom);
</span><ins>+        var minT = tDistributionByOneSidedProbability[twoSidedToOneSidedProbability(probability || 0.8)][Math.round(degreesOfFreedom - 1)];
</ins><span class="cx">         return {
</span><span class="cx">             t: t,
</span><span class="cx">             degreesOfFreedom: degreesOfFreedom,
</span><del>-            significantlyDifferent: t &gt; tDistributionQuantiles[probability || 0.9][Math.round(degreesOfFreedom - 1)],
</del><ins>+            significantlyDifferent: t &gt; minT,
</ins><span class="cx">         };
</span><span class="cx">     }
</span><span class="cx"> 
</span><span class="lines">@@ -118,8 +138,7 @@
</span><span class="cx">         recursivelySplitIntoTwoSegmentsAtMaxTIfSignificantlyDifferent(values, startIndex + argTMax, length - argTMax, minLength, segments);
</span><span class="cx">     }
</span><span class="cx"> 
</span><del>-    // One-sided t-distribution.
-    var tDistributionQuantiles = {
</del><ins>+    var tDistributionByOneSidedProbability = {
</ins><span class="cx">         0.9: [
</span><span class="cx">             3.077684, 1.885618, 1.637744, 1.533206, 1.475884, 1.439756, 1.414924, 1.396815, 1.383029, 1.372184,
</span><span class="cx">             1.363430, 1.356217, 1.350171, 1.345030, 1.340606, 1.336757, 1.333379, 1.330391, 1.327728, 1.325341,
</span><span class="lines">@@ -169,6 +188,8 @@
</span><span class="cx">             2.373270, 2.372687, 2.372119, 2.371564, 2.371022, 2.370493, 2.369977, 2.369472, 2.368979, 2.368497,
</span><span class="cx">             2.368026, 2.367566, 2.367115, 2.366674, 2.366243, 2.365821, 2.365407, 2.365002, 2.364606, 2.364217]
</span><span class="cx">     };
</span><ins>+    function oneSidedToTwoSidedProbability(probability) { return 2 * probability - 1; }
+    function twoSidedToOneSidedProbability(probability) { return (1 - (1 - probability) / 2); }
</ins><span class="cx"> 
</span><span class="cx">     this.MovingAverageStrategies = [
</span><span class="cx">         {
</span><span class="lines">@@ -501,7 +522,7 @@
</span><span class="cx">                 var results = new Array(values.length);
</span><span class="cx">                 var p = false;
</span><span class="cx">                 for (var i = 20; i &lt; values.length - 5; i++)
</span><del>-                    results[i] = Statistics.testWelchsT(values.slice(i - 20, i), values.slice(i, i + 5), 0.99) ? 5 : 0;
</del><ins>+                    results[i] = Statistics.testWelchsT(values.slice(i - 20, i), values.slice(i, i + 5), 0.98) ? 5 : 0;
</ins><span class="cx">                 return results;
</span><span class="cx">             }
</span><span class="cx">         },
</span></span></pre>
</div>
</div>

</body>
</html>