<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[182587] trunk/Websites/perf.webkit.org</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/182587">182587</a></dd>
<dt>Author</dt> <dd>rniwa@webkit.org</dd>
<dt>Date</dt> <dd>2015-04-08 21:58:19 -0700 (Wed, 08 Apr 2015)</dd>
</dl>
<h3>Log Message</h3>
<pre>The results of A/B testing should state statistical significance
https://bugs.webkit.org/show_bug.cgi?id=143552
Reviewed by Chris Dumez.
Added statistical comparisons between results for each configuration on analysis task page using
Welch's t-test. The probability as well as t-statistics and the degrees of freedoms are reported.
* public/v2/app.js:
(App.TestGroupPane._populate): Report the list of statistical comparison between every pair of
root configurations in the results. e.g. if we've got A, B, C configurations then compare A/B, A/C
and B/C.
(App.TestGroupPane._computeStatisticalSignificance): Compute the statistical significance using
Welch's t-test. Report the probability by which two samples do not come from the same distribution.
(App.TestGroupPane._createConfigurationSummary): Include the array of results for this configuration.
Also renamed "items" to "requests" for clarity.
* public/v2/index.html: Added the template for showing statistical comparisons.
* public/v2/js/statistics.js: Renamed tDistributionQuantiles to tDistributionByOneSidedProbability
for clarity. Also factored out the functions to convert from one-sided probability to two-sided
probability and vice versa.
(Statistics.supportedConfidenceIntervalProbabilities):
(Statistics.confidenceIntervalDelta):
(Statistics.probabilityRangeForWelchsT): Added. Computes the lower bound and the upper bound for
the probability that two values are sampled from distinct distributions using Welch's t-test.
(Statistics.computeWelchsT): This function now takes two-sided probability like all other functions.
(.tDistributionByOneSidedProbability): Renamed from tDistributionQuantiles.
(.oneSidedToTwoSidedProbability): Extracted.
(.twoSidedToOneSidedProbability): Extracted.
(Statistics.MovingAverageStrategies): Converted the one-sided probability to the two-sided probability
now that computeWelchsT takes two-sided probability.</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkWebsitesperfwebkitorgChangeLog">trunk/Websites/perf.webkit.org/ChangeLog</a></li>
<li><a href="#trunkWebsitesperfwebkitorgpublicv2appjs">trunk/Websites/perf.webkit.org/public/v2/app.js</a></li>
<li><a href="#trunkWebsitesperfwebkitorgpublicv2indexhtml">trunk/Websites/perf.webkit.org/public/v2/index.html</a></li>
<li><a href="#trunkWebsitesperfwebkitorgpublicv2jsstatisticsjs">trunk/Websites/perf.webkit.org/public/v2/js/statistics.js</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkWebsitesperfwebkitorgChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Websites/perf.webkit.org/ChangeLog (182586 => 182587)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Websites/perf.webkit.org/ChangeLog        2015-04-09 04:47:24 UTC (rev 182586)
+++ trunk/Websites/perf.webkit.org/ChangeLog        2015-04-09 04:58:19 UTC (rev 182587)
</span><span class="lines">@@ -1,5 +1,40 @@
</span><span class="cx"> 2015-04-08 Ryosuke Niwa <rniwa@webkit.org>
</span><span class="cx">
</span><ins>+ The results of A/B testing should state statistical significance
+ https://bugs.webkit.org/show_bug.cgi?id=143552
+
+ Reviewed by Chris Dumez.
+
+ Added statistical comparisons between results for each configuration on analysis task page using
+ Welch's t-test. The probability as well as t-statistics and the degrees of freedoms are reported.
+
+ * public/v2/app.js:
+ (App.TestGroupPane._populate): Report the list of statistical comparison between every pair of
+ root configurations in the results. e.g. if we've got A, B, C configurations then compare A/B, A/C
+ and B/C.
+ (App.TestGroupPane._computeStatisticalSignificance): Compute the statistical significance using
+ Welch's t-test. Report the probability by which two samples do not come from the same distribution.
+ (App.TestGroupPane._createConfigurationSummary): Include the array of results for this configuration.
+ Also renamed "items" to "requests" for clarity.
+
+ * public/v2/index.html: Added the template for showing statistical comparisons.
+
+ * public/v2/js/statistics.js: Renamed tDistributionQuantiles to tDistributionByOneSidedProbability
+ for clarity. Also factored out the functions to convert from one-sided probability to two-sided
+ probability and vice versa.
+ (Statistics.supportedConfidenceIntervalProbabilities):
+ (Statistics.confidenceIntervalDelta):
+ (Statistics.probabilityRangeForWelchsT): Added. Computes the lower bound and the upper bound for
+ the probability that two values are sampled from distinct distributions using Welch's t-test.
+ (Statistics.computeWelchsT): This function now takes two-sided probability like all other functions.
+ (.tDistributionByOneSidedProbability): Renamed from tDistributionQuantiles.
+ (.oneSidedToTwoSidedProbability): Extracted.
+ (.twoSidedToOneSidedProbability): Extracted.
+ (Statistics.MovingAverageStrategies): Converted the one-sided probability to the two-sided probability
+ now that computeWelchsT takes two-sided probability.
+
+2015-04-08 Ryosuke Niwa <rniwa@webkit.org>
+
</ins><span class="cx"> Unreviewed fix after r182496 for when the cached runs JSON doesn't exist.
</span><span class="cx">
</span><span class="cx"> * public/v2/app.js:
</span></span></pre></div>
<a id="trunkWebsitesperfwebkitorgpublicv2appjs"></a>
<div class="modfile"><h4>Modified: trunk/Websites/perf.webkit.org/public/v2/app.js (182586 => 182587)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Websites/perf.webkit.org/public/v2/app.js        2015-04-09 04:47:24 UTC (rev 182586)
+++ trunk/Websites/perf.webkit.org/public/v2/app.js        2015-04-09 04:58:19 UTC (rev 182587)
</span><span class="lines">@@ -1334,7 +1334,39 @@
</span><span class="cx"> range.min -= margin;
</span><span class="cx">
</span><span class="cx"> this.set('configurations', configurations);
</span><ins>+
+ var comparisons = [];
+ for (var i = 0; i < configurations.length - 1; i++) {
+ var summary1 = configurations[i].summary;
+ for (var j = i + 1; j < configurations.length; j++) {
+ var summary2 = configurations[j].summary;
+ comparisons.push({
+ label: summary1.configLetter + ' / ' + summary2.configLetter,
+ result: this._computeStatisticalSignificance(summary1.measuredValues, summary2.measuredValues)
+ });
+ }
+ }
+ this.set('comparisons', comparisons);
</ins><span class="cx"> }.observes('testResults', 'buildRequests'),
</span><ins>+ _computeStatisticalSignificance: function (values1, values2)
+ {
+ var tFormatter = d3.format('.3g');
+ var probabilityFormatter = d3.format('.2p');
+ var statistics = Statistics.probabilityRangeForWelchsT(values1, values2);
+ if (isNaN(statistics.t) || isNaN(statistics.degreesOfFreedom))
+ return 'N/A';
+
+ var details = ' (t=' + tFormatter(statistics.t) + ' df=' + tFormatter(statistics.degreesOfFreedom) + ')';
+
+ if (!statistics.range[0])
+ return 'Not statistically significant' + details;
+
+ var lowerLimit = probabilityFormatter(statistics.range[0]);
+ if (!statistics.range[1])
+ return 'Statistical significance > ' + lowerLimit + details;
+
+ return lowerLimit + ' < Statistical significance < ' + probabilityFormatter(statistics.range[1]) + details;
+ },
</ins><span class="cx"> _updateReferenceChart: function ()
</span><span class="cx"> {
</span><span class="cx"> var configurations = this.get('configurations');
</span><span class="lines">@@ -1458,12 +1490,13 @@
</span><span class="cx"> revisionList: summaryRevisions,
</span><span class="cx"> formattedValue: isNaN(mean) ? null : testResults.formatWithDeltaAndUnit(mean, ciDelta),
</span><span class="cx"> value: mean,
</span><ins>+ measuredValues: valuesInConfig,
</ins><span class="cx"> confidenceIntervalDelta: ciDelta,
</span><span class="cx"> valueRange: range,
</span><span class="cx"> statusLabel: App.BuildRequest.aggregateStatuses(requests),
</span><span class="cx"> });
</span><span class="cx">
</span><del>- return Ember.Object.create({summary: summary, items: requests, rootSet: rootSet});
</del><ins>+ return Ember.Object.create({summary: summary, requests: requests, rootSet: rootSet});
</ins><span class="cx"> },
</span><span class="cx"> });
</span><span class="cx">
</span></span></pre></div>
<a id="trunkWebsitesperfwebkitorgpublicv2indexhtml"></a>
<div class="modfile"><h4>Modified: trunk/Websites/perf.webkit.org/public/v2/index.html (182586 => 182587)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Websites/perf.webkit.org/public/v2/index.html        2015-04-09 04:47:24 UTC (rev 182586)
+++ trunk/Websites/perf.webkit.org/public/v2/index.html        2015-04-09 04:58:19 UTC (rev 182587)
</span><span class="lines">@@ -634,7 +634,7 @@
</span><span class="cx"> {{partial "testGroupRow"}}
</span><span class="cx"> {{/with}}
</span><span class="cx"> </tr>
</span><del>- {{#each items}}
</del><ins>+ {{#each requests}}
</ins><span class="cx"> <tr class="request">
</span><span class="cx"> {{#with ../this}}
</span><span class="cx"> <td class="config-letter" {{action toggleShowRequestList this}}></td>
</span><span class="lines">@@ -645,6 +645,19 @@
</span><span class="cx"> {{/each}}
</span><span class="cx"> </tbody>
</span><span class="cx"> {{/each}}
</span><ins>+ {{#each comparisons}}
+ <tbody>
+ <tr>
+ <td colspan="2">{{label}}</td>
+ {{#with ../this}}
+ {{#each repositories}}
+ <td></td>
+ {{/each}}
+ {{/with}}
+ <td colspan="2">{{result}}</td>
+ </tr>
+ </tbody>
+ {{/each}}
</ins><span class="cx"> </table>
</span><span class="cx"> <div class="reference-chart">
</span><span class="cx"> {{#if referenceChart}}
</span></span></pre></div>
<a id="trunkWebsitesperfwebkitorgpublicv2jsstatisticsjs"></a>
<div class="modfile"><h4>Modified: trunk/Websites/perf.webkit.org/public/v2/js/statistics.js (182586 => 182587)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Websites/perf.webkit.org/public/v2/js/statistics.js        2015-04-09 04:47:24 UTC (rev 182586)
+++ trunk/Websites/perf.webkit.org/public/v2/js/statistics.js        2015-04-09 04:58:19 UTC (rev 182587)
</span><span class="lines">@@ -26,21 +26,21 @@
</span><span class="cx">
</span><span class="cx"> this.supportedConfidenceIntervalProbabilities = function () {
</span><span class="cx"> var supportedProbabilities = [];
</span><del>- for (var quantile in tDistributionQuantiles)
- supportedProbabilities.push((1 - (1 - quantile) * 2).toFixed(2));
</del><ins>+ for (var probability in tDistributionByOneSidedProbability)
+ supportedProbabilities.push(oneSidedToTwoSidedProbability(probability).toFixed(2));
</ins><span class="cx"> return supportedProbabilities
</span><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> // Computes the delta d s.t. (mean - d, mean + d) is the confidence interval with the specified probability in O(1).
</span><span class="cx"> this.confidenceIntervalDelta = function (probability, numberOfSamples, sum, squareSum) {
</span><del>- var quantile = (1 - (1 - probability) / 2);
- if (!(quantile in tDistributionQuantiles)) {
</del><ins>+ var oneSidedProbability = twoSidedToOneSidedProbability(probability);
+ if (!(oneSidedProbability in tDistributionByOneSidedProbability)) {
</ins><span class="cx"> throw 'We only support ' + this.supportedConfidenceIntervalProbabilities().map(function (probability)
</span><span class="cx"> { return probability * 100 + '%'; } ).join(', ') + ' confidence intervals.';
</span><span class="cx"> }
</span><span class="cx"> if (numberOfSamples - 2 < 0)
</span><span class="cx"> return NaN;
</span><del>- var deltas = tDistributionQuantiles[quantile];
</del><ins>+ var deltas = tDistributionByOneSidedProbability[oneSidedProbability];
</ins><span class="cx"> var degreesOfFreedom = numberOfSamples - 1;
</span><span class="cx"> if (degreesOfFreedom > deltas.length)
</span><span class="cx"> throw 'We only support up to ' + deltas.length + ' degrees of freedom';
</span><span class="lines">@@ -61,6 +61,25 @@
</span><span class="cx"> return this.computeWelchsT(values1, 0, values1.length, values2, 0, values2.length, probability).significantlyDifferent;
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+ this.probabilityRangeForWelchsT = function (values1, values2) {
+ var result = this.computeWelchsT(values1, 0, values1.length, values2, 0, values2.length);
+ if (isNaN(result.t) || isNaN(result.degreesOfFreedom))
+ return {t: NaN, degreesOfFreedom:NaN, range: [null, null]};
+
+ var lowerBound = null;
+ var upperBound = null;
+ for (var probability in tDistributionByOneSidedProbability) {
+ var twoSidedProbability = oneSidedToTwoSidedProbability(probability);
+ if (result.t > tDistributionByOneSidedProbability[probability][Math.round(result.degreesOfFreedom - 1)])
+ lowerBound = twoSidedProbability;
+ else if (lowerBound) {
+ upperBound = twoSidedProbability;
+ break;
+ }
+ }
+ return {t: result.t, degreesOfFreedom: result.degreesOfFreedom, range: [lowerBound, upperBound]};
+ }
+
</ins><span class="cx"> this.computeWelchsT = function (values1, startIndex1, length1, values2, startIndex2, length2, probability) {
</span><span class="cx"> var stat1 = sampleMeanAndVarianceForValues(values1, startIndex1, length1);
</span><span class="cx"> var stat2 = sampleMeanAndVarianceForValues(values2, startIndex2, length2);
</span><span class="lines">@@ -71,10 +90,11 @@
</span><span class="cx"> var degreesOfFreedom = sumOfSampleVarianceOverSampleSize * sumOfSampleVarianceOverSampleSize
</span><span class="cx"> / (stat1.variance * stat1.variance / stat1.size / stat1.size / stat1.degreesOfFreedom
</span><span class="cx"> + stat2.variance * stat2.variance / stat2.size / stat2.size / stat2.degreesOfFreedom);
</span><ins>+ var minT = tDistributionByOneSidedProbability[twoSidedToOneSidedProbability(probability || 0.8)][Math.round(degreesOfFreedom - 1)];
</ins><span class="cx"> return {
</span><span class="cx"> t: t,
</span><span class="cx"> degreesOfFreedom: degreesOfFreedom,
</span><del>- significantlyDifferent: t > tDistributionQuantiles[probability || 0.9][Math.round(degreesOfFreedom - 1)],
</del><ins>+ significantlyDifferent: t > minT,
</ins><span class="cx"> };
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -118,8 +138,7 @@
</span><span class="cx"> recursivelySplitIntoTwoSegmentsAtMaxTIfSignificantlyDifferent(values, startIndex + argTMax, length - argTMax, minLength, segments);
</span><span class="cx"> }
</span><span class="cx">
</span><del>- // One-sided t-distribution.
- var tDistributionQuantiles = {
</del><ins>+ var tDistributionByOneSidedProbability = {
</ins><span class="cx"> 0.9: [
</span><span class="cx"> 3.077684, 1.885618, 1.637744, 1.533206, 1.475884, 1.439756, 1.414924, 1.396815, 1.383029, 1.372184,
</span><span class="cx"> 1.363430, 1.356217, 1.350171, 1.345030, 1.340606, 1.336757, 1.333379, 1.330391, 1.327728, 1.325341,
</span><span class="lines">@@ -169,6 +188,8 @@
</span><span class="cx"> 2.373270, 2.372687, 2.372119, 2.371564, 2.371022, 2.370493, 2.369977, 2.369472, 2.368979, 2.368497,
</span><span class="cx"> 2.368026, 2.367566, 2.367115, 2.366674, 2.366243, 2.365821, 2.365407, 2.365002, 2.364606, 2.364217]
</span><span class="cx"> };
</span><ins>+ function oneSidedToTwoSidedProbability(probability) { return 2 * probability - 1; }
+ function twoSidedToOneSidedProbability(probability) { return (1 - (1 - probability) / 2); }
</ins><span class="cx">
</span><span class="cx"> this.MovingAverageStrategies = [
</span><span class="cx"> {
</span><span class="lines">@@ -501,7 +522,7 @@
</span><span class="cx"> var results = new Array(values.length);
</span><span class="cx"> var p = false;
</span><span class="cx"> for (var i = 20; i < values.length - 5; i++)
</span><del>- results[i] = Statistics.testWelchsT(values.slice(i - 20, i), values.slice(i, i + 5), 0.99) ? 5 : 0;
</del><ins>+ results[i] = Statistics.testWelchsT(values.slice(i - 20, i), values.slice(i, i + 5), 0.98) ? 5 : 0;
</ins><span class="cx"> return results;
</span><span class="cx"> }
</span><span class="cx"> },
</span></span></pre>
</div>
</div>
</body>
</html>