[Webkit-unassigned] [Bug 43642] New: SunSpider confidence intervals are questionable

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri Aug 6 14:18:34 PDT 2010


https://bugs.webkit.org/show_bug.cgi?id=43642

           Summary: SunSpider confidence intervals are questionable
           Product: WebKit
           Version: 528+ (Nightly build)
          Platform: All
        OS/Version: All
            Status: UNCONFIRMED
          Severity: Normal
          Priority: P2
         Component: Tools / Tests
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: dmandelin at mozilla.com
                CC: pbiggar at mozilla.com


Most SunSpider users I have talked to take the confidence intervals with a grain of salt, especially the confidence metrics for comparing two runs. In particular, there are a lot of false positives: way more than 5% of things marked "95% significant" are not in fact real differences. 

I looked into this for a while, and I saw that the comparison script uses the t test, which is of course the standard significance test for the difference of two sample means taken from normally distributed data sets. I did some simulations and simple normality tests that show that SunSpider scores for individual benchmarks are not normally distributed. There seem to be 3 main deviations from normality:

1. The scores are integral numbers of milliseconds. This makes the data look very non-normal for short-running tests especially.

2. The differences from the mean are not symmetrical: there are bigger outliers on the high side than on the low side. Related to this is the fact that the range of the scores stops at 0, rather than going down to negative infinity.

3. The tails seem to be much fatter than a normal distribution.

I'm not sure exactly what should be done about this. I think it would be possible to pick a distribution that better fits the benchmark scores, and compute the confidence intervals with that instead. I did some simulations that suggested that multiplying the low range of the confidence interval by 1.5 and the high range by 2 gave something closer to 95% significance.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.



More information about the webkit-unassigned mailing list