On Tue, Oct 19, 2010 at 8:42 AM, Alexey Proskuryakov <ap@webkit.org> wrote:
15.10.2010, в 07:39, Eric Seidel написал(а):
BTW, the commit-queue has started complaining publicly about flaky tests:
https://bugs.webkit.org/show_bug.cgi?id=47698#c5
Hopefully this will bring further awareness to the issue.
I find this extremely annoying and offensive. Half of my bugmail is already about bugs that I'm not interested in.
Sorry Alexey, I certainly didn't intend to offend you. The problem we're trying to solve is currently there is no feedback loop for authors of flaky tests. If someone writes a flaky test, there's no mechanism for them to find out about it. It just sticks around and causes pain for everyone else. The idea behind this change is to create a feedback loop whereby authors of flaky tests can discover that their tests are flaky. Looking back at the history since this feature was enabled, it looks like you were CCed on 3 of the 4 bugs that encountered flaky tests. Here are the tests that flaked out: 1x http://trac.webkit.org/browser/trunk/LayoutTests/http/tests/appcache/404-man... 2x http://trac.webkit.org/browser/trunk/LayoutTests/http/tests/appcache/insert-... According to SVN, you did write both of these tests, so the tool is accurately computing the author. This triggering more often than we expected. I'm not sure whether that's a statistical aberration. Here's how we calculated how much traffic this tool would generate: According to webkit-patch find-flaky-tests, the flakiest test fails about 7 times per 2000 revisions, which means it fails for 0.3% of test runs. The commit-queue lands about 30 patches per day, so that means the author of the flakiest test should get CCed on about one bug every ten days. Also, these bugs are close to the end of their lifecycle (because their patch is about to land), so they shouldn't generate more than 3 or 4 emails each. That boils down to about one or two emails per week for the flakiest test. Now, that calculation is a very rough approximation, and we might have missed some important factors. We're certainly open to other suggestions for how to close the loop on flaky tests if this approach generates too much email. Adam