Closing the loop on flaky tests (was Re: Flaky test hit list)
On Tue, Oct 19, 2010 at 8:42 AM, Alexey Proskuryakov <ap@webkit.org> wrote:
15.10.2010, в 07:39, Eric Seidel написал(а):
BTW, the commit-queue has started complaining publicly about flaky tests:
https://bugs.webkit.org/show_bug.cgi?id=47698#c5
Hopefully this will bring further awareness to the issue.
I find this extremely annoying and offensive. Half of my bugmail is already about bugs that I'm not interested in.
Sorry Alexey, I certainly didn't intend to offend you. The problem we're trying to solve is currently there is no feedback loop for authors of flaky tests. If someone writes a flaky test, there's no mechanism for them to find out about it. It just sticks around and causes pain for everyone else. The idea behind this change is to create a feedback loop whereby authors of flaky tests can discover that their tests are flaky. Looking back at the history since this feature was enabled, it looks like you were CCed on 3 of the 4 bugs that encountered flaky tests. Here are the tests that flaked out: 1x http://trac.webkit.org/browser/trunk/LayoutTests/http/tests/appcache/404-man... 2x http://trac.webkit.org/browser/trunk/LayoutTests/http/tests/appcache/insert-... According to SVN, you did write both of these tests, so the tool is accurately computing the author. This triggering more often than we expected. I'm not sure whether that's a statistical aberration. Here's how we calculated how much traffic this tool would generate: According to webkit-patch find-flaky-tests, the flakiest test fails about 7 times per 2000 revisions, which means it fails for 0.3% of test runs. The commit-queue lands about 30 patches per day, so that means the author of the flakiest test should get CCed on about one bug every ten days. Also, these bugs are close to the end of their lifecycle (because their patch is about to land), so they shouldn't generate more than 3 or 4 emails each. That boils down to about one or two emails per week for the flakiest test. Now, that calculation is a very rough approximation, and we might have missed some important factors. We're certainly open to other suggestions for how to close the loop on flaky tests if this approach generates too much email. Adam
19.10.2010, в 11:16, Adam Barth написал(а):
Also, these bugs are close to the end of their lifecycle (because their patch is about to land), so they shouldn't generate more than 3 or 4 emails each. That boils down to about one or two emails per week for the flakiest test.
One e-mail (per week?) would perhaps make sense, even though "flaky test" is sometimes "flaky code", so the blame becomes misplaced. Getting 3-4 automated e-mails per bug seems over the board. I agree that raising awareness of which tests or code areas are flaky seems useful. One problem I personally had was with digging up data on flakiness. The link for a dashboard that I found was <http://test-results.appspot.com/dashboards/flakiness_dashboard.html> - the URL was freezing my browser for several minutes on each move, and I couldn't make sense of what it was telling me UI-wise quickly enough. I'm not even sure how it's related to flakiness seen by commit queue, as it seems to be about chromium. Is there a better data source that I missed? - WBR, Alexey Proskuryakov
On Tue, Oct 19, 2010 at 11:44 AM, Alexey Proskuryakov <ap@webkit.org> wrote:
I agree that raising awareness of which tests or code areas are flaky seems useful. One problem I personally had was with digging up data on flakiness. The link for a dashboard that I found was < http://test-results.appspot.com/dashboards/flakiness_dashboard.html> - the URL was freezing my browser for several minutes on each move, and I couldn't make sense of what it was telling me UI-wise quickly enough. I'm not even sure how it's related to flakiness seen by commit queue, as it seems to be about chromium.
That dashboard currently only supports the Chromium bots. If other bots successfully switch over to new-run-webkit-tests, we'll be able to easily add them to that dashboard. The freezing issue is a recent one I plan on looking into soon. WebKit is ridiculously slow at rendering this HTML for some reason (it's a single large table). The UI is very dense and confusing, but it gives you quite a bit of useful information. Here's some limited documentation on making sense of the dashboard UI: http://sites.google.com/a/chromium.org/dev/developers/testing/flakiness-dash... Ojan
webkit-patch find-flaky-tests can also show you what tests are recently flaky, but its not as nice as the dashboard. -eric On Tue, Oct 19, 2010 at 12:06 PM, Ojan Vafai <ojan@chromium.org> wrote:
On Tue, Oct 19, 2010 at 11:44 AM, Alexey Proskuryakov <ap@webkit.org> wrote:
I agree that raising awareness of which tests or code areas are flaky seems useful. One problem I personally had was with digging up data on flakiness. The link for a dashboard that I found was <http://test-results.appspot.com/dashboards/flakiness_dashboard.html> - the URL was freezing my browser for several minutes on each move, and I couldn't make sense of what it was telling me UI-wise quickly enough. I'm not even sure how it's related to flakiness seen by commit queue, as it seems to be about chromium.
That dashboard currently only supports the Chromium bots. If other bots successfully switch over to new-run-webkit-tests, we'll be able to easily add them to that dashboard. The freezing issue is a recent one I plan on looking into soon. WebKit is ridiculously slow at rendering this HTML for some reason (it's a single large table). The UI is very dense and confusing, but it gives you quite a bit of useful information. Here's some limited documentation on making sense of the dashboard UI: http://sites.google.com/a/chromium.org/dev/developers/testing/flakiness-dash... Ojan _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
On Tue, Oct 19, 2010 at 11:44 AM, Alexey Proskuryakov <ap@webkit.org> wrote:
19.10.2010, в 11:16, Adam Barth написал(а):
Also, these bugs are close to the end of their lifecycle (because their patch is about to land), so they shouldn't generate more than 3 or 4 emails each. That boils down to about one or two emails per week for the flakiest test.
One e-mail (per week?) would perhaps make sense, even though "flaky test" is sometimes "flaky code", so the blame becomes misplaced. Getting 3-4 automated e-mails per bug seems over the board.
Maybe the thing to do is CC the author of the flaky test for the one bug comment and then immediately unCC them. That way they don't see the rest of the traffic on the bug. Adam
I agree that raising awareness of which tests or code areas are flaky seems useful. One problem I personally had was with digging up data on flakiness. The link for a dashboard that I found was <http://test-results.appspot.com/dashboards/flakiness_dashboard.html> - the URL was freezing my browser for several minutes on each move, and I couldn't make sense of what it was telling me UI-wise quickly enough. I'm not even sure how it's related to flakiness seen by commit queue, as it seems to be about chromium.
Is there a better data source that I missed?
- WBR, Alexey Proskuryakov
19.10.2010, в 12:33, Adam Barth написал(а):
Maybe the thing to do is CC the author of the flaky test for the one bug comment and then immediately unCC them. That way they don't see the rest of the traffic on the bug.
That would still be two e-mails about a bug the person otherwise doesn't want to know about. I don't think that CC'ing is the right approach. - WBR, Alexey Proskuryakov
On Tue, Oct 19, 2010 at 12:41 PM, Alexey Proskuryakov <ap@webkit.org> wrote:
19.10.2010, в 12:33, Adam Barth написал(а):
Maybe the thing to do is CC the author of the flaky test for the one bug comment and then immediately unCC them. That way they don't see the rest of the traffic on the bug.
That would still be two e-mails about a bug the person otherwise doesn't want to know about. I don't think that CC'ing is the right approach.
Do you see changes to bugs when you get removed from the CC? Do you have another suggestion for how to providing feedback to authors of flaky tests? Adam
On Tue, Oct 19, 2010 at 1:30 PM, Adam Barth <abarth@webkit.org> wrote:
On Tue, Oct 19, 2010 at 12:41 PM, Alexey Proskuryakov <ap@webkit.org> wrote:
19.10.2010, в 12:33, Adam Barth написал(а):
Maybe the thing to do is CC the author of the flaky test for the one bug comment and then immediately unCC them. That way they don't see the rest of the traffic on the bug.
That would still be two e-mails about a bug the person otherwise doesn't want to know about. I don't think that CC'ing is the right approach.
Do you see changes to bugs when you get removed from the CC? Do you have another suggestion for how to providing feedback to authors of flaky tests?
Email the author directly? Doesn't need to go through bugs.webkit.org, does it?
On Oct 19, 2010, at 1:30 PM, Adam Barth wrote:
On Tue, Oct 19, 2010 at 12:41 PM, Alexey Proskuryakov <ap@webkit.org> wrote:
19.10.2010, в 12:33, Adam Barth написал(а):
Maybe the thing to do is CC the author of the flaky test for the one bug comment and then immediately unCC them. That way they don't see the rest of the traffic on the bug.
That would still be two e-mails about a bug the person otherwise doesn't want to know about. I don't think that CC'ing is the right approach.
Do you see changes to bugs when you get removed from the CC? Do you have another suggestion for how to providing feedback to authors of flaky tests?
It looks like the bot is adding a comment to the bug with the patch that was being processed when flakiness was detected, not the one that originally landed the tests believed to be flaky. Is that right? If so, that doesn't seem like a great way to notify the author of the original test. It seems like it would be better to comment in the bug that added the test. To be fair, it's also possible that the new patch caused the flakiness, so a separate comment there could be useful. Perhaps it would be useful to determine if the test in question has a track record of flakiness. If not, then maybe the presumption should be that the patch is the problem, not the test. On the other hand, if the test has always been flaky, then the new patch probably has nothing to do with it. Regards, Maciej
On Tue, Oct 19, 2010 at 1:45 PM, Maciej Stachowiak <mjs@apple.com> wrote:
On Oct 19, 2010, at 1:30 PM, Adam Barth wrote:
On Tue, Oct 19, 2010 at 12:41 PM, Alexey Proskuryakov <ap@webkit.org> wrote:
19.10.2010, в 12:33, Adam Barth написал(а):
Maybe the thing to do is CC the author of the flaky test for the one bug comment and then immediately unCC them. That way they don't see the rest of the traffic on the bug.
That would still be two e-mails about a bug the person otherwise doesn't want to know about. I don't think that CC'ing is the right approach.
Do you see changes to bugs when you get removed from the CC? Do you have another suggestion for how to providing feedback to authors of flaky tests?
It looks like the bot is adding a comment to the bug with the patch that was being processed when flakiness was detected, not the one that originally landed the tests believed to be flaky. Is that right? If so, that doesn't seem like a great way to notify the author of the original test. It seems like it would be better to comment in the bug that added the test. To be fair, it's also possible that the new patch caused the flakiness, so a separate comment there could be useful. Perhaps it would be useful to determine if the test in question has a track record of flakiness. If not, then maybe the presumption should be that the patch is the problem, not the test. On the other hand, if the test has always been flaky, then the new patch probably has nothing to do with it.
Another option is to file a new bug about the flakiness and ping that bug when we observe the test flake out. Adam
On Tue, Oct 19, 2010 at 1:51 PM, Adam Barth <abarth@webkit.org> wrote:
Another option is to file a new bug about the flakiness and ping that bug when we observe the test flake out.
I've considered this before. We'd have to write a bit of bugzilla.py code to make this work though. :) That's probably the best long term solution. We could then add links to these bugs in our "sorry we're slow, tests are flaky" messages too. -eric
I'm still getting CC'ed by commit queue. Any objections to removing Bugzilla editbugs privilege from commit-queue until this is resolved? --- Comment #10 from WebKit Commit Bot <commit-queue@webkit.org> 2010-10-20 17:01:11 PST --- The commit-queue encountered the following flaky tests while processing attachment 71284: transitions/transition-end-event-transform.html http/tests/appcache/fail-on-update-2.html Please file bugs against the tests. The author(s) of the test(s) have been CCed on this bug. The commit-queue is continuing to process your patch. - WBR, Alexey Proskuryakov
I'll take care of it tonight. Adam On Wed, Oct 20, 2010 at 5:09 PM, Alexey Proskuryakov <ap@webkit.org> wrote:
I'm still getting CC'ed by commit queue. Any objections to removing Bugzilla editbugs privilege from commit-queue until this is resolved?
--- Comment #10 from WebKit Commit Bot <commit-queue@webkit.org> 2010-10-20 17:01:11 PST --- The commit-queue encountered the following flaky tests while processing attachment 71284:
transitions/transition-end-event-transform.html http/tests/appcache/fail-on-update-2.html
Please file bugs against the tests. The author(s) of the test(s) have been CCed on this bug. The commit-queue is continuing to process your patch.
- WBR, Alexey Proskuryakov
Thank you Adam. We should look into more options here, including some of the ones Maciej proposed. Alternatively, we could have just skipped the tests which are flaking so much. Alexey seemed to get CC'd on nearly every flake because they seemed to mostly be tests he wrote. :( -eric On Wed, Oct 20, 2010 at 6:01 PM, Adam Barth <abarth@webkit.org> wrote:
I'll take care of it tonight.
Adam
On Wed, Oct 20, 2010 at 5:09 PM, Alexey Proskuryakov <ap@webkit.org> wrote:
I'm still getting CC'ed by commit queue. Any objections to removing Bugzilla editbugs privilege from commit-queue until this is resolved?
--- Comment #10 from WebKit Commit Bot <commit-queue@webkit.org> 2010-10-20 17:01:11 PST --- The commit-queue encountered the following flaky tests while processing attachment 71284:
transitions/transition-end-event-transform.html http/tests/appcache/fail-on-update-2.html
Please file bugs against the tests. The author(s) of the test(s) have been CCed on this bug. The commit-queue is continuing to process your patch.
- WBR, Alexey Proskuryakov
participants (5)
-
Adam Barth
-
Alexey Proskuryakov
-
Eric Seidel
-
Maciej Stachowiak
-
Ojan Vafai