[webkit-dev] Requesting feedback about EWS comments on Bugzilla bugs
jbedard at apple.com
Wed Jun 19 08:14:14 PDT 2019
To elaborate a little on Aakash’s comments here:
We’ve found that tracking results for the number of tests we have, on the number of configurations we test with the number of weekly commits to the WebKit project is actually not a trivial problem to solve. The naive SQL approach to this problem isn’t performant. Over the last year, I’ve developed a solution to this problem, and we’ve been reporting a subset of results to it for some time (<> is when the reporting started)
The plan is to commit this to WebKit in the next few weeks, then, as Aakash mentioned, have EWS query this service to determine whether a specific test is failing more generally, or if the failure is specific to the engineer’s patch.
Keith’s suggestion to add a ‘frictionless process in place to address test results’ points to a larger discussion I’ve been having with Aakash and Ryan about how to handle transient test failures. Right now, we use the same mechanism to handle tests which are not supported on a specific configuration and tests which are failing on a specific configuration. This pollutes our changelings with thousands of test gardening commits, (I grepped for ‘garden’ in our LayoutTest changelings, and found more than 8000 results) is error-prone and means that knowledge about a failure caused by the configuration may be lost if you’re bisecting. I have a number of ideas for improving this, but none of them are coherent enough to be proposed at the moment. I’m only mentioning this here because it seems like others are thinking about this problem too.
> On Jun 18, 2019, at 2:11 PM, Aakash Jain <aakash_jain at apple.com> wrote:
>> On Jun 17, 2019, at 1:52 PM, Keith Rollin <krollin at apple.com> wrote:
>>> On Jun 16, 2019, at 11:14, Darin Adler <darin at apple.com> wrote:
>>> If we want to augment it, we should think of what we are aiming at. I do find it useful to see which tests are failing, and when I click on the red bubble I don’t see that information. I have to click once to see the “log of activities” then click on “results”, then see a confusing giant file with lots of other information. At the bottom of that file the one thing I want to know.
>> We might want to also start turning those failure into action items. We could have an automatic mechanism that gathers the failures, records them in a database, and then — with sufficient data — makes determinations about the flakiness or other status of the test. It could then mark the test as flaky or raise it as an issue to some responsible (and responsive) party.
> Agree. This is the plan. Jonathan Bedard is working on an improved flakiness dashboard. Once we have that, EWS will start using it's API to get the test flakiness information. That should significantly reduce EWS's false positives (and also reduce the number of retries EWS has to do while trying to rule out flakiness).
>> We could also have a relatively manual process. The failures are surfaced in Bugzilla or in a Bugzilla-accessible page. The engineer posting the patch could then review the failures and mark them as “Flag as flaky”, “Flag as failing and should be fixed by someone else”, “Flag as failing and should be ignored”, etc. These responses could then be turned into action items for some responsible (and responsive) party to address.
>> As Michael says, there’s a big issue with ignoring test results. Putting a frictionless process in place to address test results would help make them more effective. When I make a change to an Xcode project and Windows builds throw up errors, that’s not something caused by my immediate patch, but I would like to see the flaky test fixed.
>> — Keith
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
More information about the webkit-dev