[webkit-dev] [PSA] WebKitGTK layout testers available on the Bugzilla EWS bubbles
Carlos Alberto Lopez Perez
clopez at igalia.com
Fri Dec 24 10:23:55 PST 2021
On 24/12/2021 15:00, Michael Catanzaro via webkit-dev wrote:
> On Fri, Dec 24 2021 at 12:44:49 AM +0000, Carlos Alberto Lopez Perez via
> webkit-dev <webkit-dev at lists.webkit.org> wrote:
>> So we ended deploying a different version of the EWS that has a much
>> higher tolerance to pre-existent failures (up to 500 before exiting
>> early) and also that tries hard to discard pre-existent failures and
>> flakies by repeating each failure 10 times with patch and 10 times
>> without it. 
> Mixed thoughts on this:
> (1) Good job. Having layout tests on EWS is a great improvement. We've
> been talking about this for a long time, and you finally made it happen!
> (2) That you needed to use such a big hammer to make the EWS work
> reliably suggests either that either WebKitGTK quality or WebKit test
> quality is quite low. I'm sure it's a mix of both, but mostly the
> former, because test flakiness is not this severe for Apple ports. This
> is not encouraging.
Sorry, but I don't agree with your conclusion about quality.
So, let me explain in more detail the factors that contribute to this
issue with the tests:
1) Number of unexpected failures on the clean tree
The higher number of unexpected failures on the clean tree is caused
mainly by the following reasons:
1.1) Until now we didn't have an EWS. So it was pretty hard (if not
impossible) for any developer to notice that the patch was going to
break GTK tests. This didn't helped to avoid breaking patches landing.
1.2) We don't have a rule to roll-back patches breaking GTK tests.
If a patch lands adding unexpected failures for GTK those usually
stay there until some of our gardeners have time to fix the issue
or mark the new failure as expected. Also having such rule wouldn't
have made sense before having an EWS that developers can use.
1.3) We don't have anyone working full-time doing gardening. We try
to share the effort between us on a best-effort basis. So unexpected
failures once landed can remain there for days until those are gardened.
1.4) Patches landing via commit-queue run layout test on Mac before
landing. So a patch won't land if it breaks layout tests on Mac.
But it will land anyway if it breaks tests on GTK.
2) Number of unexpected flaky tests
2.1) It is true that we do have a higher number of flaky tests compared
to Apple ports. But the flakiness issue is also a problem there.
It is not unusual to see the standard EWS giving false positives due
to some test being flaky.
2.2) I'm not sure if our higher number of flaky tests is caused by
some issue on the code of the port or is just that we don't have
enough manpower to be on top of flaky tests on a daily basis and
mark any detected flaky test as soon as it is detected.
And regarding quality or test quality:
3) Having the results of the layout tests "green" is not synonym of quality.
Layout tests giving a "green" or "red" result is not about passing or failing
the tests, is just about giving the "expected" result (which can be a failure).
A port can have lot of failures marked as "expected failure" or lot
of flaky tests marked as "expected flaky" and be more green than
other port that has less failures or less flaky tests but not marked.
If you want to compare the quality of the ports, then maybe something like
wpt.fyi  can be more useful than WebKit layout tests, because tests there
can't be "expected failures". So it will be only green if it passes the test.
And looking ahead to improve things:
4) I expect the number of unexpected failures in the clean tree to start
to be more controllable now that we have this EWS working an developers
can be notified in advance of a breaking change before landing.
5) The EWS also has now code to detect flaky tests when it does all those
runs and repeats, and is sending mails to the bot watchers with the names
of all the flaky tests that it detects. We will be gardening those with
the idea of reducing the number of unexpected flakies.
> (3) Any plans for WPE?
Yes. We look forward to add WPE testers as soon as possible. Hopefully it will happen in 2022-Q1.
Best regards and happy holidays!
More information about the webkit-dev