[webkit-dev] handling failing tests (test_expectations, Skipped files, etc.)

Tue Apr 10 10:00:55 PDT 2012

On Tue, Apr 10, 2012 at 6:10 AM, Stephen Chenney <schenney at chromium.org>wrote:

> There is a significant practical problem to "turn the tree red and work
> with someone to rebaseline the tests". It takes multiple hours for some
> bots to build and test a given patch. That means, at any moment, you will
> have maybe tens and in some cases hundreds of failing tests associated with
> some changelist that you need to track on the bots. You might have more
> failing tests associated with a different changelist, and so on.

But you have to do this for non-Chromium ports anyway because they don't
use test_expectations.txt and skipping the tests won't help you generate
new baseline. In my opinion, we should not further diverge from the way
things are done in other ports.

How do you propose to track which baselines have been computed by which
> bots for which changelists? In the past I have seen failing tests because
> baselines were checked in too soon and some bot had not generated new
> baselines. Things are even worse when a change goes in late in the day. How
> will the pending rebaselines be handed off to the new gardener?
>

You have to stick around as long as it takes to rebaseline or notify
relevant port contributors after your patch lands:

http://www.webkit.org/coding/contributing.html clearly says
Keeping the tree green

In either case, your responsibility for the patch does not end with the
patch landing in the tree. There may be regressions from your change or
additional feedback from reviewers after the patch has landed. You can
watch the tree at build.webkit.org to make sure your patch builds and
passes tests on all platforms. It is your responsibility to be available
should regressions arise and to respond to additional feedback that happens
after a check-in.

Changes should succeed on all platforms, but it can be difficult to test on
every platform WebKit supports. Be certain that your change does not
introduce new test failures on the high-traffic Mac or Windows ports by
comparing the list of failing tests before and after your change. Your
change must at least compile on all platforms.

> As the Chromium owner of tests that are frequently broken by third party
> contributors (they do their best to help us, but cannot generate all 15 or
> so Chromium expectations), I would much rather have a bug filed against me
> and a line in the expectations file. At least I then have visibility on
> what's happening and a chance to verify the results. In a recent pass to
> clean up expectations we found multiple cases where the wrong result had
> been committed because the person who did it failed to realize the result
> was wrong. While you might say "fix the test to make it obvious", it is not
> obvious how to do that for all tests.
>

This is an orthogonal issue to adding failing expectations prior to landing
your patch. It's about how rebaseline should be done. I agree that
rebaselining a test without understanding what the test intends to test or
the correct output is bad. But let's not mangle such a discussion into this
thread.

Why not simply attach an owner and a resolution date to each expectation?
> The real problem right now is accountability and a way to remind people
> that they have left expectations hanging.
>

That's what WebKit bugs are for. Ossy frequently files a bug and cc'es the
patch author when a new test is added or a test starts failing and he
doesn't know whether new result is correct or not. He also either skips the
test or rebaseline the test as needed. He also reverts patches when the
patch clearly introduced serious regressions (e.g. crashes on hundreds of
tests).

- Ryosuke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120410/4a6b8240/attachment.html>