[webkit-dev] handling failing tests (test_expectations, Skipped files, etc.)

Tue Apr 10 06:10:45 PDT 2012

There is a significant practical problem to "turn the tree red and work
with someone to rebaseline the tests". It takes multiple hours for some
bots to build and test a given patch. That means, at any moment, you will
have maybe tens and in some cases hundreds of failing tests associated with
some changelist that you need to track on the bots. You might have more
failing tests associated with a different changelist, and so on.

How do you propose to track which baselines have been computed by which
bots for which changelists? In the past I have seen failing tests because
baselines were checked in too soon and some bot had not generated new
baselines. Things are even worse when a change goes in late in the day. How
will the pending rebaselines be handed off to the new gardener?

As the Chromium owner of tests that are frequently broken by third party
contributors (they do their best to help us, but cannot generate all 15 or
so Chromium expectations), I would much rather have a bug filed against me
and a line in the expectations file. At least I then have visibility on
what's happening and a chance to verify the results. In a recent pass to
clean up expectations we found multiple cases where the wrong result had
been committed because the person who did it failed to realize the result
was wrong. While you might say "fix the test to make it obvious", it is not
obvious how to do that for all tests.

Why not simply attach an owner and a resolution date to each expectation?
The real problem right now is accountability and a way to remind people
that they have left expectations hanging.

Cheers,
Stephen.

On Mon, Apr 9, 2012 at 9:19 PM, Julien Chaffraix <jchaffraix at webkit.org>wrote:

> >>> If there's consensus in the mean time that it is better on balance to
> >>> check in suppressions, perhaps we can figure out a better way to do
> >>> that. Maybe (shudder) a second test_expectations file? Or maybe it
> >>> would be better to actually check in suppressions marked as REBASELINE
> >>> (or something like that)?
> >>
> >> That sounds quirky as it involves maintaining 2 sets of files.
> >>
> >> From my perspective, saying that we should discard the EWS result and
> >> allow changes to get in WebKit trunk, knowing they will turn the bots
> >> red, is a bad proposal regardless of how you justify it. In the small
> >> delta where the bots are red, you can bet people will miss something
> >> else that breaks.
> >>
> >
> > As Ryosuke points out, practically we're already in that situation -
> > from what I can tell, the tree is red virtually all of the time, at
> > least during US/Pacific working hours. It's not clear to me if the EWS
> > has made this better or worse, but perhaps others have noticed a
> > difference. That said, I doubt I like red trees any more than you do
> > :)
>
> I wasn't talking about the tree's status quo here as it shouldn't
> impact the discussion. Just because the tree is red, doesn't mean it's
> the right thing (tm) to just drop the towel and make it more red (even
> if we seem to agree on the badness of that :)).
>
> To add some thoughts here, saying that the tree is red covers several
> states (failing tests, not building...) and IMHO the EWS has at least
> helped on the building side. As far as the tests goes, a lot of
> platform differences are unfortunately uncovered on the bots.
>
> Thanks,
> Julien
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120410/f409b87e/attachment.html>