On Tue, Apr 10, 2012 at 6:10 AM, Stephen Chenney <schenney@chromium.org> wrote:

There is a significant practical problem to "turn the tree red and work with someone to rebaseline the tests". It takes multiple hours for some bots to build and test a given patch. That means, at any moment, you will have maybe tens and in some cases hundreds of failing tests associated with some changelist that you need to track on the bots. You might have more failing tests associated with a different changelist, and so on.

But you have to do this for non-Chromium ports anyway because they don't use test_expectations.txt and skipping the tests won't help you generate new baseline. In my opinion, we should not further diverge from the way things are done in other ports.

How do you propose to track which baselines have been computed by which bots for which changelists? In the past I have seen failing tests because baselines were checked in too soon and some bot had not generated new baselines. Things are even worse when a change goes in late in the day. How will the pending rebaselines be handed off to the new gardener?

You have to stick around as long as it takes to rebaseline or notify relevant port contributors after your patch lands:

http://www.webkit.org/coding/contributing.html clearly says

Keeping the tree green

In either case, your responsibility for the patch does not end with the patch landing in the tree. There may be regressions from your change or additional feedback from reviewers after the patch has landed. You can watch the tree at build.webkit.org to make sure your patch builds and passes tests on all platforms. It is your responsibility to be available should regressions arise and to respond to additional feedback that happens after a check-in.

Changes should succeed on all platforms, but it can be difficult to test on every platform WebKit supports. Be certain that your change does not introduce new test failures on the high-traffic Mac or Windows ports by comparing the list of failing tests before and after your change. Your change must at least compile on all platforms.

As the Chromium owner of tests that are frequently broken by third party contributors (they do their best to help us, but cannot generate all 15 or so Chromium expectations), I would much rather have a bug filed against me and a line in the expectations file. At least I then have visibility on what's happening and a chance to verify the results. In a recent pass to clean up expectations we found multiple cases where the wrong result had been committed because the person who did it failed to realize the result was wrong. While you might say "fix the test to make it obvious", it is not obvious how to do that for all tests.

This is an orthogonal issue to adding failing expectations prior to landing your patch. It's about how rebaseline should be done. I agree that rebaselining a test without understanding what the test intends to test or the correct output is bad. But let's not mangle such a discussion into this thread.

Why not simply attach an owner and a resolution date to each expectation? The real problem right now is accountability and a way to remind people that they have left expectations hanging.

That's what WebKit bugs are for. Ossy frequently files a bug and cc'es the patch author when a new test is added or a test starts failing and he doesn't know whether new result is correct or not. He also either skips the test or rebaseline the test as needed. He also reverts patches when the patch clearly introduced serious regressions (e.g. crashes on hundreds of tests).

- Ryosuke