[webkit-dev] handling failing tests (test_expectations, Skipped files, etc.)

Tue Apr 10 12:33:50 PDT 2012

I agree with Ojan. It's clear that there are arguments for both
approaches and my initial note did not address all the situations that
come up. I will write up something further and put it on the wiki.

I will also continue mulling over what sorts of changes to the tools
we could do in the short term to make things better.

-- Dirk

On Tue, Apr 10, 2012 at 12:29 PM, Ojan Vafai <ojan at chromium.org> wrote:
> I don't think we can come up with a hard and fast rule given current
> tooling. In a theoretical future world in which it's easy to get expected
> results off the EWS bots (or some other infrastructure), it would be
> reasonable to expect people to incorporate the correct expected results for
> any EWS-having ports before committing the patch. I expect we'd all agree
> that would be better than turning the bots red or adding to
> test_expectations.txt/Skipped files.
>
> In the current world, it's a judgement call. If I expect a patch to need a
> lot of platform-specific baselines, I'll make sure to commit it at a time
> when I have hours to spare to cleanup any failures or, if I can't stick
> around for the bots to cycle, I'll add it to test_expectations.txt
> appropriately.
>
> Both approaches have nasty tradeoffs. It is probably worth writing up a wiki
> page outlining these two options and explaining why you might do one or the
> other for people new to the project, but I don't see benefit in trying to
> pick a hard rule that everyone must follow.
>
> Ojan
>
> On Tue, Apr 10, 2012 at 11:58 AM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>>
>> On Tue, Apr 10, 2012 at 11:42 AM, Stephen Chenney <schenney at chromium.org>
>> wrote:
>>>
>>> On Tue, Apr 10, 2012 at 1:00 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>>>>
>>>> On Tue, Apr 10, 2012 at 6:10 AM, Stephen Chenney <schenney at chromium.org>
>>>> wrote:
>>>>>
>>>>> There is a significant practical problem to "turn the tree red and work
>>>>> with someone to rebaseline the tests". It takes multiple hours for some bots
>>>>> to build and test a given patch. That means, at any moment, you will have
>>>>> maybe tens and in some cases hundreds of failing tests associated with some
>>>>> changelist that you need to track on the bots. You might have more failing
>>>>> tests associated with a different changelist, and so on.
>>>>
>>>>
>>>> But you have to do this for non-Chromium ports anyway because they don't
>>>> use test_expectations.txt and skipping the tests won't help you generate new
>>>> baseline. In my opinion, we should not further diverge from the way things
>>>> are done in other ports.
>>>
>>>
>>> How long on average does it take a builder to get through a change on
>>> another port? Right now the Chromium Mac 10.5 and 10.6 dbg builds are
>>> processing a patch from about 3 hours ago. About 20 patches have gone in
>>> since then. For the Mac 10.5 tree to ever be green would require there being
>>> no changes at all requiring new baselines for a 3 hour window.
>>>
>>> Just because other teams do it some way does not mean that Chromium, with
>>> it's greater number of bots and platforms, should do it the same way.
>>
>>
>> Yes, it does mean that we should do it the same way. What if non-Chromium
>> ports started imposing arbitrary processes like this on the rest of us?
>> It'll be a total chaos, and nobody would understand the right thing to do
>> for all ports.
>>
>>>
>>> We are discussing a process here, not code, and in my mind the goal is to
>>> have the tree be as green as possible with all failures tracked with a
>>> "minimal" expectations file and as little engineer time as possible.
>>
>>
>> That's not our project goal. We have continuous builds and regression
>> tests to prevent regressions to improve the stability, not to keep bots
>> green. Please review http://www.webkit.org/projects/goals.html
>>
>>> Just look at how often the non-chromium mac and win builds are red. In
>>> particular, changes submitted via the commit queue take an indeterminate
>>> amount of time to go in, anything from an hour to several hours. Patch
>>> authors do not necessarily even have control over when the CQ+ is given.
>>
>>
>> That's why I don't use commit queue when I know my patch requires
>> platform-dependent rebaselines.
>>
>>>
>>> Even when manually committing, if it takes 3 hours to create baselines
>>> then no patches go in in the afternoon. What if the bots are down or
>>> misbehaving?
>>
>>
>> We need to promptly fix those bots.
>>
>>> I would also point out the waste of resources when every contributor
>>> needs to track every failure around commit time in order to know when their
>>> own changes cause failures, and then track the bots to know when they are
>>> free to go home.
>>
>>
>> But that's clearly stated in the contribution guide line.
>>
>>>>> Why not simply attach an owner and a resolution date to each
>>>>> expectation? The real problem right now is accountability and a way to
>>>>> remind people that they have left expectations hanging.
>>>>
>>>>
>>>> That's what WebKit bugs are for. Ossy frequently files a bug and cc'es
>>>> the patch author when a new test is added or a test starts failing and he
>>>> doesn't know whether new result is correct or not. He also either skips the
>>>> test or rebaseline the test as needed. He also reverts patches when the
>>>> patch clearly introduced serious regressions (e.g. crashes on hundreds of
>>>> tests).
>>>
>>>
>>> Yes, Ossy does an excellent job of gardening. Unfortunately, on Chrome we
>>> have tens if not hundreds of gardeners and, as this thread has revealed, no
>>> clear agreement on the best way to garden.
>>
>>
>> That IS the problem. We have too many in-experiented gardeners that don't
>> understand the WebKit culture or the WebKit process.
>>
>>> I strongly believe that keeping the tree green is more important than
>>> having a clean expectations file.
>>
>>
>> I disagree. You're effectively just disabling the test temporarily.
>>
>>> Finally, there is no pain free way to do this. The question is how to
>>> distribute the pain. Right now each gardening is using a process that
>>> distributes pain in their preferred way. From a community standpoint it
>>> would be nice if the Chromium team could come up with something consistent.
>>
>>
>> The process Chromium port uses should be consistent with non-Chromium
>> ports.
>>
>> - Ryosuke
>>
>>
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>>
>
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>