[webkit-dev] A proposal for handling "failing" layout tests and TestExpectations

Thu Aug 16 15:50:00 PDT 2012

On Thu, Aug 16, 2012 at 6:15 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:

> On Thu, Aug 16, 2012 at 3:04 PM, Dirk Pranke <dpranke at chromium.org> wrote:
>
>> On Thu, Aug 16, 2012 at 2:23 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>> > Like Filip, I'm extremely concerned about the prospect of us introducing
>> > yet-another-way-of-doing-things, and not be able to get rid of it later.
>>
>> Presumably the only way we'd be not able to get rid of it would be if
>> some port actually liked it, in which case, why would we get rid of it?
>>
>
> Because we already have too many ways to deal with test failures.
>
> Why don't we at least merge Skipped and TestExpectations first.
>
>
I agree with the priorities above, at least. I also agree with the overall
goal of making our implementation match our philosophy on testing.

Ryosuke has raised a very valid point: it is not clear what a Gardener
should do with a test that has changed behavior. Personally, the area I
deal with is very susceptible to the "single pixel differences matter"
issue, and I would prefer gardeners to flag updates in some way so that I
can look at it myself (or one of the very few other experts can look at
it). Maybe I'm just a control freak.

In the current situation, gardeners are not likely to roll out a patch
short of build failures, obvious test or perf regressions, or security
issues. For the bulk of cases where a test result has changed, the gardener
will either add it to expectations or go ahead and update the result.

The result is a mixture of incorrect expected results and files listed in
TestExpectations that may or may not be correct. The way I deal with this
right now is to maintain a snapshot of TestExpectations and occasionally go
through and look for newly added results, and check if they are in fact
correct. And every now and then go look at older entries to see if they
have been updated. Or I get lucky and notice that a new expected result has
been checked in that is incorrect (I can mostly catch that by following
checkins in the code).

The proposed change does not alter this fundamental dynamic. Depending on
what policy gardeners adopt, they might now rename the result as failing
and remove the expected, or maybe they'll just update the expected. I'm
still in a situation where I don't know the exact status of any given
result, and where I have no easy way of knowing if a gardener has correctly
updated an expected or failing image.

Two questions come out of this:
1) What policy do we want gardeners to follow? Giving "one more option" is
not helpful as far as I can tell. The new method does nothing that
TestExpectations could not already do, so I suppose the goal is to not put
known failing tests in expectations.
2) How will it be simpler to know if something is really failing, really
correct, or whether a non-expert gardener made an understandable mistake?

Personally, I don't much care if we go to failing and remove all
just-plain-failing tests from TestExpectations provided gardeners know what
to do. What I really want is "unconfirmed" or some way for a gardener to
say "I changed a result to turn the bots green, but I don't know what I'm
doing and need an expert".

On top of that we would need a nagging system to make experts look at
unconfirmed results. We already have such a nagging system, thanks to Ojan.

My goal would also be met by the ability to easily track all expected
results changes and TestExpectation updates for a subset of tests.

Cheers,
Stephen.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120816/a32a2341/attachment.html>