[webkit-dev] A simpler proposal for handling failing tests WAS: A proposal for handling "failing" layout tests and TestExpectations

Mon Aug 20 18:03:10 PDT 2012

Sorry, I overlooked these questions earlier.

On Aug 17, 2012, at 7:36 PM, Dirk Pranke <dpranke at chromium.org> wrote:

> I'm not sure if I like this idea or not. A couple of observations/questions ...
> 
> 1) I wouldn't want to call it '-correct' unless we were sure it was
> correct; '-previous' is better in that regard

Good point. Though I think -unexpected-pass might be even better.

> 
> 2) the issue with keeping a '-correct' in the tree is that it's quite
> possible for a previous correct expectation to need to change to a
> different expectation to still be correct. i.e., they get stale.

Sure, it's possible. At worst, when somebody goes to investigate expected failures, the old result will no longer be useful to them.

> I fear that this could quickly become more confusing than it's worth.
> It's also not clear to me when -previous gets updated or removed?

It would get removed if someone decides that -expected is actually better or no worse; or if someone fixes the bug that caused the expected failure in the first place, or if the result updates in such a way that neither result is relevant.

> 
> 3) It also feels like '-previous' is something that we can just as
> easily get from SVN/Git/whatever, in a completely foolproof and
> automatic way. I grant that it's easier to just do a side by side
> compare, but "diff against previous" isn't that hard to do and we
> could easily write a wrapper to make it trivial ...

True, but in addition to a convenience improvement , it would also leave a clear record of which tests are expected failures.

> 4) I'd want to work through the various branches in the workflow more
> before I felt comfortable with this. When I was coming up with my
> original proposal I originally wanted to allow -passing and -expected
> to live side-by-side, but all sorts of complications arose, so I'd be
> worried that we'd have them here, too.

Here's how I imagine the workflow when a sheriff or just innocent bystander notices a deterministically failing test. Follow this two-step algorithm:

1) Are you confident that the new result is an improvement or no worse? If so, then simply update -expected.txt.
2) Otherwise, copy the old result to -<whatever-we-call-the-unexpected-pass-result>.txt, and check in the new result as -<whatever-we-call-the-expected-failure-result.txt>.

This replaces all other approaches to marking expected failures, including the Skipped list, overwriting -expected even you know the result is a regression, marking the test in TestExpectations as Skip, Wontfix, Image, Text, or Text+Image, or any of the other legacy techniques for marking an expected failure reult.

Regards,
Maciej

> 
> -- Dirk
> 
> On Fri, Aug 17, 2012 at 6:51 PM, Ojan Vafai <ojan at chromium.org> wrote:
>> That matches my understanding. You proposed modification sounds fine to me.
>> 
>> On Fri, Aug 17, 2012 at 6:40 PM, Maciej Stachowiak <mjs at apple.com> wrote:
>>> 
>>> 
>>> My understanding of the current proposal is this:
>>> 
>>> 1) This applies to tests that fail deterministically, for reasons other
>>> than a crash or hang.
>>> 2) If the test has a new result that you're confident is a progression (or
>>> neither better or worse), you simply update the -expected.txt file.
>>> 3) If the test has a new result that you're confident is a regression, you
>>> delete the -expected.txt file and check in the new results as -failing.txt.
>>> 4) Ditto points 2 and 3 with respect to -expected.png, for image diffs.
>>> 5) We would stop using all other ways of marking tests that fail
>>> deterministically, including Skipped and the many things you could enter in
>>> TestExpectations.
>>> 
>>> Is that correct?
>>> 
>>> If so, I'd like to suggest a minor modification. In place of point 3
>>> above, I propose the following:
>>> 
>>> 3) If the test has a new result that you're confident is a regression, you
>>> rename the -expected.txt file to -previous.txt (or maybe -correct.txt or
>>> -pre-expected.txt or something) and check in the new results as
>>> -expected.txt (unless there is already a -previous.txt, in which case just
>>> update -expected and leave -previous).
>>> 
>>> I propose this for the following reasons:
>>> 
>>> - It maintains the longstanding approach that -expected.txt reflects what
>>> is currently *expected* to happen, not whether it is right or wrong in some
>>> abstract sense. It is an expectation, not a reference.
>>> - It still leaves a clear indication of tests that somebody needs to look
>>> at further, to determine if a regression occurred.
>>> - It leaves both old and new result in place for easy comparison by an
>>> expert.
>>> 
>>> Regards,
>>> Maciej
>>> 
>>> 
>>> On Aug 17, 2012, at 6:06 PM, Ojan Vafai <ojan at chromium.org> wrote:
>>> 
>>> +1
>>> 
>>> 
>>> On Fri, Aug 17, 2012 at 5:36 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>>>> 
>>>> That's my expectation although we probably can't do that for flaky tests
>>>> :(
>>>> 
>>>> e.g. sometimes fails with image diff.
>>>> 
>>>> On Fri, Aug 17, 2012 at 5:35 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>>>>> 
>>>>> +1, contingent upon the following: are we agreeing that all current uses
>>>>> of TEXT, IMAGE, and so forth in TestExpectations should be in the *very near
>>>>> term* following Dirk's change be turned into -failing files?
>>>>> 
>>>>> -Filip
>>>>> 
>>>>> 
>>>>> On Aug 17, 2012, at 5:01 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>>>>> 
>>>>> On Fri, Aug 17, 2012 at 4:55 PM, Ojan Vafai <ojan at chromium.org> wrote:
>>>>>> 
>>>>>> Asserting a test case is 100% correct is nearly impossible for a large
>>>>>> percentage of tests. The main advantage it gives us is the ability to have
>>>>>> -expected mean "unsure".
>>>>>> 
>>>>>> Lets instead only add -failing (i.e. no -passing). Leaving -expected to
>>>>>> mean roughly what it does today to Chromium folk (roughly, as best we can
>>>>>> tell this test is passing). -failing means it's *probably* an incorrect
>>>>>> result but needs an expert to look at it to either mark it correct (i.e.
>>>>>> rename it to -expected) or figure out how the root cause of the bug.
>>>>>> 
>>>>>> This actually matches exactly what Chromium gardeners do today, except
>>>>>> instead of putting a line in TestExpectations/Skipped to look at later, they
>>>>>> checkin the -failing file to look at later, which has all the advantages
>>>>>> Dirk listed in the other thread.
>>>>> 
>>>>> 
>>>>> I'm much more comfortable with this proposal.
>>>>> 
>>>>> - Ryosuke
>>>>> 
>>>>> _______________________________________________
>>>>> webkit-dev mailing list
>>>>> webkit-dev at lists.webkit.org
>>>>> http://lists.webkit.org/mailman/listinfo/webkit-dev
>>>>> 
>>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
>>> http://lists.webkit.org/mailman/listinfo/webkit-dev
>>> 
>>> 
>>