[webkit-dev] A simpler proposal for handling failing tests WAS: A proposal for handling "failing" layout tests and TestExpectations

Fri Aug 17 18:40:19 PDT 2012

My understanding of the current proposal is this:

1) This applies to tests that fail deterministically, for reasons other than a crash or hang.
2) If the test has a new result that you're confident is a progression (or neither better or worse), you simply update the -expected.txt file.
3) If the test has a new result that you're confident is a regression, you delete the -expected.txt file and check in the new results as -failing.txt.
4) Ditto points 2 and 3 with respect to -expected.png, for image diffs.
5) We would stop using all other ways of marking tests that fail deterministically, including Skipped and the many things you could enter in TestExpectations.

Is that correct?

If so, I'd like to suggest a minor modification. In place of point 3 above, I propose the following:

3) If the test has a new result that you're confident is a regression, you rename the -expected.txt file to -previous.txt (or maybe -correct.txt or -pre-expected.txt or something) and check in the new results as -expected.txt (unless there is already a -previous.txt, in which case just update -expected and leave -previous).

I propose this for the following reasons:

- It maintains the longstanding approach that -expected.txt reflects what is currently *expected* to happen, not whether it is right or wrong in some abstract sense. It is an expectation, not a reference.
- It still leaves a clear indication of tests that somebody needs to look at further, to determine if a regression occurred.
- It leaves both old and new result in place for easy comparison by an expert.

Regards,
Maciej

On Aug 17, 2012, at 6:06 PM, Ojan Vafai <ojan at chromium.org> wrote:

> +1
> 
> 
> On Fri, Aug 17, 2012 at 5:36 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
> That's my expectation although we probably can't do that for flaky tests :(
> 
> e.g. sometimes fails with image diff.
> 
> On Fri, Aug 17, 2012 at 5:35 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> +1, contingent upon the following: are we agreeing that all current uses of TEXT, IMAGE, and so forth in TestExpectations should be in the *very near term* following Dirk's change be turned into -failing files?
> 
> -Filip
> 
> 
> On Aug 17, 2012, at 5:01 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
> 
>> On Fri, Aug 17, 2012 at 4:55 PM, Ojan Vafai <ojan at chromium.org> wrote:
>> Asserting a test case is 100% correct is nearly impossible for a large percentage of tests. The main advantage it gives us is the ability to have -expected mean "unsure".
>> 
>> Lets instead only add -failing (i.e. no -passing). Leaving -expected to mean roughly what it does today to Chromium folk (roughly, as best we can tell this test is passing). -failing means it's *probably* an incorrect result but needs an expert to look at it to either mark it correct (i.e. rename it to -expected) or figure out how the root cause of the bug.
>> 
>> This actually matches exactly what Chromium gardeners do today, except instead of putting a line in TestExpectations/Skipped to look at later, they checkin the -failing file to look at later, which has all the advantages Dirk listed in the other thread.
>> 
>> I'm much more comfortable with this proposal.
>> 
>> - Ryosuke
>> 
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo/webkit-dev
> 
> 
> 
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo/webkit-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120817/d71b2ac5/attachment.html>