[webkit-dev] A simpler proposal for handling failing tests WAS: A proposal for handling "failing" layout tests and TestExpectations

Filip Pizlo fpizlo at apple.com
Fri Aug 17 20:02:13 PDT 2012


So this is down to expected/failing and expected/previous?

I must say that expected/failing feels less confusing. Easier to remember if I have to quickly recall what it means. 

-Filip

On Aug 17, 2012, at 7:36 PM, Dirk Pranke <dpranke at chromium.org> wrote:

> I'm not sure if I like this idea or not. A couple of observations/questions ...
> 
> 1) I wouldn't want to call it '-correct' unless we were sure it was
> correct; '-previous' is better in that regard
> 
> 2) the issue with keeping a '-correct' in the tree is that it's quite
> possible for a previous correct expectation to need to change to a
> different expectation to still be correct. i.e., they get stale. I
> fear that this could quickly become more confusing than it's worth.
> It's also not clear to me when -previous gets updated or removed?
> 
> 3) It also feels like '-previous' is something that we can just as
> easily get from SVN/Git/whatever, in a completely foolproof and
> automatic way. I grant that it's easier to just do a side by side
> compare, but "diff against previous" isn't that hard to do and we
> could easily write a wrapper to make it trivial ...
> 
> 4) I'd want to work through the various branches in the workflow more
> before I felt comfortable with this. When I was coming up with my
> original proposal I originally wanted to allow -passing and -expected
> to live side-by-side, but all sorts of complications arose, so I'd be
> worried that we'd have them here, too.
> 
> -- Dirk
> 
> On Fri, Aug 17, 2012 at 6:51 PM, Ojan Vafai <ojan at chromium.org> wrote:
>> That matches my understanding. You proposed modification sounds fine to me.
>> 
>> On Fri, Aug 17, 2012 at 6:40 PM, Maciej Stachowiak <mjs at apple.com> wrote:
>>> 
>>> 
>>> My understanding of the current proposal is this:
>>> 
>>> 1) This applies to tests that fail deterministically, for reasons other
>>> than a crash or hang.
>>> 2) If the test has a new result that you're confident is a progression (or
>>> neither better or worse), you simply update the -expected.txt file.
>>> 3) If the test has a new result that you're confident is a regression, you
>>> delete the -expected.txt file and check in the new results as -failing.txt.
>>> 4) Ditto points 2 and 3 with respect to -expected.png, for image diffs.
>>> 5) We would stop using all other ways of marking tests that fail
>>> deterministically, including Skipped and the many things you could enter in
>>> TestExpectations.
>>> 
>>> Is that correct?
>>> 
>>> If so, I'd like to suggest a minor modification. In place of point 3
>>> above, I propose the following:
>>> 
>>> 3) If the test has a new result that you're confident is a regression, you
>>> rename the -expected.txt file to -previous.txt (or maybe -correct.txt or
>>> -pre-expected.txt or something) and check in the new results as
>>> -expected.txt (unless there is already a -previous.txt, in which case just
>>> update -expected and leave -previous).
>>> 
>>> I propose this for the following reasons:
>>> 
>>> - It maintains the longstanding approach that -expected.txt reflects what
>>> is currently *expected* to happen, not whether it is right or wrong in some
>>> abstract sense. It is an expectation, not a reference.
>>> - It still leaves a clear indication of tests that somebody needs to look
>>> at further, to determine if a regression occurred.
>>> - It leaves both old and new result in place for easy comparison by an
>>> expert.
>>> 
>>> Regards,
>>> Maciej
>>> 
>>> 
>>> On Aug 17, 2012, at 6:06 PM, Ojan Vafai <ojan at chromium.org> wrote:
>>> 
>>> +1
>>> 
>>> 
>>> On Fri, Aug 17, 2012 at 5:36 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>>>> 
>>>> That's my expectation although we probably can't do that for flaky tests
>>>> :(
>>>> 
>>>> e.g. sometimes fails with image diff.
>>>> 
>>>> On Fri, Aug 17, 2012 at 5:35 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>>>>> 
>>>>> +1, contingent upon the following: are we agreeing that all current uses
>>>>> of TEXT, IMAGE, and so forth in TestExpectations should be in the *very near
>>>>> term* following Dirk's change be turned into -failing files?
>>>>> 
>>>>> -Filip
>>>>> 
>>>>> 
>>>>> On Aug 17, 2012, at 5:01 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>>>>> 
>>>>> On Fri, Aug 17, 2012 at 4:55 PM, Ojan Vafai <ojan at chromium.org> wrote:
>>>>>> 
>>>>>> Asserting a test case is 100% correct is nearly impossible for a large
>>>>>> percentage of tests. The main advantage it gives us is the ability to have
>>>>>> -expected mean "unsure".
>>>>>> 
>>>>>> Lets instead only add -failing (i.e. no -passing). Leaving -expected to
>>>>>> mean roughly what it does today to Chromium folk (roughly, as best we can
>>>>>> tell this test is passing). -failing means it's *probably* an incorrect
>>>>>> result but needs an expert to look at it to either mark it correct (i.e.
>>>>>> rename it to -expected) or figure out how the root cause of the bug.
>>>>>> 
>>>>>> This actually matches exactly what Chromium gardeners do today, except
>>>>>> instead of putting a line in TestExpectations/Skipped to look at later, they
>>>>>> checkin the -failing file to look at later, which has all the advantages
>>>>>> Dirk listed in the other thread.
>>>>> 
>>>>> 
>>>>> I'm much more comfortable with this proposal.
>>>>> 
>>>>> - Ryosuke
>>>>> 
>>>>> _______________________________________________
>>>>> webkit-dev mailing list
>>>>> webkit-dev at lists.webkit.org
>>>>> http://lists.webkit.org/mailman/listinfo/webkit-dev
>>> 
>>> _______________________________________________
>>> webkit-dev mailing list
>>> webkit-dev at lists.webkit.org
>>> http://lists.webkit.org/mailman/listinfo/webkit-dev
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo/webkit-dev


More information about the webkit-dev mailing list