[webkit-dev] A simpler proposal for handling failing tests WAS: A proposal for handling "failing" layout tests and TestExpectations

Ryosuke Niwa rniwa at webkit.org
Sat Aug 18 00:24:56 PDT 2012


+1 to that. -expected.png / -failure.png is clearer than -expected.png /
-previous.png or -expected.png / -correct.png.

It's hard to grasp the difference between "expected" and "correct" unless
you fully knew how rebaselines worked in layout tests. Also, this model has
a nice one-to-one mapping with entires in TestExpectations (with single
expectation).

On Fri, Aug 17, 2012 at 8:02 PM, Filip Pizlo <fpizlo at apple.com> wrote:

> So this is down to expected/failing and expected/previous?
>
> I must say that expected/failing feels less confusing. Easier to remember
> if I have to quickly recall what it means.
>
> -Filip
>
> On Aug 17, 2012, at 7:36 PM, Dirk Pranke <dpranke at chromium.org> wrote:
>
> > I'm not sure if I like this idea or not. A couple of
> observations/questions ...
> >
> > 1) I wouldn't want to call it '-correct' unless we were sure it was
> > correct; '-previous' is better in that regard
> >
> > 2) the issue with keeping a '-correct' in the tree is that it's quite
> > possible for a previous correct expectation to need to change to a
> > different expectation to still be correct. i.e., they get stale. I
> > fear that this could quickly become more confusing than it's worth.
> > It's also not clear to me when -previous gets updated or removed?
> >
> > 3) It also feels like '-previous' is something that we can just as
> > easily get from SVN/Git/whatever, in a completely foolproof and
> > automatic way. I grant that it's easier to just do a side by side
> > compare, but "diff against previous" isn't that hard to do and we
> > could easily write a wrapper to make it trivial ...
> >
> > 4) I'd want to work through the various branches in the workflow more
> > before I felt comfortable with this. When I was coming up with my
> > original proposal I originally wanted to allow -passing and -expected
> > to live side-by-side, but all sorts of complications arose, so I'd be
> > worried that we'd have them here, too.
> >
> > -- Dirk
> >
> > On Fri, Aug 17, 2012 at 6:51 PM, Ojan Vafai <ojan at chromium.org> wrote:
> >> That matches my understanding. You proposed modification sounds fine to
> me.
> >>
> >> On Fri, Aug 17, 2012 at 6:40 PM, Maciej Stachowiak <mjs at apple.com>
> wrote:
> >>>
> >>>
> >>> My understanding of the current proposal is this:
> >>>
> >>> 1) This applies to tests that fail deterministically, for reasons other
> >>> than a crash or hang.
> >>> 2) If the test has a new result that you're confident is a progression
> (or
> >>> neither better or worse), you simply update the -expected.txt file.
> >>> 3) If the test has a new result that you're confident is a regression,
> you
> >>> delete the -expected.txt file and check in the new results as
> -failing.txt.
> >>> 4) Ditto points 2 and 3 with respect to -expected.png, for image diffs.
> >>> 5) We would stop using all other ways of marking tests that fail
> >>> deterministically, including Skipped and the many things you could
> enter in
> >>> TestExpectations.
> >>>
> >>> Is that correct?
> >>>
> >>> If so, I'd like to suggest a minor modification. In place of point 3
> >>> above, I propose the following:
> >>>
> >>> 3) If the test has a new result that you're confident is a regression,
> you
> >>> rename the -expected.txt file to -previous.txt (or maybe -correct.txt
> or
> >>> -pre-expected.txt or something) and check in the new results as
> >>> -expected.txt (unless there is already a -previous.txt, in which case
> just
> >>> update -expected and leave -previous).
> >>>
> >>> I propose this for the following reasons:
> >>>
> >>> - It maintains the longstanding approach that -expected.txt reflects
> what
> >>> is currently *expected* to happen, not whether it is right or wrong in
> some
> >>> abstract sense. It is an expectation, not a reference.
> >>> - It still leaves a clear indication of tests that somebody needs to
> look
> >>> at further, to determine if a regression occurred.
> >>> - It leaves both old and new result in place for easy comparison by an
> >>> expert.
> >>>
> >>> Regards,
> >>> Maciej
> >>>
> >>>
> >>> On Aug 17, 2012, at 6:06 PM, Ojan Vafai <ojan at chromium.org> wrote:
> >>>
> >>> +1
> >>>
> >>>
> >>> On Fri, Aug 17, 2012 at 5:36 PM,
> Ryosuke Niwa
> Software Engineer
> Google Inc.
>
>
> <rniwa at webkit.org> wrote:
> >>>>
> >>>> That's my expectation although we probably can't do that for flaky
> tests
> >>>> :(
> >>>>
> >>>> e.g. sometimes fails with image diff.
> >>>>
> >>>> On Fri, Aug 17, 2012 at 5:35 PM, Filip Pizlo <fpizlo at apple.com>
> wrote:
> >>>>>
> >>>>> +1, contingent upon the following: are we agreeing that all current
> uses
> >>>>> of TEXT, IMAGE, and so forth in TestExpectations should be in the
> *very near
> >>>>> term* following Dirk's change be turned into -failing files?
> >>>>>
> >>>>> -Filip
> >>>>>
> >>>>>
> >>>>> On Aug 17, 2012, at 5:01 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
> >>>>>
> >>>>> On Fri, Aug 17, 2012 at 4:55 PM, Ojan Vafai <ojan at chromium.org>
> wrote:
> >>>>>>
> >>>>>> Asserting a test case is 100% correct is nearly impossible for a
> large
> >>>>>> percentage of tests. The main advantage it gives us is the ability
> to have
> >>>>>> -expected mean "unsure".
> >>>>>>
> >>>>>> Lets instead only add -failing (i.e. no -passing). Leaving
> -expected to
> >>>>>> mean roughly what it does today to Chromium folk (roughly, as best
> we can
> >>>>>> tell this test is passing). -failing means it's *probably* an
> incorrect
> >>>>>> result but needs an expert to look at it to either mark it correct
> (i.e.
> >>>>>> rename it to -expected) or figure out how the root cause of the bug.
> >>>>>>
> >>>>>> This actually matches exactly what Chromium gardeners do today,
> except
> >>>>>> instead of putting a line in TestExpectations/Skipped to look at
> later, they
> >>>>>> checkin the -failing file to look at later, which has all the
> advantages
> >>>>>> Dirk listed in the other thread.
> >>>>>
> >>>>>
> >>>>> I'm much more comfortable with this proposal.
> >>>>>
> >>>>> - Ryosuke
> >>>>>
> >>>>> _______________________________________________
> >>>>> webkit-dev mailing list
> >>>>> webkit-dev at lists.webkit.org
> >>>>> http://lists.webkit.org/mailman/listinfo/webkit-dev
> >>>
> >>> _______________________________________________
> >>> webkit-dev mailing list
> >>> webkit-dev at lists.webkit.org
> >>> http://lists.webkit.org/mailman/listinfo/webkit-dev
> > _______________________________________________
> > webkit-dev mailing list
> > webkit-dev at lists.webkit.org
> > http://lists.webkit.org/mailman/listinfo/webkit-dev
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo/webkit-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120818/16fef398/attachment-0001.html>


More information about the webkit-dev mailing list