[webkit-dev] A proposal for handling "failing" layout tests and TestExpectations
rniwa at webkit.org
Thu Aug 16 14:38:41 PDT 2012
On Thu, Aug 16, 2012 at 2:32 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> On Aug 16, 2012, at 2:13 PM, Dirk Pranke wrote:
> > On Wed, Aug 15, 2012 at 6:02 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> >> 2) Possibility of the sheriff getting it wrong.
> >> (2) concerns me most. We're talking about using filenames to serve as a
> >> kind of unchecked comment. We already know that comments are usually
> >> because there is no protection against them going stale.
> > Sheriffs can already get things wrong (and rebaseline when they
> > shouldn't). I believe that adding passing/failing to expected will
> > make things better in this regard, not worse.
> In what way do things become better? Because other people will see what
> the sheriff believed about the result?
> Can you articulate some more about what happens when you have both
> -expected and -failing?
> My specific concern is that after someone checks in a fix, we will have
> some sheriff accidentally misjudge the change in behavior to be a
> regression, and check in a -failing file. And then we end up in a world of
> This is why I think that just having -expected files is better. It is a
> kind of recognition that we're tracking changes in behavior, rather than
> comparing against some almighty notion of what it means to be correct.
> > Another idea/observation is that if we have multiple types of
> > expectation files, it might be easier to set up watchlists, e.g., "let
> > me know whenever a file gets checked into fast/forms with an -expected
> > or -failing result". It seems like this might be useful, but I'm not
> > sure.
> >> In particular, to further clarify my position, if someone were to argue
> >> Dirk's proposal would be a wholesale replacement for TestExpectations,
> >> I would be more likely to be on board, since I very much like the idea
> >> reducing the number of ways of doing things. Maybe that's a good way to
> >> reach compromise.
> >> Dirk, what value do you see in TestExpectations were your change to be
> >> landed? Do scenarios still exist where there would be a test for which
> >> there is no -fail.* file, (b) the test is not skipped, and (c) it's
> >> with some annotation in TestExpectations? I'm most interested in the
> >> question of such scenarios exist, since in my experience, whenever a
> test is
> >> not rebased, is not skipped, and is marked as failing in
> >> it ends up just causing gardening overhead later.
> > This is a good question, because it is definitely my intent that this
> > change replace some existing practices, not add to them.
> > Currently, the Chromium port uses TestExpectations entries for four
> > different kinds of things: tests we don't ever plan to fix (WONTFIX),
> > tests that we skip because not doing so causes other tests to break,
> > tests that fail (reliably), and tests that are flaky.
> > Skipped files do not let you distinguish (programmatically) between
> > the first two categories, and so my plan is to replace Skipped files
> > with TestExpectations (using the new syntax discussed a month or so
> > ago) soon (next week or two at the latest).
> > I would like to replace using TestExpectations for failing tests (at
> > least for tests that are expected to keep failing indefinitely because
> > someone isn't working on an active fix) with this new mechanism.
> > That leaves flaky tests. One can debate what the right thing to do w/
> > flaky tests is here; I'm inclined to argue that flakiness is at least
> > as bad as failing, and we should probably be skipping them, but the
> > Chromium port has not yet actively tried this approach (I think other
> > ports probably have experience here, though).
> > Does that help answer your question / sway you at all?
> Yes, it does - it answers my question, though it perhaps doesn't sway me.
> My concerns are still that:
> 1) Switching to skipping flaky tests wholesale in all ports would be
> great, and then we could get rid of the flakiness support.
This is not necessarily helpful in some cases. There are cases some test
make subsequent tests flaky so skipping a flaky test doesn't fix the
problem. Instead, it just makes the next test flaky. The right fix, of
course, to skip/fix the culprit but pinpointing the test that makes
subsequent tests more flaky has proved to be a time consuming and
2) The WONTFIX mode in TestExpectations feels to me more like a statement
> that you're just trying to see if the test doesn't crash. Correct? Either
> way, it's confusing.
Yes, I'd argue that they should just be skipped but that's a bikeshedding
for another time.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the webkit-dev