[webkit-dev] A proposal for handling "failing" layout tests and TestExpectations

Thu Aug 16 14:32:25 PDT 2012

On Aug 16, 2012, at 2:13 PM, Dirk Pranke wrote:

> On Wed, Aug 15, 2012 at 6:02 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>> 
>> 2) Possibility of the sheriff getting it wrong.
>> 
>> (2) concerns me most.  We're talking about using filenames to serve as a
>> kind of unchecked comment.  We already know that comments are usually bad
>> because there is no protection against them going stale.
>> 
> 
> Sheriffs can already get things wrong (and rebaseline when they
> shouldn't). I believe that adding passing/failing to expected will
> make things better in this regard, not worse.

In what way do things become better?  Because other people will see what the sheriff believed about the result?

Can you articulate some more about what happens when you have both -expected and -failing?

My specific concern is that after someone checks in a fix, we will have some sheriff accidentally misjudge the change in behavior to be a regression, and check in a -failing file.  And then we end up in a world of confusion.

This is why I think that just having -expected files is better.  It is a kind of recognition that we're tracking changes in behavior, rather than comparing against some almighty notion of what it means to be correct.

> 
> Another idea/observation is that if we have multiple types of
> expectation files, it might be easier to set up watchlists, e.g., "let
> me know whenever a file gets checked into fast/forms with an -expected
> or -failing result". It seems like this might be useful, but I'm not
> sure.
> 
>> In particular, to further clarify my position, if someone were to argue that
>> Dirk's proposal would be a wholesale replacement for TestExpectations, then
>> I would be more likely to be on board, since I very much like the idea of
>> reducing the number of ways of doing things.  Maybe that's a good way to
>> reach compromise.
>> 
>> Dirk, what value do you see in TestExpectations were your change to be
>> landed?  Do scenarios still exist where there would be a test for which (a)
>> there is no -fail.* file, (b) the test is not skipped, and (c) it's marked
>> with some annotation in TestExpectations?  I'm most interested in the
>> question of such scenarios exist, since in my experience, whenever a test is
>> not rebased, is not skipped, and is marked as failing in TestExpectations,
>> it ends up just causing gardening overhead later.
> 
> This is a good question, because it is definitely my intent that this
> change replace some existing practices, not add to them.
> 
> Currently, the Chromium port uses TestExpectations entries for four
> different kinds of things: tests we don't ever plan to fix (WONTFIX),
> tests that we skip because not doing so causes other tests to break,
> tests that fail (reliably), and tests that are flaky.
> 
> Skipped files do not let you distinguish (programmatically) between
> the first two categories, and so my plan is to replace Skipped files
> with TestExpectations (using the new syntax discussed a month or so
> ago) soon (next week or two at the latest).
> 
> I would like to replace using TestExpectations for failing tests (at
> least for tests that are expected to keep failing indefinitely because
> someone isn't working on an active fix) with this new mechanism.
> 
> That leaves flaky tests. One can debate what the right thing to do w/
> flaky tests is here; I'm inclined to argue that flakiness is at least
> as bad as failing, and we should probably be skipping them, but the
> Chromium port has not yet actively tried this approach (I think other
> ports probably have experience here, though).
> 
> Does that help answer your question / sway you at all?

Yes, it does - it answers my question, though it perhaps doesn't sway me.  My concerns are still that:

1) Switching to skipping flaky tests wholesale in all ports would be great, and then we could get rid of the flakiness support.

2) The WONTFIX mode in TestExpectations feels to me more like a statement that you're just trying to see if the test doesn't crash.  Correct?  Either way, it's confusing.

3) Your new mechanism feels like it's already covered by our existing use of -expected files.  I'm not quite convinced that having -failing in addition to -expected files would be all that helpful.

(3) concerns me the most, and it concerns me particularly because we're still not giving good answers to (1) or (2).

-F