[webkit-dev] A proposal for handling "failing" layout tests and TestExpectations

Wed Aug 15 15:06:40 PDT 2012

Apparently I was somewhat unclear.  Let me restate.  We have the following mechanisms available when a test fails:

1) Check in a new -expected.* file.

2) Modify the test.

3) Modify a TestExpectations file.

4) Add the test to a Skipped file.

5) Remove the test entirely.

I have no problem with (1) unless it is intended to mark the test as expected-to-fail-but-not-crash.  I agree that using -expected.* to accomplish what TestExpectations accomplishes is not valuable, but I further believe that even TestExpectations is not valuable.

I broadly prefer (2) whenever possible.

I believe that (3) and (4) are redundant, and I don't buy the value of (3).

I don't like (5) but we should probably do more of it for tests that have a chronically low signal-to-noise ratio.

You're proposing a new mechanism.  I'm arguing that given the sheer number of tests, and the overheads associated with maintaining them, (4) is the broadly more productive strategy in terms of bugs-fixed/person-hours.  And, increasing the number of mechanisms for dealing with tests by 20% is likely to reduce overall productivity rather than helping anyone.

-F

On Aug 15, 2012, at 12:40 PM, Dirk Pranke <dpranke at chromium.org> wrote:

> On Wed, Aug 15, 2012 at 12:27 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>> This sounds like it's adding even more complication to an already complicated system.
> 
> In some ways, yes. In other ways, perhaps it will allow us to simplify
> things; e.g., if we are checking in failing tests, there is much less
> of a need for multiple failure keywords in the TestExpectations file
> (so perhaps we can simplify them back to something closer to Skipped
> files).
> 
>> Given how many tests we currently have, I also don't buy that continuing to run a test that is already known to fail provides much benefit.
> 
> I'm not sure I understand your feedback here? It's common practice (on
> all the ports as far as I know today) to rebaseline tests that are
> currently failing so that they fail differently. Of course, we also
> skip some tests while they are failing as well. However, common
> objections to skipping tests are that we can lose coverage for a
> feature and/or miss when a test starts failing worse (e.g. crashing)?
> And of course, a test might start passing again, but if we're skipping
> it we wouldn't know that ...
> 
>> So, I'd rather not continue to institutionalize this notion that we should have loads of incomprehensible machinery to reason about tests that have already given us all the information they were meant to give (i.e. they failed, end of story).
> 
> Are you suggesting that, rather than checking in new baselines at all
> or having lots of logic to manage different kinds of failures, we
> should just let failing tests fail (and keep the tree red) until a
> change is either reverted or the test is fixed?
> 
> -- Dirk