[webkit-dev] A simpler proposal for handling failing tests WAS: A proposal for handling "failing" layout tests and TestExpectations

Dirk Pranke dpranke at chromium.org
Mon Aug 20 16:36:49 PDT 2012


On Sat, Aug 18, 2012 at 8:31 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>
> On Aug 18, 2012, at 5:55 PM, Maciej Stachowiak <mjs at apple.com> wrote:
>
>>
>> On Aug 18, 2012, at 5:11 PM, Filip Pizlo <fpizlo at apple.com> wrote:
>>
>>> Maybe at this point we can agree to let Dirk land some variant of this with whatever half-way sensible name (any of the options on the table are decent) and see how it works?
>>>
>>> It seems that the only thing anyone is disagreeing over is naming and which files to keep around, which is a much smaller set of differences than status-quo versus any variant of this proposal.
>>
>> I agree that we should adopt some variant over the status quo. As you rightly noted, there are too many different ways to handle tests that deviate from the original expectation, and we have the opportunity to obsolete most of those ways with an approach that combines advantages of multiple current approaches.
>>
>> However, I fear that whatever names we pick for the first round will then be unchangeable due to status quo bias (which we see a lot of in test infrastructure discussions, indeed, even this one). And anyone arguing against change at that point will have a valid argument that a huge global rename of tests is a bad idea. So I think it's worth expending a little effort to find names that are good.
>>
>> Would you object to -expected-failure/-unexpected-pass as a naming scheme, along with the approach of keeping both around when they are used?
>
> I don't mind -expected-failure/-unexpected-pass, and I think that the slightly added verbosity will make things clearer.  Would you also advocate having the tooling mandate that the expected files are in either, but never both, of these two states:
>
> 1) -expected.foo
> 2) -expected-failure.foo/-unexpected-pass.foo
>
> That is, if we're not in a failing state, the -expected suffix is what we use.  Dirk, what do you think?  (And a possibly correct retort will be to tell us that we're bikeshedding. ;-))
>

I think I'm lost :) I think this is partially because Maciej didn't
respond to my previous questions about this proposal, and partially
because I'm not actually sure which combinations you're now proposing
we have (there was something like twelve different variants :).

Perhaps someone can recap how they expect things to work and what the
extensions being proposed for each case are?

While I agree with Maciej's point that it would be nice if
"-expected.txt" referred to whatever we currently expect to happen, as
this discussion indicates, the definition of "expected" itself starts
to become unclear. This is partially why I only wanted there to be one
baseline allowed for a given test regardless of pass/fail/unknown
status.

The other (and IMO more serious) flaw with allowing more than one
baseline to exist at a time is that the one that isn't actually being
exercised is subject to bitrot, and hence it's not clear how relevant
it will stay. But it's hard to discuss this clearly without being
referring to the different names and cases, and so I'll wait until
someone can recap first.

-- Dirk





> -F
>
>
>>
>> Regards,
>> Maciej
>>
>>>
>>> -Filip
>>>
>>> On Aug 18, 2012, at 2:01 PM, Maciej Stachowiak <mjs at apple.com> wrote:
>>>
>>>>
>>>> On Aug 18, 2012, at 1:08 AM, Filip Pizlo <fpizlo at apple.com> wrote:
>>>>
>>>>> I like your idea of having both the result-we-currently-expect and the result-we-think-may-be-more-correct to be checked in.  I still prefer Dirk's naming scheme though.
>>>>
>>>> I think if we had both checked in, the result-we-think-may-be-more-correct should be named something other than -expected, since it is not, in fact, expected. That was the basis of my naming scheme.
>>>>
>>>> I think I would be happy with any scheme that had both checked in, and matched the criteria that you never have a file named -expected that is unexpected. For example, there could be schemes with no file named expected. If you let it be verbose, you could have:
>>>>
>>>> Single result:
>>>>  foo-expected.txt
>>>>
>>>> Possibly-worse current result, possibly-better older result:
>>>>  foo-expected-failure.txt
>>>>  foo-unexpected-pass.txt
>>>>
>>>>>
>>>>> I get the notion that "expected" always means literally what it seems to mean from the standpoint of whether the tooling is silent for the test (actual == expected) or has something to say.
>>>>>
>>>>> But I think that if the tooling is behaving right, your concern that "a test would fail if it did *not* match the "failing" result" would be addressed: the tooling could be silent for actual == failing (if a failing file exists) but notify you of an "unexpected pass" if actual == expected.
>>>>
>>>> But if you match neither, you get a failure for not matching the "failing" result. That still strikes me as a little goofy. Not failing is failing, and getting the expected result is unexpected. I think my extra-verbose naming scheme above would better match what you suggest the tool UI would do. Maybe there is a more concise way to get the same point across.
>>>>
>>>> Regards,
>>>> Maciej
>>>>
>>
>


More information about the webkit-dev mailing list