[webkit-dev] A proposal for handling "failing" layout tests and TestExpectations

Wed Aug 15 14:25:25 PDT 2012

On Wed, Aug 15, 2012 at 1:36 PM, Michael Saboff <msaboff at apple.com> wrote:
> It seems to me that there are two issues here.  One is Chromium specific about process conformity.  It seems to me that should stay a Chromium issue without making the mechanism more complex for all ports.  The other ports seem to make things work using the existing framework.
>

I'm definitely attempting to minimize the impact on ports (and people)
that are happy with things the way they are. One thing we could do to
help with this would be to require that any '-passing'/'-failing'
tests be limited to the platform/ directories, so that way a given
port could opt-in to this additional complexity (and we could replace
one Chromium-specific process with another one without affecting the
other ports).

> The other broader issue is failing tests.  If I understand part of Filip's concern it is a signal to noise issue.  We do not want the pass / fail signal to be lost in the noise of expected failures.  Failing tests should be fixed as appropriate for failing platform(s).  That fixing might involve splitting off or removing a failing sub-test so that the remaining test adds value once again.  Especially "a pass becoming a fail" edge.  For me, a test failing differently provides unknown value as the noise of it being a failing test likely exceeds the benefit of the different failure mode signal.  It takes a non-zero effort to filter that noise and that effort is likely better spent fixing the test.

If I understand you correctly, you're suggesting that if a test starts
to fail, you don't care how it is failing, and simply knowing that it
is failing is enough? And if a test doesn't meet that criteria, it
should be split into multiple tests such that each sub-test does? I
think that's a nice goal, but we're nowhere close to that in practice.

In particular, this isn't true of many pixel tests at all. Minor
changes in text rendering can cause a whole bunch of tests to fail,
and yet we still need to run those tests to look for real regressions.
Ideally we can replace some or most of these tests with text-only
tests and reftests, but we're a long way away from that, too.

-- Dirk