[webkit-dev] A proposal for handling "failing" layout tests and TestExpectations

Wed Aug 15 17:39:00 PDT 2012

On Wed, Aug 15, 2012 at 5:00 PM, Filip Pizlo <fpizlo at apple.com> wrote:

> I believe that the cognitive load is greater than any benefit from
> catching bugs incidentally by continuing to run a (1-fail) or (3) test, and
> continuing to evaluate whether or not the expectation matches some notions
> of desired behavior.
>

As someone who has spent a lot of time maintaining Chromium's expectations,
this seems clearly false, if your proposed alternative is to stop running
the test.  This is because a very common course of events is for a test to
begin failing, and then later on return to passing.  We (Chromium) see this
all the time with e.g. Skia changes, where for example the Skia folks will
rewrite gradient handling to more perfectly match some spec and as a result
dozens or hundreds of tests, many not explicitly intended to be about
gradient handling, will change and possibly begin passing.

By contrast, if we aren't running a test, we don't know when the test
begins passing again (except by trying to run it).  The resulting effect is
that skipped tests tend to remain skipped.  Tests that remain skipped are
no better than no tests.  And even if such tests are periodically retested,
once a test's output changes, there is a large window of time where the
test wasn't running, making it difficult to pinpoint exactly what caused
the change and whether the resulting effect is intentional and beneficial.

If we ARE running a test, then when the results change, knowing whether the
existing result was thought to be correct or not is a critical part of a
sheriff's job in deciding what to do about the change.  This is one reason
why Chromium has never gone down the path of simply checking in failure
expectations, and something that Dirk's proposal explicitly tries to
address while still allowing ports that (IMO mistakenly) don't care to
continue to not care.

We already have some good tooling (e.g. garden-o-matic) that could be
extended to show and update the small amount of additional info Dirk is
proposing.  I am very skeptical of abstract claims that this proposal
inflates complexity and decreases productivity in the absence of actually
testing a real workflow using the tools that we sheriffs really use to
maintain tree greenness.

I would like to see this proposal tested to get concrete feedback instead
of arguments on principle.

PK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120815/a9d058cf/attachment.html>