[webkit-dev] A proposal for handling "failing" layout tests and TestExpectations

Wed Aug 15 17:45:56 PDT 2012

The typical approach used in situations that you describe is to rebase, not skip.  This avoids the problem of not knowing when the test started passing.  Hence, I'm not sure what you're implying.  Maybe a better example would help.

On Aug 15, 2012, at 5:39 PM, Peter Kasting <pkasting at chromium.org> wrote:

> On Wed, Aug 15, 2012 at 5:00 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> I believe that the cognitive load is greater than any benefit from catching bugs incidentally by continuing to run a (1-fail) or (3) test, and continuing to evaluate whether or not the expectation matches some notions of desired behavior.
> 
> As someone who has spent a lot of time maintaining Chromium's expectations, this seems clearly false, if your proposed alternative is to stop running the test.  This is because a very common course of events is for a test to begin failing, and then later on return to passing.  We (Chromium) see this all the time with e.g. Skia changes, where for example the Skia folks will rewrite gradient handling to more perfectly match some spec and as a result dozens or hundreds of tests, many not explicitly intended to be about gradient handling, will change and possibly begin passing.
> 
> By contrast, if we aren't running a test, we don't know when the test begins passing again (except by trying to run it).  The resulting effect is that skipped tests tend to remain skipped.  Tests that remain skipped are no better than no tests.  And even if such tests are periodically retested, once a test's output changes, there is a large window of time where the test wasn't running, making it difficult to pinpoint exactly what caused the change and whether the resulting effect is intentional and beneficial.
> 
> If we ARE running a test, then when the results change, knowing whether the existing result was thought to be correct or not is a critical part of a sheriff's job in deciding what to do about the change.  This is one reason why Chromium has never gone down the path of simply checking in failure expectations, and something that Dirk's proposal explicitly tries to address while still allowing ports that (IMO mistakenly) don't care to continue to not care.
> 
> We already have some good tooling (e.g. garden-o-matic) that could be extended to show and update the small amount of additional info Dirk is proposing.  I am very skeptical of abstract claims that this proposal inflates complexity and decreases productivity in the absence of actually testing a real workflow using the tools that we sheriffs really use to maintain tree greenness.
> 
> I would like to see this proposal tested to get concrete feedback instead of arguments on principle.

I would not like to see our testing infrastructure get any more complicated than it already is, just because of a philosophical direction chosen unilaterally by one port.

> 
> PK

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120815/cdd7fea6/attachment.html>