[webkit-dev] Skipping Flakey Tests

Thu Oct 1 11:55:06 PDT 2009

On Thu, Oct 1, 2009 at 11:47 AM, Darin Adler <darin at apple.com> wrote:
> On Oct 1, 2009, at 11:41 AM, Drew Wilson wrote:
>
>> I don't have an opinion about flakey tests, but flat-out-busted tests
>> should get skipped. Any thoughts/objections?
>
> I object.
>
> If a test fails on some platforms and succeeds on others, we should have the
> success result checked in as the default case, and the failure as an
> exception. And we should structure test results and exceptions so that it’s
> easy to get the expected failure on the right platforms and success on
> others. Your story about a slight inconvenience because a test failed on the
> base Windows WebKit but succeeded on the Chromium WebKit does not seem like
> a reason to change this!
>
> Skipping the test does not seem like a good thing to do for the long term
> health of the project. It is good to exercise all the other code each test
> covers and also to notice when a test result gets even worse or gets better
> when a seemingly unrelated change is made.
>
> I think we should skip only tests that endanger the testing strategy because
> they are super-slow, crash, or adversely affect other tests in some way.
>

I agree that skipping the test is the wrong thing to do. However,
checking in an incorrect baseline over the correct baseline is also
the wrong thing to do (because, as Drew points out, this can break
other platforms that don't have the bug).

Chromium does have the concept of marking tests as expected to FAIL,
but it does not have a way to capture what the expected failure is
(i.e., there is no way to capture a "FAIL" baseline). We discussed
this recently and punted on it because it was unclear how useful this
would really be, and -- as we all probably agree -- it's better not to
have failing tests in the first place.

Eric and Dimitry have suggested that we look into pulling the Chromium
expectations framework upstream into Webkit and adding the features
that WebKit's framework has that Chromium's doesn't. It sounds to me
like this might be the right long-term solution, and I'd be happy to
work on it.

In the meantime, maybe it makes sense to add Fail files alongside the
Skipped files? That would allow the bots to stay green, but would at
least keep the tests running.

-- Dirk