[webkit-dev] Skipping Flakey Tests

Thu Oct 1 11:41:40 PDT 2009

I wanted to re-open this discussion with some real-world feedback.
In this case, there was a failure in one of the layout tests on the windows
platform, so following the advice below, aroben correctly checked in an
update to the test expectations instead of skipping the tests.

Downstream, this busted the Chromium tests, because that failure was not
happening in Chromium, and now our correct test output doesn't match the
incorrect test output that's been codified in the test expecations. We can
certainly manage this downstream by rebaselining the test and managing a
custom chromium test expectation, but that's a pain and is somewhat fragile
as it requires maintenance every time someone adds a new test case to the
test.

I'd really like to suggest that we skip broken tests rather than codify
their breakages in the expectations file. Perhaps we'd make exceptions to
this rule for tests that have a bunch of working test cases (in which case
there's value in running the other test cases instead of skipping the entire
test). But in general it's less work for everyone just to skip broken tests.

I don't have an opinion about flakey tests, but flat-out-busted tests should
get skipped. Any thoughts/objections?

-atw

On Fri, Sep 25, 2009 at 1:59 PM, Darin Adler <darin at apple.com> wrote:

> Green buildbots have a lot of value.
>
> I think it’s worthwhile finding a way to have them even when there are test
> failures.
>
> For predictable failures, the best approach is to land the expected failure
> as an expected result, and use a bug to track the fact that it’s wrong. To
> me this does seem a bit like “sweeping something under the rug”, a bug
> report is much easier to overlook than a red buildbot. We don’t have a great
> system for keeping track of the most important bugs.
>
> For tests that give intermittent and inconsistent results, the best we can
> currently do is to skip the test. I think it would make sense to instead
> allow multiple expected results. I gather that one of the tools used in the
> Chromium project has this concept and I think there’s no real reason not to
> add the concept to run-webkit-tests as long as we are conscientious about
> not using it when it’s not needed. And use a bug to track the fact that the
> test gives insufficient results. This has the same downsides as landing the
> expected failure results.
>
> For tests that have an adverse effect on other tests, the best we can
> currently do is to skip the test.
>
> I think we are overusing the Skipped machinery at the moment for platform
> differences. I think in many cases it would be better to instead land an
> expected failure result. On the other hand, one really great thing about the
> Skipped file is that there’s a complete list in the file, allowing everyone
> to see the list. It makes a good to do list, probably better than just a
> list of bugs. This made Darin Fisher’s recent “why are so many tests
> skipped, lets fix it” message possible.
>
>    -- Darin
>
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20091001/069fbb89/attachment.html>