[webkit-dev] Skipping Flakey Tests

Tue Dec 22 17:52:09 PST 2009

On Tue, Dec 22, 2009 at 4:58 PM, Darin Adler <darin at apple.com> wrote:
> On Dec 22, 2009, at 4:27 PM, Dirk Pranke wrote:
>
>> In the completely generic case, I hope we are not checking in incorrect results.
>
> We do intentionally check in incorrect results, fairly often. For example, we’ve checked in whole test suites and then generated expected results without studying the tests to see which ones are successful and which are failures.
>

Interesting. I wasn't aware of that, and I guess I hadn't noticed it yet.

>> An alternative would be to move to the more general syntax (and hopefully, just move to the tool) that Chromium uses.
>
> I’m surprised that Chromium developed a separate tool. If instead the Chromium team had enhanced the WebKit project’s shared run-webkit-tests we’d be better off. How did we end up with two separate tools?!

That I couldn't tell you, as the decision precedes me joining the
team; I'm sure someone else can chime in. I don't think anyone would
argue that one tool would be better than two, and Eric Seidel and I
have been working on a plan to merge the two feature sets so that we
do end up with only one tool; the major plusses to the Chromium tool
are that it has a more expressive syntax for tracking failures across
multiple platforms, and it can run tests in parallel across multiple
cores, so it tends to be 3x faster than the perl version (at least on
my 4-CPU MacPro). I do know the WebKit version supports a bunch of
switches and features that the Chromium tool doesn't, but they're
mostly switches I've never needed to use, I think, so I couldn't tell
you off the top of my head what they are.

>> Second, there's the question of whether or not you want to track what the "expected incorrect" results are, separate from what the "expected correct" results are. That way, you can detect when a test fails *differently* than it has been in the past.
>
> I do think we want to track this. It’s part of why the original system worked they way it did when I created it back in 2005.

Good to know. Being a fan of this feature myself, I would happily add it.

> Also, in some cases it may be difficult to generate “correct” results if the engine doesn’t yet have correct behavior at the time the test is being created.

True enough.

-- Dirk