[webkit-dev] An update on new-run-webkit-tests

Thu Apr 7 04:40:35 PDT 2011

On Apr 6, 2011, at 10:33 PM, Dirk Pranke wrote:

> 
> I'm not sure I understand you, but if I do, this is what I was
> attempting to talk about in the paragraph above, about expecting some
> tests to be flaky or failing under NRWT simply because NRWT isn't
> exactly identical to ORWT. NRWT may be exposing bugs in the code that
> ORWT didn't trigger (e.g., because tests ran in a slightly different
> order, or because of the concurrency issues).
> 
> It may be that you're thinking that either we run the test and it
> fails, or we put the test in the Skipped file, because that was our
> only choice with ORWT. In the new system, we can mark the test as
> expected to fail in a particular way, but continue to run it (in order
> to ensure that the test doesn't get worse and maintaining coverage).

I think if there are changes in test behavior that give worse test results (either failures or flakiness), those should be fixed before cutting over. If the new test tool causes more failures, or worse yet causes more tests to give unpredictable results, then that makes our testing system worse. The main benefit of new-run-webkit-tests, as I understand it, is that it can run the tests a lot faster. But I don't think it's a good tradeoff to run the tests a lot faster on the buildbot, if the results we get will be less reliable. I'm actually kind of shocked that anyone would consider replacing the test script with one that is known to make our existing tests less reliable.

I don't really care why tests would turn flaky. It's entirely possible that these are bugs in the tests themselves, or in the code they are testing. That should still be fixed.

Nor do I think that marking tests as flaky in the expectations file means we are not losing test coverage. If a test can randomly pass or fail, and we know that and the tool expects it, we are nonetheless not getting the benefit of learning when the test starts failing. 

> 
> Certainly running both systems in parallel for a while and shaking out
> bugs that the NRWT bots reveal prior to cutting over is a good idea,
> but I don't know that it's realistic to target all tests passing 100%
> of the time prior to cutover. Then again, it may be that I'm more used
> to Chromium bots where we have a large number of tests that aren't
> expected to pass for one reason or another, and the Apple Mac port
> will be more stable and easier to converge on.

OK, but we are not talking about future discoveries here. We are talking about problems that have been in bugzilla for a year or so. And we're talking about (for now) a relatively short list. Of course, once we do a test run we may discover there are more problems, but I don't think that gives us license to ignore the problems we already know about.

> Does that address your concerns?

Not really!

Regards,
Maciej