[webkit-dev] DRT/WTR should clear the cache at the beginning of each test?

Mon Oct 29 05:48:15 PDT 2012

On Oct 28, 2012, at 10:09 PM, Dirk Pranke <dpranke at chromium.org> wrote:

> 
> On Sun, Oct 28, 2012 at 6:32 AM, Maciej Stachowiak <mjs at apple.com> wrote:
>> 
>> I think the nature of loader and cache code is that it's very hard to make tests which always fail deterministically when regressions are introduced, as opposed to randomly. The reason for this is that bugs in these areas are often timing-dependent. I think it's likely this tendency to fail randomly will be the case whether or not the tests are trying to explicitly test the cache or are just incidentally doing so in the course of other things.
>> 
> 
> I am not familiar with the loader and caching code in webkit, but I
> know enough about similar problem spaces to be puzzled by why it's
> impossible to write tests that can adequately test the code.

Has anyone claimed that? I think "impossible to write tests that can adequately test the code" is not a position that anyone in this thread has taken, certainly not me above.

My claim is only that many classes of loader and cache bugs, when first introduced, are likely to cause nondeterministic test failures. And further, this is likely to be the case even if tests are written to target that subsystem. That's not the same as saying adequate tests are impossible. It just means to have good testing of some areas of the code, we need a good way of dealing with nondeterministic failures.

> 
>> What I personally would most wish for is good tools to catch when a test starts failing nondeterministically, and to identify the revision where the failures began. The reason we hate random failures is that they are hard to track down and diagnose. But some types of bugs are unlikely to manifest in a purely deterministic way. It would be good if we had a reliable and useful way to catch those types of bugs.
> 
> This is a fine idea -- and I'm always happy to talk about ways we can
> improve our test tooling, please feel free to start a separate thread
> on these issues -- but I don't want to lose sight of the main issue
> here.

I think the problem I identified -- that it's overly hard to track down and diagnose regressions that cause tests to fail only part of the time -- is more important and more fundamental than any of the three problems that you cite below. Our test infrastructure ultimately exists to help us notice and promptly fix regressions, and for some types of regressions, namely those that do not manifest 100% of the time, it is not working so well. The problems you mention are all secondary consequences of that fundamental problem, in my opinion.

 - Maciej

> 
> It sounds like we've identified three existing problems - please
> correct me if I'm misstating them:
> 
> 1. There appears to be a bug in the caching code that is causing tests
> for other parts of the system to fail randomly.
> 
> 2. DRT and WTR on some ports are implemented in a way that is causing
> the system to be more fragile than some of us would like it to be, and
> there doesn't seem to be an a priori need for this to be the case;
> indeed some ports already don't do this.
> 
> 3. We don't apparently have dedicated test coverage for caching and
> the loader that people think is good enough, and getting such tests
> might be "hard".

P.S. I do think your problem statements are somewhat tendentious and not really supported by evidence provided in the thread. But even granting them as written, I don't think any of these is the "main issue".