[webkit-dev] DRT/WTR should clear the cache at the beginning of each test?

Fri Oct 26 14:57:56 PDT 2012

On Fri, Oct 26, 2012 at 12:43 PM, Alexey Proskuryakov <ap at webkit.org> wrote:
>
> 26.10.2012, в 11:04, Antti Koivisto <koivisto at iki.fi> написал(а):
>
>> The reality is that this "test coverage" today shows up as flakiness and
>> so is ignored anyway, meaning we don't actually have useful coverage here.
>> Even when flakiness is investigated, the "fix" is to cache-bust using unique
>> URL params, which just means we "lose" the coverage you describe for that
>> test, anyway.
>
>
> I think that this is the real issue here. Test flakiness is very important
> to investigate, this often leads to discovery of bad bugs, including
> security ones. The phrase "flaky test" often misplaces the blame.
>
> When making cache related changes I have frequently found bugs from my
> patches because some seemingly random test started failing and I
> investigated. Without the test coverage some of those bugs would probably
> now be in the tree.
>

I agree strongly with both of these sentiments. My experience, though,
is that most
people are disinclined to actually spend the time figuring out why a
test is flaky; in addition,
it can be very difficult to even figure out if a test has just started
becoming flaky or if your
change introduced flakiness. As a result, people tend to just suppress
or skip over flaky tests.

>
> I agree with Antti. Finding regressions is what tests are for, and it would
> be difficult to make enough explicit tests to compensate for such loss of
> coverage. It would certainly be very unfortunate to lose test coverage
> without even an attempt to compensate for that.

Because of what I've written above, having flaky tests is causing us
to lose coverage today. So, I suspect that with this change we'll be
able to unsuppress a number of failures and re-gain lost coverage
happening now. Whether this offsets Antti's concern, I am not informed
enough to know.

Moreover, in my experience, flaky tests cause far more pain than they
are worth, and as a result it is much more important to get tests that
run consistently every time than it is to keep running tests that
cause intermittent failures. I believe this is a generally accepted
industry / QA principle (i.e., I don't think I'm in a minority here).

A corollary of this is that a change that fixes or removes test
flakiness is valued highly, even if it causes the underlying problems
to stop manifesting themselves.

Of course, we have to balance the desire to find bugs against other
sources of productivity gain and loss as well. For example, there is
no question that running the layout tests in parallel increases test
flakiness as well, and yet we think that that is an acceptable
tradeoff generally (although perhaps not everyone agrees with this
choice).

Given all this, it seems like Elliot's suggestion is a near-ideal
compromise. You can have your cake and eat it ... we get less
flakiness by default, and if you want to test for more flakiness /
additional code paths, you can still do so.

Perhaps a slight variant of this is that we can agree to make the
changes on the Chromium port to clear the cache (much like the Qt and
EFL ports already do), and you can continue to not clear the cache on
the Apple Mac port until you feel comfortable that you've added
additional tests?

WDYT?

-- Dirk