[webkit-reviews] review denied: [Bug 108288] On EWS bots, run tests on patch once instead of twice if some tests fail. : [Attachment 185388] Patch

Wed Jan 30 12:43:26 PST 2013

Dirk Pranke <dpranke at chromium.org> has denied Timothy Loh
<timloh at chromium.org>'s request for review:
Bug 108288: On EWS bots, run tests on patch once instead of twice if some tests
fail.
https://bugs.webkit.org/show_bug.cgi?id=108288

Attachment 185388: Patch
https://bugs.webkit.org/attachment.cgi?id=185388&action=review

------- Additional Comments from Dirk Pranke <dpranke at chromium.org>
View in context: https://bugs.webkit.org/attachment.cgi?id=185388&action=review

> Tools/Scripts/webkitpy/layout_tests/controllers/manager.py:206
> +		       repeat_each=retries, iterations=1, num_workers=1,
retrying=True)

There are important differences between --retry-failures and --repeat-each that
are getting glossed over here with your change (and with this patch as a
whole).

Currently we save the results for both the initial run (into initial_results)
and the retry run (into retry_results) and compute the aggregate results and
return code downstream based on knowing what happens on each run. This means
that, for example, the flakiness dashboard gets the results of both times the
test is invoked (although we don't actually use that today). 

When you use --repeat-each, each result gets overwritten, so you only know the
result for the last invocation. In other words, if the test fails on the last
run, you'll think the test is failing, even if it passed the previous six times
and should be considered flaky.

Put differently, --repeat-each and --iterations are not designed to be used on
the bots; I don't have any good sense of what sort of problems this'll expose.

Now, one could argue that we should combine --repeat-each and --retry-failures
as you're trying to do, but I'd want to step back a bit first and understand
*why* you want to combine them; i.e., what we're looking to gain from this.

For example, I'm not following what we're attempting to fix with this patch as
a whole, apart from the fact that we have two layers both retrying things. Is
the belief that NRWT's built in retry logic is reporting too many flaky tests
as actual failures? (i.e., EWS needs to retry more often in order to figure out
what's really failing)?

When I initially added the retry logic to NRWT, I actually had the abiliity to
retry N times. It turned out nobody really wanted to retry more than once and
we didn't really use the feature, so I simplified it to the code that we have
today. Of course, we weren't using this for EWS at the time ...