[webkit-dev] A post-mordem of today's tree redness

Mon Apr 5 23:35:56 PDT 2010

On Mon, Apr 5, 2010 at 11:27 PM, Brent Fulgham <bfulgham at gmail.com> wrote:
> On Apr 5, 2010, at 9:58 PM, Adam Barth wrote:
>> We had some trouble today keeping the tree green.  In this email, I
>> present a post-mordem analysis of what happened and what we can learn
>> from these events.  I've removed most of the names from this account
>> because the purpose isn't to assign blame but to document what
>> happened in the hopes that we can learn from it.
>
> Could rollout of patches be automated in some fashion, so that a previously-green tree becoming red could trigger a rollout of the last checkin?

We have the tools to do this currently, but the prevailing wisdom is
that we should have some human judgement involved in the process.  We
still have enough test flakiness that false positives could roll out
perfect good patches.  In other cases, it's clear how to fix the tree
without rolling out.

> On the other hand, I've noticed that the varying speed of the various build bots makes it difficult to assess which patch might have triggered a break.  It's not uncommon for some machines to be several patches behind others, and long test runs further exacerbate the problem.

The sheriffbot does a pretty good job of narrowing down the regression
window.  Its algorithm is somewhat robust to flaky tests and other
bits of noise.  We continue to refine it based on experience.  For
example, even during today's complex overlapping failures, it correct
computed the regression window to a set of five commits.

The main failure mode we're seeing currently is that if a test fails
80% of the time, sheriffbot will generate false positives because it
thinks that the test was fixed the one time it happens to be green.  I
have some ideas for how to handle that case, but we'll need to
experiment some more.

Adam