[webkit-dev] A post-mordem of today's tree redness

Ojan Vafai ojan at chromium.org
Tue Apr 6 11:28:37 PDT 2010


On Mon, Apr 5, 2010 at 11:35 PM, Adam Barth <abarth at webkit.org> wrote:

> On Mon, Apr 5, 2010 at 11:27 PM, Brent Fulgham <bfulgham at gmail.com> wrote:
> > On Apr 5, 2010, at 9:58 PM, Adam Barth wrote:
> >> We had some trouble today keeping the tree green.  In this email, I
> >> present a post-mordem analysis of what happened and what we can learn
> >> from these events.  I've removed most of the names from this account
> >> because the purpose isn't to assign blame but to document what
> >> happened in the hopes that we can learn from it.
> >
> > Could rollout of patches be automated in some fashion, so that a
> previously-green tree becoming red could trigger a rollout of the last
> checkin?
>
> We have the tools to do this currently, but the prevailing wisdom is
> that we should have some human judgement involved in the process.  We
> still have enough test flakiness that false positives could roll out
> perfect good patches.  In other cases, it's clear how to fix the tree
> without rolling out.
>
> > On the other hand, I've noticed that the varying speed of the various
> build bots makes it difficult to assess which patch might have triggered a
> break.  It's not uncommon for some machines to be several patches behind
> others, and long test runs further exacerbate the problem.
>

Another example of tree redness that I caused a couple weeks ago:
1. Commit a change that I thought would be green on all platforms.
2. Turned out to regress some tests on the Chromium Windows bot.
3. Committed a fix for Chromium Windows, which also fixed the WebKit Windows
bots.

The WebKit Windows bots had gotten hours behind, so the original commit had
still not run the tests on the bots. Hours later, the original commit ran
and the bot turned red, even though the fix was already committed. Hours
after that, the followup commit ran and the tests were fixed. Anyways, my
point is that it's really difficult to look at the state of the tree and say
definitively that a patch should be rolled out.

Also, I'd like to add a recommendation to improve keeping the tree green.
Make the bots cycle faster. It makes it much easier to see what commit
caused a regression and it makes it easier to tell that a fix actually
greened the tree. Not sure what this involves, but some ideas:
1. Get faster/more machines.
2. Run tests in parallel. new-run-webkit-tests does this if we can get it to
be a suitable replacement for run-webkit-tests. How much benefit we get from
parallelizing depends on the number of cores on the machine, which gets back
to recommendation 1.

Ojan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20100406/07022f19/attachment.html>


More information about the webkit-dev mailing list