[webkit-dev] A post-mordem of today's tree redness

Tue Apr 6 10:06:41 PDT 2010

On Tue, Apr 6, 2010 at 9:25 AM, Alexey Proskuryakov <ap at webkit.org> wrote:
> 05.04.2010, в 21:58, Adam Barth написал(а):
>> Leaving failures in the tree make it difficult to track
>> down subsequent failures.  Rolling out a change means more work for
>> you, but less costs imposed on the rest of the project.
>
> While I agree with your analysis for the most part, there are costs associated with rolling out patches that you didn't mention. Some of these are:
>
> 1) Confusion about what is going on with the project. It becomes harder to know what's going on by reading webkit-changes - because you can't unsee a patch you saw landed, and because people often roll out patches with cryptic messages (roll out rXXXXX, because it breaks Tiger - how are you supposed to know that an important change you saw landed a few hours ago isn't there any more?)

I don't read webkit-changes, so I might not fully appreciate this use
case, but the way I know when things are rolled out is because we
reopen the bug and comment that patch was rolled out in a certain
revision.  If you like, we can put more information in the ChangeLogs
created by sheriffbot (such as the title of the original bug).

> 2) Confusion also happens in Bugzilla - there are several styles for dealing with such issues (make a new bug for rollout, or just roll out and reopen). People often forget to document what they were doing to fix the build, so you end up with a resolved bug for something that has been rolled out, or a reopened bug without adequate explanations. Even when the original bug is correctly reopened, it can be hard to figure out its history, because commit queue clears out flags on patches.

This problem is solvable with tooling.  The process I recommend is as follows:

1) Open a new bug for the rollout patch and mark it blocking the main
bug.  This reduces noise on the main bug and provides a location to
discuss the failures and resolve the situation, either by landing the
rollout or not.  (Creating a new bug is already automated.)

2) When landing a rollout, reopen the original bug and comment that
the patch was rolled out and provide a link to the revision and bug.
(Currently, this step is manual, but we can automate this too.)

> 3) Likelihood of more world rebuilds for developers and bots. A troublesome patch is more likely to touch common headers than a targeted build fix, so you get three world rebuilds instead of one.

I don't see this as much of a concern.  We can track statistics, but I
bet the build time attributable to rollouts is less than 5% of all
build time.

> 4) It's harder to isolate regressions if these appear and disappear several times (aforementioned confusion doesn't help either). Screening bugs about regressions also becomes more error-prone. This arguments goes both ways though - it's even harder to isolate regressions if the platform in question had broken build at the time.

Concretely, supposed we hadn't cleaned up the Tiger bot to be green
recently.  I strongly suspect the regression caused by r57081 would
have been lost in the thought process that "the Tiger bot is always
red."  Even though the regression was real and affected every
platform.  Had we noticed the problem (say) a month later, we would
have had a devil of a time tracking down the issue as evidenced by the
effort required to fix the previous ancient Tiger-only failures.

Even more concretely, the Windows bot have been red for thousands of
revisions.  Today someone (it's not important who) broke the
break-blockquote-after-delete.html test on all the bots.  He resolved
the situation by updating the expected results to make the tree green
again.  However, he did not update the Windows expected results even
though the failure diff is identical:

http://build.webkit.org/results/Windows%20Release%20(Tests)/r57153%20(10992)/editing/inserting/break-blockquote-after-delete-pretty-diff.html

That means when we finally get around to tracking down the failures
some number of months ago, we'll be mystified by this failure even
though we had the knowledge yesterday to fix the failure in a few
minutes.  I strongly suspect that if the Windows bots had started the
day green and we had a culture of keeping the green, this individual
would have made them green again and we wouldn't be accumulating a
debt of mysterious failures that will drain our productivity in the
future.

I'm not saying the rollouts are always the best solution.  For
example, updating the expected results for
break-blockquote-after-delete.html yesterday appears to have been the
proper response.  What I am saying is that we should keep the bots
green, ideally by steadfastly refusing to regress tests.

Adam