[webkit-dev] Yet another email about a broken tree

Wed Mar 17 12:14:09 PDT 2010

On Wed, Mar 17, 2010 at 2:52 AM, Maciej Stachowiak <mjs at apple.com> wrote:
>
> On Mar 17, 2010, at 2:31 AM, Kenneth Russell wrote:
>
>>
>> Our best current plan is more widespread testing. We will file a Radar
>> bug as soon as we have more information about the nature of the
>> failure -- by virtue of working around the bugs. If we knew the
>> precise hardware configuration of the bots, including graphics cards,
>> that would help.
>
> Mark Rowe or Stephanie Lewis would probably know the exact configurations, I
> will see if one of them can get you the data tomorrow.
>
>
>>
>>>> Again, I apologize for the breakage. It would be best for everyone, I
>>>> think, if we got the tree to a green state and all committed through
>>>> the queue, thereby having a line of defense against unexpected test
>>>> failures on the bots.
>>>
>>> If we really want everyone to use the commit queue for most normal work,
>>> we
>>> really have to fix it so that it puts a meaningful value in SVN's
>>> committer
>>> field.
>>>
>>> That being said, the mechanism I'd really like to see first is better
>>> notification of when the bot goes red (I suspect a number of people
>>> involved
>>> in today's redness didn't notice right away because there is no active
>>> notification system).
>>
>> For what it's worth, the tree was red before I committed due to at
>> least two or three flaky tests which have been showing up for days. An
>> active notification system would not have helped here.
>
> The SnowLeopard bot went from 3 failures to over 20 with your commit. The
> Leopard Debug bot went from 0 failures  to 4. I would think those are the
> kinds of events that a computer program could detect and report.
>
> (I also suspect Alexey hadn't noticed that his new tests were failing on the
> Gtk and Qt bots, and Daniel didn't notice that his was failing on Windows.)

(FYI, rather than try to work around the driver bugs on the fly, we
are going to revert the original patch and re-apply it later. See
https://bugs.webkit.org/show_bug.cgi?id=36233 .)

The basic problem was that I committed the patch by hand, not that
there wasn't prompt warning; at least three people contacted me within
minutes of the patch landing indicating that there was test breakage
on the bots. If the commit queue had processed it, the patch would not
have reached the tree in the first place because the test failures
would have blocked it. Unfortunately, the commit queue was blocked at
the time because of other failing tests.

>> I think what
>> should be done is to get the tree green by skipping these flaky tests;
>> file high-priority bugs against the test authors to fix the flakiness;
>> and then figure out a way the commit queue can be used for the vast
>> majority of patches. (A secure repository of committer
>> username/passwords on the machine actually executing the commit?)
>
> Were the tests flaky because they were badly designed new tests or because
> some patch caused them to start flaking out? It does seem like the
> SnowLeopard Release bot, at least, has been failing for some time. I've
> looked back to Monday morning (which seems to be as far back as we have
> data) and could not find two consecutive green builds of SnowLeopard Release
> tests. (Leopard bots, on the other hand, seem to be green going pretty far
> back, which makes me wonder if there might be something wrong with the SL
> build slaves.)

As far as I know they are longstanding tests. I've seen a couple of
their names show up before when intermittent failures caused patches
of mine to fail the commit queue. websocket/tests/frame-lengths.html
is one.

-Ken