[webkit-dev] Commit Queue Outage

Adam Barth abarth at webkit.org
Fri Apr 15 10:50:36 PDT 2011


While we're discussing commit-queue infrastructure changes, I should
mention that we're thinking about moving some or all of the
commit-queue nodes over to EC2.  Moving to EC2 means can use faster
machines and we can scale up or down the number of machines easily.
Currently, we run the commit-queue nodes on Mac Minis, which means a
fully cycle to land a patch takes about an hour.  If you do the math,
even with our six machines, we're running a near capacity, which is
why small hiccups lead to large backlogs and high latency.

Ideally, we'd continue to use the Mac port on EC2, but licensing
restrictions prevent us from running Mac OS X on EC2.  That means we'd
likely need to switch the commit-queue over to using one of the Linux
ports.  The consequences of switching ports are somewhat difficult to
foresee, which is why we're being cautious.  We're building the
infrastructure now, and we'll probably start experimenting with one or
two EC2 nodes in the coming weeks.  As always, we'll keep you updated
with any developments.

Thanks,
Adam


On Fri, Apr 15, 2011 at 9:24 AM, Eric Seidel <eric at webkit.org> wrote:
> CQ backlog is cleared.
>
> Recent CQ changes of note:
> 1.  The CQ now runs with --exit-after-N-failures=10 instead of 1.
> 2.  The CQ now knows how to upload layout-test-results.zip files when
> tests fail during a commit run.
> 3.  The CQ can now land when the tree is red with up to 9 failures.
> (Keeps a list of failures detected from a clean-tree build.  It
> ignores any failures seen from that list, but aggressively remove
> tests from that list if they ever pass.)
> 4.  Now that we're continuing after the first failure, we've found
> that flaky-tests are *much* worse than previously understood.  Flaky
> tests also defeat the new land-while-red behavior, which is
> doubly-bad. :(
> You can see a list of flaky tests found by the CQ here:
> https://bugs.webkit.org/show_bug.cgi?id=50856
>
> If you see any troubles with the CQ please do let me or abarth know.
> Filling a bug and CCing one or more of us is best.
>
> Thanks!
>
> -eric
>
> On Thu, Apr 14, 2011 at 9:03 PM, Eric Seidel <eric at webkit.org> wrote:
>> Service is restored.  The backlog should be cleared by morning.
>>
>> Thanks for your patience.
>>
>> -eric
>>
>> On Thu, Apr 14, 2011 at 2:52 PM, Eric Seidel <eric at webkit.org> wrote:
>>> The cq cluster is sick at the moment, after some changes I made
>>> yesterday to teach it how to land when the tree is red with known
>>> failures.
>>>
>>> I'm working on bringing it back on line.
>>>
>>> On the bright side, as of this morning queues.webkit.org will show you
>>> how long its been offline. :)
>>> http://queues.webkit.org/queue-status/commit-queue
>>>
>>> Right now it will tell you it hasn't landed ("Last Pass") a patch in 8 hours. :(
>>>
>>> Sorry for the inconvenience.  I expect the queue to be back working
>>> (and having cleared its backlog) by tomorrow morning.
>>>
>>> -eric
>>>
>>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>


More information about the webkit-dev mailing list