[webkit-qt] The bot infrastructure and gardening.

Thu May 10 15:07:02 PDT 2012

reactions inlined

On 05/10/2012 05:01 PM, Osztrogonac Csaba wrote:
> Hi All,
>
> Alexis Menard írta:
>> Hi,
>>
>> By reading the email of Simon about removing Qt4 I have seen there was
>> plan to move to Amazon EC2.
>>
>> State of art of gardening Qt :
>>
>> - Mostly Ossy alone is gardener, which is unacceptable. Apple made a
>> move towards improving their bots (when you see kling gardening it
>> tells you they changed something), Google is already pretty good, GTK
>> also, we need to be better. While at the summit people praised Qt bots
>> being green all the time I do think it hides a terrible truth : our
>> skiplist grows grows grows and nobody look after it which conflicts a
>> bit with trying to release a stable trunk for Qt 5.0. How many mails
>> we receive from Ossy complaining about quality?
>
> It's not exactly true that I'm the only one gardener. I'm working on
> gardening with my buildbot group together. They are part time developers,
> because they are students and have courses, exams, etc. Most of them
> aren't WebKit committers yet, so their gardening patches are usually
> committed by me or anybody else from here. ( But you can find their
> names in the commit logs, changelogs, of course. :) )
>
> With the other thing I have to agree, with this small group we have 
> resource
> (enough time) for only fire-fighting: detect who/which commit broke 
> which tests,
> update expected results if it is needed, filing bug reports, 
> commenting bugs,
> buildfixes, etc. We don't have enough time to fix all bugs instead of 
> who caused.
>
> But gardening is so hard if most of the developer don't care with QA 
> at all.  When
> I comment a bug with "your patch broke X.Y. layout/API test, diff: 
> ...", I regulary
> get the question: "How can I run this test?" And it isn't good, 
> because it means
> this developer never run tests before. (But everybody should before 
> commiting.)
> Other problem is that many developers insist on their buggy patch 
> being in
> trunk, but they don't care fixing the bug. In this case we can only do 
> that
> we skip the new failing tests, because red bots with many failing 
> tests would
> make catching new regression much more complex, sometimes impossible. But
> in my opinion rolling-out a buggy patch and reland after fixing it would
> cause less pain for everybody than growing, growing and growing skiplist.
> I don't know why folks hate rolling out patches. It doesn't mean that the
> patch is wrong at all. It isn't a capital sentence for the patch or 
> the author. :)
> It only means that the patch caused some trouble/regression an should 
> be fixed.
> And fixing offline is less painful for others than leaving buggy patch 
> in trunk.
> Chromium guys usually rollout their own patches if they broke a test 
> on the Qt
> bot before I noticed. Really. We should follow their good practice. ;)
>
>> - Their is a huge delta machine wise with what the bot is running and
>> what people use to develop. The bot runs Ubuntu, many of us run
>> ArchLinux/OpenSuse while some us run Ubuntu. It leads to results
>> different from what the bot produce and what you see and your machine.
>> We have encountered many many many times people saying : "it passes on
>> my machine but not on the bot" -> Added to the Skiplist because nobody
>> can really see what's going with the bot. Szeged tried their best to
>> provide a virtual machine but it was a bit of a failure as the VM
>> doesn't behave the same as the bot, and the VM behave differently
>> whether your run it on VMWare or VirtualBox.
>
> Unfortunately the VMWare image wasn't the best solution. And then we
> created a meta package for Ubuntu 11.10 which installs all dependency:
> https://launchpad.net/~u-szeged/+archive/sedkit
> With this meta package you can install a full QtWebKit development 
> environment in an hour.
>
> Now the dircetion is moving to an Amazon Ubuntu image. But I think it 
> is still
> papering over the problem. It is _very good_ (but expensive) for 
> ensuring everybody
> can simple reproduce the bot results. But we don't develop for only 
> one platform.
> More platform show more hidden and maybe serious bugs. If your patch 
> works fine
> on the only one reference platform, it doesn't mean there isn't any 
> bug in it.
>
> The biggest problem is that folks who don't use Ubuntu 11.10 got 
> thousands of failing
> tests because of minor font differences. In this case the best 
> solution isn't that
> "I can't reproduce the results, so I won't run layout tests anymore." 
> It would be
> more valuable for the whole project if font(config) experts try to 
> make the WebKit,
> Qt, fontconfig or anything else to use same fonts. I don't know if it 
> is possible
> or not, I don't know anything about fonts. Is it possible somehow to 
> bundle a chosen
> fontconfig to Qt or to WebKit and use it for regression testing on all 
> distro instead
> of sweating because of different system fontconfig versions?

You are speaking about Linux, but it's not the only system where we want 
coverage.
For example on Mac fontconfig does not play a role in the font game. We 
could use
it, but than we would lose the coverage for the real use case. Btw, 
there is some light
in the dark land of fonts:
     - I have done some work to unify test results between Linux and 
Mac, hopefully
I could finish it in the near future.
     - In Ubutu 12.0, a strange bug have been fixed in freetype which 
made the Ahem
font produce wrong metrics (WidthXHeight=NxN+1 instead of NxN). Ahem is used
in a lot of tests in the css* directories. Currently our expectations 
are wrong, but if
we fix them these metrics will match across distros (everybody use the newer
freetype for a long time except our beloved, stable Debian :D )

>
>> - We don't have any gardening plan.
> Not only the missing gardening plan is the problem. In my
> opinion introducing contributing rules would be more important.
> For example:
>  - Developers should build the patch and run tests before committing.
>    (Or at least watch the bots after landing and fix/rollout quick if 
> something goes wrong)
>  - What should I do if I broke the build / a layout test / API test ?
>  - What should a gardener do if somebody doesn't care with the 
> regression he/she caused ?
>  - What should do the boss if somebody usually and intentionally hurt 
> the rules? :)

I have to protest a bit. As Ossy describes it, it's really simple and 
straightforward. When somebody
breaks a test than it means his patch is buggy and he should find the 
error in his changes, and
everything will be fine. In reality, this is not always the case. When 
you break a test, it could mean different things:

     1. you did it wrong
Obviously you need to fix your patch
     2. there is a bug in the system that you triggered somehow (with 
even a totally right change on it's own)
Of course the right thing to do is to investigate in the problem. But it 
could be very complex, maybe the bug exists
in a different subsystem that you don't know well. I don't think it is 
always possible to find the manpower to fight
with these bugs.
     3. there is a bug/imperfection in the test infrastructure that you 
triggered
Well, this is pretty annoying and relatively common. We should detect 
and solve these issues but it's not really fair
to stop a good patch to land until somebody fixes the tools. Note that 
working out of trunk upon your previous work is
possible but it's not fun because you have to struggle more with rebasing.
     4. you caused some change that is not really a bug
Like some pixel differences that the actual users could not even notice. 
I would say if you do such a change than
let's update the expectations, but it's not always possible since you 
cannot test your patch in each environment
where we want coverage. (And if you don't use Ubuntu or Debian you 
cannot even produce results locally for Linux-destop.)

After all, I think we should be careful about what rules we introduce. 
They should satisfy two requirement:
     - we have to keep them. not just the first week, not just the first 
month, but always. :)
     - they must not block the development too much. How cares if we are 
rock stable if we cannot follow the evolution of the web?!

I agree with Ossy in that we should allocate more efforts on bug fixing 
/ stabilisation but I don't agree that we should banish the
skip list once and forever. Actually there is no stable port of WebKit 
where the skip list is unused. I would say, let's try to find a better
balance between stability and the speed of development.

>
>
>> What could be improved :
>>
>> - We need to make a gardening plan. We can't be serious about making
>> web browsers/APIs without improving our coverage. I know we don't have
>> much resources but I think it should be ok to have one person doing it
>> for a week and then turn. Really it's a week maybe boring but it's
>> once every long time (almost one time every two-three months). This
>> will make Ossy more free to do something else so Ossy can go back
>> proper coding. I can make that list if people agree. Also it needs to
>> be enforced (maybe reviews could be the exception).
>
> Gardening isn't so simple that only one person can be done. It can be 
> enough
> for fire-fighting: buildfixes, updating expected files, reporting 
> bugs, fix
> some trivial bug. But isn't enough to fix all regression caused by others
> who aren't responsible at all or the regression occured on the part of 
> WebKit
> you don't know anything. Not to mention there are many complex tests, and
> there isn't trivial to decide if the new result is correct or not.
>
> I added our gardening timetable to this wiki:
> https://trac.webkit.org/wiki/QtWebKitBuildBots
>
> All new volunteers are very welcome. ;-) It would be great if you guys 
> in INdT
> could be join, you are near to PDT timezone. And handling problems 
> freshly is
> always simpler than waiting for hungarian morning and trying solve 
> dozens of
> new regressions, broken builds, assertions, flakey tests, ...
>
>> - We need to be able to test/stress/break the bot environment. Today
>> the fact that none of us can mess up with the bot make it hard to
>> reproduce the failures of the bot that you can't see on your machine.
>> While I do understand (and we don't want that) that Ossy doesn't give
>> us the key to the bot, we still need to have one to mess around. 
>
> We hacked too many times in the past to make layout test system be 
> able run
> more than one bot on the same 8-24 cores machine. But the limitation 
> is still
> for one linux user. We still have a strict limitation: An other user 
> trying to
> run tests on the same machine can kill all the bots, so now only one 
> user is
> allowed. In this case it isn't a good idea if anybody logs in and hacking
> something. When I have to do it, I'm very very careful, but sometimes I
> broke everything accidentally.

Not strictly in connection to your points but another infrastructural 
thing:
when will we able to run tests in parallel? Is it reliable right now? 
Could we
make it the default configuration of nrwt - except on bots, until it is 
really stable -
so folks were not have to know the command line switch by heart (as I 
know it's
not simple because you need to call the real nrwt and not the pearl 
wrapper and it's
slightly different). It would be much more fun to run the tests before 
uploading / landing
your patch if it were not run for years.

-kbalazs