[webkit-dev] Best practices for landing new/changed layout test expectations?

Tue Feb 26 14:34:25 PST 2013

On Tue, Feb 26, 2013 at 1:03 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
> On Tue, Feb 26, 2013 at 12:47 PM, Dirk Pranke <dpranke at chromium.org> wrote:
>>
>> On Tue, Feb 26, 2013 at 2:11 AM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>> > On Tue, Feb 26, 2013 at 1:55 AM, Tom Hudson <tomhudson at google.com>
>> > wrote:
>> >>
>> >> On Mon, Feb 25, 2013 at 10:34 PM, Ryosuke Niwa <rniwa at webkit.org>
>> >> wrote:
>> >>>
>> >>> It should be fairly straight forward to create a tool that analyzes
>> >>> files
>> >>> changed in each commit and deduce which tests' expected results have
>> >>> been
>> >>> changed. The tool can then fetch results from each port' bot for those
>> >>> tests
>> >>> and automatically land them. It can then comment on the bug
>> >>> automatically
>> >>> about these rebaseline commits. There is no need to add & remove
>> >>> entries
>> >>> from TestExpectation files.
>>
>> Wait, what?
>>
>> For some reason neither I nor the mailing list archives got your
>> initial message, nor  Silvia or Tom's responses, nor your responses
>> (at least as of the time of me writing this), so I feel like I've
>> missed a radical shift in this thread, and maybe I missed some of the
>> context.
>
>
> https://lists.webkit.org/pipermail/webkit-dev/2013-February/023967.html
>

This link doesn't point to any of those messages, but perhaps it's not
that important.

>> You're proposing that we automatically land updated baselines without
>> review and then somehow update bugs, have people go back and look at
>> the updated bugs to see if the baseline changes represent actual
>> regressions or just expected changes?
>
>
> Right. Given that the commit already contains information as to which tests
> have been rebaselined, a script should be able to fetch new baselines for
> those affected tests on each platform and land them or upload as patches as
> needed.
>

It's possible that we could fetch and cluster new baselines based on
what changed in the initial commit. I would be concerned that there
could be a fair amount of noise in either direction (tests that
changed on the initial platform didn't on others, and others did), and
you'd also have to figure out how to cluster changes since most builds
on the bots contain multiple changes. But, you could probably use some
of garden-o-matic's results to help here.

That said, I'm not sure this workflow would actually improve things
much over garden-o-matic.

I am quite a bit more reluctant to automatically land any such
changes; it seems like it would be hard if not impossible to tell
(programmatically) whether a baseline changed as expected or if it
represented a regression.

If we were to work on new tooling, I would be much more in favor of
pushing this up to an EWS-time step like Ossy suggests.

-- Dirk