[webkit-dev] Best practices for landing new/changed layout test expectations?

Mon Feb 25 17:15:59 PST 2013

On Feb 25, 2013, at 4:27 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:

> On Mon, Feb 25, 2013 at 3:36 PM, Glenn Adams <glenn at skynav.com> wrote:
> 
> On Mon, Feb 25, 2013 at 3:53 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
> On Mon, Feb 25, 2013 at 2:48 PM, Dirk Pranke <dpranke at google.com> wrote:
> On Mon, Feb 25, 2013 at 2:05 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
> > On Mon, Feb 25, 2013 at 1:39 PM, Dirk Pranke <dpranke at google.com> wrote:
> >> Are you suggesting we should land a "failling" baseline in the meantime?
> >
> >
> > No. I'm suggesting patch authors perform their due diligence and either ask
> > port maintainers to rebaseline or rebaseline tests themselves.
> >
> 
> I think either you misunderstood my question, or I am misunderstanding
> your answer. I'm not asking "who", I'm asking "what" ...
> 
> If we know some tests are failing, and when we fix a bug the tests
> will start passing again (but that patch might not land for quite some
> time), what should we (anyone) do in the meantime? Leave the tree red,
> land "incorrect" -expected baselines so that we can catch changes in
> behavior, or add lines to TestExpectations? Many of the lines you
> cited fell into the last category.
> 
> Either one of those two solutions would work (although I strictly advice we do the latter) when there are failing tests and we need a fix in order for those tests to pass but I'm not interested in discussing that matter at the moment.
> 
> I'm specifically opposed to adding new entries to TestExpectations for the sole purpose of rebaselining them later.
> 
> One of the modes that the new generic TE file permits is to specify a generic Skip and overriding this with per-port Pass. This provides (IMO) a more manageable way to land a patch that you expect will require per-port mods to fully get the patch working on the primary ports. It also allows one to land a patch containing tests where a subsequent patch will provide the functionality to be tested.
> 
> I recently used both of these modes to sub-divide a large patch and to incrementally verify (landing individual per-port fixes as needed) individual ports.
> 
> Personally, I prefer this over the "just turn bots red" mode. I suspect the turn bots red approach results in greater churn, due to rollouts and re-landing, than the finer grain approach I outlined above. I agree it requires the committer to follow through on other ports and eventually remove the generic Skip and Pass rules (once it works everywhere), but we have to assume here (IMO) that committers will do the right thing and take responsibility for the process. [And if not, then remind them.]
> 
> So I for one, would vote against the "turn the bots red" approach. Overall, I think that approach produces a higher interrupt load on our limited committer and reviewer resources. I think it would be useful to give other procedures an opportunity to be tried out so we can have more data for choosing between alternative approaches.
> 
> This has been tried as many people including yourself are using this approach in landing and rebaselining patches. In particular, a lot of new contributors from Chromium port use this strategy.
> 
> When I used to work for Google, I periodically went through all entries in TestExpectations files that needed rebaselines and manually rebaselined them myself because people just leave entries like I've cited in another email all the time.
> 
> Quite frankly, I don't want it to be my (or anyone but patch author's) job to take care of all these stale entries people add.

The problem of Chromium in the early days was that all tests were rebaselined when they failed. Too many tests turned the bots red at this time and the Chromium folks didn't (couldn't?) take attention of all these failing tests. We lost the chance to fix a lot of these bugs immediately during this time. It took months to cover these tests in bug reports and fix them finally. Some tests may still report that everything is ok, even if they are failing - untracked. I see the current "needs rebaseline" approach better, since the tests are at least logged. I do not have an opinion how they should be logged.

Greetings,
Dirk

> 
> - R. Niwa
> 
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev