[webkit-dev] A proposal for handling "failing" layout tests and TestExpectations

Fri Aug 17 11:29:55 PDT 2012

On Fri, Aug 17, 2012 at 11:06 AM, Dirk Pranke <dpranke at chromium.org> wrote:

>  > On the other hand, the pixel test output that's correct to one expert
> may
> > not be correct to another expert. For example, I might think that one
> > editing test's output is correct because it shows the feature we're
> testing
> > in the test is working properly. But Stephen might realizes that this
> > -expected.png contains off-by-one Skia bug. So categorizing -correct.png
> and
> > -failure.png may require multiple experts looking at the output, which
> may
> > or may not be practical.
>
> Perhaps. Obviously (a) there's a limit to what you can do here, and
> (b) a test that requires multiple experts to verify its correctness
> is, IMO, a bad test :).
>

With that argument, almost all pixel tests are bad tests because pixel
tests in editing, for example, involve editing, rendering, and graphics
code. I don't think any single person can comprehend the entire stack to
tell with a 100% confidence that the test result is exactly and precisely
correct.

>  I think we should just check-in whatever result we're
> > currently seeing as -expected.png because we wouldn't at least have any
> > ambiguity in the process then. We just check in whatever we're currently
> > seeing and file a bug if we see a problem with the new result and
> possibly
> > rollout the patch after talking with the author/reviewer.
>
> This is basically saying we should just follow the "existing
> non-Chromium" process, right?

Yes. In addition, doing so will significantly reduce the complexity of the
current process.

This would seem to bring us back to step
> 1: it doesn't address the problem that I identified with the "existing
> non-Chromium" process, namely that a non-expert can't tell by looking
> at the checkout what tests are believed to be passing or not.

What is the use case of this? I've been working on WebKit for more than 3
years, and I've never had to think about whether a test for an area outside
of my expertise has the correct output or not other than when I was
gardening. And having -correct / -failing wouldn't have helped me knowing
what the correct output when I was gardening anyway because the new output
may as well as be new -correct or -failing result.

I don't think searching bugzilla (as it is currently used) is a
> workable alternative.
>

Why not? Bugzilla is the tool we use to triage and track bugs. I don't see
a need for an alternative method to keep track of bugs.

 > The new result we check in may not be 100% right but experts — e.g. me
> for
> > editing and Stephen for Skia — can come in and look at recent changes to
> > triage any new failures.
> >
> > In fact, it might be worthwhile for us to invest our time in improving
> tools
> > to do this. For example, can we have a portal where I can see new
> > rebaselines that happened in LayoutTests/editing and
> > LayoutTests/platform/*/editing since the last time I visited the portal?
> > e.g. it can show chronological timeline of baselines along with a
> possible
> > cause (list of changesets maybe?) of the baseline.
>
> We could build such a portal, sure. I would be interested to hear from
> others whether such a thing would be more or less useful than my
> proposal.
>
> Of course, you could just set up a watchlist for new expectations
> today. Probably not quite as polished as we could get with a portal,
> but dirt simple ..
>

That might be useful as long as it has an option to give us a digest
instead of sending me an e-mail per commit.

- Ryosuke
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120817/cbb9e233/attachment.html>