[webkit-dev] A proposal for handling "failing" layout tests and TestExpectations

Thu Aug 16 14:05:05 PDT 2012

On Wed, Aug 15, 2012 at 5:19 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
> I have a concern that a lot of people wouldn't know what the "correct"
> output is for a given test.
>
> For a lot of pixel tests, deciding whether a given output is correct or not
> is really hard. e.g. some seemingly insignificant anti-alias different may
> turn out be a result of a bug in Skia and other graphics library or WebCore
> code that uses it.
>
> As a result,
>
> people may check in wrong "correct" results
> people may add "failing" results even though new results are more or less
> correct
>
>
> This leads me to think that just checking in the current output as
> -expected.png and filing bugs separately is a better approach.
>

I think your observations are correct, but at least my experience as a
gardener/sheriff leads me to a different conclusion. Namely, when I'm
looking at a newly failing test, it is difficult if not impossible for
me to know if the existing baseline was previously believed to be
correct or not, and thus it's hard for me to tell if the new baseline
should be considered worse, better, or different. In theory I could go
look at the changelog for each test, but I would be skeptical if that
had enough useful information (I would expect most comments to be
along the lines of "rebaselining after X" with no indication if the
output is correct or not. This is just a theory, though.

This is why I want to test this theory :). It seems like if we got
experience with this on one (or more) ports for a couple of months we
would have a much more well-informed opinion, and I'm not seeing a
huge downside to at least trying this idea out.

-- Dirk