[webkit-dev] Does NRWT let you indicate that a test should fail with a particular failure diff?

Fri Jul 1 15:24:51 PDT 2011

On Fri, Jul 1, 2011 at 3:04 PM, Darin Adler <darin at apple.com> wrote:

> On Jul 1, 2011, at 2:54 PM, Dirk Pranke wrote:
>
> > Does that apply to -expected.txt files in the base directories, or just
> platform-specific exceptions?
>
> Base directories.
>
> Expected files contain output reflecting the behavior of WebKit at the time
> the test was checked in. The expected result when we re-run a test. Many
> expected files contain text that says “FAIL” in them. The fact that these
> expected results are not successes, but rather expected failures does not
> seem to me to be a subtle point, but one of the basic things about how these
> tests are set up.
>

Right, it helps us keep track of where we are, so that we don't regress, and
only make forward progress.

>
> > I wonder how it is that I've been working (admittedly, mostly on tooling)
> in WebKit for more that two years and this is the first I'm hearing about
> this.
>
> I’m guessing it’s because you have been working on Chrome.
>
> The Chrome project came up with a different system for testing layered on
> top of the original layout test machinery based on different concepts. I
> don’t think anyone ever discussed that system with me; I was the one who
> created the original layout test system, to help Dave Hyatt originally, and
> then later the rest of the team started using it.
>

The granular annotations (more than just SKIP) in test_expectations.txt was
something we introduced back when Chrome was failing a large percentage of
layout tests, and we needed a system to help us triage the failures.  It was
useful to distinguish tests that crash from tests that generate bad results,
for example.  We then focused on the crashing tests first.

In addition, we wanted to understand how divergent we were from the standard
WebKit port, and we wanted to know if we were failing to match text results
or just image results.  This allowed us to measure our degree of
incompatibility with standard WebKit.  We basically used this mechanism to
classify differences that mattered and differences that didn't matter.

I think that if we had just checked in a bunch of port-specific "failure"
expectations as -expected files, then we would have had a hard time
distinguishing failures we needed to fix for compat reasons from failures
that were expected (e.g., because we have different looking form controls).

I'm not sure if we are at a point now where this mechanism isn't useful, but
I kind of suspect that it will always be useful.  Afterall, it is not
uncommon for a code change to result in different rendering behavior between
the ports.  I think it is valuable to have a measure of divergence between
the various WebKit ports.  We want to minimize such divergence from a web
compat point of view, of course.  Maybe the count of SKIPPED tests is
enough?  But, then we suffer from not running the tests at all.  At least by
annotating expected IMAGE failures, we get to know that the TEXT output is
the same and that we don't expect a CRASH.

I suspect this isn't the best solution to the problem though.

-Darin

>
> > Are there reasons we [are] doing things this way[?]
>
> Sure. The idea of the layout test framework is to check if the code is
> still behaving as it did when the test was created and last run; we want to
> detect any changes in behavior that are not expected. When there are
> expected changes in behavior, we change the contents of the expected results
> files.
>
> It seems possibly helpful to augment the test system with editorial
> comments about which tests show bugs that we’d want to fix. But I wouldn’t
> want to stop running all regression tests where the output reflects the
> effects of a bug or missing feature.
>
>    -- Darin
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20110701/3fdf3c64/attachment.html>