[webkit-dev] Does NRWT let you indicate that a test should fail with a particular failure diff?

Fri Jul 1 15:56:08 PDT 2011

On Fri, Jul 1, 2011 at 3:37 PM, Dirk Pranke <dpranke at chromium.org> wrote:

> On Fri, Jul 1, 2011 at 3:24 PM, Darin Fisher <darin at chromium.org> wrote:
> > On Fri, Jul 1, 2011 at 3:04 PM, Darin Adler <darin at apple.com> wrote:
> >>
> >> On Jul 1, 2011, at 2:54 PM, Dirk Pranke wrote:
> >>
> >> > Does that apply to -expected.txt files in the base directories, or
> just
> >> > platform-specific exceptions?
> >>
> >> Base directories.
> >>
> >> Expected files contain output reflecting the behavior of WebKit at the
> >> time the test was checked in. The expected result when we re-run a test.
> >> Many expected files contain text that says “FAIL” in them. The fact that
> >> these expected results are not successes, but rather expected failures
> does
> >> not seem to me to be a subtle point, but one of the basic things about
> how
> >> these tests are set up.
> >
> > Right, it helps us keep track of where we are, so that we don't regress,
> and
> > only make forward progress.
> >
> >>
> >> > I wonder how it is that I've been working (admittedly, mostly on
> >> > tooling) in WebKit for more that two years and this is the first I'm
> hearing
> >> > about this.
> >>
> >> I’m guessing it’s because you have been working on Chrome.
> >>
> >> The Chrome project came up with a different system for testing layered
> on
> >> top of the original layout test machinery based on different concepts. I
> >> don’t think anyone ever discussed that system with me; I was the one who
> >> created the original layout test system, to help Dave Hyatt originally,
> and
> >> then later the rest of the team started using it.
> >
> > The granular annotations (more than just SKIP) in test_expectations.txt
> was
> > something we introduced back when Chrome was failing a large percentage
> of
> > layout tests, and we needed a system to help us triage the failures.  It
> was
> > useful to distinguish tests that crash from tests that generate bad
> results,
> > for example.  We then focused on the crashing tests first.
> > In addition, we wanted to understand how divergent we were from the
> standard
> > WebKit port, and we wanted to know if we were failing to match text
> results
> > or just image results.  This allowed us to measure our degree of
> > incompatibility with standard WebKit.  We basically used this mechanism
> to
> > classify differences that mattered and differences that didn't matter.
> > I think that if we had just checked in a bunch of port-specific "failure"
> > expectations as -expected files, then we would have had a hard time
> > distinguishing failures we needed to fix for compat reasons from failures
> > that were expected (e.g., because we have different looking form
> controls).
> > I'm not sure if we are at a point now where this mechanism isn't useful,
> but
> > I kind of suspect that it will always be useful.  Afterall, it is not
> > uncommon for a code change to result in different rendering behavior
> between
> > the ports.  I think it is valuable to have a measure of divergence
> between
> > the various WebKit ports.  We want to minimize such divergence from a web
> > compat point of view, of course.  Maybe the count of SKIPPED tests is
> > enough?  But, then we suffer from not running the tests at all.  At least
> by
> > annotating expected IMAGE failures, we get to know that the TEXT output
> is
> > the same and that we don't expect a CRASH.
>
> There's at least two reasons for divergence .. one is that the port is
> actually doing the wrong thing, and the other is that the port is
> doing the "right" thing but the output is different anyway (e.g., a
> control is rendered differently). We cannot easily separate the two if
> we have only a single convention (platform-specific -expected files),
> but SKIPPING tests seems wrong for either category.
>
> It seems like -failing gives you the control you would want, no?
> Obviously, it wouldn't help the thousands of -expected files that are
> "wrong" but at least it could keep things from getting worse.
>
> I will note that reftests might solve some issues but not all of them
> (since obviously code could render both pages "wrong").
>
> -- Dirk
>
>
I'm not sure.  It makes me a bit uneasy adding even more heft to the
LayoutTests directory.

-Darin

> > I suspect this isn't the best solution to the problem though.
> > -Darin
> >
> >
> >>
> >> > Are there reasons we [are] doing things this way[?]
> >>
> >> Sure. The idea of the layout test framework is to check if the code is
> >> still behaving as it did when the test was created and last run; we want
> to
> >> detect any changes in behavior that are not expected. When there are
> >> expected changes in behavior, we change the contents of the expected
> results
> >> files.
> >>
> >> It seems possibly helpful to augment the test system with editorial
> >> comments about which tests show bugs that we’d want to fix. But I
> wouldn’t
> >> want to stop running all regression tests where the output reflects the
> >> effects of a bug or missing feature.
> >>
> >>    -- Darin
> >>
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20110701/f61e747f/attachment.html>