[webkit-dev] Does NRWT let you indicate that a test should fail with a particular failure diff?

Fri Jul 1 15:37:09 PDT 2011

On Fri, Jul 1, 2011 at 3:24 PM, Darin Fisher <darin at chromium.org> wrote:
> On Fri, Jul 1, 2011 at 3:04 PM, Darin Adler <darin at apple.com> wrote:
>>
>> On Jul 1, 2011, at 2:54 PM, Dirk Pranke wrote:
>>
>> > Does that apply to -expected.txt files in the base directories, or just
>> > platform-specific exceptions?
>>
>> Base directories.
>>
>> Expected files contain output reflecting the behavior of WebKit at the
>> time the test was checked in. The expected result when we re-run a test.
>> Many expected files contain text that says “FAIL” in them. The fact that
>> these expected results are not successes, but rather expected failures does
>> not seem to me to be a subtle point, but one of the basic things about how
>> these tests are set up.
>
> Right, it helps us keep track of where we are, so that we don't regress, and
> only make forward progress.
>
>>
>> > I wonder how it is that I've been working (admittedly, mostly on
>> > tooling) in WebKit for more that two years and this is the first I'm hearing
>> > about this.
>>
>> I’m guessing it’s because you have been working on Chrome.
>>
>> The Chrome project came up with a different system for testing layered on
>> top of the original layout test machinery based on different concepts. I
>> don’t think anyone ever discussed that system with me; I was the one who
>> created the original layout test system, to help Dave Hyatt originally, and
>> then later the rest of the team started using it.
>
> The granular annotations (more than just SKIP) in test_expectations.txt was
> something we introduced back when Chrome was failing a large percentage of
> layout tests, and we needed a system to help us triage the failures.  It was
> useful to distinguish tests that crash from tests that generate bad results,
> for example.  We then focused on the crashing tests first.
> In addition, we wanted to understand how divergent we were from the standard
> WebKit port, and we wanted to know if we were failing to match text results
> or just image results.  This allowed us to measure our degree of
> incompatibility with standard WebKit.  We basically used this mechanism to
> classify differences that mattered and differences that didn't matter.
> I think that if we had just checked in a bunch of port-specific "failure"
> expectations as -expected files, then we would have had a hard time
> distinguishing failures we needed to fix for compat reasons from failures
> that were expected (e.g., because we have different looking form controls).
> I'm not sure if we are at a point now where this mechanism isn't useful, but
> I kind of suspect that it will always be useful.  Afterall, it is not
> uncommon for a code change to result in different rendering behavior between
> the ports.  I think it is valuable to have a measure of divergence between
> the various WebKit ports.  We want to minimize such divergence from a web
> compat point of view, of course.  Maybe the count of SKIPPED tests is
> enough?  But, then we suffer from not running the tests at all.  At least by
> annotating expected IMAGE failures, we get to know that the TEXT output is
> the same and that we don't expect a CRASH.

There's at least two reasons for divergence .. one is that the port is
actually doing the wrong thing, and the other is that the port is
doing the "right" thing but the output is different anyway (e.g., a
control is rendered differently). We cannot easily separate the two if
we have only a single convention (platform-specific -expected files),
but SKIPPING tests seems wrong for either category.

It seems like -failing gives you the control you would want, no?
Obviously, it wouldn't help the thousands of -expected files that are
"wrong" but at least it could keep things from getting worse.

I will note that reftests might solve some issues but not all of them
(since obviously code could render both pages "wrong").

-- Dirk

> I suspect this isn't the best solution to the problem though.
> -Darin
>
>
>>
>> > Are there reasons we [are] doing things this way[?]
>>
>> Sure. The idea of the layout test framework is to check if the code is
>> still behaving as it did when the test was created and last run; we want to
>> detect any changes in behavior that are not expected. When there are
>> expected changes in behavior, we change the contents of the expected results
>> files.
>>
>> It seems possibly helpful to augment the test system with editorial
>> comments about which tests show bugs that we’d want to fix. But I wouldn’t
>> want to stop running all regression tests where the output reflects the
>> effects of a bug or missing feature.
>>
>>    -- Darin
>>
>
>
>