[webkit-dev] Does NRWT let you indicate that a test should fail with a particular failure diff?

Sun Jul 3 22:07:25 PDT 2011

> There's at least two reasons for divergence .. one is that the port is
> actually doing the wrong thing, and the other is that the port is
> doing the "right" thing but the output is different anyway (e.g., a
> control is rendered differently). We cannot easily separate the two if
> we have only a single convention (platform-specific -expected files),
> but SKIPPING tests seems wrong for either category.

Yes. I think separating the two categories is important. But we can do
it without -failing files.
1. the port is doing the "right" thing but the output is different anyway.
    We can 'rebaseline' these tests. ('rebaseline' means check in the
-expected files)
2. the port is actually doing the wrong thing
    We should NOT 'rebaeline' them. Instead, we should add them into
test_expectations.txt with a bug number. We can easily track all
failures we have at the specific time by just seeing
test_expectations.txt, and opening the related bug if we want to see
more detailed description.

Both things can be done under current test framework. Adding -failing
files will make the huge layout tests effort even more complicated.
Anyway, we only want to know which tests are failing, but not to what
extent do they fail. If we want to know that, it means our tests are
not reduced to the proper scale. Of course there are many 'big' tests,
like acid tests, but I think the potential problems covered by these
tests can also be covered by other small tests; if that's not the
case, we just need to add some more small tests. So IMO -failing files
are not necessary.

> It seems like -failing gives you the control you would want, no?
> Obviously, it wouldn't help the thousands of -expected files that are
> "wrong" but at least it could keep things from getting worse.
>

How to correct thousands of the wrong files is really a big problem...

On Sat, Jul 2, 2011 at 6:37 AM, Dirk Pranke <dpranke at chromium.org> wrote:
> On Fri, Jul 1, 2011 at 3:24 PM, Darin Fisher <darin at chromium.org> wrote:
>> On Fri, Jul 1, 2011 at 3:04 PM, Darin Adler <darin at apple.com> wrote:
>>>
>>> On Jul 1, 2011, at 2:54 PM, Dirk Pranke wrote:
>>>
>>> > Does that apply to -expected.txt files in the base directories, or just
>>> > platform-specific exceptions?
>>>
>>> Base directories.
>>>
>>> Expected files contain output reflecting the behavior of WebKit at the
>>> time the test was checked in. The expected result when we re-run a test.
>>> Many expected files contain text that says “FAIL” in them. The fact that
>>> these expected results are not successes, but rather expected failures does
>>> not seem to me to be a subtle point, but one of the basic things about how
>>> these tests are set up.
>>
>> Right, it helps us keep track of where we are, so that we don't regress, and
>> only make forward progress.
>>
>>>
>>> > I wonder how it is that I've been working (admittedly, mostly on
>>> > tooling) in WebKit for more that two years and this is the first I'm hearing
>>> > about this.
>>>
>>> I’m guessing it’s because you have been working on Chrome.
>>>
>>> The Chrome project came up with a different system for testing layered on
>>> top of the original layout test machinery based on different concepts. I
>>> don’t think anyone ever discussed that system with me; I was the one who
>>> created the original layout test system, to help Dave Hyatt originally, and
>>> then later the rest of the team started using it.
>>
>> The granular annotations (more than just SKIP) in test_expectations.txt was
>> something we introduced back when Chrome was failing a large percentage of
>> layout tests, and we needed a system to help us triage the failures.  It was
>> useful to distinguish tests that crash from tests that generate bad results,
>> for example.  We then focused on the crashing tests first.
>> In addition, we wanted to understand how divergent we were from the standard
>> WebKit port, and we wanted to know if we were failing to match text results
>> or just image results.  This allowed us to measure our degree of
>> incompatibility with standard WebKit.  We basically used this mechanism to
>> classify differences that mattered and differences that didn't matter.
>> I think that if we had just checked in a bunch of port-specific "failure"
>> expectations as -expected files, then we would have had a hard time
>> distinguishing failures we needed to fix for compat reasons from failures
>> that were expected (e.g., because we have different looking form controls).
>> I'm not sure if we are at a point now where this mechanism isn't useful, but
>> I kind of suspect that it will always be useful.  Afterall, it is not
>> uncommon for a code change to result in different rendering behavior between
>> the ports.  I think it is valuable to have a measure of divergence between
>> the various WebKit ports.  We want to minimize such divergence from a web
>> compat point of view, of course.  Maybe the count of SKIPPED tests is
>> enough?  But, then we suffer from not running the tests at all.  At least by
>> annotating expected IMAGE failures, we get to know that the TEXT output is
>> the same and that we don't expect a CRASH.
>
> There's at least two reasons for divergence .. one is that the port is
> actually doing the wrong thing, and the other is that the port is
> doing the "right" thing but the output is different anyway (e.g., a
> control is rendered differently). We cannot easily separate the two if
> we have only a single convention (platform-specific -expected files),
> but SKIPPING tests seems wrong for either category.
>
> It seems like -failing gives you the control you would want, no?
> Obviously, it wouldn't help the thousands of -expected files that are
> "wrong" but at least it could keep things from getting worse.
>
> I will note that reftests might solve some issues but not all of them
> (since obviously code could render both pages "wrong").
>
> -- Dirk
>
>> I suspect this isn't the best solution to the problem though.
>> -Darin
>>
>>
>>>
>>> > Are there reasons we [are] doing things this way[?]
>>>
>>> Sure. The idea of the layout test framework is to check if the code is
>>> still behaving as it did when the test was created and last run; we want to
>>> detect any changes in behavior that are not expected. When there are
>>> expected changes in behavior, we change the contents of the expected results
>>> files.
>>>
>>> It seems possibly helpful to augment the test system with editorial
>>> comments about which tests show bugs that we’d want to fix. But I wouldn’t
>>> want to stop running all regression tests where the output reflects the
>>> effects of a bug or missing feature.
>>>
>>>    -- Darin
>>>
>>
>>
>>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>