Fuzzy Reftest Plans, and Metadata Locations

older
-Wreturn-type and -Wredundant-move...

Sam Sneddon

28 Oct 2021 28 Oct '21

5:24 p.m.

Hi! As part of the ongoing work on GPU Process, we’re interested in adding support for reftest fuzzy matching (i.e., allowing a certain amount of tolerance when comparing the generated images). Our intention is to match the semantics of WPT’s reftests (https://web-platform-tests.org/writing-tests/reftests.html#fuzzy-matching): <meta name=fuzzy content="maxDifference=15;totalPixels=300"> There are cases where we’ll want to apply these to the tests unconditionally, for example where varying behaviour is expected across ports (such as anti-aliasing differences), and in these cases for WPT tests these annotations should probably be exported upstream. The current plan, and work is underway to do this, is to support this syntax via parsing the HTML in Python when there is a hash mismatch, which should minimise the performance impact versus always reading this metadata. However, this doesn’t entirely suffice. There are cases where we might want to allow more tolerance on one platform or another, or vary based on GPU model or driver. As such, this requires not only platform specific metadata (i.e., similar to that which we have in TestExpectations files today), but also expectations with finer granularity. As such I think there are a few options here: One option is to extend the meta content to encode conditional variants, though this doesn’t work for WPT tests (unless we get buy-in to upstream these annotations into the upstream repo, though that might be desirable for the sake of results on wpt.fyi). We would need to be confident that this wouldn’t become unwieldy however; we wouldn’t want to end up with something like (if:port=Apple)maxDifference=1;totalPixels=10,(if:platform=iOS)maxDifference=10;totalPixels=20,(if:port=GTK)maxDifference=10;totalPixels=300. Another option is to extend TestExpectations to store more specific data (though again this might become unwieldy, as we’re unlikely to add new “platforms” based on every variable we might want to distinguish results on). This also means the metadata is far away from the test itself, and the TestExpectations files would continue to grow even further (and we already have 34k lines of TestExpectations data!). TestExpectations is also a rather horrible file format to modify the parser of. There is also test-options.json which has most of the same downsides as TestExpectations, albeit without the pain in modifying the parser. Finally, we could add per-test or per-directory files alongside the tests. (Due to how things work, these could presumably also be in directories in platform/.) This I think is probably the best option as it keeps the metadata near the test, without needing to modify the test (which, per above, is problematic for WPT as we move to automatically exporting changes). One could imagine either a __dir__-metadata.json (to use a similar name to how WPT names directory-level metadata files) or a -expected-fuzzy.json file alongside each test. Your opinions would be warmly welcomed! Thanks, Sam

Show replies by date

Myles Maxfield

30 Oct 30 Oct

12:20 a.m.

...

On Oct 28, 2021, at 10:24 AM, Sam Sneddon via webkit-dev <webkit-dev@lists.webkit.org> wrote:

Hi!

As part of the ongoing work on GPU Process, we’re interested in adding support for reftest fuzzy matching (i.e., allowing a certain amount of tolerance when comparing the generated images).

Our intention is to match the semantics of WPT’s reftests (https://web-platform-tests.org/writing-tests/reftests.html#fuzzy-matching): <meta name=fuzzy content="maxDifference=15;totalPixels=300"> There are cases where we’ll want to apply these to the tests unconditionally, for example where varying behaviour is expected across ports (such as anti-aliasing differences), and in these cases for WPT tests these annotations should probably be exported upstream.

The current plan, and work is underway to do this, is to support this syntax via parsing the HTML in Python when there is a hash mismatch, which should minimise the performance impact versus always reading this metadata.

However, this doesn’t entirely suffice. There are cases where we might want to allow more tolerance on one platform or another, or vary based on GPU model or driver. As such, this requires not only platform specific metadata (i.e., similar to that which we have in TestExpectations files today), but also expectations with finer granularity.

As such I think there are a few options here:

One option is to extend the meta content to encode conditional variants, though this doesn’t work for WPT tests (unless we get buy-in to upstream these annotations into the upstream repo, though that might be desirable for the sake of results on wpt.fyi). We would need to be confident that this wouldn’t become unwieldy however; we wouldn’t want to end up with something like (if:port=Apple)maxDifference=1;totalPixels=10,(if:platform=iOS)maxDifference=10;totalPixels=20,(if:port=GTK)maxDifference=10;totalPixels=300.

Another option is to extend TestExpectations to store more specific data (though again this might become unwieldy, as we’re unlikely to add new “platforms” based on every variable we might want to distinguish results on). This also means the metadata is far away from the test itself, and the TestExpectations files would continue to grow even further (and we already have 34k lines of TestExpectations data!). TestExpectations is also a rather horrible file format to modify the parser of.

There is also test-options.json which has most of the same downsides as TestExpectations, albeit without the pain in modifying the parser.

Finally, we could add per-test or per-directory files alongside the tests. (Due to how things work, these could presumably also be in directories in platform/.) This I think is probably the best option as it keeps the metadata near the test, without needing to modify the test (which, per above, is problematic for WPT as we move to automatically exporting changes). One could imagine either a __dir__-metadata.json (to use a similar name to how WPT names directory-level metadata files) or a -expected-fuzzy.json file alongside each test.

There’s a 4th option, which is one that we have used historically - make certain directories magic by hardcoding their paths in the test runner.

...

Your opinions would be warmly welcomed!

Thanks,

Sam _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev

Alejandro Garcia Castro

10:12 a.m.

On Thu, Oct 28, 2021 at 06:24:02PM +0100, Sam Sneddon via webkit-dev wrote:

...

Hi!

As part of the ongoing work on GPU Process, we’re interested in adding support for reftest fuzzy matching (i.e., allowing a certain amount of tolerance when comparing the generated images).

[...]

Finally, we could add per-test or per-directory files alongside the tests. (Due to how things work, these could presumably also be in directories in platform/.) This I think is probably the best option as it keeps the metadata near the test, without needing to modify the test (which, per above, is problematic for WPT as we move to automatically exporting changes). One could imagine either a __dir__-metadata.json (to use a similar name to how WPT names directory-level metadata files) or a -expected-fuzzy.json file alongside each test.

Your opinions would be warmly welcomed!

Thanks for working on this, I can not provide any feedback on the solutions, but with regard to the situations that we would like to solve in WPE/GTK we need it. Some refrests will use it because depending on the layer structure the blurring algorithm used can be different and we found some of those when we activated async scrolling in the tests (I think Apple is currently not testing it this way). I guess that means we need a number per test for the solution, because the tests can be in different directories. I hope this information helps! Alex

Ryosuke Niwa

5:45 p.m.

On Thu, Oct 28, 2021 at 10:24 AM Sam Sneddon via webkit-dev < webkit-dev@lists.webkit.org> wrote:

...

As part of the ongoing work on GPU Process, we’re interested in adding support for reftest fuzzy matching (i.e., allowing a certain amount of tolerance when comparing the generated images).

Our intention is to match the semantics of WPT’s reftests ( https://web-platform-tests.org/writing-tests/reftests.html#fuzzy-matching ): <meta name=fuzzy content="maxDifference=15;totalPixels=300"> There are cases where we’ll want to apply these to the tests unconditionally, for example where varying behaviour is expected across ports (such as anti-aliasing differences), and in these cases for WPT tests these annotations should probably be exported upstream.

The current plan, and work is underway to do this, is to support this syntax via parsing the HTML in Python when there is a hash mismatch, which should minimise the performance impact versus always reading this metadata.

However, this doesn’t entirely suffice. There are cases where we might want to allow more tolerance on one platform or another, or vary based on GPU model or driver. As such, this requires not only platform specific metadata (i.e., similar to that which we have in TestExpectations files today), but also expectations with finer granularity.

Are we sure we really need that? What are examples of tests that do warrant such a mechanism? Generally, we want to keep our testing infrastructure as simple as possible. One option is to extend the meta content to encode conditional variants,

...

though this doesn’t work for WPT tests (unless we get buy-in to upstream these annotations into the upstream repo, though that might be desirable for the sake of results on wpt.fyi). We would need to be confident that this wouldn’t become unwieldy however; we wouldn’t want to end up with something like (if:port=Apple)maxDifference=1;totalPixels=10,(if:platform=iOS)maxDifference=10;totalPixels=20,(if:port=GTK)maxDifference=10;totalPixels=300.

Another option is to extend TestExpectations to store more specific data (though again this might become unwieldy, as we’re unlikely to add new “platforms” based on every variable we might want to distinguish results on). This also means the metadata is far away from the test itself, and the TestExpectations files would continue to grow even further (and we already have 34k lines of TestExpectations data!). TestExpectations is also a rather horrible file format to modify the parser of.

I'm fine with either of the above options but I don't think we should introduce this kind of micro syntax if we're going with meta. We should probably specify a platform in a different attribute altogether. e.g. <meta name="fuzzy" content="platforms=mac-bigsur; maxDifference=15; totalPixels=300"> I really hate that WPT is using a micro-syntax for this. Why isn't this simply a different content attribute like this: <meta name="fuzzy" platforms="mac-bigsur" max-difference="15" total-pixels="300"> There is also test-options.json which has most of the same downsides as

...

TestExpectations, albeit without the pain in modifying the parser.

Finally, we could add per-test or per-directory files alongside the tests. (Due to how things work, these could presumably also be in directories in platform/.) This I think is probably the best option as it keeps the metadata near the test, without needing to modify the test (which, per above, is problematic for WPT as we move to automatically exporting changes). One could imagine either a __dir__-metadata.json (to use a similar name to how WPT names directory-level metadata files) or a -expected-fuzzy.json file alongside each test.

Both of these two options seem worse than either encoding in the test or putting in the test expectations. They invent a brand new mechanism to store metadata for tests. We don't want to introduce yet another file / mechanism people need to be aware of. - R. Niwa

Simon Fraser

1 Nov 1 Nov

6:10 p.m.

...

On Oct 30, 2021, at 10:45 AM, Ryosuke Niwa via webkit-dev <webkit-dev@lists.webkit.org> wrote:

On Thu, Oct 28, 2021 at 10:24 AM Sam Sneddon via webkit-dev <webkit-dev@lists.webkit.org <mailto:webkit-dev@lists.webkit.org>> wrote: As part of the ongoing work on GPU Process, we’re interested in adding support for reftest fuzzy matching (i.e., allowing a certain amount of tolerance when comparing the generated images).

Our intention is to match the semantics of WPT’s reftests (https://web-platform-tests.org/writing-tests/reftests.html#fuzzy-matching <https://web-platform-tests.org/writing-tests/reftests.html#fuzzy-matching>): <meta name=fuzzy content="maxDifference=15;totalPixels=300"> There are cases where we’ll want to apply these to the tests unconditionally, for example where varying behaviour is expected across ports (such as anti-aliasing differences), and in these cases for WPT tests these annotations should probably be exported upstream.

The current plan, and work is underway to do this, is to support this syntax via parsing the HTML in Python when there is a hash mismatch, which should minimise the performance impact versus always reading this metadata.

However, this doesn’t entirely suffice. There are cases where we might want to allow more tolerance on one platform or another, or vary based on GPU model or driver. As such, this requires not only platform specific metadata (i.e., similar to that which we have in TestExpectations files today), but also expectations with finer granularity.

Are we sure we really need that? What are examples of tests that do warrant such a mechanism?

Generally, we want to keep our testing infrastructure as simple as possible.

One option is to extend the meta content to encode conditional variants, though this doesn’t work for WPT tests (unless we get buy-in to upstream these annotations into the upstream repo, though that might be desirable for the sake of results on wpt.fyi). We would need to be confident that this wouldn’t become unwieldy however; we wouldn’t want to end up with something like (if:port=Apple)maxDifference=1;totalPixels=10,(if:platform=iOS)maxDifference=10;totalPixels=20,(if:port=GTK)maxDifference=10;totalPixels=300.

Another option is to extend TestExpectations to store more specific data (though again this might become unwieldy, as we’re unlikely to add new “platforms” based on every variable we might want to distinguish results on). This also means the metadata is far away from the test itself, and the TestExpectations files would continue to grow even further (and we already have 34k lines of TestExpectations data!). TestExpectations is also a rather horrible file format to modify the parser of.

I'm fine with either of the above options but I don't think we should introduce this kind of micro syntax if we're going with meta.

We should probably specify a platform in a different attribute altogether. e.g. <meta name="fuzzy" content="platforms=mac-bigsur; maxDifference=15; totalPixels=300">

I like this suggestion; WPT already allows multiple <meta name="fuzzy"> because you can specify a per-reference fuzzy value: <meta name=fuzzy content="option1-ref.html:10-15;200-300">.

...

I really hate that WPT is using a micro-syntax for this. Why isn't this simply a different content attribute like this: <meta name="fuzzy" platforms="mac-bigsur" max-difference="15" total-pixels="300">

Indeed. Maybe be should propose that change to avoid complicating the micro-syntax?

...

There is also test-options.json which has most of the same downsides as TestExpectations, albeit without the pain in modifying the parser.

Finally, we could add per-test or per-directory files alongside the tests. (Due to how things work, these could presumably also be in directories in platform/.) This I think is probably the best option as it keeps the metadata near the test, without needing to modify the test (which, per above, is problematic for WPT as we move to automatically exporting changes). One could imagine either a __dir__-metadata.json (to use a similar name to how WPT names directory-level metadata files) or a -expected-fuzzy.json file alongside each test.

Both of these two options seem worse than either encoding in the test or putting in the test expectations. They invent a brand new mechanism to store metadata for tests. We don't want to introduce yet another file / mechanism people need to be aware of.

It may be that, for performance, we have a run-tests-time step that extracts fuzzy data from tests and puts it in a file somewhere, but that's orthogonal to where devs go to look for/edit fuzzy data. Also something to consider: when importing WPT, we extract "slow" metadata and store it in a file. We should converge our solutions for all these WPT features that involve metadata in tests. Simon

1467

Age (days ago)

1471

Last active (days ago)

List overview

Download

4 comments

5 participants

participants (5)

Alejandro Garcia Castro
Myles Maxfield
Ryosuke Niwa
Sam Sneddon
Simon Fraser