On Oct 30, 2021, at 10:45 AM, Ryosuke Niwa via webkit-dev <webkit-dev@lists.webkit.org> wrote:

On Thu, Oct 28, 2021 at 10:24 AM Sam Sneddon via webkit-dev <webkit-dev@lists.webkit.org> wrote:
As part of the ongoing work on GPU Process, we’re interested in adding support for reftest fuzzy matching (i.e., allowing a certain amount of tolerance when comparing the generated images).

Our intention is to match the semantics of WPT’s reftests (https://web-platform-tests.org/writing-tests/reftests.html#fuzzy-matching):
<meta name=fuzzy content="maxDifference=15;totalPixels=300">
There are cases where we’ll want to apply these to the tests unconditionally, for example where varying behaviour is expected across ports (such as anti-aliasing differences), and in these cases for WPT tests these annotations should probably be exported upstream.

The current plan, and work is underway to do this, is to support this syntax via parsing the HTML in Python when there is a hash mismatch, which should minimise the performance impact versus always reading this metadata.

However, this doesn’t entirely suffice. There are cases where we might want to allow more tolerance on one platform or another, or vary based on GPU model or driver. As such, this requires not only platform specific metadata (i.e., similar to that which we have in TestExpectations files today), but also expectations with finer granularity.

Are we sure we really need that? What are examples of tests that do warrant such a mechanism?

Generally, we want to keep our testing infrastructure as simple as possible.

One option is to extend the meta content to encode conditional variants, though this doesn’t work for WPT tests (unless we get buy-in to upstream these annotations into the upstream repo, though that might be desirable for the sake of results on wpt.fyi). We would need to be confident that this wouldn’t become unwieldy however; we wouldn’t want to end up with something like (if:port=Apple)maxDifference=1;totalPixels=10,(if:platform=iOS)maxDifference=10;totalPixels=20,(if:port=GTK)maxDifference=10;totalPixels=300.

Another option is to extend TestExpectations to store more specific data (though again this might become unwieldy, as we’re unlikely to add new “platforms” based on every variable we might want to distinguish results on). This also means the metadata is far away from the test itself, and the TestExpectations files would continue to grow even further (and we already have 34k lines of TestExpectations data!). TestExpectations is also a rather horrible file format to modify the parser of.

I'm fine with either of the above options but I don't think we should introduce this kind of micro syntax if we're going with meta.

We should probably specify a platform in a different attribute altogether. e.g.
<meta name="fuzzy" content="platforms=mac-bigsur; maxDifference=15; totalPixels=300">

I like this suggestion; WPT already allows multiple <meta name="fuzzy"> because you can specify a per-reference fuzzy value:

There is also test-options.json which has most of the same downsides as TestExpectations, albeit without the pain in modifying the parser.

Finally, we could add per-test or per-directory files alongside the tests. (Due to how things work, these could presumably also be in directories in platform/.) This I think is probably the best option as it keeps the metadata near the test, without needing to modify the test (which, per above, is problematic for WPT as we move to automatically exporting changes). One could imagine either a __dir__-metadata.json (to use a similar name to how WPT names directory-level metadata files) or a -expected-fuzzy.json file alongside each test.

Both of these two options seem worse than either encoding in the test or putting in the test expectations. They invent a brand new mechanism to store metadata for tests. We don't want to introduce yet another file / mechanism people need to be aware of.