As part of the ongoing work on GPU Process, we’re interested in adding support for reftest fuzzy matching (i.e., allowing a certain amount of tolerance when comparing the generated images).
Our intention is to match the semantics of WPT’s reftests (https://web-platform-tests.org/writing-tests/reftests.html#fuzzy-matching):
<meta name=fuzzy content="maxDifference=15;totalPixels=300">
There are cases where we’ll want to apply these to the tests unconditionally, for example where varying behaviour is expected across ports (such as anti-aliasing differences), and in these cases for WPT tests these annotations should probably be exported upstream.
The current plan, and work is underway to do this, is to support this syntax via parsing the HTML in Python when there is a hash mismatch, which should minimise the performance impact versus always reading this metadata.
However, this doesn’t entirely suffice. There are cases where we might want to allow more tolerance on one platform or another, or vary based on GPU model or driver. As such, this requires not only platform specific metadata (i.e., similar to that which we have in TestExpectations files today), but also expectations with finer granularity.
Are we sure we really need that? What are examples of tests that do warrant such a mechanism?
Generally, we want to keep our testing infrastructure as simple as possible.
One option is to extend the meta content to encode conditional variants, though this doesn’t work for WPT tests (unless we get buy-in to upstream these annotations into the upstream repo, though that might be desirable for the sake of results on wpt.fyi). We would need to be confident that this wouldn’t become unwieldy however; we wouldn’t want to end up with something like (if:port=Apple)maxDifference=1;totalPixels=10,(if:platform=iOS)maxDifference=10;totalPixels=20,(if:port=GTK)maxDifference=10;totalPixels=300.
Another option is to extend TestExpectations to store more specific data (though again this might become unwieldy, as we’re unlikely to add new “platforms” based on every variable we might want to distinguish results on). This also means the metadata is far away from the test itself, and the TestExpectations files would continue to grow even further (and we already have 34k lines of TestExpectations data!). TestExpectations is also a rather horrible file format to modify the parser of.
I'm fine with either of the above options but I don't think we should introduce this kind of micro syntax if we're going with meta.
We should probably specify a platform in a different attribute altogether. e.g.
<meta name="fuzzy" content="platforms=mac-bigsur; maxDifference=15; totalPixels=300">