[webkit-dev] Fuzzy Reftest Plans, and Metadata Locations

Thu Oct 28 10:24:02 PDT 2021

Hi!

As part of the ongoing work on GPU Process, we’re interested in adding support for reftest fuzzy matching (i.e., allowing a certain amount of tolerance when comparing the generated images).

Our intention is to match the semantics of WPT’s reftests (https://web-platform-tests.org/writing-tests/reftests.html#fuzzy-matching):
<meta name=fuzzy content="maxDifference=15;totalPixels=300">
There are cases where we’ll want to apply these to the tests unconditionally, for example where varying behaviour is expected across ports (such as anti-aliasing differences), and in these cases for WPT tests these annotations should probably be exported upstream.

The current plan, and work is underway to do this, is to support this syntax via parsing the HTML in Python when there is a hash mismatch, which should minimise the performance impact versus always reading this metadata.

However, this doesn’t entirely suffice. There are cases where we might want to allow more tolerance on one platform or another, or vary based on GPU model or driver. As such, this requires not only platform specific metadata (i.e., similar to that which we have in TestExpectations files today), but also expectations with finer granularity.

As such I think there are a few options here:

One option is to extend the meta content to encode conditional variants, though this doesn’t work for WPT tests (unless we get buy-in to upstream these annotations into the upstream repo, though that might be desirable for the sake of results on wpt.fyi). We would need to be confident that this wouldn’t become unwieldy however; we wouldn’t want to end up with something like (if:port=Apple)maxDifference=1;totalPixels=10,(if:platform=iOS)maxDifference=10;totalPixels=20,(if:port=GTK)maxDifference=10;totalPixels=300.

Another option is to extend TestExpectations to store more specific data (though again this might become unwieldy, as we’re unlikely to add new “platforms” based on every variable we might want to distinguish results on). This also means the metadata is far away from the test itself, and the TestExpectations files would continue to grow even further (and we already have 34k lines of TestExpectations data!). TestExpectations is also a rather horrible file format to modify the parser of.

There is also test-options.json which has most of the same downsides as TestExpectations, albeit without the pain in modifying the parser.

Finally, we could add per-test or per-directory files alongside the tests. (Due to how things work, these could presumably also be in directories in platform/.) This I think is probably the best option as it keeps the metadata near the test, without needing to modify the test (which, per above, is problematic for WPT as we move to automatically exporting changes). One could imagine either a __dir__-metadata.json (to use a similar name to how WPT names directory-level metadata files) or a -expected-fuzzy.json file alongside each test.

Your opinions would be warmly welcomed!

Thanks,

Sam