[webkit-dev] run-webkit-tests question; hashes when comparing ref test output

Thu Jan 22 12:58:39 PST 2015

On Thu, Jan 22, 2015 at 11:48 AM, Alexey Proskuryakov <ap at webkit.org> wrote:

>
> 22 янв. 2015 г., в 11:30, Simon Fraser <simon.fraser at apple.com>
> написал(а):
>
> > This happens when the expected and actual images are very close, but not
> identical. ImageDiff has some built-in rounding that effectively acts as a
> small tolerance, so the hashes are different, but ImageDiff incorrectly
> says the images are the same. For example, some of the tests in question
> render a green box either via CALayers, or by painting, and there’s a
> slight color difference between the two code paths.
> >
> > My preference for how to fix this would be to fix ImageDiff to remove
> its slight built-in tolerance, and then to expose testRunner API to allow a
> test to set an explicit tolerance. There are many cases where we’d like to
> use ref tests, but are unable to because of slight, justifiable rendering
> differences, and having an explicit tolerance would permit the use of ref
> tests in these cases.
>
> One thing about tolerance is that it is super confusing - are we talking
> about the number of pixels that are different, or about how different the
> pixels are? Also, a lot of failures only cause small differences in pixel
> results. Even a 100x100 box that becomes red instead of green is only a
> small portion of the 800x600 image, and it's even more the case for tests
> that check e.g. text rendering.
>
> It is not currently known what the root causes are for the tests that say
> "ref test hashes didn't match but diff passed". Given that the differences
> are very tiny, one guess is that even though compositing and
> non-compositing code paths are mathematically equivalent, there are
> different rendering steps taken, and rounding at each step adds up to
> slight differences. Another theory is that we have actual bugs, such as
> with color management.
>
> If it's just rounding differences, then the right thing to do is probably
> to silence the console output, keeping behavior the same otherwise.
>

Assuming nothing much has changed since I wrote that code, there are a
couple of ways you can get that message.

As you've found, the primary culprit is probably the fuzzy matching. We
turned off fuzzy matching on the Chromium ports because I never liked the
idea; I always was worried that it would mask real bugs.
As Alexey points out, the problem with fuzzy matching is that it's not
clear what exactly the definition of fuzziness is. What may be
appropriately fuzzy in one set of tests may be entirely inappropriate for
others.

Of course, Chromium was more willing to incur the cost of keeping all of
the pixel tests up to date when trivial things changed the output, to the
detriment of everyone's repo sizes and checkout times :).

There may be other things that can also cause mismatches; I remember Simon
bugging me from time to time when Chromium would check in a PNG with an
alpha channel, which would through off the checksum on the Mac ports (or
something like that). I don't know that I would expect you to be hitting
those these days, though.

Hopefully that provides a little bit of context,

-- Dirk
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-dev/attachments/20150122/c90fbff9/attachment.html>