[webkit-gtk] Rationale for disabling pixel tests on bots

Thu Aug 3 02:53:50 PDT 2017

On 03/08/17 11:11, Romain Bellessort wrote:
> Hi,
> 
> For several years (apparently since 2012), pixel tests have been disabled
> when running tests on bots (see e.g. "Pixel tests disabled" in [1]). There
> is an option to run them locally (-p), but I was wondering what was the
> rationale for disabling them.
> 
> Based on what I found, the reason seems to be that running pixel tests on
> bots has a high processing cost. In addition, in most cases, this cost is
> not needed as considered features may be tested through reftests (hence
> disabling pixel tests on bots is not a big issue as they can generally be
> avoided).
> 
> Would you say this is correct, or are there other reasons?
> 

Pixel tests are run on the bots for the tests that first fail on the
text diff. The bot first does a first run without pixel tests (checking
only text diffs). Then it does a second run only over the tests that
first failed (this time enabling pixel tests).

Regarding about why we don't run pixel tests always..

I'm unsure if the processing cost is a concern. It will be useful to
know how much times it takes to run the whole test suite with and
without pixel tests enabled. If the difference of time it less than 25%
more I don't think this should be a concern.

My understanding is that currently there are 3 main reasons for not
doing this:

 1) Other ports (Mac) are also not running pixel test, and we currently
don't see a need to do different here. If we end enabling pixel tests
globally I think this should be done for all ports (ideally).

 2) Increased burden to keep the bots green: we already have a hard time
to keep our bots green without running pixel tests by default. If we
enable this, then the burden will be much higher than now.

 3) Difficulty to have accurate results between distributions: we have
developers using all kinds of GNU/Linux distributions. And the test
results many times depends on the very specific version of some
libraries. For example different versions of Cairo or GTK+ can cause
different 1-pixel differences (or some box to render with a sightly
different color) on the output that may make the test fail when it
actually should have passed. We try to avoid this as much as possible by
building a bunch of libraries on our internal JHBuild that we have
identified as that can cause this kind of issues. But still there are
different failures depending in if you use Fedora or Debian (for
example). So we still have not mastered the art of bundling all the
libraries that can cause different test results.

My 2 cents.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 897 bytes
Desc: OpenPGP digital signature
URL: <http://lists.webkit.org/pipermail/webkit-gtk/attachments/20170803/6b98c246/attachment.bin>