[webkit-dev] Pixel test experiment

Tue Oct 12 13:43:05 PDT 2010

To add a concrete data point, http://trac.webkit.org/changeset/69517 caused
a number of SVG tests to fail.  It required 14 text rebaselines for Mac and
a further two more for Leopard (done by Adam Barth).  In order to pass the
pixel tests in Chromium, it required 1506 new pixel baselines (checked in by
the very brave Albert Wong, http://trac.webkit.org/changeset/69543).  None
of the rebaselining was done by the patch authors and in general I would not
expect a patch author that didn't work in Chromium to be expected to update
Chromium-specific baselines.  I'm a little skeptical of the claim that all
SVG changes are run through the pixel tests given that to date none of the
affected platform/mac SVG pixel baselines have been updated.  This sort of
mass-rebaselining is required fairly regularly for minor changes in SVG and
in other parts of the codebase.

I'd really like for the bots to run the pixel tests on every run, preferably
with 0 tolerance.  We catch a lot of regressions by running these tests on
the Chromium bots that would probably otherwise go unnoticed.  However there
is a large maintenance cost associated with this coverage.  We normally have
two engineers (one in PST, one elsewhere in the world) who watch the
Chromium bots to triage, suppress, and rebaseline tests as churn is
introduced.

Questions:
- If the pixel tests were running either with a tolerance of 0 or 0.1, what
would the expectation be for a patch like
http://trac.webkit.org/changeset/69517 which requires hundreds of pixel
rebaselines?  Would the patch author be expected to update the baselines for
the platform/mac port, or would someone else?  Thus far the Chromium folks
have been the only ones actively maintaining the pixel baselines - which I
think is entirely reasonable since we're the only ones trying to run the
pixel tests on bots.

- Do we have the tools and infrastructure needed to do mass rebaselines in
WebKit currently?  We've built a number of tools to deal with the Chromium
expectations, but since this has been a need unique to Chromium so far the
tools only work for Chromium.

- James

On Fri, Oct 8, 2010 at 11:18 PM, Nikolas Zimmermann <
zimmermann at physik.rwth-aachen.de> wrote:

>
> Am 08.10.2010 um 20:14 schrieb Jeremy Orlow:
>
>
>  I'm not an expert on Pixel tests, but my understanding is that in Chromium
>> (where we've always run with tolerance 0) we've seen real regressions that
>> would have slipped by with something like tolerance 0.1.  When you have 0
>> tolerance, it is more maintenance work, but if we can avoid regressions, it
>> seems worth it.
>>
>
> Well, that's why I initially argued for tolerance 0. Especially in SVG we
> had lots of regressions in the past that were below the 0.1 tolerance. I
> fully support --tolerance 0 as default.
>
> Dirk & me are also willing to investigate possible problem sources and
> minimize them.
> Reftests as Simon said, are a great thing, but it won't help with official
> test suites like the W3C one - it would be a huge amount of work to create
> reftests for all of these...
>
>
> Cheers,
> Niko
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20101012/3c751cd9/attachment.html>