[webkit-dev] Pixel test experiment
ojan at chromium.org
Thu Oct 14 09:06:21 PDT 2010
My experience is that having a non-zero tolerance makes maintaining the
pixel results *harder*. It makes it easier at first of course. But as more
and more tests only pass with a non-zero tolerance, it gets harder to figure
out if your change causes a regression (e.g. your change causes a pixel test
to fail, but when you look at the diff, it includes more changes than you
would expect from your patch).
Having no tolerance is a pain for sure, but it's much more black and white
and thus, it's usually much easier to reason about the correctness of a
On Tue, Oct 12, 2010 at 1:43 PM, James Robinson <jamesr at google.com> wrote:
> To add a concrete data point, http://trac.webkit.org/changeset/69517 caused
> a number of SVG tests to fail. It required 14 text rebaselines for Mac and
> a further two more for Leopard (done by Adam Barth). In order to pass the
> pixel tests in Chromium, it required 1506 new pixel baselines (checked in by
> the very brave Albert Wong, http://trac.webkit.org/changeset/69543). None
> of the rebaselining was done by the patch authors and in general I would not
> expect a patch author that didn't work in Chromium to be expected to update
> Chromium-specific baselines. I'm a little skeptical of the claim that all
> SVG changes are run through the pixel tests given that to date none of the
> affected platform/mac SVG pixel baselines have been updated. This sort of
> mass-rebaselining is required fairly regularly for minor changes in SVG and
> in other parts of the codebase.
> I'd really like for the bots to run the pixel tests on every run,
> preferably with 0 tolerance. We catch a lot of regressions by running these
> tests on the Chromium bots that would probably otherwise go unnoticed.
> However there is a large maintenance cost associated with this coverage.
> We normally have two engineers (one in PST, one elsewhere in the world) who
> watch the Chromium bots to triage, suppress, and rebaseline tests as churn
> is introduced.
> - If the pixel tests were running either with a tolerance of 0 or 0.1, what
> would the expectation be for a patch like
> http://trac.webkit.org/changeset/69517 which requires hundreds of pixel
> rebaselines? Would the patch author be expected to update the baselines for
> the platform/mac port, or would someone else? Thus far the Chromium folks
> have been the only ones actively maintaining the pixel baselines - which I
> think is entirely reasonable since we're the only ones trying to run the
> pixel tests on bots.
> - Do we have the tools and infrastructure needed to do mass rebaselines in
> WebKit currently? We've built a number of tools to deal with the Chromium
> expectations, but since this has been a need unique to Chromium so far the
> tools only work for Chromium.
> - James
> On Fri, Oct 8, 2010 at 11:18 PM, Nikolas Zimmermann <
> zimmermann at physik.rwth-aachen.de> wrote:
>> Am 08.10.2010 um 20:14 schrieb Jeremy Orlow:
>> I'm not an expert on Pixel tests, but my understanding is that in
>>> Chromium (where we've always run with tolerance 0) we've seen real
>>> regressions that would have slipped by with something like tolerance 0.1.
>>> When you have 0 tolerance, it is more maintenance work, but if we can avoid
>>> regressions, it seems worth it.
>> Well, that's why I initially argued for tolerance 0. Especially in SVG we
>> had lots of regressions in the past that were below the 0.1 tolerance. I
>> fully support --tolerance 0 as default.
>> Dirk & me are also willing to investigate possible problem sources and
>> minimize them.
>> Reftests as Simon said, are a great thing, but it won't help with official
>> test suites like the W3C one - it would be a huge amount of work to create
>> reftests for all of these...
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the webkit-dev