Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)

14 Oct 2010


      I'm not sure if this could be make to work with SVG (might require some
additions to LayoutTestController), but Philip Taylor's <canvas> test suite
(in LayoutTests/canvas/philip) compares pixels programmatically in
JavaScript.  This has the major advantage that it doesn't require pixel
results, and allows for a per-test level of fuzziness/tolerance (if
required).  Obviously we would still want to have some tests remain pixel
tests, as these tests only cover a subset of pixels, but it might be a good
alternative to consider when writing new tests (especially for regressions,
where a single pixel correctly chosen can often correctly isolate the
problem).

Stephen

On Thu, Oct 14, 2010 at 12:06 PM, Ojan Vafai <ojan@chromium.org> wrote:
...
Simon, are you suggesting that we should only use pixel results for ref
tests? If not, then we still need to come to a conclusion on this tolerance
issue.
Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid
of --tolerance will be a lot of work of making sure all the pixel results
that currently pass also pass with --tolerance=0. While I would support
someone doing that work, I don't think we should block moving to NRWT on it.
Ojan
On Fri, Oct 8, 2010 at 1:03 PM, Simon Fraser <simon.fraser@apple.com>wrote:
...
I think the best solution to this pixel matching problem is ref tests.
How practical would it be to use ref tests for SVG?
Simon
On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote:
...
Jeremy is correct; the Chromium port has seen real regressions that
virtually no concept of a fuzzy match that I can imagine would've
caught.
new-run-webkit-tests doesn't currently support the tolerance concept
at al, and I am inclined to argue that it shouldn't.
However, I frequently am wrong about things, so it's quite possible
that there are good arguments for supporting it that I'm not aware of.
I'm not particularly interested in working on a tool that doesn't do
what the group wants it to do, and I would like all of the other
WebKit ports to be running pixel tests by default (and
new-run-webkit-tests ;) ) since I think it catches bugs.
As far as I know, the general sentiment on the list has been that we
should be running pixel tests by default, and the reason that we
aren't is largely due to the work involved in getting them back up to
date and keeping them up to date. I'm sure that fuzzy matching reduces
the work load, especially for the sort of mismatches caused by
differences in the text antialiasing.
In addition, I have heard concerns that we'd like to keep fuzzy
matching because people might potentially get different results on
machines with different hardware configurations, but I don't know that
we have any confirmed cases of that (except for arguably the case of
different code paths for gpu-accelerated rendering vs. unaccelerated
rendering).
If we made it easier to maintain the baselines (improved tooling like
the chromium's rebaselining tool, add reftest support, etc.) are there
still compelling reasons for supporting --tolerance -based testing as
opposed to exact matching?
-- Dirk
On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow <jorlow@chromium.org>
wrote:
...
I'm not an expert on Pixel tests, but my understanding is that in
Chromium
(where we've always run with tolerance 0) we've seen real regressions
that
would have slipped by with something like tolerance 0.1.  When you have
0 tolerance, it is more maintenance work, but if we can avoid
regressions,
it seems worth it.
J
On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann
<zimmermann@physik.rwth-aachen.de> wrote:
...
Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak:
...
On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote:
>
> Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:
>
>>
>> On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:
>>
>>> Good evening webkit folks,
>>>
>>> I've finished landing svg/ pixel test baselines, which pass with
>>> --tolerance 0 on my 10.5 & 10.6 machines.
>>> As the pixel testing is very important for the SVG tests, I'd like
to
...
...
>>> run them on the bots, experimentally, so we can catch regressions
easily.
>>>
>>> Maybe someone with direct access to the leopard & snow leopard
bots,
>>> could just run "run-webkit-tests --tolerance 0 -p svg" and mail me
the
>>> results?
>>> If it passes, we could maybe run the pixel tests for the svg/
>>> subdirectory on these bots?
>>
>> Running pixel tests would be great, but can we really expect the
>> results to be stable cross-platform with tolerance 0? Perhaps we
should
>> start with a higher tolerance level.
>
> Sure, we could do that. But I'd really like to get a feeling, for
what's
> problematic first. If we see 95% of the SVG tests pass with
--tolerance 0,
> and only a few need higher tolerances
> (64bit vs. 32bit aa differences, etc.), I could come up with a
per-file
> pixel test tolerance extension to DRT, if it's needed.
>
> How about starting with just one build slave (say. Mac Leopard) that
> runs the pixel tests for SVG, with --tolerance 0 for a while. I'd be
happy
> to identify the problems, and see
> if we can make it work, somehow :-)
The problem I worry about is that on future Mac OS X releases,
rendering
of shapes may change in some tiny way that is not visible but enough
to
cause failures at tolerance 0. In the past, such false positives
arose from
time to time, which is one reason we added pixel test tolerance in
the first
place. I don't think running pixel tests on just one build slave will
help
us understand that risk.
I think we'd just update the baseline to the newer OS X release, then,
like it has been done for the tiger -> leopard, leopard -> snow
leopard
switch?
platform/mac/ should always contain the newest release baseline, when
therere are differences on leopard, the results go into
platform/mac-leopard/
...
Why not start with some low but non-zero tolerance (0.1?) and see if
we
can at least make that work consistently, before we try the bolder
step of
tolerance 0?
Also, and as a side note, we probably need to add more build slaves
to
run pixel tests at all, since just running the test suite without
pixel
tests is already slow enough that the testers are often significantly
behind
the builders.
Well, I thought about just running the pixel tests for the svg/
subdirectory as a seperate step, hence my request for tolerance 0, as
the
baseline passes without problems at least on my & Dirks machine
already.
I wouldnt' want to argue running 20.000+ pixel tests with tolerance 0
as
first step :-) But the 1000 SVG tests, might be fine, with tolerance
0?
Even tolerance 0.1 as default for SVG would be fine with me, as long
as we
can get the bots to run the SVG pixel tests :-)
Cheers,
Niko
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Re: [webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)

Stephen White