[webkit-dev] pixel tests and --tolerance (was Re: Pixel test experiment)

Thu Oct 14 09:06:53 PDT 2010

Simon, are you suggesting that we should only use pixel results for ref
tests? If not, then we still need to come to a conclusion on this tolerance
issue.

Dirk, implementing --tolerance in NRWT isn't that hard is it? Getting rid of
--tolerance will be a lot of work of making sure all the pixel results that
currently pass also pass with --tolerance=0. While I would support someone
doing that work, I don't think we should block moving to NRWT on it.

Ojan

On Fri, Oct 8, 2010 at 1:03 PM, Simon Fraser <simon.fraser at apple.com> wrote:

> I think the best solution to this pixel matching problem is ref tests.
>
> How practical would it be to use ref tests for SVG?
>
> Simon
>
> On Oct 8, 2010, at 12:43 PM, Dirk Pranke wrote:
>
> > Jeremy is correct; the Chromium port has seen real regressions that
> > virtually no concept of a fuzzy match that I can imagine would've
> > caught.
> > new-run-webkit-tests doesn't currently support the tolerance concept
> > at al, and I am inclined to argue that it shouldn't.
> >
> > However, I frequently am wrong about things, so it's quite possible
> > that there are good arguments for supporting it that I'm not aware of.
> > I'm not particularly interested in working on a tool that doesn't do
> > what the group wants it to do, and I would like all of the other
> > WebKit ports to be running pixel tests by default (and
> > new-run-webkit-tests ;) ) since I think it catches bugs.
> >
> > As far as I know, the general sentiment on the list has been that we
> > should be running pixel tests by default, and the reason that we
> > aren't is largely due to the work involved in getting them back up to
> > date and keeping them up to date. I'm sure that fuzzy matching reduces
> > the work load, especially for the sort of mismatches caused by
> > differences in the text antialiasing.
> >
> > In addition, I have heard concerns that we'd like to keep fuzzy
> > matching because people might potentially get different results on
> > machines with different hardware configurations, but I don't know that
> > we have any confirmed cases of that (except for arguably the case of
> > different code paths for gpu-accelerated rendering vs. unaccelerated
> > rendering).
> >
> > If we made it easier to maintain the baselines (improved tooling like
> > the chromium's rebaselining tool, add reftest support, etc.) are there
> > still compelling reasons for supporting --tolerance -based testing as
> > opposed to exact matching?
> >
> > -- Dirk
> >
> > On Fri, Oct 8, 2010 at 11:14 AM, Jeremy Orlow <jorlow at chromium.org>
> wrote:
> >> I'm not an expert on Pixel tests, but my understanding is that in
> Chromium
> >> (where we've always run with tolerance 0) we've seen real regressions
> that
> >> would have slipped by with something like tolerance 0.1.  When you have
> >> 0 tolerance, it is more maintenance work, but if we can avoid
> regressions,
> >> it seems worth it.
> >> J
> >>
> >> On Fri, Oct 8, 2010 at 10:58 AM, Nikolas Zimmermann
> >> <zimmermann at physik.rwth-aachen.de> wrote:
> >>>
> >>> Am 08.10.2010 um 19:53 schrieb Maciej Stachowiak:
> >>>
> >>>>
> >>>> On Oct 8, 2010, at 12:46 AM, Nikolas Zimmermann wrote:
> >>>>
> >>>>>
> >>>>> Am 08.10.2010 um 00:44 schrieb Maciej Stachowiak:
> >>>>>
> >>>>>>
> >>>>>> On Oct 7, 2010, at 6:34 AM, Nikolas Zimmermann wrote:
> >>>>>>
> >>>>>>> Good evening webkit folks,
> >>>>>>>
> >>>>>>> I've finished landing svg/ pixel test baselines, which pass with
> >>>>>>> --tolerance 0 on my 10.5 & 10.6 machines.
> >>>>>>> As the pixel testing is very important for the SVG tests, I'd like
> to
> >>>>>>> run them on the bots, experimentally, so we can catch regressions
> easily.
> >>>>>>>
> >>>>>>> Maybe someone with direct access to the leopard & snow leopard
> bots,
> >>>>>>> could just run "run-webkit-tests --tolerance 0 -p svg" and mail me
> the
> >>>>>>> results?
> >>>>>>> If it passes, we could maybe run the pixel tests for the svg/
> >>>>>>> subdirectory on these bots?
> >>>>>>
> >>>>>> Running pixel tests would be great, but can we really expect the
> >>>>>> results to be stable cross-platform with tolerance 0? Perhaps we
> should
> >>>>>> start with a higher tolerance level.
> >>>>>
> >>>>> Sure, we could do that. But I'd really like to get a feeling, for
> what's
> >>>>> problematic first. If we see 95% of the SVG tests pass with
> --tolerance 0,
> >>>>> and only a few need higher tolerances
> >>>>> (64bit vs. 32bit aa differences, etc.), I could come up with a
> per-file
> >>>>> pixel test tolerance extension to DRT, if it's needed.
> >>>>>
> >>>>> How about starting with just one build slave (say. Mac Leopard) that
> >>>>> runs the pixel tests for SVG, with --tolerance 0 for a while. I'd be
> happy
> >>>>> to identify the problems, and see
> >>>>> if we can make it work, somehow :-)
> >>>>
> >>>> The problem I worry about is that on future Mac OS X releases,
> rendering
> >>>> of shapes may change in some tiny way that is not visible but enough
> to
> >>>> cause failures at tolerance 0. In the past, such false positives arose
> from
> >>>> time to time, which is one reason we added pixel test tolerance in the
> first
> >>>> place. I don't think running pixel tests on just one build slave will
> help
> >>>> us understand that risk.
> >>>
> >>> I think we'd just update the baseline to the newer OS X release, then,
> >>> like it has been done for the tiger -> leopard, leopard -> snow leopard
> >>> switch?
> >>> platform/mac/ should always contain the newest release baseline, when
> >>> therere are differences on leopard, the results go into
> >>> platform/mac-leopard/
> >>>
> >>>> Why not start with some low but non-zero tolerance (0.1?) and see if
> we
> >>>> can at least make that work consistently, before we try the bolder
> step of
> >>>> tolerance 0?
> >>>> Also, and as a side note, we probably need to add more build slaves to
> >>>> run pixel tests at all, since just running the test suite without
> pixel
> >>>> tests is already slow enough that the testers are often significantly
> behind
> >>>> the builders.
> >>>
> >>> Well, I thought about just running the pixel tests for the svg/
> >>> subdirectory as a seperate step, hence my request for tolerance 0, as
> the
> >>> baseline passes without problems at least on my & Dirks machine
> already.
> >>> I wouldnt' want to argue running 20.000+ pixel tests with tolerance 0
> as
> >>> first step :-) But the 1000 SVG tests, might be fine, with tolerance 0?
> >>>
> >>> Even tolerance 0.1 as default for SVG would be fine with me, as long as
> we
> >>> can get the bots to run the SVG pixel tests :-)
> >>>
> >>> Cheers,
> >>> Niko
> >>>
> >>> _______________________________________________
> >>> webkit-dev mailing list
> >>> webkit-dev at lists.webkit.org
> >>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
> >>
> >>
> >> _______________________________________________
> >> webkit-dev mailing list
> >> webkit-dev at lists.webkit.org
> >> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
> >>
> >>
> > _______________________________________________
> > webkit-dev mailing list
> > webkit-dev at lists.webkit.org
> > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20101014/71066e84/attachment.html>