Re: [webkit-dev] Running pixel tests on build.webkit.org

18 Mar 2010

      The thing I find most difficult about not having pixel bots is that, if I
make a change that changes pixel results, I need to actually build that
change on every platform to get the new pixel results. Could we put up pixel
bots on a separate waterfall? It's a waterfall we don't expect to keep green
all the time. This has a few advantages over the current state of the world:

1. When making cross-platform changes, it's easy to grab pixel results off
the bots.
2. When making changes that affect pixel tests, it's easier to see which
pixel failures are regressions caused by my patch.

I think these two would greatly help in stemming the tide of pixel test
regressions. Does that seem possible/reasonable?

Ojan

On Mon, Jan 11, 2010 at 9:17 AM, Jeremy Orlow <jorlow@chromium.org> wrote:
...
Wow, much easier than I expected.  :-)
OK, then what about buy in on this approach?
I'll even file bugs on everything I rebaseline so we can track getting
things back to a correct state and/or verifying that the new baselines are
correct.
J
On Mon, Jan 11, 2010 at 9:13 AM, Dimitri Glazkov <dglazkov@chromium.org>wrote:
...
It's baiscally just run-webkit-tests --reset-results --pixel-tests. No
magic :)
See run-webkit-tests --help for more info.
BTW, Victor is working to port the rebaselining tool to
build.webkit.org. You may want to check with him -- maybe he's close
to finishing the patch.
:DG<
On Mon, Jan 11, 2010 at 9:06 AM, Jeremy Orlow <jorlow@chromium.org>
wrote:
...
On Fri, Jan 8, 2010 at 9:52 AM, Jeremy Orlow <jorlow@chromium.org>
wrote:
...
Plan 3 seems like the best (and simplest) one until
the infrastructure for
...
the others (and/or a champion for fixing currently failing tests) is
available.
What would it take to go with plan 3?  I guess someone needs to
rebaseline
everything that's currently failing, check them in, and then someone
(like
bdash?) needs to flip a switch on the bots...?  Did I miss anything?
Are there instructions on how to do the rebaselining anywhere?  I've
only
ever created pixel baselines for Chromium before (where we have a
pretty
neat tool that pretty much does it for you).
Does anyone know?
I'm happy to do the rebaselining if someone can tell me how and we agree
to
turn pixel tests on on the bots.
...
On Fri, Jan 8, 2010 at 9:23 AM, Pam Greene <pam@chromium.org> wrote:
...
And one very quick, short-term solution:
3. Generate new pixel results to match the current behavior, and check
them in as hypothetically correct.
And of course if someone notices an existing problem and fixes it,
they
...
...
check in corrected images then. It doesn't help find current problems,
but
those are being missed now anyway. It does let the tests be run again
approximately immediately, even faster than waiting for test
expectations
functionality, so we can catch regressions moving forward.
- Pam
On Thu, Jan 7, 2010 at 5:01 PM, Ojan Vafai <ojan@chromium.org> wrote:
...
On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com>
wrote:
...
>
> On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote:
> > Are we planning to run pixel tests on the build bots?
>
> If we can get them green, we should. It’s a lot of work. We need a
> volunteer to do that work. We’ve tried before.
Two possible long-term solutions come to mind:
1. Turn the bots orange on pixel failures. They still need fixing,
but
are not as severe as text diff failures. I'm not a huge fan of this,
but
it's an option.
2. Add in a concept of expected failures and only turn the bots red
for
*unexpected* failurs. More details on this below.
In chromium-land, there's an expectations file that lists expected
failures and allows for distinguishing different types of failures
(e.g.
IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily
skip the
test. Fixing the expected failures still needs doing of course, but
can be
done asynchronously. The primary advantage of this approach is that
we can
turn on pixel tests, keep the bots green and avoid further
regressions.
Would something like that make sense for WebKit as a whole? To be
clear,
we would be nearly as loathe to add tests to this file as we are
about
adding them to the Skipped lists. This just provides a way forward.
While it's true that the bots used to be red more frequently with
pixel
tests turned on, for the most part, there weren't significant pixel
regressions. Now, if you run the pixel tests on a clean build, there
are a
number of failures and a very large number of hash-mismatches that
are
within the failure tolerance level.
-Ojan
For reference, the format of the expectations file is something like
this:
// Fails the image diff but not the text diff.
fast/forms/foo.html = IMAGE
// Fails just the text diff.
fast/forms/bar.html = TEXT
// Fails both the image and text diffs.
fast/forms/baz.html = IMAGE+TEXT
// Skips this test (e.g. because it hangs run-webkit-tests or causes
other tests to fail).
SKIP : fast/forms/foo1.html = IMAGE
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
_______________________________________________
webkit-dev mailing list
webkit-dev@lists.webkit.org
http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev