Running pixel tests on build.webkit.org

Dimitri Glazkov

7 Jan 2010 7 Jan '10

6:19 p.m.

Are we planning to run pixel tests on the build bots? What's the general opinion here? We're running them over at Chromium and it seems like a really good idea. Case in point: Change http://trac.webkit.org/changeset/52900 broke a bunch of layout tests, all pixel results, and as such didn't register on the waterfall. I rolled out the change for now. :DG<

Show replies by date

Darin Adler

7 Jan 7 Jan

6:22 p.m.

On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote:

...

Are we planning to run pixel tests on the build bots?

If we can get them green, we should. It’s a lot of work. We need a volunteer to do that work. We’ve tried before. -- Darin

Ojan Vafai

8 Jan 8 Jan

1:01 a.m.

On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com> wrote:

...

On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote:

...
Are we planning to run pixel tests on the build bots?

If we can get them green, we should. It’s a lot of work. We need a volunteer to do that work. We’ve tried before.

Two possible long-term solutions come to mind: 1. Turn the bots orange on pixel failures. They still need fixing, but are not as severe as text diff failures. I'm not a huge fan of this, but it's an option. 2. Add in a concept of expected failures and only turn the bots red for *unexpected* failurs. More details on this below. In chromium-land, there's an expectations file that lists expected failures and allows for distinguishing different types of failures (e.g. IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily skip the test. Fixing the expected failures still needs doing of course, but can be done asynchronously. The primary advantage of this approach is that we can turn on pixel tests, keep the bots green and avoid further regressions. Would something like that make sense for WebKit as a whole? To be clear, we would be nearly as loathe to add tests to this file as we are about adding them to the Skipped lists. This just provides a way forward. While it's true that the bots used to be red more frequently with pixel tests turned on, for the most part, there weren't significant pixel regressions. Now, if you run the pixel tests on a clean build, there are a number of failures and a very large number of hash-mismatches that are within the failure tolerance level. -Ojan For reference, the format of the expectations file is something like this: // Fails the image diff but not the text diff. fast/forms/foo.html = IMAGE // Fails just the text diff. fast/forms/bar.html = TEXT // Fails both the image and text diffs. fast/forms/baz.html = IMAGE+TEXT // Skips this test (e.g. because it hangs run-webkit-tests or causes other tests to fail). SKIP : fast/forms/foo1.html = IMAGE

Eric Seidel

1:08 a.m.

I'm totally in favor of adding test_expectations.txt like functionality to webkit (and we'll get it for free when Dirk finishes up-streaming run_webkit_tests.py) But the troubles with the pixel tests in the past were more to do with text metrics changing between OS releases, and individual font differences between machines. I suspect that those issues are very solvable. I think we mostly need someone willing to set up the pixel test bots. -eric On Thu, Jan 7, 2010 at 5:01 PM, Ojan Vafai <ojan@chromium.org> wrote:

...

On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com> wrote:

...
On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote:

...
Are we planning to run pixel tests on the build bots?

If we can get them green, we should. It’s a lot of work. We need a volunteer to do that work. We’ve tried before.

Two possible long-term solutions come to mind: 1. Turn the bots orange on pixel failures. They still need fixing, but are not as severe as text diff failures. I'm not a huge fan of this, but it's an option. 2. Add in a concept of expected failures and only turn the bots red for *unexpected* failurs. More details on this below. In chromium-land, there's an expectations file that lists expected failures and allows for distinguishing different types of failures (e.g. IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily skip the test. Fixing the expected failures still needs doing of course, but can be done asynchronously. The primary advantage of this approach is that we can turn on pixel tests, keep the bots green and avoid further regressions. Would something like that make sense for WebKit as a whole? To be clear, we would be nearly as loathe to add tests to this file as we are about adding them to the Skipped lists. This just provides a way forward. While it's true that the bots used to be red more frequently with pixel tests turned on, for the most part, there weren't significant pixel regressions. Now, if you run the pixel tests on a clean build, there are a number of failures and a very large number of hash-mismatches that are within the failure tolerance level. -Ojan For reference, the format of the expectations file is something like this: // Fails the image diff but not the text diff. fast/forms/foo.html = IMAGE // Fails just the text diff. fast/forms/bar.html = TEXT // Fails both the image and text diffs. fast/forms/baz.html = IMAGE+TEXT // Skips this test (e.g. because it hangs run-webkit-tests or causes other tests to fail). SKIP : fast/forms/foo1.html = IMAGE _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Ojan Vafai

1:17 a.m.

Do we really need a separate set of bots for pixel tests? Lets just turn the pixel tests on for the current bots. The only thing stopping us doing that is the currently failing tests, hence the suggestion for adding an expectations file (or we could skip all the failures). I don't know enough about text metrics changes between Mac releases. With Windows releases, we've been able to support XP, Vista and 7 pretty easily by using a generic theme for OS controls. Also, I think we have some hooks to turn off cleartype or something. There are only ~10 tests that needed custom results for Vista && Windows 7. I wonder if a similar set of steps could be taken for supporting different Mac releases. Ojan On Thu, Jan 7, 2010 at 5:08 PM, Eric Seidel <eric@webkit.org> wrote:

...

I'm totally in favor of adding test_expectations.txt like functionality to webkit (and we'll get it for free when Dirk finishes up-streaming run_webkit_tests.py)

But the troubles with the pixel tests in the past were more to do with text metrics changing between OS releases, and individual font differences between machines. I suspect that those issues are very solvable.

I think we mostly need someone willing to set up the pixel test bots.

-eric

On Thu, Jan 7, 2010 at 5:01 PM, Ojan Vafai <ojan@chromium.org> wrote:

...
On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com> wrote:

...
On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote:

...
Are we planning to run pixel tests on the build bots?

If we can get them green, we should. It’s a lot of work. We need a volunteer to do that work. We’ve tried before.

Two possible long-term solutions come to mind: 1. Turn the bots orange on pixel failures. They still need fixing, but are not as severe as text diff failures. I'm not a huge fan of this, but it's an option. 2. Add in a concept of expected failures and only turn the bots red for *unexpected* failurs. More details on this below. In chromium-land, there's an expectations file that lists expected failures and allows for distinguishing different types of failures (e.g. IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily skip the test. Fixing the expected failures still needs doing of course, but can be done asynchronously. The primary advantage of this approach is that we can turn on pixel tests, keep the bots green and avoid further regressions. Would something like that make sense for WebKit as a whole? To be clear, we would be nearly as loathe to add tests to this file as we are about adding them to the Skipped lists. This just provides a way forward. While it's true that the bots used to be red more frequently with pixel tests turned on, for the most part, there weren't significant pixel regressions. Now, if you run the pixel tests on a clean build, there are a number of failures and a very large number of hash-mismatches that are within the failure tolerance level. -Ojan For reference, the format of the expectations file is something like this: // Fails the image diff but not the text diff. fast/forms/foo.html = IMAGE // Fails just the text diff. fast/forms/bar.html = TEXT // Fails both the image and text diffs. fast/forms/baz.html = IMAGE+TEXT // Skips this test (e.g. because it hangs run-webkit-tests or causes other tests to fail). SKIP : fast/forms/foo1.html = IMAGE _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Darin Fisher

4:18 a.m.

On Thu, Jan 7, 2010 at 5:17 PM, Ojan Vafai <ojan@chromium.org> wrote:

...

Do we really need a separate set of bots for pixel tests? Lets just turn the pixel tests on for the current bots. The only thing stopping us doing that is the currently failing tests, hence the suggestion for adding an expectations file (or we could skip all the failures).

I don't know enough about text metrics changes between Mac releases. With Windows releases, we've been able to support XP, Vista and 7 pretty easily by using a generic theme for OS controls. Also, I think we have some hooks to turn off cleartype or something.

...and making sure all of the right / same fonts are installed :) -darin

...

There are only ~10 tests that needed custom results for Vista && Windows 7. I wonder if a similar set of steps could be taken for supporting different Mac releases.

Ojan

On Thu, Jan 7, 2010 at 5:08 PM, Eric Seidel <eric@webkit.org> wrote:

...
I'm totally in favor of adding test_expectations.txt like functionality to webkit (and we'll get it for free when Dirk finishes up-streaming run_webkit_tests.py)

But the troubles with the pixel tests in the past were more to do with text metrics changing between OS releases, and individual font differences between machines. I suspect that those issues are very solvable.

I think we mostly need someone willing to set up the pixel test bots.

-eric

On Thu, Jan 7, 2010 at 5:01 PM, Ojan Vafai <ojan@chromium.org> wrote:

...
On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com> wrote:

...
On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote:

...
Are we planning to run pixel tests on the build bots?

If we can get them green, we should. It’s a lot of work. We need a volunteer to do that work. We’ve tried before.

Two possible long-term solutions come to mind: 1. Turn the bots orange on pixel failures. They still need fixing, but are not as severe as text diff failures. I'm not a huge fan of this, but it's an option. 2. Add in a concept of expected failures and only turn the bots red for *unexpected* failurs. More details on this below. In chromium-land, there's an expectations file that lists expected failures and allows for distinguishing different types of failures (e.g. IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily skip the test. Fixing the expected failures still needs doing of course, but can be done asynchronously. The primary advantage of this approach is that we can turn on pixel tests, keep the bots green and avoid further regressions. Would something like that make sense for WebKit as a whole? To be clear, we would be nearly as loathe to add tests to this file as we are about adding them to the Skipped lists. This just provides a way forward. While it's true that the bots used to be red more frequently with pixel tests turned on, for the most part, there weren't significant pixel regressions. Now, if you run the pixel tests on a clean build, there are a number of failures and a very large number of hash-mismatches that are within the failure tolerance level. -Ojan For reference, the format of the expectations file is something like this: // Fails the image diff but not the text diff. fast/forms/foo.html = IMAGE // Fails just the text diff. fast/forms/bar.html = TEXT // Fails both the image and text diffs. fast/forms/baz.html = IMAGE+TEXT // Skips this test (e.g. because it hangs run-webkit-tests or causes other tests to fail). SKIP : fast/forms/foo1.html = IMAGE _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Dirk Pranke

8:53 p.m.

On Thu, Jan 7, 2010 at 8:18 PM, Darin Fisher <darin@chromium.org> wrote:

...

On Thu, Jan 7, 2010 at 5:17 PM, Ojan Vafai <ojan@chromium.org> wrote:

...
Do we really need a separate set of bots for pixel tests? Lets just turn the pixel tests on for the current bots. The only thing stopping us doing that is the currently failing tests, hence the suggestion for adding an expectations file (or we could skip all the failures). I don't know enough about text metrics changes between Mac releases. With Windows releases, we've been able to support XP, Vista and 7 pretty easily by using a generic theme for OS controls. Also, I think we have some hooks to turn off cleartype or something.

...and making sure all of the right / same fonts are installed :) -darin

We disable ClearType and do some minimal checking of the system UI theme to make sure you don't have custom font or widget sizes. All of our baselines use the fonts installed with the system, so there are no custom fonts that need to be installed. You do need QuickTime installed, or a few tests will fail, but otherwise it's pretty stock. IIRC, there are far fewer diffs between Leopard and Snow Leopard than there were between XP and Vista, so I doubt you would need the "generic theme" that we did on Windows; it's probably easier to just rebaseline the few files that do diff. I can't speak to the diffs between Tiger and Leopard. -- Dirk

...

...
There are only ~10 tests that needed custom results for Vista && Windows 7. I wonder if a similar set of steps could be taken for supporting different Mac releases. Ojan On Thu, Jan 7, 2010 at 5:08 PM, Eric Seidel <eric@webkit.org> wrote:

...
I'm totally in favor of adding test_expectations.txt like functionality to webkit (and we'll get it for free when Dirk finishes up-streaming run_webkit_tests.py)

But the troubles with the pixel tests in the past were more to do with text metrics changing between OS releases, and individual font differences between machines. I suspect that those issues are very solvable.

I think we mostly need someone willing to set up the pixel test bots.

-eric

On Thu, Jan 7, 2010 at 5:01 PM, Ojan Vafai <ojan@chromium.org> wrote:

...
On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com> wrote:

...
On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote:

...
Are we planning to run pixel tests on the build bots?

If we can get them green, we should. It’s a lot of work. We need a volunteer to do that work. We’ve tried before.

Two possible long-term solutions come to mind: 1. Turn the bots orange on pixel failures. They still need fixing, but are not as severe as text diff failures. I'm not a huge fan of this, but it's an option. 2. Add in a concept of expected failures and only turn the bots red for *unexpected* failurs. More details on this below. In chromium-land, there's an expectations file that lists expected failures and allows for distinguishing different types of failures (e.g. IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily skip the test. Fixing the expected failures still needs doing of course, but can be done asynchronously. The primary advantage of this approach is that we can turn on pixel tests, keep the bots green and avoid further regressions. Would something like that make sense for WebKit as a whole? To be clear, we would be nearly as loathe to add tests to this file as we are about adding them to the Skipped lists. This just provides a way forward. While it's true that the bots used to be red more frequently with pixel tests turned on, for the most part, there weren't significant pixel regressions. Now, if you run the pixel tests on a clean build, there are a number of failures and a very large number of hash-mismatches that are within the failure tolerance level. -Ojan For reference, the format of the expectations file is something like this: // Fails the image diff but not the text diff. fast/forms/foo.html = IMAGE // Fails just the text diff. fast/forms/bar.html = TEXT // Fails both the image and text diffs. fast/forms/baz.html = IMAGE+TEXT // Skips this test (e.g. because it hangs run-webkit-tests or causes other tests to fail). SKIP : fast/forms/foo1.html = IMAGE _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Pam Greene

5:23 p.m.

And one very quick, short-term solution: 3. Generate new pixel results to match the current behavior, and check them in as hypothetically correct. And of course if someone notices an existing problem and fixes it, they check in corrected images then. It doesn't help find current problems, but those are being missed now anyway. It does let the tests be run again approximately immediately, even faster than waiting for test expectations functionality, so we can catch regressions moving forward. - Pam On Thu, Jan 7, 2010 at 5:01 PM, Ojan Vafai <ojan@chromium.org> wrote:

...

On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com> wrote:

...
On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote:

...
Are we planning to run pixel tests on the build bots?

If we can get them green, we should. It’s a lot of work. We need a volunteer to do that work. We’ve tried before.

Two possible long-term solutions come to mind: 1. Turn the bots orange on pixel failures. They still need fixing, but are not as severe as text diff failures. I'm not a huge fan of this, but it's an option. 2. Add in a concept of expected failures and only turn the bots red for *unexpected* failurs. More details on this below.

In chromium-land, there's an expectations file that lists expected failures and allows for distinguishing different types of failures (e.g. IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily skip the test. Fixing the expected failures still needs doing of course, but can be done asynchronously. The primary advantage of this approach is that we can turn on pixel tests, keep the bots green and avoid further regressions.

Would something like that make sense for WebKit as a whole? To be clear, we would be nearly as loathe to add tests to this file as we are about adding them to the Skipped lists. This just provides a way forward.

While it's true that the bots used to be red more frequently with pixel tests turned on, for the most part, there weren't significant pixel regressions. Now, if you run the pixel tests on a clean build, there are a number of failures and a very large number of hash-mismatches that are within the failure tolerance level.

-Ojan

For reference, the format of the expectations file is something like this:

// Fails the image diff but not the text diff. fast/forms/foo.html = IMAGE

// Fails just the text diff. fast/forms/bar.html = TEXT

// Fails both the image and text diffs. fast/forms/baz.html = IMAGE+TEXT

// Skips this test (e.g. because it hangs run-webkit-tests or causes other tests to fail). SKIP : fast/forms/foo1.html = IMAGE

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Jeremy Orlow

5:52 p.m.

Plan 3 seems like the best (and simplest) one until the infrastructure for the others (and/or a champion for fixing currently failing tests) is available. What would it take to go with plan 3? I guess someone needs to rebaseline everything that's currently failing, check them in, and then someone (like bdash?) needs to flip a switch on the bots...? Did I miss anything? Are there instructions on how to do the rebaselining anywhere? I've only ever created pixel baselines for Chromium before (where we have a pretty neat tool that pretty much does it for you). J On Fri, Jan 8, 2010 at 9:23 AM, Pam Greene <pam@chromium.org> wrote:

...

And one very quick, short-term solution:

3. Generate new pixel results to match the current behavior, and check them in as hypothetically correct.

And of course if someone notices an existing problem and fixes it, they check in corrected images then. It doesn't help find current problems, but those are being missed now anyway. It does let the tests be run again approximately immediately, even faster than waiting for test expectations functionality, so we can catch regressions moving forward.

- Pam

On Thu, Jan 7, 2010 at 5:01 PM, Ojan Vafai <ojan@chromium.org> wrote:

...
On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com> wrote:

...
On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote:

...
Are we planning to run pixel tests on the build bots?

If we can get them green, we should. It’s a lot of work. We need a volunteer to do that work. We’ve tried before.

Two possible long-term solutions come to mind: 1. Turn the bots orange on pixel failures. They still need fixing, but are not as severe as text diff failures. I'm not a huge fan of this, but it's an option. 2. Add in a concept of expected failures and only turn the bots red for *unexpected* failurs. More details on this below.

In chromium-land, there's an expectations file that lists expected failures and allows for distinguishing different types of failures (e.g. IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily skip the test. Fixing the expected failures still needs doing of course, but can be done asynchronously. The primary advantage of this approach is that we can turn on pixel tests, keep the bots green and avoid further regressions.

Would something like that make sense for WebKit as a whole? To be clear, we would be nearly as loathe to add tests to this file as we are about adding them to the Skipped lists. This just provides a way forward.

While it's true that the bots used to be red more frequently with pixel tests turned on, for the most part, there weren't significant pixel regressions. Now, if you run the pixel tests on a clean build, there are a number of failures and a very large number of hash-mismatches that are within the failure tolerance level.

-Ojan

For reference, the format of the expectations file is something like this:

// Fails the image diff but not the text diff. fast/forms/foo.html = IMAGE

// Fails just the text diff. fast/forms/bar.html = TEXT

// Fails both the image and text diffs. fast/forms/baz.html = IMAGE+TEXT

// Skips this test (e.g. because it hangs run-webkit-tests or causes other tests to fail). SKIP : fast/forms/foo1.html = IMAGE

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Jeremy Orlow

11 Jan 11 Jan

5:06 p.m.

On Fri, Jan 8, 2010 at 9:52 AM, Jeremy Orlow <jorlow@chromium.org> wrote:

...

Plan 3 seems like the best (and simplest) one until the infrastructure for the others (and/or a champion for fixing currently failing tests) is available.

What would it take to go with plan 3? I guess someone needs to rebaseline everything that's currently failing, check them in, and then someone (like bdash?) needs to flip a switch on the bots...? Did I miss anything?

Are there instructions on how to do the rebaselining anywhere? I've only ever created pixel baselines for Chromium before (where we have a pretty neat tool that pretty much does it for you).

Does anyone know? I'm happy to do the rebaselining if someone can tell me how and we agree to turn pixel tests on on the bots.

...

On Fri, Jan 8, 2010 at 9:23 AM, Pam Greene <pam@chromium.org> wrote:

...
And one very quick, short-term solution:

3. Generate new pixel results to match the current behavior, and check them in as hypothetically correct.

And of course if someone notices an existing problem and fixes it, they check in corrected images then. It doesn't help find current problems, but those are being missed now anyway. It does let the tests be run again approximately immediately, even faster than waiting for test expectations functionality, so we can catch regressions moving forward.

- Pam

On Thu, Jan 7, 2010 at 5:01 PM, Ojan Vafai <ojan@chromium.org> wrote:

...
On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com> wrote:

...
On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote:

...
Are we planning to run pixel tests on the build bots?

If we can get them green, we should. It’s a lot of work. We need a volunteer to do that work. We’ve tried before.

Two possible long-term solutions come to mind: 1. Turn the bots orange on pixel failures. They still need fixing, but are not as severe as text diff failures. I'm not a huge fan of this, but it's an option. 2. Add in a concept of expected failures and only turn the bots red for *unexpected* failurs. More details on this below.

In chromium-land, there's an expectations file that lists expected failures and allows for distinguishing different types of failures (e.g. IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily skip the test. Fixing the expected failures still needs doing of course, but can be done asynchronously. The primary advantage of this approach is that we can turn on pixel tests, keep the bots green and avoid further regressions.

Would something like that make sense for WebKit as a whole? To be clear, we would be nearly as loathe to add tests to this file as we are about adding them to the Skipped lists. This just provides a way forward.

While it's true that the bots used to be red more frequently with pixel tests turned on, for the most part, there weren't significant pixel regressions. Now, if you run the pixel tests on a clean build, there are a number of failures and a very large number of hash-mismatches that are within the failure tolerance level.

-Ojan

For reference, the format of the expectations file is something like this:

// Fails the image diff but not the text diff. fast/forms/foo.html = IMAGE

// Fails just the text diff. fast/forms/bar.html = TEXT

// Fails both the image and text diffs. fast/forms/baz.html = IMAGE+TEXT

// Skips this test (e.g. because it hangs run-webkit-tests or causes other tests to fail). SKIP : fast/forms/foo1.html = IMAGE

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Dimitri Glazkov

5:13 p.m.

It's baiscally just run-webkit-tests --reset-results --pixel-tests. No magic :) See run-webkit-tests --help for more info. BTW, Victor is working to port the rebaselining tool to build.webkit.org. You may want to check with him -- maybe he's close to finishing the patch. :DG< On Mon, Jan 11, 2010 at 9:06 AM, Jeremy Orlow <jorlow@chromium.org> wrote:

...

On Fri, Jan 8, 2010 at 9:52 AM, Jeremy Orlow <jorlow@chromium.org> wrote:

...
Plan 3 seems like the best (and simplest) one until the infrastructure for the others (and/or a champion for fixing currently failing tests) is available. What would it take to go with plan 3? I guess someone needs to rebaseline everything that's currently failing, check them in, and then someone (like bdash?) needs to flip a switch on the bots...? Did I miss anything? Are there instructions on how to do the rebaselining anywhere? I've only ever created pixel baselines for Chromium before (where we have a pretty neat tool that pretty much does it for you).

Does anyone know? I'm happy to do the rebaselining if someone can tell me how and we agree to turn pixel tests on on the bots.

...
On Fri, Jan 8, 2010 at 9:23 AM, Pam Greene <pam@chromium.org> wrote:

...
And one very quick, short-term solution: 3. Generate new pixel results to match the current behavior, and check them in as hypothetically correct. And of course if someone notices an existing problem and fixes it, they check in corrected images then. It doesn't help find current problems, but those are being missed now anyway. It does let the tests be run again approximately immediately, even faster than waiting for test expectations functionality, so we can catch regressions moving forward. - Pam

On Thu, Jan 7, 2010 at 5:01 PM, Ojan Vafai <ojan@chromium.org> wrote:

...
On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com> wrote:

...
On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote:

...
Are we planning to run pixel tests on the build bots?

If we can get them green, we should. It’s a lot of work. We need a volunteer to do that work. We’ve tried before.

Two possible long-term solutions come to mind: 1. Turn the bots orange on pixel failures. They still need fixing, but are not as severe as text diff failures. I'm not a huge fan of this, but it's an option. 2. Add in a concept of expected failures and only turn the bots red for *unexpected* failurs. More details on this below. In chromium-land, there's an expectations file that lists expected failures and allows for distinguishing different types of failures (e.g. IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily skip the test. Fixing the expected failures still needs doing of course, but can be done asynchronously. The primary advantage of this approach is that we can turn on pixel tests, keep the bots green and avoid further regressions. Would something like that make sense for WebKit as a whole? To be clear, we would be nearly as loathe to add tests to this file as we are about adding them to the Skipped lists. This just provides a way forward. While it's true that the bots used to be red more frequently with pixel tests turned on, for the most part, there weren't significant pixel regressions. Now, if you run the pixel tests on a clean build, there are a number of failures and a very large number of hash-mismatches that are within the failure tolerance level. -Ojan For reference, the format of the expectations file is something like this: // Fails the image diff but not the text diff. fast/forms/foo.html = IMAGE // Fails just the text diff. fast/forms/bar.html = TEXT // Fails both the image and text diffs. fast/forms/baz.html = IMAGE+TEXT // Skips this test (e.g. because it hangs run-webkit-tests or causes other tests to fail). SKIP : fast/forms/foo1.html = IMAGE _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Jeremy Orlow

5:17 p.m.

Wow, much easier than I expected. :-) OK, then what about buy in on this approach? I'll even file bugs on everything I rebaseline so we can track getting things back to a correct state and/or verifying that the new baselines are correct. J On Mon, Jan 11, 2010 at 9:13 AM, Dimitri Glazkov <dglazkov@chromium.org>wrote:

...

It's baiscally just run-webkit-tests --reset-results --pixel-tests. No magic :)

See run-webkit-tests --help for more info.

BTW, Victor is working to port the rebaselining tool to build.webkit.org. You may want to check with him -- maybe he's close to finishing the patch.

:DG<

On Mon, Jan 11, 2010 at 9:06 AM, Jeremy Orlow <jorlow@chromium.org> wrote:

...
On Fri, Jan 8, 2010 at 9:52 AM, Jeremy Orlow <jorlow@chromium.org> wrote:

...
Plan 3 seems like the best (and simplest) one until

the infrastructure for

...
the others (and/or a champion for fixing currently failing tests) is available. What would it take to go with plan 3? I guess someone needs to rebaseline everything that's currently failing, check them in, and then someone (like bdash?) needs to flip a switch on the bots...? Did I miss anything? Are there instructions on how to do the rebaselining anywhere? I've only ever created pixel baselines for Chromium before (where we have a pretty neat tool that pretty much does it for you).

Does anyone know? I'm happy to do the rebaselining if someone can tell me how and we agree to turn pixel tests on on the bots.

...
On Fri, Jan 8, 2010 at 9:23 AM, Pam Greene <pam@chromium.org> wrote:

...
And one very quick, short-term solution: 3. Generate new pixel results to match the current behavior, and check them in as hypothetically correct. And of course if someone notices an existing problem and fixes it, they check in corrected images then. It doesn't help find current problems,

but

...
...
those are being missed now anyway. It does let the tests be run again approximately immediately, even faster than waiting for test expectations functionality, so we can catch regressions moving forward. - Pam

On Thu, Jan 7, 2010 at 5:01 PM, Ojan Vafai <ojan@chromium.org> wrote:

...
On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com> wrote:

...
On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote: > Are we planning to run pixel tests on the build bots?

If we can get them green, we should. It’s a lot of work. We need a volunteer to do that work. We’ve tried before.

Two possible long-term solutions come to mind: 1. Turn the bots orange on pixel failures. They still need fixing, but are not as severe as text diff failures. I'm not a huge fan of this,

but

...
it's an option. 2. Add in a concept of expected failures and only turn the bots red for *unexpected* failurs. More details on this below. In chromium-land, there's an expectations file that lists expected failures and allows for distinguishing different types of failures (e.g. IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily skip the test. Fixing the expected failures still needs doing of course, but can be done asynchronously. The primary advantage of this approach is that we can turn on pixel tests, keep the bots green and avoid further regressions. Would something like that make sense for WebKit as a whole? To be clear, we would be nearly as loathe to add tests to this file as we are about adding them to the Skipped lists. This just provides a way forward. While it's true that the bots used to be red more frequently with pixel tests turned on, for the most part, there weren't significant pixel regressions. Now, if you run the pixel tests on a clean build, there are a number of failures and a very large number of hash-mismatches that are within the failure tolerance level. -Ojan For reference, the format of the expectations file is something like this: // Fails the image diff but not the text diff. fast/forms/foo.html = IMAGE // Fails just the text diff. fast/forms/bar.html = TEXT // Fails both the image and text diffs. fast/forms/baz.html = IMAGE+TEXT // Skips this test (e.g. because it hangs run-webkit-tests or causes other tests to fail). SKIP : fast/forms/foo1.html = IMAGE _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Ojan Vafai

18 Mar 18 Mar

8:57 p.m.

The thing I find most difficult about not having pixel bots is that, if I make a change that changes pixel results, I need to actually build that change on every platform to get the new pixel results. Could we put up pixel bots on a separate waterfall? It's a waterfall we don't expect to keep green all the time. This has a few advantages over the current state of the world: 1. When making cross-platform changes, it's easy to grab pixel results off the bots. 2. When making changes that affect pixel tests, it's easier to see which pixel failures are regressions caused by my patch. I think these two would greatly help in stemming the tide of pixel test regressions. Does that seem possible/reasonable? Ojan On Mon, Jan 11, 2010 at 9:17 AM, Jeremy Orlow <jorlow@chromium.org> wrote:

...

Wow, much easier than I expected. :-)

OK, then what about buy in on this approach?

I'll even file bugs on everything I rebaseline so we can track getting things back to a correct state and/or verifying that the new baselines are correct.

J

On Mon, Jan 11, 2010 at 9:13 AM, Dimitri Glazkov <dglazkov@chromium.org>wrote:

...
It's baiscally just run-webkit-tests --reset-results --pixel-tests. No magic :)

See run-webkit-tests --help for more info.

BTW, Victor is working to port the rebaselining tool to build.webkit.org. You may want to check with him -- maybe he's close to finishing the patch.

:DG<

On Mon, Jan 11, 2010 at 9:06 AM, Jeremy Orlow <jorlow@chromium.org> wrote:

...
On Fri, Jan 8, 2010 at 9:52 AM, Jeremy Orlow <jorlow@chromium.org> wrote:

...
Plan 3 seems like the best (and simplest) one until

the infrastructure for

...
the others (and/or a champion for fixing currently failing tests) is available. What would it take to go with plan 3? I guess someone needs to rebaseline everything that's currently failing, check them in, and then someone (like bdash?) needs to flip a switch on the bots...? Did I miss anything? Are there instructions on how to do the rebaselining anywhere? I've only ever created pixel baselines for Chromium before (where we have a pretty neat tool that pretty much does it for you).

Does anyone know? I'm happy to do the rebaselining if someone can tell me how and we agree to turn pixel tests on on the bots.

...
On Fri, Jan 8, 2010 at 9:23 AM, Pam Greene <pam@chromium.org> wrote:

...
And one very quick, short-term solution: 3. Generate new pixel results to match the current behavior, and check them in as hypothetically correct. And of course if someone notices an existing problem and fixes it,

they

...
...
check in corrected images then. It doesn't help find current problems, but those are being missed now anyway. It does let the tests be run again approximately immediately, even faster than waiting for test expectations functionality, so we can catch regressions moving forward. - Pam

On Thu, Jan 7, 2010 at 5:01 PM, Ojan Vafai <ojan@chromium.org> wrote:

...
On Thu, Jan 7, 2010 at 10:22 AM, Darin Adler <darin@apple.com>

wrote:

...
> > On Jan 7, 2010, at 10:19 AM, Dimitri Glazkov wrote: > > Are we planning to run pixel tests on the build bots? > > If we can get them green, we should. It’s a lot of work. We need a > volunteer to do that work. We’ve tried before.

Two possible long-term solutions come to mind: 1. Turn the bots orange on pixel failures. They still need fixing, but are not as severe as text diff failures. I'm not a huge fan of this, but it's an option. 2. Add in a concept of expected failures and only turn the bots red for *unexpected* failurs. More details on this below. In chromium-land, there's an expectations file that lists expected failures and allows for distinguishing different types of failures (e.g. IMAGE vs. TEXT). It's like Skipped lists, but doesn't necessarily skip the test. Fixing the expected failures still needs doing of course, but can be done asynchronously. The primary advantage of this approach is that we can turn on pixel tests, keep the bots green and avoid further regressions. Would something like that make sense for WebKit as a whole? To be clear, we would be nearly as loathe to add tests to this file as we are about adding them to the Skipped lists. This just provides a way forward. While it's true that the bots used to be red more frequently with pixel tests turned on, for the most part, there weren't significant pixel regressions. Now, if you run the pixel tests on a clean build, there are a number of failures and a very large number of hash-mismatches that are within the failure tolerance level. -Ojan For reference, the format of the expectations file is something like this: // Fails the image diff but not the text diff. fast/forms/foo.html = IMAGE // Fails just the text diff. fast/forms/bar.html = TEXT // Fails both the image and text diffs. fast/forms/baz.html = IMAGE+TEXT // Skips this test (e.g. because it hangs run-webkit-tests or causes other tests to fail). SKIP : fast/forms/foo1.html = IMAGE _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

_______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Dirk Schulze

7 Jan 7 Jan

6:27 p.m.

Would be great to have pixel tests on a bot back. And it would be great, if the commit queue runs them too. Especially for patches of non-commiters. -Dirk Am Donnerstag, den 07.01.2010, 10:19 -0800 schrieb Dimitri Glazkov:

...

Are we planning to run pixel tests on the build bots? What's the general opinion here?

We're running them over at Chromium and it seems like a really good idea. Case in point:

Change http://trac.webkit.org/changeset/52900 broke a bunch of layout tests, all pixel results, and as such didn't register on the waterfall.

I rolled out the change for now.

:DG< _______________________________________________ webkit-dev mailing list webkit-dev@lists.webkit.org http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev

Nikolas Zimmermann

8 Jan 8 Jan

12:05 a.m.

Am 07.01.2010 um 19:19 schrieb Dimitri Glazkov:

...

Are we planning to run pixel tests on the build bots? What's the general opinion here?

We're running them over at Chromium and it seems like a really good idea. Case in point:

Change http://trac.webkit.org/changeset/52900 broke a bunch of layout tests, all pixel results, and as such didn't register on the waterfall.

I rolled out the change for now.

I'd also love to see pixel tests again, otherwhise we have to rely that ie. Dirk & me run SVG pixel tests on a regular base, to find these regressions. Just checking the -expected.txt files is not sufficient for SVG. Though enabling pixel tests for all layout tests will be a lot of work, as Darin already pointed out. How about we'd start only with svg/ pixel tests? Getting SVG pixel tests working across the ports would be a huge leap forward. Cheers, Niko

5716

Age (days ago)

5786

Last active (days ago)

List overview

Download

14 comments

10 participants

participants (10)

Darin Adler
Darin Fisher
Dimitri Glazkov
Dirk Pranke
Dirk Schulze
Eric Seidel
Jeremy Orlow
Nikolas Zimmermann
Ojan Vafai
Pam Greene