Problems porting layout tests to linux

Jean-Charles VERDIE

31 Aug 2006 31 Aug '06

3:39 p.m.

Hi, My team is currently working on having the layout tests ported to linux, to ease our future tasks on completing current efforts to make webkit available for linux. We have a big issue, which is that free type does not render text the same way as OSX does. The expected results do not match. We tried to hack freetype to make this work, but no obvious solution revealed. We started today a thread there to see if they have a solution. Here is an example of the diff we get : - RenderTable {TABLE} at (0,18) size 88x52 [bgcolor=#FFA500] [border: (1px outset #808080)] - RenderTableSection {TBODY} at (1,1) size 86x50 - RenderTableRow {TR} at (0,2) size 86x22 - RenderTableCell {TD} at (2,14) size 24x22 [border: (1px inset #808080)] [r=0 c=0 rs=2 cs=1] - RenderText {#text} at (2,2) size 20x18 - text run at (2,2) width 20: "X1" - RenderTableCell {TD} at (28,2) size 27x22 [border: (1px inset #808080)] [r=0 c=1 rs=1 cs=1] - RenderText {#text} at (2,2) size 20x18 - text run at (2,2) width 20: "X2" - RenderTableRow {TR} at (0,26) size 86x22 - RenderTableCell {TD} at (28,26) size 27x22 [border: (1px inset #808080)] [r=1 c=1 rs=2 cs=1] + RenderTable {TABLE} at (0,18) size 87x52 [bgcolor=#FFA500] [border: (1px outset #808080)] + RenderTableSection {TBODY} at (1,1) size 85x50 + RenderTableRow {TR} at (0,2) size 85x22 + RenderTableCell {TD} at (2,14) size 23x22 [border: (1px inset #808080)] [r=0 c=0 rs=2 cs=1] + RenderText {#text} at (2,2) size 19x18 + text run at (2,2) width 19: "X1" + RenderTableCell {TD} at (27,2) size 27x22 [border: (1px inset #808080)] [r=0 c=1 rs=1 cs=1] + RenderText {#text} at (2,2) size 19x18 + text run at (2,2) width 19: "X2" + RenderTableRow {TR} at (0,26) size 85x22 + RenderTableCell {TD} at (27,26) size 27x22 [border: (1px inset #808080)] [r=1 c=1 rs=2 cs=1] We imagine a solution which is to drop our goal to fix the problem, and instead generate "linux"-based -expected.txt files and to modify dump render tree to either chose osx or linux -based expected files, depending on the platform we are testing on. This solution has a major drawback, which is that it will force the community to maintain two version of expected files for every single test. Before starting to work in this direction, I'd appreciate some feedback on the feeling about this solution, and may be, fortunately, others ideas on how to remove this roadblock. Best regards -- Jean-Charles Verdié Origyn Web Browser for Embedded Systems Team CTO

Attachments:

attachment.html (text/html — 2.9 KB)

Show replies by date

Krzysztof Kowalczyk

31 Aug 31 Aug

8:08 p.m.

On 8/31/06, Jean-Charles VERDIE <jcverdie@origyn.fr> wrote:

...

We imagine a solution which is to drop our goal to fix the problem, and instead generate "linux"-based -expected.txt files and to modify dump render tree to either chose osx or linux -based expected files, depending on the platform we are testing on.

This solution has a major drawback, which is that it will force the community to maintain two version of expected files for every single test.

Before starting to work in this direction, I'd appreciate some feedback on the feeling about this solution, and may be, fortunately, others ideas on how to remove this roadblock.

It would probably be an ugly hack, but looking at the diff it looks like all differences are for sizes and fall within 1-2 pixels. How about a fuzz factor (either as percent of the total original size or in pixels) and accepting failures for sizes orginated from text rendering if they fall within fuzz factor? Fuzz factor would be best determined empirically (i.e. by comparing current linux vs. mac differences and choosing a factor that makes them pass). The thinking is that a major breakage would still be detected as falling outside of fuzz factor. -- kjk

David Hyatt

8:18 p.m.

If you look at DumpRenderTree on Win32, we just made the results append -win to the file. Unfortunately we can't really generate good results for Win32 yet, since someone needs to rewrite DumpRenderTree to be more of a windowed app (like Spinneret). dave On Aug 31, 2006, at 1:08 PM, Krzysztof Kowalczyk wrote:

...

On 8/31/06, Jean-Charles VERDIE <jcverdie@origyn.fr> wrote:

...
We imagine a solution which is to drop our goal to fix the problem, and instead generate "linux"-based -expected.txt files and to modify dump render tree to either chose osx or linux -based expected files, depending on the platform we are testing on.

This solution has a major drawback, which is that it will force the community to maintain two version of expected files for every single test.

Before starting to work in this direction, I'd appreciate some feedback on the feeling about this solution, and may be, fortunately, others ideas on how to remove this roadblock.

It would probably be an ugly hack, but looking at the diff it looks like all differences are for sizes and fall within 1-2 pixels. How about a fuzz factor (either as percent of the total original size or in pixels) and accepting failures for sizes orginated from text rendering if they fall within fuzz factor? Fuzz factor would be best determined empirically (i.e. by comparing current linux vs. mac differences and choosing a factor that makes them pass).

The thinking is that a major breakage would still be detected as falling outside of fuzz factor.

-- kjk _______________________________________________ webkit-dev mailing list webkit-dev@opendarwin.org http://www.opendarwin.org/mailman/listinfo/webkit-dev

Darin Adler

11:51 p.m.

To improve the situation, I'd like to see some of the following happen: 1) Reorganize the tests so that tests that should work across all platforms are separate from ones that might need different results per platform. (Perhaps we could rationalize the organization and naming of tests in some other ways as well.) 2) Change more tests to use dumpAsText as a way of making them platform-independent. 3) Come up with some techniques to make tests independent of font widths to make them platform-independent; for example, perhaps we can create a font with widths that are consistent on all platforms and use it for most tests. 4) Consider alternate dumping formats that would include what's relevant to check if a test succeeded that don't dump the entire layout and position of each element -- something in between "dump as text" and "dump render tree". Also, I think the DumpRenderTree tool is going to need a rename eventually. That was a good name for the original tool back when I first wrote it and it was only about render trees, but now it's more like "test engine" or something along those lines. Similarly, I think the LayoutTests directory might need a rename for similar reasons. -- Darin

Jean-Charles VERDIE

1 Sep 1 Sep

8:38 a.m.

Hi Darin & all, On 9/1/06, Darin Adler <darin@apple.com> wrote:

...

To improve the situation, I'd like to see some of the following happen:

1) Reorganize the tests so that tests that should work across all platforms are separate from ones that might need different results per platform. (Perhaps we could rationalize the organization and naming of tests in some other ways as well.)

2) Change more tests to use dumpAsText as a way of making them platform-independent.

3) Come up with some techniques to make tests independent of font widths to make them platform-independent; for example, perhaps we can create a font with widths that are consistent on all platforms and use it for most tests.

It would surely be the best plan, but I'm not sure that it will be possible (in a measurable human-based time). May be the fastest way of having the tests more cross-platform is to split them into three categories: 1- those which already are cross-platform. 2- those which are not cross-platform, but which do not need a font. E.g. comparing 2 cells of a table. We could replace the texts by pictures with a specific size. The test would not be self-documented any more, but we can solve this by a 2-windows solution, one with the explanation, and one with the actual test. 3.1- those which are not cross-platform, but which need a font. Some can be addressed by replacing DumpRenderTree by DumpAsText. 3.2 - ...But other tests in this third will hardly need a font, e.g. css manipulation or font replacement. For this latest part, I don't imagine a solution right now. May be some can be addressed by your proposal #4 But one question is: how much of these tests remain? if we address 1- to 3.1 ? IMHO here is the list : - DOM: already x-platform - plugins / svg / http / editing: we've not started looking at them - Tables, fast, traversal: all should be replaceable by images - CSS 1, 2.1: here is the list of our problems :) There are 362 tests in these folders, but may be only 30% of them are belonging to the problematic category so something like a big hundred of tests would need to be re-thinked... 4) Consider alternate dumping formats that would include what's

...

relevant to check if a test succeeded that don't dump the entire layout and position of each element -- something in between "dump as text" and "dump render tree".

Yes! Also, I think the DumpRenderTree tool is going to need a rename

...

eventually. That was a good name for the original tool back when I first wrote it and it was only about render trees, but now it's more like "test engine" or something along those lines. Similarly, I think the LayoutTests directory might need a rename for similar reasons.

Very good idea, for sure. Best regards, -- Jean-Charles Verdié Origyn Web Browser for Embedded Systems Team CTO

Darin Adler

7:42 p.m.

On Sep 1, 2006, at 1:38 AM, Jean-Charles VERDIE wrote:

...

It would surely be the best plan, but I'm not sure that it will be possible (in a measurable human-based time). May be the fastest way of having the tests more cross-platform is to split them into three categories: 1- those which already are cross-platform. 2- those which are not cross-platform, but which do not need a font. E.g. comparing 2 cells of a table. We could replace the texts by pictures with a specific size. The test would not be self- documented any more, but we can solve this by a 2-windows solution, one with the explanation, and one with the actual test. 3.1- those which are not cross-platform, but which need a font. Some can be addressed by replacing DumpRenderTree by DumpAsText. 3.2 - ...But other tests in this third will hardly need a font, e.g. css manipulation or font replacement.

For this latest part, I don't imagine a solution right now. May be some can be addressed by your proposal #4 But one question is: how much of these tests remain? if we address 1- to 3.1 ? IMHO here is the list : - DOM: already x-platform - plugins / svg / http / editing: we've not started looking at them - Tables, fast, traversal: all should be replaceable by images - CSS 1, 2.1: here is the list of our problems :) There are 362 tests in these folders, but may be only 30% of them are belonging to the problematic category so something like a big hundred of tests would need to be re-thinked...

Sounds good. If you would like to start doing some work in this area, please let us know about your specific ideas. Maybe for your (2) above we could make tests where certain parts of the page wouldn't dump. That way the test could be self documenting but the part that dumps wouldn't be affected by the text. -- Darin

Jean-Charles VERDIE

2 Sep 2 Sep

7:52 a.m.

Darin, We'll schedule a task for identifying in CSS 1/2.1 folders which tests should be replaced by picture-based ones, and which could not. For the latter, we'll try to think about workarounds to propose. Have you an accurate placeholder to suggest for these thoughts? I was not sure whether bugzilla or wiki would be better.... For others tests folders, I think that if the proposed plan fits WebKit community's interest, we should file bugs about replacing Tables, Fast and Traversal by image-based tests, and also bugs about exploring the parts we haven't done yet (plugin, svg, http, editing)... Let me know what you think, Regards Jean-Charles On 9/1/06, Darin Adler <darin@apple.com> wrote:

...

On Sep 1, 2006, at 1:38 AM, Jean-Charles VERDIE wrote:

...
It would surely be the best plan, but I'm not sure that it will be possible (in a measurable human-based time). May be the fastest way of having the tests more cross-platform is to split them into three categories: 1- those which already are cross-platform. 2- those which are not cross-platform, but which do not need a font. E.g. comparing 2 cells of a table. We could replace the texts by pictures with a specific size. The test would not be self- documented any more, but we can solve this by a 2-windows solution, one with the explanation, and one with the actual test. 3.1- those which are not cross-platform, but which need a font. Some can be addressed by replacing DumpRenderTree by DumpAsText. 3.2 - ...But other tests in this third will hardly need a font, e.g. css manipulation or font replacement.

For this latest part, I don't imagine a solution right now. May be some can be addressed by your proposal #4 But one question is: how much of these tests remain? if we address 1- to 3.1 ? IMHO here is the list : - DOM: already x-platform - plugins / svg / http / editing: we've not started looking at them - Tables, fast, traversal: all should be replaceable by images - CSS 1, 2.1: here is the list of our problems :) There are 362 tests in these folders, but may be only 30% of them are belonging to the problematic category so something like a big hundred of tests would need to be re-thinked...

Sounds good. If you would like to start doing some work in this area, please let us know about your specific ideas.

Maybe for your (2) above we could make tests where certain parts of the page wouldn't dump. That way the test could be self documenting but the part that dumps wouldn't be affected by the text.

-- Darin

-- Jean-Charles Verdié Origyn Web Browser for Embedded Systems Team CTO

Darin Adler

5 Sep 5 Sep

6:33 p.m.

On Sep 2, 2006, at 12:52 AM, Jean-Charles VERDIE wrote:

...

We'll schedule a task for identifying in CSS 1/2.1 folders which tests should be replaced by picture-based ones, and which could not. For the latter, we'll try to think about workarounds to propose.

Sounds good. I think an earlier first step should be to reorganize layout tests so that the platform-independent ones are in a different directory than the dependent ones.

...

Have you an accurate placeholder to suggest for these thoughts? I was not sure whether bugzilla or wiki would be better....

Either or both seems fine -- I don't feel strongly about this.

...

For others tests folders, I think that if the proposed plan fits WebKit community's interest, we should file bugs about replacing Tables, Fast and Traversal by image-based tests, and also bugs about exploring the parts we haven't done yet (plugin, svg, http, editing)...

Bugs are a great way to track specific tasks to be done. But I think it's even better if there's a way that the layout tests source tree self-documents which tests are expected to have different results per platform and which are not. -- Darin

Nikolas Zimmermann

1 Sep 1 Sep

10:52 a.m.

Hi Darin & Dave & Krzystof & Jean-Claude, During the Qt/KDE platform porting, I've basically run into the same problem, though I also dislike the idea of a fuzz factor.

...

1) Reorganize the tests so that tests that should work across all platforms are separate from ones that might need different results per platform. (Perhaps we could rationalize the organization and naming of tests in some other ways as well.)

I still think this is the best solution, having -expected-qt.txt, -expected-gdk.txt, -expected-osx.txt, -expected-win.txt, to be able to find out regressions pixel-wise, so that 1px does make a difference.

...

Also, I think the DumpRenderTree tool is going to need a rename eventually. That was a good name for the original tool back when I first wrote it and it was only about render trees, but now it's more like "test engine" or something along those lines. Similarly, I think the LayoutTests directory might need a rename for similar reasons.

Something like RegressionTester may be better, indeed. My suggestions is to create subdirectories for the specific platforms, where the -expected-<PLATFORM>.txt files & image diffs etc. will go. Maybe it would be even wiser to adjust the "Get-WebKit-From-SVN" scripts to just svn co the specific subdirectory needed. For example it doesn't make sense for you guys to get our -expected-gdk/qt.txt files. And we also don't need for instance the -win.txt files. Please let me know what you think. Niko

Jean-Charles VERDIE

8:12 a.m.

Hi Krzystof, We had this idea, but it brings a new problem: Let's imagine that we tolerate a 2 px error. Let's imagine, again, that for one specific test, the rendered test is 1 px smaller than the reference. Everything's ok. But then I make some very bad change to the code and tomorrow's build brings that the same test is now 1 px bigger. Still on the road, but the regression would not be detected... Another (bigger) issue : the fuzz factor is a kind of a chain reaction : if my first block is 2 px bigger, the second block will begin 2 px after the expected position, if it is still 2 px bigger, the next one will begin 4 px after the expected position, and so on and so on... On 8/31/06, Krzysztof Kowalczyk <kkowalczyk@gmail.com> wrote:

...

On 8/31/06, Jean-Charles VERDIE <jcverdie@origyn.fr> wrote:

...
We imagine a solution which is to drop our goal to fix the problem, and instead generate "linux"-based -expected.txt files and to modify dump render tree to either chose osx or linux -based expected files, depending on the platform we are testing on.

This solution has a major drawback, which is that it will force the community to maintain two version of expected files for every single test.

Before starting to work in this direction, I'd appreciate some feedback on the feeling about this solution, and may be, fortunately, others ideas on how to remove this roadblock.

It would probably be an ugly hack, but looking at the diff it looks like all differences are for sizes and fall within 1-2 pixels. How about a fuzz factor (either as percent of the total original size or in pixels) and accepting failures for sizes orginated from text rendering if they fall within fuzz factor? Fuzz factor would be best determined empirically (i.e. by comparing current linux vs. mac differences and choosing a factor that makes them pass).

The thinking is that a major breakage would still be detected as falling outside of fuzz factor.

-- kjk

-- Jean-Charles Verdié Origyn Web Browser for Embedded Systems Team CTO

7003

Age (days ago)

7008

Last active (days ago)

List overview

Download

9 comments

5 participants

participants (5)

Darin Adler
David Hyatt
Jean-Charles VERDIE
Krzysztof Kowalczyk
Nikolas Zimmermann