Hi, I have finally gotten the layout tests for the Qt build running a few days ago. This does however bring two basic problems: * run-webkit-tests generated results for new tests on the fly This give problems both for the user and the build bots. As a user, you don't see that there is a new test checked in for which you don't have a result. This is especially a problem for the Qt build currently, but will also hit the Mac build once we start checking in layout tests. A test that doesn't have a result on a platform will most probably also need some manual inspection to ensure it works correctly on that platform. The build bots run into trouble, as they will generate/store the result for the new test locally. Once someone commits the result, you'll end up getting svn merging issues, as the files the build bot wants to check out already exist on disk. I've fixed this issue with r18976. run-webkit-tests does now not generate new results by default anymore. You'll have to pass the --new-tests flag to it to force it to do so. * All test results are stored together with the LayoutTests. This is ok for text only tests (as the results can be shared), only that run-webkit-tests currently doesn't know whether a test is text only. Once we submit the Qt test results (including pixel tests), you'll get 3 move files per test case. In the long term we might get results from even more platforms, completely cluttering the directories. Checking out all the test results does already now take quite some time (and people working on the Qt port don't really need the Mac results and vice versa). Adding more results will at some point make this prohibitive (the LayoutTests directory is at around 600MB currently, the number can be multiplied with every platform that has tests running). One way of solving this is to move all test results into a separate results directory (or even separate repository). In there once could have subdirectories shared (for the text only tests), mac and qt. We could also add some sort of script to use instead of 'svn update' that will not check out the results you don't need. Opinions? Cheers, Lars
On Jan 19, 2007, at 4:23 AM, Lars Knoll wrote:
* run-webkit-tests generated results for new tests on the fly
[...]
I've fixed this issue with r18976. run-webkit-tests does now not generate new results by default anymore. You'll have to pass the --new-tests flag to it to force it to do so.
What is the behavior on the buildbot if a test is committed without results after applying this patch? It SHOULD fail! Currently, the buildbot will generate new results (that automatically pass) with no one the wiser.
* All test results are stored together with the LayoutTests.
Thinking out loud, I like the idea of having separate results trees, but I think it would be difficult to keep them in sync by putting them in another repository, especially when committing. It will be challenging enough to generate results for all the trees when a new test is created or an existing test is fixed. Some example directory structures (at the same level as LayoutTests): LayoutTestsResultsMac LayoutTestsResultsQt LayoutTestsTextResults LayoutTestsImageResults/mac LayoutTestsImageResults/qt LayoutTestsResults/text LayoutTestsResults/image/mac LayoutTestsResults/image/qt Will we need some kind of a generate-test-results-on-all-ports-bot? We can't expect every developer to have "one of each" kind of system. Or must we expect a developer on each port to review new tests and create updated test results on a per-port basis? Would test results with the Qt port on the Mac be able to use the test results with the Qt port on Linux (specifically, the image results)? I could see subtle differences occurring between the same "graphics port" on different operating systems. Does Subversion have a way to do something like "check out this entire tree, except for this directory" and then honor that commitment when updating as well? Or would a custom update script be needed, or a tool like svk? It's too bad there isn't a way to store a set of base results, then only store "expected differences" to each port. That would cut down on the amount of space required by each new port's test results, but it might be tricky to do with image results, and a text diff might be as big or bigger than just new results. Are there any other open source projects with multiple ports that have already solved this problem? Sorry...more questions than answers! :) Dave
On Friday 19 January 2007 14:31, David D. Kilzer wrote:
On Jan 19, 2007, at 4:23 AM, Lars Knoll wrote:
* run-webkit-tests generated results for new tests on the fly
[...]
I've fixed this issue with r18976. run-webkit-tests does now not generate new results by default anymore. You'll have to pass the --new-tests flag to it to force it to do so.
What is the behavior on the buildbot if a test is committed without results after applying this patch? It SHOULD fail! Currently, the buildbot will generate new results (that automatically pass) with no one the wiser.
That's what the change does. It doesn't generate new results, and will mark the test as "new" (not "failed"). bdash said he can fix build.webkit.org to show these explicitly as new tests. Marking them as failures is a bit too much, as you'd get 'regressions' on the qt build as soon as you added a test containing only the -expected files for the Mac and vice versa.
* All test results are stored together with the LayoutTests.
Thinking out loud, I like the idea of having separate results trees, but I think it would be difficult to keep them in sync by putting them in another repository, especially when committing. It will be challenging enough to generate results for all the trees when a new test is created or an existing test is fixed.
That's why I thought that we shouldn't mark tests without a result for the platform as failures, but as what they are: new tests. You'd see them on the buildbot, manually inspect the new test on your platform and submit the results if the test passes.
Some example directory structures (at the same level as LayoutTests):
LayoutTestsResultsMac LayoutTestsResultsQt
I don't think that's a good idea, as we'd clutter the top level directory with lots of these in the long term.
LayoutTestsTextResults LayoutTestsImageResults/mac LayoutTestsImageResults/qt
LayoutTestsResults/text LayoutTestsResults/image/mac LayoutTestsResults/image/qt
I'd prefer: LayoutTestResults/text LayoutTestsResults/mac LayoutTestsResults/qt There are lots of results that are RenderTreeDumps. These are platform dependent, but not images.
Will we need some kind of a generate-test-results-on-all-ports-bot? We can't expect every developer to have "one of each" kind of system. Or must we expect a developer on each port to review new tests and create updated test results on a per-port basis?
My idea was the last proposal. It's easiest to handle and verify. That's why I made sure you see new tests and why the results for new threads won't get created automatically.
Would test results with the Qt port on the Mac be able to use the test results with the Qt port on Linux (specifically, the image results)? I could see subtle differences occurring between the same "graphics port" on different operating systems.
Currently not. We're currently limiting our testing to Linux. Ideally it would be best to get 100% platform independent test results, but that's more or less impossible. So it could very well happen that we'll at some point also have Qt-Mac results.
Does Subversion have a way to do something like "check out this entire tree, except for this directory" and then honor that commitment when updating as well? Or would a custom update script be needed, or a tool like svk?
Good question. Maybe someone with more svn knowledge than I have has an answer.
It's too bad there isn't a way to store a set of base results, then only store "expected differences" to each port. That would cut down on the amount of space required by each new port's test results, but it might be tricky to do with image results, and a text diff might be as big or bigger than just new results.
It would actually not be smaller. The only place where it works is for the text only tests. The rendered page has slightly different coordinates and line breaks due to different font metrics. Unfortunately these differences show up as huge diffs if you try to do a diff between the RenderTree and the one on the Mac. I did however add a hack (see the --strict) option in run-webkit-tests, that tries to strip out all these things (coordinates, line breaks etc) from the RenderTree dump and then compare to the result on the Mac. This is a good test to see whether we have any bigger issues. Unfortunately it has two drawbacks: It doesn't work 100% reliable and can only be used for manual verification, and I would really like to have the positioning information in our renderTree dumps as well.
Are there any other open source projects with multiple ports that have already solved this problem?
khtml in KDE had similar issues, due to different fonts that are intalled on different linux machines. The solution was to override Qt's font system for testing purposes and have a very limited set of fonts that are rendered the same way on all platforms. We could probably do the same with WebKit (implement a hook to override WebKits native font system with a platform independent one). Like that we could probably even get platform independent image tests. There is however one drawback to this: You loose the ability to automatically test the text subsystem.
Sorry...more questions than answers! :)
I guess they help to move the discussion forward :) Cheers, Lars
participants (2)
-
David D. Kilzer
-
Lars Knoll