[webkit-dev] layout tests

Fri Jan 19 06:08:05 PST 2007

On Friday 19 January 2007 14:31, David D. Kilzer wrote:
> On Jan 19, 2007, at 4:23 AM, Lars Knoll wrote:
> > * run-webkit-tests generated results for new tests on the fly
> >
> > [...]
> >
> > I've fixed this issue with r18976. run-webkit-tests does now not
> > generate new
> > results by default anymore. You'll have to pass the --new-tests
> > flag to it to
> > force it to do so.
>
> What is the behavior on the buildbot if a test is committed without
> results after applying this patch?  It SHOULD fail!  Currently, the
> buildbot will generate new results (that automatically pass) with no
> one the wiser.

That's what the change does. It doesn't generate new results, and will mark 
the test as "new" (not "failed"). bdash said he can fix build.webkit.org to 
show these explicitly as new tests. Marking them as failures is a bit too 
much, as you'd get 'regressions' on the qt build as soon as you added a test 
containing only the -expected files for the Mac and vice versa.

> > * All test results are stored together with the LayoutTests.
>
> Thinking out loud, I like the idea of having separate results trees,
> but I think it would be difficult to keep them in sync by putting
> them in another repository, especially when committing.  It will be
> challenging enough to generate results for all the trees when a new
> test is created or an existing test is fixed.  

That's why I thought that we shouldn't mark tests without a result for the 
platform as failures, but as what they are: new tests. You'd see them on the 
buildbot, manually inspect the new test on your platform and submit the 
results if the test passes. 

> Some example directory 
> structures (at the same level as LayoutTests):
>
>    LayoutTestsResultsMac
>    LayoutTestsResultsQt

I don't think that's a good idea, as we'd clutter the top level directory with 
lots of these in the long term.
>
>    LayoutTestsTextResults
>    LayoutTestsImageResults/mac
>    LayoutTestsImageResults/qt
>
>    LayoutTestsResults/text
>    LayoutTestsResults/image/mac
>    LayoutTestsResults/image/qt

I'd prefer:

  LayoutTestResults/text
  LayoutTestsResults/mac
  LayoutTestsResults/qt

There are lots of results that are RenderTreeDumps. These are platform 
dependent, but not images.

> Will we need some kind of a generate-test-results-on-all-ports-bot?
> We can't expect every developer to have "one of each" kind of
> system.  Or must we expect a developer on each port to review new
> tests and create updated test results on a per-port basis?

My idea was the last proposal. It's easiest to handle and verify. That's why I 
made sure you see new tests and why the results for new threads won't get 
created automatically.

> Would test results with the Qt port on the Mac be able to use the
> test results with the Qt port on Linux (specifically, the image
> results)?  I could see subtle differences occurring between the same
> "graphics port" on different operating systems.

Currently not. We're currently limiting our testing to Linux. Ideally it would 
be best to get 100% platform independent test results, but that's more or 
less impossible. So it could very well happen that we'll at some point also 
have Qt-Mac results.
>
> Does Subversion have a way to do something like "check out this
> entire tree, except for this directory" and then honor that
> commitment when updating as well?  Or would a custom update script be
> needed, or a tool like svk?

Good question. Maybe someone with more svn knowledge than I have has an 
answer.

> It's too bad there isn't a way to store a set of base results, then
> only store "expected differences" to each port.  That would cut down
> on the amount of space required by each new port's test results, but
> it might be tricky to do with image results, and a text diff might be
> as big or bigger than just new results.

It would actually not be smaller. The only place where it works is for the 
text only tests. The rendered page has slightly different coordinates and 
line breaks due to different font metrics. Unfortunately these differences 
show up as huge diffs if you try to do a diff between the RenderTree and the 
one on the Mac. 

I did however add a hack (see the --strict) option in run-webkit-tests, that 
tries to strip out all these things (coordinates, line breaks etc) from the 
RenderTree dump and then compare to the result on the Mac. This is a good 
test to see whether we have any bigger issues. Unfortunately it has two 
drawbacks: It doesn't work 100% reliable and can only be used for manual 
verification, and I would really like to have the positioning information in 
our renderTree dumps as well.

> Are there any other open source projects with multiple ports that
> have already solved this problem?

khtml in KDE had similar issues, due to different fonts that are intalled on 
different linux machines.

The solution was to override Qt's font system for testing purposes and have a 
very limited set of fonts that are rendered the same way on all platforms.

We could probably do the same with WebKit (implement a hook to override 
WebKits native font system with a platform independent one). Like that we 
could probably even get platform independent image tests. There is however 
one drawback to this: You loose the ability to automatically test the text 
subsystem.

> Sorry...more questions than answers!  :)

I guess they help to move the discussion forward :)

Cheers,
Lars