[webkit-dev] insanity of updating 4000+ baseline images due to font rendering change?

Thu Oct 20 08:16:37 PDT 2011

On Oct 20, 2011, at 1:04 AM, Ryosuke Niwa wrote:

> On Wed, Oct 19, 2011 at 2:04 PM, Elliot Poger <epoger at google.com> wrote:
> Here are the various approaches I can think of... what's the Hive-Mind-Approved approach?
> Commit 4500 new baseline images for SnowLeopard
> pro: known to work, will catch any regressions that come later
> con: takes a long time to commit, chews up disk space and bandwidth for all developers, future minor changes may require yet another set of new baselines
> Leave all SnowLeopard tests marked as "PASS FAIL" (or maybe mark them "SKIP") in test_expectations 
> pro: known to work, quick and easy, doesn't clog repo space and developer update bandwidth, future minor changes won't break any bots
> con: will not catch any regressions that come later on SnowLeopard
> Remove descriptive text from all these tests, so that text rendering is only evaluated in tests specifically for that purpose
> pro: prevents this problem for future OS versions, should allow for lots more baseline images to be shared across platforms
> con: a lot of work to replace all existing baseline images, must coordinate across community of Chromium/WebKit developers, tests will be more difficult to interpret without text
> Figure out how our test pages can be rendered with a completely cross-platform pixel-equivalent font
> pro: similar to above but tests keep their descriptive text
> con: similar to above but more technically challenging
> Augment our pixel-diff tools to allow for comparison masks (only pay attention to pixel diffs within this rectangle)
> pro: existing baseline images can stay in place, and perhaps be shared with new OS versions and platforms
> con: requires modification of pixel-diff tools, need to add comparison mask to each test definition
> I'd add another option to increase the tolerance level so that we ignore all these tiny gradient/font rendering differences. I don't think the added maintenance cost is not worth the benefit of being able to catch all regressions.
> 
> But I'd argue that we should keep baselines for Snow Leopard with tolerance=0 and increase the tolerance level of Leopard since Snow Leopard is a newer platform and will probably be supported for a longer period of time than Leopard.

Why not use Lion as the tolerance=0 baseline?

Something else to bear in mind before lots of rebasing. I implemented mock scrollbars with the intention that they'd be enabled by all platforms, to reduce platform diffs in image results. Sadly, we can't use them in WK1 on Mac (since AppKit draws the scrollbars there), so perhaps we should have a policy that all pixel results on Mac are generated with WK2. I'm not sure how that fits into your Chromium plans.

Simon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20111020/85c38c01/attachment.html>