[webkit-dev] insanity of updating 4000+ baseline images due to font rendering change?

Thu Oct 20 20:35:58 PDT 2011

On Thu, Oct 20, 2011 at 8:16 AM, Simon Fraser <simon.fraser at apple.com> wrote:
>
> On Oct 20, 2011, at 1:04 AM, Ryosuke Niwa wrote:
>
> On Wed, Oct 19, 2011 at 2:04 PM, Elliot Poger <epoger at google.com> wrote:
>>
>> Here are the various approaches I can think of... what's the
>> Hive-Mind-Approved approach?
>>
>> Commit 4500 new baseline images for SnowLeopard
>>
>> pro: known to work, will catch any regressions that come later
>> con: takes a long time to commit, chews up disk space and bandwidth for
>> all developers, future minor changes may require yet another set of new
>> baselines
>>
>> Leave all SnowLeopard tests marked as "PASS FAIL" (or maybe mark them
>> "SKIP") in test_expectations
>>
>> pro: known to work, quick and easy, doesn't clog repo space and developer
>> update bandwidth, future minor changes won't break any bots
>> con: will not catch any regressions that come later on SnowLeopard
>>
>> Remove descriptive text from all these tests, so that text rendering is
>> only evaluated in tests specifically for that purpose
>>
>> pro: prevents this problem for future OS versions, should allow for lots
>> more baseline images to be shared across platforms
>> con: a lot of work to replace all existing baseline images, must
>> coordinate across community of Chromium/WebKit developers, tests will be
>> more difficult to interpret without text
>>
>> Figure out how our test pages can be rendered with a completely
>> cross-platform pixel-equivalent font
>>
>> pro: similar to above but tests keep their descriptive text
>> con: similar to above but more technically challenging
>>
>> Augment our pixel-diff tools to allow for comparison masks (only pay
>> attention to pixel diffs within this rectangle)
>>
>> pro: existing baseline images can stay in place, and perhaps be shared
>> with new OS versions and platforms
>> con: requires modification of pixel-diff tools, need to add comparison
>> mask to each test definition
>
> I'd add another option to increase the tolerance level so that we ignore all
> these tiny gradient/font rendering differences. I don't think the
> added maintenance cost is not worth the benefit of being able to catch all
> regressions.
> But I'd argue that we should keep baselines for Snow Leopard with
> tolerance=0 and increase the tolerance level of Leopard since Snow Leopard
> is a newer platform and will probably be supported for a longer period of
> time than Leopard.
>
> Why not use Lion as the tolerance=0 baseline?
> Something else to bear in mind before lots of rebasing. I implemented mock
> scrollbars with the intention that they'd be enabled by all platforms, to
> reduce platform diffs in image results. Sadly, we can't use them in WK1 on
> Mac (since AppKit draws the scrollbars there), so perhaps we should have a
> policy that all pixel results on Mac are generated with WK2. I'm not sure
> how that fits into your Chromium plans.
> Simon

Chromium's DRT on Windows actually already has a custom theme that we
use to eliminate diffs between windows versions. It probably wouldn't
be too hard to match your scrollbars.

-- Dirk