[webkit-dev] Introducing run-perf-tests and Adding Performance Bots

Thu Mar 1 09:41:40 PST 2012

A Qt WebKit1 performance bot was added last week, sorry for the late
announcement.

If I'm not mistaken, currently run-perf-tests works with DRT only, but
what if we would like to make it work with WTR as well so we could
also have WebKit2 performance bots running? I'm not aware of the
infrastructure provided by webkitpy (Drivers, etc) so I'm not sure
about the amount of work needed...

Cheers,
jesus

On Tue, Jan 31, 2012 at 8:16 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
> FYI, I've added a wiki page describing how to write a new perf.
> test: https://trac.webkit.org/wiki/Writing%20Performance%20Tests
>
> On Fri, Jan 20, 2012 at 11:20 AM, Ojan Vafai <ojan at chromium.org> wrote:
>>
>> On Thu, Jan 19, 2012 at 3:20 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>>>
>>> I didn't merge it into run-webkit-tests because performance tests don't
>>> pass/fail but instead give us some values that fluctuate over time. While
>>> Chromium takes an approach to hard-code the rage of acceptable values, such
>>> an approach has a high maintenance cost and prone to problems such as having
>>> to increase the range periodically as the score slowly degrades over time.
>>> Also, as you can see on Chromium perf bots, the test results tend to
>>> fluctuate a lot so hard-coding a tight range of acceptable value is tricky.
>>
>>
>> While this isn't perfect, I still think it's worth doing.
>
>
> I'm afraid that the maintenance cost here will be too high. Values will
> necessarily depend on each bot so we'll need <number of tests>×<number of
> bots> expectations, and I don't think people are enthusiastic about
> maintaining values like that over time (even I don't want to do that
> myself).
>
>> Turning the bot red when a performance test fails badly is helpful for
>> finding and reverting regressions quickly, which in turn helps identify
>> smaller regressions more easily (large regressions mask smaller ones).
>
>
> I agree. Maybe we can obtain the historical average and standard deviation
> and turn bots red if the value doesn't fall within <some value between 1 and
> 2> standard deviations.
>
>> In either case, we have to get the bots running the tests and work on
>> getting reliable data first.
>
>
> After http://trac.webkit.org/changeset/106211, values for most tests have
> gotten very stable. They tend to vary within 5% range.
>
> - Ryosuke
>
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>