[webkit-dev] Introducing run-perf-tests and Adding Performance Bots

Thu Mar 1 12:36:36 PST 2012

I have opened the
On Thu, Mar 1, 2012 at 3:17 PM, Žan Doberšek <zandobersek at gmail.com> wrote:
> To get WKTR running the performance tests a '-2' switch must be added to
> PerfTestRunner and some refactoring is required in the WKTR itself to
> properly handle the '--no-timeout' switch when given.
>
> I've got a diff of these changes laying around I can transform into a patch
> if there isn't one yet, just point me to a bug (or let's create one).

I have opened bug https://bugs.webkit.org/show_bug.cgi?id=80042 . Can
you assign it to yourself?

Best regards,
jesus

>
> Best,
> Zan
>
>>
>>
>> Cheers,
>> jesus
>>
>> On Tue, Jan 31, 2012 at 8:16 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>> > FYI, I've added a wiki page describing how to write a new perf.
>> > test: https://trac.webkit.org/wiki/Writing%20Performance%20Tests
>> >
>> > On Fri, Jan 20, 2012 at 11:20 AM, Ojan Vafai <ojan at chromium.org> wrote:
>> >>
>> >> On Thu, Jan 19, 2012 at 3:20 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
>> >>>
>> >>> I didn't merge it into run-webkit-tests because performance tests
>> >>> don't
>> >>> pass/fail but instead give us some values that fluctuate over time.
>> >>> While
>> >>> Chromium takes an approach to hard-code the rage of acceptable values,
>> >>> such
>> >>> an approach has a high maintenance cost and prone to problems such as
>> >>> having
>> >>> to increase the range periodically as the score slowly degrades over
>> >>> time.
>> >>> Also, as you can see on Chromium perf bots, the test results tend to
>> >>> fluctuate a lot so hard-coding a tight range of acceptable value is
>> >>> tricky.
>> >>
>> >>
>> >> While this isn't perfect, I still think it's worth doing.
>> >
>> >
>> > I'm afraid that the maintenance cost here will be too high. Values will
>> > necessarily depend on each bot so we'll need <number of tests>×<number
>> > of
>> > bots> expectations, and I don't think people are enthusiastic about
>> > maintaining values like that over time (even I don't want to do that
>> > myself).
>> >
>> >> Turning the bot red when a performance test fails badly is helpful for
>> >> finding and reverting regressions quickly, which in turn helps identify
>> >> smaller regressions more easily (large regressions mask smaller ones).
>> >
>> >
>> > I agree. Maybe we can obtain the historical average and standard
>> > deviation
>> > and turn bots red if the value doesn't fall within <some value between 1
>> > and
>> > 2> standard deviations.
>> >
>> >> In either case, we have to get the bots running the tests and work on
>> >> getting reliable data first.
>> >
>> >
>> > After http://trac.webkit.org/changeset/106211, values for most tests
>> > have
>> > gotten very stable. They tend to vary within 5% range.
>> >
>> > - Ryosuke
>> >
>> >
>> > _______________________________________________
>> > webkit-dev mailing list
>> > webkit-dev at lists.webkit.org
>> > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>> >
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
>