[webkit-dev] Introducing run-perf-tests and Adding Performance Bots

Thu Mar 1 10:17:31 PST 2012

On Thu, Mar 1, 2012 at 6:41 PM, Jesus Sanchez-Palencia <jesus at webkit.org>wrote:

> A Qt WebKit1 performance bot was added last week, sorry for the late
> announcement.
>
> If I'm not mistaken, currently run-perf-tests works with DRT only, but
> what if we would like to make it work with WTR as well so we could
> also have WebKit2 performance bots running? I'm not aware of the
> infrastructure provided by webkitpy (Drivers, etc) so I'm not sure
> about the amount of work needed...
>

To get WKTR running the performance tests a '-2' switch must be added to
PerfTestRunner and some refactoring is required in the WKTR itself to
properly handle the '--no-timeout' switch when given.

I've got a diff of these changes laying around I can transform into a patch
if there isn't one yet, just point me to a bug (or let's create one).

Best,
Zan

>
> Cheers,
> jesus
>
> On Tue, Jan 31, 2012 at 8:16 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
> > FYI, I've added a wiki page describing how to write a new perf.
> > test: https://trac.webkit.org/wiki/Writing%20Performance%20Tests
> >
> > On Fri, Jan 20, 2012 at 11:20 AM, Ojan Vafai <ojan at chromium.org> wrote:
> >>
> >> On Thu, Jan 19, 2012 at 3:20 PM, Ryosuke Niwa <rniwa at webkit.org> wrote:
> >>>
> >>> I didn't merge it into run-webkit-tests because performance tests don't
> >>> pass/fail but instead give us some values that fluctuate over time.
> While
> >>> Chromium takes an approach to hard-code the rage of acceptable values,
> such
> >>> an approach has a high maintenance cost and prone to problems such as
> having
> >>> to increase the range periodically as the score slowly degrades over
> time.
> >>> Also, as you can see on Chromium perf bots, the test results tend to
> >>> fluctuate a lot so hard-coding a tight range of acceptable value is
> tricky.
> >>
> >>
> >> While this isn't perfect, I still think it's worth doing.
> >
> >
> > I'm afraid that the maintenance cost here will be too high. Values will
> > necessarily depend on each bot so we'll need <number of tests>×<number of
> > bots> expectations, and I don't think people are enthusiastic about
> > maintaining values like that over time (even I don't want to do that
> > myself).
> >
> >> Turning the bot red when a performance test fails badly is helpful for
> >> finding and reverting regressions quickly, which in turn helps identify
> >> smaller regressions more easily (large regressions mask smaller ones).
> >
> >
> > I agree. Maybe we can obtain the historical average and standard
> deviation
> > and turn bots red if the value doesn't fall within <some value between 1
> and
> > 2> standard deviations.
> >
> >> In either case, we have to get the bots running the tests and work on
> >> getting reliable data first.
> >
> >
> > After http://trac.webkit.org/changeset/106211, values for most tests
> have
> > gotten very stable. They tend to vary within 5% range.
> >
> > - Ryosuke
> >
> >
> > _______________________________________________
> > webkit-dev mailing list
> > webkit-dev at lists.webkit.org
> > http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
> >
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> http://lists.webkit.org/mailman/listinfo.cgi/webkit-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120301/aa28657b/attachment.html>