[webkit-dev] Introducing run-perf-tests and Adding Performance Bots

Ryosuke Niwa rniwa at webkit.org
Thu Jan 19 15:20:27 PST 2012

Hi WebKittens,

*Executive Summary*

   - I've added Tools/Scripts/run-perf-test, try out
   - Please add --no-timeout and --timeout options to your DRT
   - Perf-o-matic coming on webkit-perf.appspot.com, a clone of
   - Chromium Mac perf bots coming on build.webkit.org
   - Use PerformanceTests/Parser/resources/runner.js to write new
   performance tests

We have some performance tests in PerformanceTests but they're not ran by
any bots. In fact, there are no performance bots at all on build.webkit.org.
While Chromium has perf
we can only see progressions and regressions triggered by WebKit changes
when Chromium gets a WebKit roll (pulling newer version of WebKit), which
happens only a handful times a day. It doesn't scale to the rate at which
we're making changes to WebKit and the visibility and the usability of bots
are not great for non-Chromium WebKit contributors. Furthermore, Chromium
perf bots will not catch JSC progressions and regressions at all.

*Means to Run Performance Tests*
I've added Tools/Scripts/run-perf-tests to run PerformanceTests in
DRT based on the work Ilya Tikhonovsky (loislo) has done for
run-inspector-perf-tests.py. The script aims to run performance tests both
locally and on bots similar to the way run-webkit-tests works and runs on
Mac (WebKit1) and Chromium ports. Please try it out and give me a feedback
(you can file a bug with "run-perf-tests: " in the summary and cc me).

I didn't merge it into run-webkit-tests because performance tests don't
pass/fail but instead give us some values that fluctuate over time. While
Chromium takes an approach to hard-code the rage of acceptable values, such
an approach has a high maintenance cost and prone to problems such as
having to increase the range periodically as the score slowly degrades over
time. Also, as you can see on Chromium perf
the test results tend to fluctuate a lot so hard-coding a tight range of
acceptable value is tricky.

Unlike run-webkit-tests, run-perf-tests doesn't generate any HTML or JSON
files to summarize the results by default since only output you get out of
performance tests are time took to run tests or scores, which are already
reported on stdout. The output of run-perf-tests is designed to be
compatible with Chromium perf bots but we can easily change that to
something more human friendly if people are so inclined. The script
optionally generates a JSON file to be used by perf bots.

In order for other ports (e.g. Windows, Qt, GTK, etc...) to support
run-perf-tests, simply their respective DRT needs to support
--no-timeoutoption that disables the watchdog timer. This is necessary
as some
performance tests take a long time to run. Also, we'll appreciate your help
if you could add --timeout option per
https://bugs.webkit.org/show_bug.cgi?id=76662 for the code sanity.

*Adding Performance Bots*
In the next couple of days, I'm going to post a patch to add a Chromium Mac
Perf bot to build.webkit.org (of course, upon appropriate reviews) that
runs run-perf-tests and uploads a JSON file to webkit-perf.appspot.com, a
clone of graphs.mozilla.org.

While we could have adopted Chromium's perf bot output where each slave
generates a JSON file with a html front end that loads the JSON, the
approach didn't scale well for Chromium when the number of historical
values stored on each slave soared and the size of JSON increased
proportionally over time. Furthermore, it's hard to compare values between
different bots or tests. On the other hand, creating a new front end seemed
like a too much work. As such, I've decided to port Mozilla's Graph
Server<https://github.com/mozilla/graphs>to WebKit after consulting
with tony^work, ojan, and evmar.

While we could have added another dedicated apache server with all nice
features Graph Server's native backend provides, the maintenance cost of
maintaining such a server seemed too high. Also, Robert Helmer (rhelmer), a
Mozilla contributor who is actively working on the Graph Server, told me
that Mozilla is planning to replace the backend with a key-value database.
Given these circumstances and some experimentations, I wrote our own
backend using Google App Engine <http://code.google.com/appengine/> for its
low maintenance cost and ease of use; note App Engine is already used by
commit-queue and flakiness dashboard.

My work to port the Graph Server is near completion and I expect it to be
working in the next couple of days just as I add a Chromium Mac Perf bot.
If you're interested in adding new perf bots for your port, please contact
me directly and I'll give you a detailed instruction on what needs to
happen (it's super trivial but involves giving out or receiving a password).

*How to Write Performance Tests*
If you're interested in adding more performance tests (you should be!),
then use
an example. It uses
which automatically aggregates results over multiple runs and outputs the
results in the preferred format run-perf-tests understands.

Since there hadn't been any script to run performance tests, tests in
PerformanceTests don't have an uniform output format. As a result,
run-perf-tests only supports running tests in Bindings, Parser, and
inspector at the moment. I'd really appreciate your help if you could
convert the existing tests to use runner.js to increase the number of
performance tests run-perf-tests can run or modify run-perf-tests so that
it can run more tests. Obviously, our goal is to be able to run all tests
in PerformanceTests by run-perf-tests.

Note Hajime Morita (morrita) has taken initiative on the effort to run
Dromaeo in DRT <https://bugs.webkit.org/show_bug.cgi?id=76156>.

Best regards,
Ryosuke Niwa
Software Engineer
Google Inc.
