[webkit-dev] handling failing tests (test_expectations, Skipped files, etc.)

Mon Apr 9 14:42:33 PDT 2012

Hi all,

Recently I've noticed more people making changes and adding test
failure suppressions to various ports' test_expectations.txt files.
This is great!

However, I don't think we have an agreement over what the "best
practices" are here, so I thought I'd list out what I thought they
were, and others can comment / correct me as necessary:

1) Don't mix test_expectations.txt files and Skipped files. This is
really confusing to everyone involved ... your port should use one or
the other where possible. (*)

2) Entries in the test_expectations files should be short-lived. We
should be checking in what we believe the "current output" is,
*regardless of whether it's "right" or "wrong"*. A different way of
putting this is that we are more interested in changes in behavior
(regressions) rather than correctness. If you expect a test to be
failing for a long time in the same consistent way, check in the
failure and file a bug. Don't use test_expectations to suppress this.
(**)

3) Don't use test_expectations.txt to suppress failures across a
single cycle of the bot, just so you can gather updated baselines
without the tree going red. While it might seem that you're doing tree
maintainers a favor, in my experience this just makes things confusing
and it's better to let the tree go red until you can actually
rebaseline things. It's too easy to add a suppression "for now" and
then forget about it.

These rules are not set in stone, they're just what I try to do. If
people think there are better ways of doing this, please speak up! If
there is consensus, I'll update the wikis accordingly.

Thanks!

-- Dirk

(*) I have an outstanding to-do to modify new-run-webkit-tests to a
better way of tracking expectations to merge the inheritance/cascade
aspects of Skipped files with the flexibility in types of failures
that you get from expectations. Eventually we should have a mechanism
that replaces both, but for now, we don't. See
https://bugs.webkit.org/show_bug.cgi?id=83508 .

(**) Note that Chromium has not historically worked this way -- we
suppress failures rather than check in failing output -- but I believe
many / most of the chromium gardeners have come to believe that this
is not the way things should work and we'd be better off checking in
the failing output instead. Note that it may make sense to do
different things based on your ports' maturity, so you suppress things
while bringing a port up, and stop suppressing once your port is
stable.