[webkit-dev] Reconsidering test expectations (PASS, TEXT, IMAGE, TEXT+IMAGE, TIMEOUT, CRASH, etc...)
rniwa at webkit.org
Fri May 18 00:18:04 PDT 2012
On Fri, May 18, 2012 at 12:05 AM, Maciej Stachowiak <mjs at apple.com> wrote:
> On May 17, 2012, at 4:57 PM, Ojan Vafai <ojan at chromium.org> wrote:
> > Sure TEXT, IMAGE, etc are not very clear, but noone has actually
> proposed something better.
I believe proposals were made which were more clear, but were rejected for
> other reasons (mainly not being as familiar to people used to the format
> afaict). Let me add another. I would propose the following replacements for
> current states:
> neither TEXT nor IMAGE
> ==> (continue to say nothing)
I think we currently use PASS for these.
TEXT or TEXT+IMAGE
> ==> FAIL
> FAIL would mean the test fails - for text-only tests, it means text
> failure, for render tree tests it means text failure (who cares if the
> pixel test somehow accidentally pass at that point, that's not a
> meaningfully distinct state), for ref tests it would mean a reference
There have been cases where only render tree dump changed (e.g. ways in
which render objects were created changed but the difference was not
visible in pixel results). I think the idea of having both TEXT and
TEXT+IMAGE so as to detect another regression that causes the image to
start to mismatch.
FWIW, I think this simplification is for better.
> ==> FLAKY
> If one of the text tests or the image tests will fail but maybe not both,
> that means the test is nondeterministic, so it should be marked as flaky
> and its results should not affect greenness of the bots, so long as it does
> not hang or crash. It doesn't seem like we currently have a FLAKY result
> expectation based on the bots, you are supposed to indicate it by listing
> all possible kinds of failures, but that seems unhelpful. Also, a flaky
> test that sometimes entirely passes on multiple runs in a row will turn the
> bots red, which seems bad. Let's just have FLAKY state instead where we
> don't get upset whether the test passes or fails.
There are some tests that are flaky between PASS, CRASH, TIMEOUT, etc... so
we have to take those into account as well.
In the past, some people have told me that they want to be able to document
the kind of flakiness each test exhibits.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the webkit-dev