[webkit-dev] Reconsidering test expectations (PASS, TEXT, IMAGE, TEXT+IMAGE, TIMEOUT, CRASH, etc...)

Maciej Stachowiak mjs at apple.com
Fri May 18 00:43:55 PDT 2012

On May 18, 2012, at 12:18 AM, Ryosuke Niwa <rniwa at webkit.org> wrote:

> On Fri, May 18, 2012 at 12:05 AM, Maciej Stachowiak <mjs at apple.com> wrote:
> On May 17, 2012, at 4:57 PM, Ojan Vafai <ojan at chromium.org> wrote:
> > Sure TEXT, IMAGE, etc are not very clear, but noone has actually proposed something better. 
> I believe proposals were made which were more clear, but were rejected for other reasons (mainly not being as familiar to people used to the format afaict). Let me add another. I would propose the following replacements for current states:
> neither TEXT nor IMAGE
>   ==> (continue to say nothing)
> I think we currently use PASS for these.

I guess we do. I think there is no point to saying PASS, because if a test crashes, hangs or is skipped it's meaningless. And if it does not crash and none of those other things apply, then it shouldn't be listed. But I could be missing something.

>   ==> FAIL
> FAIL would mean the test fails - for text-only tests, it means text failure, for render tree tests it means text failure (who cares if the pixel test somehow accidentally pass at that point, that's not a meaningfully distinct state), for ref tests it would mean a reference failure
> There have been cases where only render tree dump changed (e.g. ways in which render objects were created changed but the difference was not visible in pixel results). I think the idea of having both TEXT and TEXT+IMAGE so as to detect another regression that causes the image to start to mismatch.
> FWIW, I think this simplification is for better.

I think when there are regressions which do not cause a crash or hang, we should be checking in a new expectation, so further regressions to either text or image would be detected. More detail to come in an upcoming more comprehensive proposal.

>  ==> FLAKY
> If one of the text tests or the image tests will fail but maybe not both, that means the test is nondeterministic, so it should be marked as flaky and its results should not affect greenness of the bots, so long as it does not hang or crash. It doesn't seem like we currently have a FLAKY result expectation based on the bots, you are supposed to indicate it by listing all possible kinds of failures, but that seems unhelpful. Also, a flaky test that sometimes entirely passes on multiple runs in a row will turn the bots red, which seems bad. Let's just have FLAKY state instead where we don't get upset whether the test passes or fails.
> There are some tests that are flaky between PASS, CRASH, TIMEOUT, etc... so we have to take those into account as well.
> In the past, some people have told me that they want to be able to document the kind of flakiness each test exhibits.

Tests that could randomly crash or time out should probably not be run until fixed. Tests that randomly fail in one of several ways, I am not sure it is super useful to list what files might be affected but nothing else. If a test gets one of N results, bugzilla is a fine way to document that in full detail.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20120518/730d0250/attachment.html>

More information about the webkit-dev mailing list