<div class="gmail_quote">On Fri, May 18, 2012 at 12:05 AM, Maciej Stachowiak <span dir="ltr"><<a href="mailto:mjs@apple.com" target="_blank">mjs@apple.com</a>></span> wrote:<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="im">

On May 17, 2012, at 4:57 PM, Ojan Vafai <<a href="mailto:ojan@chromium.org">ojan@chromium.org</a>> wrote:<br></div><div class="im">

> Sure TEXT, IMAGE, etc are not very clear, but noone has actually proposed something better. </div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">


 </div></blockquote><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">I believe proposals were made which were more clear, but were rejected for other reasons (mainly not being as familiar to people used to the format afaict). Let me add another. I would propose the following replacements for current states:<br>


<br>

neither TEXT nor IMAGE<br>

   ==> (continue to say nothing)<br></blockquote><div><br></div><div>I think we currently use PASS for these.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


TEXT or TEXT+IMAGE<br>

   ==> FAIL<br>

FAIL would mean the test fails - for text-only tests, it means text failure, for render tree tests it means text failure (who cares if the pixel test somehow accidentally pass at that point, that's not a meaningfully distinct state), for ref tests it would mean a reference failure<br>


</blockquote><div><br></div><div>There have been cases where only render tree dump changed (e.g. ways in which render objects were created changed but the difference was not visible in pixel results). I think the idea of having both TEXT and TEXT+IMAGE so as to detect another regression that causes the image to start to mismatch.</div>


<div><br></div><div>FWIW, I think this simplification is for better.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

TEXT IMAGE<br>

  ==> FLAKY<br>

If one of the text tests or the image tests will fail but maybe not both, that means the test is nondeterministic, so it should be marked as flaky and its results should not affect greenness of the bots, so long as it does not hang or crash. It doesn't seem like we currently have a FLAKY result expectation based on the bots, you are supposed to indicate it by listing all possible kinds of failures, but that seems unhelpful. Also, a flaky test that sometimes entirely passes on multiple runs in a row will turn the bots red, which seems bad. Let's just have FLAKY state instead where we don't get upset whether the test passes or fails.<br>


</blockquote></div><div><br></div><div>There are some tests that are flaky between PASS, CRASH, TIMEOUT, etc... so we have to take those into account as well.</div><div><br></div><div>In the past, some people have told me that they want to be able to document the kind of flakiness each test exhibits.</div>


<div><br></div><div>- Ryosuke</div><div><br></div>