<br><div class="gmail_quote">On Thu, Aug 16, 2012 at 2:32 PM, Filip Pizlo <span dir="ltr"><<a href="mailto:fpizlo@apple.com" target="_blank">fpizlo@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im"><br>
On Aug 16, 2012, at 2:13 PM, Dirk Pranke wrote:<br>
<br>
> On Wed, Aug 15, 2012 at 6:02 PM, Filip Pizlo <<a href="mailto:fpizlo@apple.com">fpizlo@apple.com</a>> wrote:<br>
>><br>
>> 2) Possibility of the sheriff getting it wrong.<br>
>><br>
>> (2) concerns me most. We're talking about using filenames to serve as a<br>
>> kind of unchecked comment. We already know that comments are usually bad<br>
>> because there is no protection against them going stale.<br>
>><br>
><br>
> Sheriffs can already get things wrong (and rebaseline when they<br>
> shouldn't). I believe that adding passing/failing to expected will<br>
> make things better in this regard, not worse.<br>
<br>
</div>In what way do things become better? Because other people will see what the sheriff believed about the result?<br>
<br>
Can you articulate some more about what happens when you have both -expected and -failing?<br>
<br>
My specific concern is that after someone checks in a fix, we will have some sheriff accidentally misjudge the change in behavior to be a regression, and check in a -failing file. And then we end up in a world of confusion.<br>
<br>
This is why I think that just having -expected files is better. It is a kind of recognition that we're tracking changes in behavior, rather than comparing against some almighty notion of what it means to be correct.<br>
<div><div class="h5"><br>
><br>
> Another idea/observation is that if we have multiple types of<br>
> expectation files, it might be easier to set up watchlists, e.g., "let<br>
> me know whenever a file gets checked into fast/forms with an -expected<br>
> or -failing result". It seems like this might be useful, but I'm not<br>
> sure.<br>
><br>
>> In particular, to further clarify my position, if someone were to argue that<br>
>> Dirk's proposal would be a wholesale replacement for TestExpectations, then<br>
>> I would be more likely to be on board, since I very much like the idea of<br>
>> reducing the number of ways of doing things. Maybe that's a good way to<br>
>> reach compromise.<br>
>><br>
>> Dirk, what value do you see in TestExpectations were your change to be<br>
>> landed? Do scenarios still exist where there would be a test for which (a)<br>
>> there is no -fail.* file, (b) the test is not skipped, and (c) it's marked<br>
>> with some annotation in TestExpectations? I'm most interested in the<br>
>> question of such scenarios exist, since in my experience, whenever a test is<br>
>> not rebased, is not skipped, and is marked as failing in TestExpectations,<br>
>> it ends up just causing gardening overhead later.<br>
><br>
> This is a good question, because it is definitely my intent that this<br>
> change replace some existing practices, not add to them.<br>
><br>
> Currently, the Chromium port uses TestExpectations entries for four<br>
> different kinds of things: tests we don't ever plan to fix (WONTFIX),<br>
> tests that we skip because not doing so causes other tests to break,<br>
> tests that fail (reliably), and tests that are flaky.<br>
><br>
> Skipped files do not let you distinguish (programmatically) between<br>
> the first two categories, and so my plan is to replace Skipped files<br>
> with TestExpectations (using the new syntax discussed a month or so<br>
> ago) soon (next week or two at the latest).<br>
><br>
> I would like to replace using TestExpectations for failing tests (at<br>
> least for tests that are expected to keep failing indefinitely because<br>
> someone isn't working on an active fix) with this new mechanism.<br>
><br>
> That leaves flaky tests. One can debate what the right thing to do w/<br>
> flaky tests is here; I'm inclined to argue that flakiness is at least<br>
> as bad as failing, and we should probably be skipping them, but the<br>
> Chromium port has not yet actively tried this approach (I think other<br>
> ports probably have experience here, though).<br>
><br>
> Does that help answer your question / sway you at all?<br>
<br>
</div></div>Yes, it does - it answers my question, though it perhaps doesn't sway me. My concerns are still that:<br>
<br>
1) Switching to skipping flaky tests wholesale in all ports would be great, and then we could get rid of the flakiness support.<br></blockquote><div><br></div><div>This is not necessarily helpful in some cases. There are cases some test make subsequent tests flaky so skipping a flaky test doesn't fix the problem. Instead, it just makes the next test flaky. The right fix, of course, to skip/fix the culprit but pinpointing the test that makes subsequent tests more flaky has proved to be a time consuming and challenging task.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
2) The WONTFIX mode in TestExpectations feels to me more like a statement that you're just trying to see if the test doesn't crash. Correct? Either way, it's confusing.<br></blockquote><div><br></div><div>Yes, I'd argue that they should just be skipped but that's a bikeshedding for another time.</div>
<div><br></div><div>- Ryosuke</div><div><br></div></div>