<br><div class="gmail_quote">On Thu, Aug 16, 2012 at 2:32 PM, Filip Pizlo <span dir="ltr">&lt;<a href="mailto:fpizlo@apple.com" target="_blank">fpizlo@apple.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div class="im"><br>

On Aug 16, 2012, at 2:13 PM, Dirk Pranke wrote:<br>

<br>

&gt; On Wed, Aug 15, 2012 at 6:02 PM, Filip Pizlo &lt;<a href="mailto:fpizlo@apple.com">fpizlo@apple.com</a>&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt; 2) Possibility of the sheriff getting it wrong.<br>

&gt;&gt;<br>

&gt;&gt; (2) concerns me most.  We&#39;re talking about using filenames to serve as a<br>

&gt;&gt; kind of unchecked comment.  We already know that comments are usually bad<br>

&gt;&gt; because there is no protection against them going stale.<br>

&gt;&gt;<br>

&gt;<br>

&gt; Sheriffs can already get things wrong (and rebaseline when they<br>

&gt; shouldn&#39;t). I believe that adding passing/failing to expected will<br>

&gt; make things better in this regard, not worse.<br>

<br>

</div>In what way do things become better?  Because other people will see what the sheriff believed about the result?<br>

<br>

Can you articulate some more about what happens when you have both -expected and -failing?<br>

<br>

My specific concern is that after someone checks in a fix, we will have some sheriff accidentally misjudge the change in behavior to be a regression, and check in a -failing file.  And then we end up in a world of confusion.<br>


<br>

This is why I think that just having -expected files is better.  It is a kind of recognition that we&#39;re tracking changes in behavior, rather than comparing against some almighty notion of what it means to be correct.<br>


<div><div class="h5"><br>

&gt;<br>

&gt; Another idea/observation is that if we have multiple types of<br>

&gt; expectation files, it might be easier to set up watchlists, e.g., &quot;let<br>

&gt; me know whenever a file gets checked into fast/forms with an -expected<br>

&gt; or -failing result&quot;. It seems like this might be useful, but I&#39;m not<br>

&gt; sure.<br>

&gt;<br>

&gt;&gt; In particular, to further clarify my position, if someone were to argue that<br>

&gt;&gt; Dirk&#39;s proposal would be a wholesale replacement for TestExpectations, then<br>

&gt;&gt; I would be more likely to be on board, since I very much like the idea of<br>

&gt;&gt; reducing the number of ways of doing things.  Maybe that&#39;s a good way to<br>

&gt;&gt; reach compromise.<br>

&gt;&gt;<br>

&gt;&gt; Dirk, what value do you see in TestExpectations were your change to be<br>

&gt;&gt; landed?  Do scenarios still exist where there would be a test for which (a)<br>

&gt;&gt; there is no -fail.* file, (b) the test is not skipped, and (c) it&#39;s marked<br>

&gt;&gt; with some annotation in TestExpectations?  I&#39;m most interested in the<br>

&gt;&gt; question of such scenarios exist, since in my experience, whenever a test is<br>

&gt;&gt; not rebased, is not skipped, and is marked as failing in TestExpectations,<br>

&gt;&gt; it ends up just causing gardening overhead later.<br>

&gt;<br>

&gt; This is a good question, because it is definitely my intent that this<br>

&gt; change replace some existing practices, not add to them.<br>

&gt;<br>

&gt; Currently, the Chromium port uses TestExpectations entries for four<br>

&gt; different kinds of things: tests we don&#39;t ever plan to fix (WONTFIX),<br>

&gt; tests that we skip because not doing so causes other tests to break,<br>

&gt; tests that fail (reliably), and tests that are flaky.<br>

&gt;<br>

&gt; Skipped files do not let you distinguish (programmatically) between<br>

&gt; the first two categories, and so my plan is to replace Skipped files<br>

&gt; with TestExpectations (using the new syntax discussed a month or so<br>

&gt; ago) soon (next week or two at the latest).<br>

&gt;<br>

&gt; I would like to replace using TestExpectations for failing tests (at<br>

&gt; least for tests that are expected to keep failing indefinitely because<br>

&gt; someone isn&#39;t working on an active fix) with this new mechanism.<br>

&gt;<br>

&gt; That leaves flaky tests. One can debate what the right thing to do w/<br>

&gt; flaky tests is here; I&#39;m inclined to argue that flakiness is at least<br>

&gt; as bad as failing, and we should probably be skipping them, but the<br>

&gt; Chromium port has not yet actively tried this approach (I think other<br>

&gt; ports probably have experience here, though).<br>

&gt;<br>

&gt; Does that help answer your question / sway you at all?<br>

<br>

</div></div>Yes, it does - it answers my question, though it perhaps doesn&#39;t sway me.  My concerns are still that:<br>

<br>

1) Switching to skipping flaky tests wholesale in all ports would be great, and then we could get rid of the flakiness support.<br></blockquote><div><br></div><div>This is not necessarily helpful in some cases. There are cases some test make subsequent tests flaky so skipping a flaky test doesn&#39;t fix the problem. Instead, it just makes the next test flaky. The right fix, of course, to skip/fix the culprit but pinpointing the test that makes subsequent tests more flaky has proved to be a time consuming and challenging task.</div>


<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

2) The WONTFIX mode in TestExpectations feels to me more like a statement that you&#39;re just trying to see if the test doesn&#39;t crash.  Correct?  Either way, it&#39;s confusing.<br></blockquote><div><br></div><div>Yes, I&#39;d argue that they should just be skipped but that&#39;s a bikeshedding for another time.</div>


<div><br></div><div>- Ryosuke</div><div><br></div></div>