[webkit-dev] can we stop using Skipped files?

Fri Jun 8 12:50:37 PDT 2012

On Jun 8, 2012, at 12:31 PM, Dirk Pranke wrote:

> On Fri, Jun 8, 2012 at 10:56 AM, Filip Pizlo <fpizlo at apple.com> wrote:
>> 
>> It's a lot harder to dive into, a lot more cumbersome to improve, and not
>> any easier to maintain.
>> 
> 
> I definitely agree that NRWT is more complicated than it seems like it
> should be; it got contorted as we added all the features we needed to
> add, and I have been in a "simplify" mode over the past few months. I
> would welcome any feedback where you think things are overly complex.

This is a difficult question - it's unfortunately easier to observe that something is complex, than it is to pinpoint why it is complex.  But I will try.

1) Code locality.  I can open Tools/Scripts/old-run-webkit-tests and pretty rapidly discover (a) how options are parsed, (b) how platform differences are handled, (c) how tests are found, and (d) how tests are run.  I can hack all of this code because it's all in one place.  I don't have to be a domain expert to do it.  Hell, I don't even have to be good a Perl to find my way around.

2) Code size:

[pizlo at wartooth OpenSource] wc Tools/Scripts/old-run-webkit-tests 
    2796   10316   98733 Tools/Scripts/old-run-webkit-tests

[pizlo at wartooth OpenSource] wc Tools/Scripts/new-run-webkit-tests Tools/Scripts/webkitpy/layout_tests/*.py Tools/Scripts/webkitpy/layout_tests/*/*.py
.... bunch of stuff
   23197   91897 1049914 total

That's a *HUGE* difference.  Consider that NRWT just adds only one thing that most people care about: parallelism.  Is an 8x increase in code size justified?

I know that LoC metrics are evil in most cases.  But this is not most cases.  This is an order-of-magnitude difference.  That's 8x more code I have to look at to find what I want.  That's 8x more code that I potentially have to edit to add a feature.  That's 8x more code that could have a bug.  And so on.  Badness!

> 
>> The fact that it is unittested is part of the problem.  It restricts what
>> you can do to the interfaces used internally in the code, which makes larger
>> changes much harder to pull off.  It also steepens the learning curve.
> 
> This is true. Of course, like all tests, it's a tradeoff. If it's not
> tested, it's very easy for one to change things and break some other
> feature that people need.

But the act of running layout tests is in itself a regression test of the layout test harness just as much as it is a regression test of WebKit.

So the unit tests are superfluous.  In particular, if I had to pick between only having unit tests or only having regression tests, I might pick unit tests.  But if I already have regression tests then I'm unlikely to want to incur technical debt to build unit tests, particularly since unit tests requiring changing the infrastructure to make the code more testable, which then leads to the problems listed above.

> 
>> Look, the test harness should be a thing that helps us get work done, not an
>> end unto itself.  The harness should err on the side of understandability,
>> hackability, and, in short, simplicity.
> 
> I definitely agree that the harness should be a means and not an end.
> I of course like those ilities, too, but we also  include
> functionality and performance, and both of those usually bring in
> complexity :(.

Right, and we are all for functionality and performance.  That's good stuff.

But to me, one of the primary goals of tooling infrastructure should be that it is as simple as possible.  This is of course a huge challenge - doing more with less is always harder than allowing the code to bloat ad infinitum.  But the need for simplicity is particularly acute in something that as central to our understanding of WebKit's conformance and correctness as the test harness.  This is the part of the system that must not be wrong or flaky, because if it is, then it diminishes the value of having tests at all, which then leads to horrible bugs not getting caught.

> 
>> Python feels like the wrong technology to use for multithreading.  I'm
>> curious if, in the future, next time we rewrite RWT (and I do believe that
>> there will be such a "next time"), we could pick a language that allows this
>> more naturally.
> 
> It is true that Python's multithreading support is not great (it works
> as long as you don't need to interact with subprocesses, and as long
> as you don't need truly concurrent threads executing python code,
> because of the global interpreter lock). However,much of NRWT is built
> around a shared-nothing message-passing approach; do you think that
> was the wrong architecture, or would you have preferred to use a tool
> that had the same architecture but did it more easily (perhaps in
> something like Erlang)?

I really like approach of shared-nothing message-passing, but was under the (possibly mistaken) impression that NRWT has some threading in it.

I just wish that the message passing could be done with less code.  8x less code.

I am intrigued by the notion of using Erlang, but worry that it would reduce effective hackability due to there being less Erlang experience in the universe.  I also don't want the porting of Erlang's runtime to be a gating factor for porting WebKit itself.  Hence, I fear that we should stick to broadly accepted languages, like Python, or Perl, or if need be, C++.

-Filip