[webkit-dev] Parallel JavaScript: Why a separate ParallelArray types

Fri Apr 12 11:54:37 PDT 2013

On Apr 12, 2013, at 8:36 AM, "Hudson, Rick" <rick.hudson at intel.com> wrote:

> I'm assuming that we agree that JavaScript must evolve to leverage the hardware's power stingy parallelism.

True, but there is also the question of how high of a priority we should give to this relative to other project goals, and also what opportunity costs are introduced from any particular implementation of parallelism that we choose.  What you are proposing will only work for code that performs non-effectful operations.  This is a limited form of parallelism, and I'm not sure that it's a general enough programming model to warrant adding new things to the platform.

> For completeness there seems to be the following approaches. 
> 
> 1)	Add nothing to the language and rely on increasingly sophisticated compilers to detect opportunities to safely parallelize computations within the current semantics. 
> 2)	Add methods to Array that provide parallel semantics. 
> 3)	Add syntax and keywords such as parallelFor to introduce parallelism.
> 4)	Add a parallel type (ParallelArray) and associated methods that provide parallel semantics.

You forgot the obvious ones:

5) Message passing
6) Shared mutable memory with either threads and locks, or something along those lines.

We sort of already have (5) but it is half baked; I think there is opportunity to improve it.

We don't have (6).  (6) subsumes all of (1), (2), (3), (4), and (5) as it would allow the programmer to roll all of those primitives in JS.

We should not reject shared mutable memory out of hand.  It is the prevailing technique for implementing concurrency and parallelism in the other languages that people use.

> 
> The _nothing_ approach would have to deal with array methods, such as map and reduce, that have sequential semantics. For example reduce is really reduce right (or fold right) and these semantics defeat parallelization. For example the EIC signature for the function/closure passed to reduce assumes E === C[I] but the value E could be the result of previous invocation of the function.  This approach seems to be strictly worse than the methods approach. A similar nothing approach is to analyze loops to mechanically discover opportunities for parallelization. While there exists decades of literature that have been full of optimism the reality is that this approach has not achieved wide spread success. 

Part of that lack of success stems from the fact that most code that does things to arrays has subtle loop-carried side-effects.  The closures passed to ParallelArray methods could have those, also; and if they did then you wouldn't be able to soundly parallelize.

You haven't convinced me that ParallelArray will be any more successful than the full autoparallelization approach.  You also haven't convinced me that it will be any more successful than other eclectic ideas, like OpenMP, for example.  I don't think that such annotation-based approaches have been particularly successful in other languages.  The languages that have seen the most success in parallel development are precisely those that give *complete* control to the programmer, such as Java, which gives you threads and locks combined with a sensible memory model - and allows you to craft your algorithm in the way that best suits your problem domain.  Why is this approach being rejected?

> 
> The _methods_ approach side steps the semantics issue raised in the nothing approach but does not compose well with the original methods. For example Array has several inherently sequential  methods, such as push and pop, that do not have a reasonable parallel semantics since they force mutation. 
> That said this approach has its proponents. Nico http://smallcultfollowing.com/babysteps/blog/2013/02/26/splitting-the-pjs-api/ suggested that we add unorderedMap and unorderedReduce to indicate that the scheduling of the computation be unordered. While the idea is compelling on the surface it is likely to lead to the development of algorithms that are at first sequential and then hacked to make them parallel. This approach will impede the goal of having programmers develop parallel algorithms and the appropriate parallel data structures. The methods approach would miss a real opportunity here to move programmers away from the sequential models and on to parallel models.

ParallelArray also misses this opportunity.  The only thing that ParallelArray does is adds methods that could have been added to Array, and requires you to say, when you write your code, if you want parallelization or not.  The only problem this solves is inferring the "is it profitable to parallelize" question.  I'm not sure that inferring the answer to this question is hard enough to warrant a new type.

> 
> The _syntax_ approach suggests that we follow the lead of Amp and add parallelFor and siblings to JavaScript. This approach seems to suffer from not being backwardly compatible and like the methods approach misses an opportunity to move programmers away from their sequential mindset.
> 
> The _parallel type_ approach provides a clear separation from the semantics of Array and in doing so does a much better job of encouraging programmers to build parallel algorithms and data structures instead of attempting to parallelize sequential algorithms.

The methods approach also provides such a clear separation.

Just adding the word "Parallel" to the name of a new type and saying that this is the only type that benefits from parallelism does not constitute positive encouragement of any kind.  Quite the opposite, it seems the only thing it encourages is for VM engineers to give up on the inference approach entirely.

> The contract between the programmer and the system can be better defined if ParallelArray exists independently of Array and the two prototypes could develop independently. That is not to say that we shouldn't follow the principle of least surprise and leverage the programmer's familiarity with the Array methods signatures. For example the kernel/callback function should follow the same conventions found in the Array.map signature such as the call back (kernel) function formals being element, index, array.

But this is a double-edged sword: having a separate array type with separate semantics incurs opportunity costs for optimization.  As the two prototypes diverge, there will be optimizations that get implemented less well in one than in the other.

Supporting more types in a compiler is never free.

> 
> Furthermore the parallel type approach is a clear indication to the SDE and JIT implementers that the programmer wishes to execute this code in parallel. The SDE will be able to assist the programmer when they don't follow the contract, for example by throwing when it encounters a kernel with side effects.

But how will developers debug this exception?  Likely by using something like the Inspector's exception view which highlights parts of the code that threw.

We can do the same thing for autoparallelization in the "methods" approach: we can highlight those array method calls that were parallelized, as well as those that didn't.

I don't buy your argument that ParallelArray handles this developer scenario any better than the other approaches.  Whether the failure case throws an exception, or not, either way we can highlight the fact that one kernel or another wasn't parallelized.

> While the parallel type approach could be implemented on top of the methods or even the syntax approach information about the intent of the programmer would be lost to the JIT and the SDE. In summary the real value in adding ParallelArray as its own type is that it helps move the developer structure his programs and choose his algorithms with parallelism in mind and convey that intent clearly. 

This argument can be used in favor of any parallel or concurrent programming model.  If I run s/ParallelArray as its own type/threads and locks/ on your statement, I get a convincing argument for the traditional approach to shared memory parallelism.

I do believe that parallelism, and concurrency, are features that should be added to JavaScript.  But I think we should aim higher than adding a limited-use restricted programming model for this purpose.

-Filip