rick.hudson at intel.com
Fri Apr 12 08:36:08 PDT 2013
1) Add nothing to the language and rely on increasingly sophisticated compilers to detect opportunities to safely parallelize computations within the current semantics.
2) Add methods to Array that provide parallel semantics.
3) Add syntax and keywords such as parallelFor to introduce parallelism.
4) Add a parallel type (ParallelArray) and associated methods that provide parallel semantics.
The _nothing_ approach would have to deal with array methods, such as map and reduce, that have sequential semantics. For example reduce is really reduce right (or fold right) and these semantics defeat parallelization. For example the EIC signature for the function/closure passed to reduce assumes E === C[I] but the value E could be the result of previous invocation of the function. This approach seems to be strictly worse than the methods approach. A similar nothing approach is to analyze loops to mechanically discover opportunities for parallelization. While there exists decades of literature that have been full of optimism the reality is that this approach has not achieved wide spread success.
The _methods_ approach side steps the semantics issue raised in the nothing approach but does not compose well with the original methods. For example Array has several inherently sequential methods, such as push and pop, that do not have a reasonable parallel semantics since they force mutation.
That said this approach has its proponents. Nico http://smallcultfollowing.com/babysteps/blog/2013/02/26/splitting-the-pjs-api/ suggested that we add unorderedMap and unorderedReduce to indicate that the scheduling of the computation be unordered. While the idea is compelling on the surface it is likely to lead to the development of algorithms that are at first sequential and then hacked to make them parallel. This approach will impede the goal of having programmers develop parallel algorithms and the appropriate parallel data structures. The methods approach would miss a real opportunity here to move programmers away from the sequential models and on to parallel models.
The _parallel type_ approach provides a clear separation from the semantics of Array and in doing so does a much better job of encouraging programmers to build parallel algorithms and data structures instead of attempting to parallelize sequential algorithms. The contract between the programmer and the system can be better defined if ParallelArray exists independently of Array and the two prototypes could develop independently. That is not to say that we shouldn't follow the principle of least surprise and leverage the programmer's familiarity with the Array methods signatures. For example the kernel/callback function should follow the same conventions found in the Array.map signature such as the call back (kernel) function formals being element, index, array.
Furthermore the parallel type approach is a clear indication to the SDE and JIT implementers that the programmer wishes to execute this code in parallel. The SDE will be able to assist the programmer when they don't follow the contract, for example by throwing when it encounters a kernel with side effects. While the parallel type approach could be implemented on top of the methods or even the syntax approach information about the intent of the programmer would be lost to the JIT and the SDE. In summary the real value in adding ParallelArray as its own type is that it helps move the developer structure his programs and choose his algorithms with parallelism in mind and convey that intent clearly.
More information about the webkit-dev