[webkit-dev] SIMD support in JavaScript

Sun Sep 28 11:52:00 PDT 2014

-Filip
> On Sep 28, 2014, at 6:44 AM, Dan Gohman <sunfish at mozilla.com> wrote:
> 
> Hi Nadav,
> 
> I agree with much of your assessment of the the proposed SIMD.js API.
> However, I don't believe it's unsuitability for some problems
> invalidates it for solving other very important problems, which it is
> well suited for. Performance portability is actually one of SIMD.js'
> biggest strengths: it's not the kind of performance portability that
> aims for a consistent percentage of peak on every machine (which, as you
> note, of course an explicit 128-bit SIMD API won't achieve), it's the
> kind of performance portability that achieves predictable performance
> and minimizes surprises across machines (though yes, there are some
> unavoidable ones, but overall the picture is quite good).

Would you claim that if there exists even just one corner case use of a language feature then this is sufficient to add the feature to the language specification and have all browsers implement it?

The problem with SIMD.js is that it is even less useful than native SIMD. To me, "less useful than SIMD" is a euphemism for "useless".

-Filip

> 
>> On 09/26/2014 03:16 PM, Nadav Rotem wrote:
>> So far, I’ve explained why I believe SIMD.js will not be
>> performance-portable and why it will not utilize modern instruction
>> sets, but I have not made a suggestion on how to use vector
>> instructions to accelerate JavaScript programs. Vectorization, like
>> instruction scheduling and register allocation, is a code-generation
>> problem. In order to solve these problems, it is necessary for the
>> compiler to have intimate knowledge of the architecture. Forcing the
>> compiler to use a specific instruction or a specific data-type is the
>> wrong answer. We can learn a lesson from the design of compilers for
>> data-parallel languages. GPU programs (shaders and compute languages,
>> such as OpenCL and GLSL) are written using vector instructions because
>> the domain of the problem requires vectors (colors and coordinates).
>> One of the first thing that data-parallel compilers do is to break
>> vector instructions into scalars (this process is called
>> scalarization). After getting rid of the vectors that resulted from
>> the problem domain, the compiler may begin to analyze the program,
>> calculate profitability, and make use of the available instruction set.
> 
>> I believe that it is the responsibility of JIT compilers to use vector
>> instructions. In the implementation of the Webkit’s FTL JIT compiler,
>> we took one step in the direction of using vector instructions. LLVM
>> already vectorizes some code sequences during instruction selection,
>> and we started investigating the use of LLVM’s Loop and SLP
>> vectorizers. We found that despite nice performance gains on a number
>> of workloads, we experienced some performance regressions on Intel’s
>> Sandybridge processors, which is currently a very popular desktop
>> processor. JavaScript code contains many branches (due to dynamic
>> speculation). Unfortunately, branches on Sandybridge execute on Port5,
>> which is also where many vector instructions are executed. So,
>> pressure on Port5 prevented performance gains. The LLVM vectorizer
>> currently does not model execution port pressure and we had to disable
>> vectorization in FTL. In the future, we intend to enable more
>> vectorization features in FTL.
> 
> This is an example of a weakness of depending on automatic vectorization
> alone. High-level language features create complications which can lead
> to surprising performance problems. Compiler transformations to target
> specialized hardware features often have widely varying applicability.
> Expensive analyses can sometimes enable more and better vectorization,
> but when a compiler has to do an expensive complex analysis in order to
> optimize, it's unlikely that a programmer can count on other compilers
> doing the exact same analysis and optimizing in all the same cases. This
> is a problem we already face in many areas of compilers, but it's more
> pronounced with vectorization than many other optimizations.
> 
> In contrast, the proposed SIMD.js has the property that code using it
> will not depend on expensive compiler analysis in the JIT, and is much
> more likely to deliver predictable performance in practice between
> different JIT implementations and across a very practical variety of
> hardware architectures.
> 
>> 
>> To summarize, SIMD.js will not provide a portable performance solution
>> because vector instruction sets are sparse and vary between
>> architectures and generations. Emscripten should not generate vector
>> instructions because it can’t model the target machine. SIMD.js will
>> not make use of modern SIMD features such as predication or
>> scatter/gather. Vectorization is a compiler code generation problem
>> that should be solved by JIT compilers, and not by the language
>> itself. JIT compilers should continue to evolve and to start
>> vectorizing code like modern compilers.
> 
> As I mentioned above, performance portability is actually one of
> SIMD.js's core strengths.
> 
> I have found it useful to think of the API propsed in SIMD.js as a
> "short vector" API. It hits a sweet spot, being a convenient size for
> many XYZW and RGB/RGBA and similar algorithms, being implementable on a
> wide variety of very relevant hardware architectures, being long enough
> to deliver worthwhile speedups for many tasks, and being short enough to
> still be convenient to manipulate.
> 
> I agree that the "short vector" model doesn't address all use cases, so
> I also believe a "long vector" approach would be very desirable as well.
> Such an approach could be based on automatic loop vectorization, a SPMD
> programming model, or something else. I look forward to discussing ideas
> for this. Such approaches have the potential to be much more scalable
> and adaptable, and can be much better positioned to solve those problems
> that the presently proposed SIMD.js API doesn't attempt to solve. I
> believe there is room for both approaches to coexist, and to serve
> distinct sets of needs.
> 
> In fact, a good example of short and long vector models coexisting is in
> these popular GPU programming models that you mentioned, where short
> vectors represent things in the problem domains like colors and
> coordinates, and are then broken down by the compiler to participate in
> the long vectors, as you described. It's very plausible that the
> proposed SIMD.js could be adapted to combine with a future long-vector
> approach in the same way.
> 
> Dan
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev