[webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled

Thu Sep 20 00:00:44 PDT 2018

I've just set up MacBook Pro to measure the effect on macOS.

The results are the followings.

VMs tested:

"baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/jsc

"patched" at
/Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/jsc

Collected 2 samples per benchmark/VM, with 2 VM invocations per benchmark.
Emitted a call to gc() between sample

measurements. Used 1 benchmark iteration per VM invocation for warm-up.
Used the jsc-specific preciseTime()

function to get microsecond-level timing. Reporting benchmark execution
times with 95% confidence intervals in

milliseconds.

                                           baseline
patched

ai-astar                              1738.056+-49.666     ^
1568.904+-44.535        ^ definitely 1.1078x faster

audio-beat-detection                  1127.677+-15.749     ^
972.323+-23.908        ^ definitely 1.1598x faster

audio-dft                              942.952+-107.209
919.933+-310.247         might be 1.0250x faster

audio-fft                              985.489+-47.414     ^
796.955+-25.476        ^ definitely 1.2366x faster

audio-oscillator                       967.891+-34.854     ^
801.778+-18.226        ^ definitely 1.2072x faster

imaging-darkroom                      1265.340+-114.464    ^
1099.233+-2.372         ^ definitely 1.1511x faster

imaging-desaturate                    1737.826+-40.791     ?
1749.010+-167.969       ?

imaging-gaussian-blur                 7846.369+-52.165     ^
6392.379+-1025.168      ^ definitely 1.2275x faster

json-parse-financial                    33.141+-0.473
33.054+-1.058

json-stringify-tinderbox                20.803+-0.901
20.664+-0.717

stanford-crypto-aes                    401.589+-39.750
376.622+-12.111          might be 1.0663x faster

stanford-crypto-ccm                    245.629+-45.322
228.013+-8.976           might be 1.0773x faster

stanford-crypto-pbkdf2                 941.178+-28.744
864.462+-60.083          might be 1.0887x faster

stanford-crypto-sha256-iterative       299.988+-47.729
270.849+-32.356          might be 1.1076x faster

<arithmetic>                          1325.281+-2.613      ^
1149.584+-75.875        ^ definitely 1.1528x faster

Interestingly, the improvement is not so large. In Linux box, it was 2x.
But in macOS, it is 15%.
But I think it is very nice if we can get 15% boost without any drawbacks.

On Thu, Sep 20, 2018 at 3:08 PM Saam Barati <sbarati at apple.com> wrote:

> Interesting! I must have not run this experiment correctly when I did it.
>
> - Saam
>
> On Sep 19, 2018, at 7:31 PM, Yusuke Suzuki <yusukesuzuki at slowstart.org>
> wrote:
>
> On Thu, Sep 20, 2018 at 12:54 AM Saam Barati <sbarati at apple.com> wrote:
>
>> To elaborate: I ran this same experiment before. And I forgot to turn off
>> the RegExp JIT and got results similar to what you got. Once I turned off
>> the RegExp JIT, I saw no perf difference.
>>
>
> Yeah, I disabled JIT and RegExpJIT explicitly by using
>
> export JSC_useJIT=false
> export JSC_useRegExpJIT=false
>
> and I checked no JIT code is generated by running dumpDisassembly. And I
> also put `CRASH()` in ExecutableAllocator::singleton() to ensure no
> executable memory is allocated.
> The result is the same. I think `useJIT=false` disables RegExp JIT too.
>
>                                            baseline
> patched
>
> ai-astar                              3499.046+-14.772     ^
> 1897.624+-234.517       ^ definitely 1.8439x faster
> audio-beat-detection                  1803.466+-491.965
> 970.636+-428.051         might be 1.8580x faster
> audio-dft                             1756.985+-68.710     ^
> 954.312+-528.406       ^ definitely 1.8411x faster
> audio-fft                             1637.969+-458.129
> 850.083+-449.228         might be 1.9268x faster
> audio-oscillator                      1866.006+-569.581    ^
> 967.194+-82.521        ^ definitely 1.9293x faster
> imaging-darkroom                      2156.526+-591.042    ^
> 1231.318+-187.297       ^ definitely 1.7514x faster
> imaging-desaturate                    3059.335+-284.740    ^
> 1754.128+-339.941       ^ definitely 1.7441x faster
> imaging-gaussian-blur                16034.828+-1930.938   ^
> 7389.919+-2228.020      ^ definitely 2.1698x faster
> json-parse-financial                    60.273+-4.143
> 53.935+-28.957          might be 1.1175x faster
> json-stringify-tinderbox                39.497+-3.915
> 38.146+-9.652           might be 1.0354x faster
> stanford-crypto-aes                    873.623+-208.225    ^
> 486.350+-132.379       ^ definitely 1.7963x faster
> stanford-crypto-ccm                    538.707+-33.979     ^
> 285.944+-41.570        ^ definitely 1.8840x faster
> stanford-crypto-pbkdf2                1929.960+-649.861    ^
> 1044.320+-1.182         ^ definitely 1.8481x faster
> stanford-crypto-sha256-iterative       614.344+-200.228
> 342.574+-123.524         might be 1.7933x faster
>
> <arithmetic>                          2562.183+-207.456    ^
> 1304.749+-312.963       ^ definitely 1.9637x faster
>
> I think this result is not related to RegExp JIT since ai-astar is not
> using RegExp.
>
> Best regards,
> Yusuke Suzuki
>
>
>>
>> - Saam
>>
>> On Sep 19, 2018, at 8:53 AM, Saam Barati <sbarati at apple.com> wrote:
>>
>> Did you turn off the RegExp JIT?
>>
>> - Saam
>>
>> On Sep 18, 2018, at 11:23 PM, Yusuke Suzuki <yusukesuzuki at slowstart.org>
>> wrote:
>>
>> Hi WebKittens!
>>
>> Recently, node-jsc is announced[1]. When I read the documents of that
>> project,
>> I found that they use LLInt ASM interpreter instead of CLoop in non-JIT
>> environment.
>> So I had one question in my mind: How fast the LLInt ASM interpreter when
>> comparing to CLoop?
>>
>> I've set up two builds. One is CLoop build (-DENABLE_JIT=OFF) and another
>> is JIT build JSC with `JSC_useJIT=false`.
>> And I've ran kraken benchmarks with these two builds in x64 Linux
>> machine. The results are the followings.
>>
>> Benchmark report for Kraken on sakura-trick.
>>
>> VMs tested:
>> "baseline" at
>> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/bin/jsc
>> "patched" at
>> /home/yusukesuzuki/dev/WebKit/WebKitBuild/nojit-llint/Release/bin/jsc
>>
>> Collected 10 samples per benchmark/VM, with 10 VM invocations per
>> benchmark. Emitted a call to gc() between sample
>> measurements. Used 1 benchmark iteration per VM invocation for warm-up.
>> Used the jsc-specific preciseTime()
>> function to get microsecond-level timing. Reporting benchmark execution
>> times with 95% confidence intervals in
>> milliseconds.
>>
>>                                            baseline
>> patched
>>
>> ai-astar                              3619.974+-57.095     ^
>> 2014.835+-59.016        ^ definitely 1.7967x faster
>> audio-beat-detection                  1762.085+-24.853     ^
>> 1030.902+-19.743        ^ definitely 1.7093x faster
>> audio-dft                             1822.426+-28.704     ^
>> 909.262+-16.640        ^ definitely 2.0043x faster
>> audio-fft                             1651.070+-9.994      ^
>> 865.203+-7.912         ^ definitely 1.9083x faster
>> audio-oscillator                      1853.697+-26.539     ^
>> 992.406+-12.811        ^ definitely 1.8679x faster
>> imaging-darkroom                      2118.737+-23.219     ^
>> 1303.729+-8.071         ^ definitely 1.6251x faster
>> imaging-desaturate                    3133.654+-28.545     ^
>> 1759.738+-18.182        ^ definitely 1.7808x faster
>> imaging-gaussian-blur                16321.090+-154.893    ^
>> 7228.017+-58.508        ^ definitely 2.2580x faster
>> json-parse-financial                    57.256+-2.876
>> 56.101+-4.265           might be 1.0206x faster
>> json-stringify-tinderbox                38.470+-2.788      ?
>> 38.771+-0.935         ?
>> stanford-crypto-aes                    851.341+-7.738      ^
>> 485.438+-13.904        ^ definitely 1.7538x faster
>> stanford-crypto-ccm                    556.133+-6.606      ^
>> 264.161+-3.970         ^ definitely 2.1053x faster
>> stanford-crypto-pbkdf2                1945.718+-15.968     ^
>> 1075.013+-13.337        ^ definitely 1.8099x faster
>> stanford-crypto-sha256-iterative       623.203+-7.604      ^
>> 349.782+-12.810        ^ definitely 1.7817x faster
>>
>> <arithmetic>                          2596.775+-14.857     ^
>> 1312.383+-8.840         ^ definitely 1.9787x faster
>>
>> Surprisingly, LLInt ASM interpreter is significantly faster than CLoop. I
>> expected it would be fast, but it would show around 10% performance win.
>> But the reality is that it is 2x faster. It is too much number to me to
>> consider enabling LLInt ASM interpreter for non-JIT build configuration.
>> As a bonus, LLInt ASM interpreter offers sampling profiler support even
>> in non-JIT environment.
>>
>> So my proposal is, how about enabling LLInt ASM interpreter in non-JIT
>> configuration environment in major architectures (x64 and ARM64)?
>>
>> Best regards,
>> Yusuke Suzuki
>>
>> [1]:
>> https://lists.webkit.org/pipermail/webkit-dev/2018-September/030140.html
>>
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> https://lists.webkit.org/mailman/listinfo/webkit-dev
>>
>> _______________________________________________
>> jsc-dev mailing list
>> jsc-dev at lists.webkit.org
>> https://lists.webkit.org/mailman/listinfo/jsc-dev
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20180920/fbb01060/attachment.html>