[webkit-dev] [jsc-dev] Proposal: Using LLInt Asm in major architectures even if JIT is disabled
yusukesuzuki at slowstart.org
Thu Sep 20 00:00:44 PDT 2018
I've just set up MacBook Pro to measure the effect on macOS.
The results are the followings.
"baseline" at /Users/yusukesuzuki/dev/WebKit/WebKitBuild/nojit/Release/jsc
Collected 2 samples per benchmark/VM, with 2 VM invocations per benchmark.
Emitted a call to gc() between sample
measurements. Used 1 benchmark iteration per VM invocation for warm-up.
Used the jsc-specific preciseTime()
function to get microsecond-level timing. Reporting benchmark execution
times with 95% confidence intervals in
ai-astar 1738.056+-49.666 ^
1568.904+-44.535 ^ definitely 1.1078x faster
audio-beat-detection 1127.677+-15.749 ^
972.323+-23.908 ^ definitely 1.1598x faster
919.933+-310.247 might be 1.0250x faster
audio-fft 985.489+-47.414 ^
796.955+-25.476 ^ definitely 1.2366x faster
audio-oscillator 967.891+-34.854 ^
801.778+-18.226 ^ definitely 1.2072x faster
imaging-darkroom 1265.340+-114.464 ^
1099.233+-2.372 ^ definitely 1.1511x faster
imaging-desaturate 1737.826+-40.791 ?
imaging-gaussian-blur 7846.369+-52.165 ^
6392.379+-1025.168 ^ definitely 1.2275x faster
376.622+-12.111 might be 1.0663x faster
228.013+-8.976 might be 1.0773x faster
864.462+-60.083 might be 1.0887x faster
270.849+-32.356 might be 1.1076x faster
<arithmetic> 1325.281+-2.613 ^
1149.584+-75.875 ^ definitely 1.1528x faster
Interestingly, the improvement is not so large. In Linux box, it was 2x.
But in macOS, it is 15%.
But I think it is very nice if we can get 15% boost without any drawbacks.
On Thu, Sep 20, 2018 at 3:08 PM Saam Barati <sbarati at apple.com> wrote:
> Interesting! I must have not run this experiment correctly when I did it.
> - Saam
> On Sep 19, 2018, at 7:31 PM, Yusuke Suzuki <yusukesuzuki at slowstart.org>
> On Thu, Sep 20, 2018 at 12:54 AM Saam Barati <sbarati at apple.com> wrote:
>> To elaborate: I ran this same experiment before. And I forgot to turn off
>> the RegExp JIT and got results similar to what you got. Once I turned off
>> the RegExp JIT, I saw no perf difference.
> Yeah, I disabled JIT and RegExpJIT explicitly by using
> export JSC_useJIT=false
> export JSC_useRegExpJIT=false
> and I checked no JIT code is generated by running dumpDisassembly. And I
> also put `CRASH()` in ExecutableAllocator::singleton() to ensure no
> executable memory is allocated.
> The result is the same. I think `useJIT=false` disables RegExp JIT too.
> ai-astar 3499.046+-14.772 ^
> 1897.624+-234.517 ^ definitely 1.8439x faster
> audio-beat-detection 1803.466+-491.965
> 970.636+-428.051 might be 1.8580x faster
> audio-dft 1756.985+-68.710 ^
> 954.312+-528.406 ^ definitely 1.8411x faster
> audio-fft 1637.969+-458.129
> 850.083+-449.228 might be 1.9268x faster
> audio-oscillator 1866.006+-569.581 ^
> 967.194+-82.521 ^ definitely 1.9293x faster
> imaging-darkroom 2156.526+-591.042 ^
> 1231.318+-187.297 ^ definitely 1.7514x faster
> imaging-desaturate 3059.335+-284.740 ^
> 1754.128+-339.941 ^ definitely 1.7441x faster
> imaging-gaussian-blur 16034.828+-1930.938 ^
> 7389.919+-2228.020 ^ definitely 2.1698x faster
> json-parse-financial 60.273+-4.143
> 53.935+-28.957 might be 1.1175x faster
> json-stringify-tinderbox 39.497+-3.915
> 38.146+-9.652 might be 1.0354x faster
> stanford-crypto-aes 873.623+-208.225 ^
> 486.350+-132.379 ^ definitely 1.7963x faster
> stanford-crypto-ccm 538.707+-33.979 ^
> 285.944+-41.570 ^ definitely 1.8840x faster
> stanford-crypto-pbkdf2 1929.960+-649.861 ^
> 1044.320+-1.182 ^ definitely 1.8481x faster
> stanford-crypto-sha256-iterative 614.344+-200.228
> 342.574+-123.524 might be 1.7933x faster
> <arithmetic> 2562.183+-207.456 ^
> 1304.749+-312.963 ^ definitely 1.9637x faster
> I think this result is not related to RegExp JIT since ai-astar is not
> using RegExp.
> Best regards,
> Yusuke Suzuki
>> - Saam
>> On Sep 19, 2018, at 8:53 AM, Saam Barati <sbarati at apple.com> wrote:
>> Did you turn off the RegExp JIT?
>> - Saam
>> On Sep 18, 2018, at 11:23 PM, Yusuke Suzuki <yusukesuzuki at slowstart.org>
>> Hi WebKittens!
>> Recently, node-jsc is announced. When I read the documents of that
>> I found that they use LLInt ASM interpreter instead of CLoop in non-JIT
>> So I had one question in my mind: How fast the LLInt ASM interpreter when
>> comparing to CLoop?
>> I've set up two builds. One is CLoop build (-DENABLE_JIT=OFF) and another
>> is JIT build JSC with `JSC_useJIT=false`.
>> And I've ran kraken benchmarks with these two builds in x64 Linux
>> machine. The results are the followings.
>> Benchmark report for Kraken on sakura-trick.
>> VMs tested:
>> "baseline" at
>> "patched" at
>> Collected 10 samples per benchmark/VM, with 10 VM invocations per
>> benchmark. Emitted a call to gc() between sample
>> measurements. Used 1 benchmark iteration per VM invocation for warm-up.
>> Used the jsc-specific preciseTime()
>> function to get microsecond-level timing. Reporting benchmark execution
>> times with 95% confidence intervals in
>> ai-astar 3619.974+-57.095 ^
>> 2014.835+-59.016 ^ definitely 1.7967x faster
>> audio-beat-detection 1762.085+-24.853 ^
>> 1030.902+-19.743 ^ definitely 1.7093x faster
>> audio-dft 1822.426+-28.704 ^
>> 909.262+-16.640 ^ definitely 2.0043x faster
>> audio-fft 1651.070+-9.994 ^
>> 865.203+-7.912 ^ definitely 1.9083x faster
>> audio-oscillator 1853.697+-26.539 ^
>> 992.406+-12.811 ^ definitely 1.8679x faster
>> imaging-darkroom 2118.737+-23.219 ^
>> 1303.729+-8.071 ^ definitely 1.6251x faster
>> imaging-desaturate 3133.654+-28.545 ^
>> 1759.738+-18.182 ^ definitely 1.7808x faster
>> imaging-gaussian-blur 16321.090+-154.893 ^
>> 7228.017+-58.508 ^ definitely 2.2580x faster
>> json-parse-financial 57.256+-2.876
>> 56.101+-4.265 might be 1.0206x faster
>> json-stringify-tinderbox 38.470+-2.788 ?
>> 38.771+-0.935 ?
>> stanford-crypto-aes 851.341+-7.738 ^
>> 485.438+-13.904 ^ definitely 1.7538x faster
>> stanford-crypto-ccm 556.133+-6.606 ^
>> 264.161+-3.970 ^ definitely 2.1053x faster
>> stanford-crypto-pbkdf2 1945.718+-15.968 ^
>> 1075.013+-13.337 ^ definitely 1.8099x faster
>> stanford-crypto-sha256-iterative 623.203+-7.604 ^
>> 349.782+-12.810 ^ definitely 1.7817x faster
>> <arithmetic> 2596.775+-14.857 ^
>> 1312.383+-8.840 ^ definitely 1.9787x faster
>> Surprisingly, LLInt ASM interpreter is significantly faster than CLoop. I
>> expected it would be fast, but it would show around 10% performance win.
>> But the reality is that it is 2x faster. It is too much number to me to
>> consider enabling LLInt ASM interpreter for non-JIT build configuration.
>> As a bonus, LLInt ASM interpreter offers sampling profiler support even
>> in non-JIT environment.
>> So my proposal is, how about enabling LLInt ASM interpreter in non-JIT
>> configuration environment in major architectures (x64 and ARM64)?
>> Best regards,
>> Yusuke Suzuki
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> jsc-dev mailing list
>> jsc-dev at lists.webkit.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the webkit-dev