<html>
<head>
<base href="https://bugs.webkit.org/" />
</head>
<body>
<p>
<div>
<b><a class="bz_bug_link
bz_status_NEW "
title="NEW - Web Inspector: Iterating over a Set/Map is too slow"
href="https://bugs.webkit.org/show_bug.cgi?id=152691#c5">Comment # 5</a>
on <a class="bz_bug_link
bz_status_NEW "
title="NEW - Web Inspector: Iterating over a Set/Map is too slow"
href="https://bugs.webkit.org/show_bug.cgi?id=152691">bug 152691</a>
from <span class="vcard"><a class="email" href="mailto:utatane.tea@gmail.com" title="Yusuke Suzuki <utatane.tea@gmail.com>"> <span class="fn">Yusuke Suzuki</span></a>
</span></b>
<pre>(In reply to <a href="show_bug.cgi?id=152691#c3">comment #3</a>)
<span class="quote">> (In reply to <a href="show_bug.cgi?id=152691#c1">comment #1</a>)
> > I guess we can improve this by making Set.prototype.forEach written in JS.
> > Currently, it is implemented in C++, it avoid inlining the given callback.
>
> But if you have many calls to Set.prototype.forEach, we're not going
> to inline the call.
>
> Is there a fast way to get an array of the elements in the set?
> That way we could just iterate over that array?</span >
I created the small patch and measure the bottlenecks with Linux perf.
The test script, <a href="https://gist.github.com/Constellation/b747d897066ebbc567c8">https://gist.github.com/Constellation/b747d897066ebbc567c8</a>
And I executed the code with `jsc lodash.min.js benchmark.js above-code.js`.
And collect the perf result with `perf record jsc lodash.min.js benchmark.js above-code.js`.
Set forEach x 13,138 ops/sec ±0.21% (66 runs sampled)
And the perf report shows the following.
Samples: 23K of event 'cycles', Event count (approx.): 21446074385
34.04% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::Interpreter::execute(JSC::CallFrameClosure&)
20.48% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] vmEntryToJavaScript
9.80% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::JITCode::execute(JSC::VM*, JSC::ProtoCallFrame*)
7.95% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::setProtoFuncForEach(JSC::ExecState*)
5.65% jsc perf-22854.map [.] 0x00007f5d2c204a6f
2.76% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::JSLock::currentThreadIsHoldingLock()
0.69% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::FTL::JITCode::executableAddressAtOffset(unsigned long)
0.68% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] pthread_self@plt
0.65% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::Interpreter::startSampling()
0.61% jsc libpthread-2.19.so [.] pthread_self
0.60% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::Interpreter::stopSampling()
0.60% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] _ZN3JSC11Interpreter12stopSamplingEv@plt
0.54% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] _ZN3JSC7JITCode7executeEPNS_2VMEPNS_14ProtoCallFrameE@plt
0.52% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] _ZN3JSC6JSLock26currentThreadIsHoldingLockEv@plt
0.38% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] _ZN3JSC11Interpreter7executeERNS_16CallFrameClosureE@plt
0.29% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] _ZN3JSC11Interpreter13startSamplingEv@plt
0.25% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::Lexer<unsigned char>::lex(JSC::JSToken*, unsigned int, bool)
We can see that C++ -> JS call becomes large performance overhead.
So, I've created the small change, adding builtin's Set.prototype.forEach. It results the following,
Set forEach x 20,827 ops/sec ±3.69% (40 runs sampled)
Samples: 23K of event 'cycles', Event count (approx.): 21255691651
62.91% jsc perf-22890.map [.] 0x00007fd117c0a3b9
24.89% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::privateFuncSetIteratorNext(JSC::ExecState*)
0.29% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::CodeBlock::updateAllPredictionsAndCountLiveness(unsigned int&, unsigned int&)
0.24% jsc [vdso] [.] 0x00000000000008e8
0.22% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::CodeBlock::predictedMachineCodeSize()
0.16% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] WTF::MetaAllocator::currentStatistics()
0.15% jsc libjavascriptcoregtk-4.0.so.18.3.1 [.] JSC::Lexer<unsigned char>::lex(JSC::JSToken*, unsigned int, bool)
The one of the bottlenecks was C++ -> JS calls. It can be eliminated and now privateFuncSetIteratorNext (SetIterator's next op) becomes the main contribution.
It's still slow. But at least, dropping C++ -> JS calls improve the performance; 1.6x.
$ WebKitBuild/set-map-old/Release/bin/jsc tmp/lodash.min.js tmp/benchmark.js tmp/hello.js
Linked List x 212,160 ops/sec ±0.39% (66 runs sampled)
Array x 376,348 ops/sec ±0.21% (66 runs sampled)
Array forEach x 20,054 ops/sec ±4.26% (49 runs sampled)
Set forEach x 13,294 ops/sec ±0.20% (67 runs sampled)
Fastest is Array
$ WebKitBuild/set-map/Release/bin/jsc tmp/lodash.min.js tmp/benchmark.js tmp/hello.js
Linked List x 210,151 ops/sec ±0.65% (66 runs sampled)
Array x 370,408 ops/sec ±0.51% (66 runs sampled)
Array forEach x 20,345 ops/sec ±4.35% (46 runs sampled)
Set forEach x 21,007 ops/sec ±3.51% (40 runs sampled)
Fastest is Array</pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are the assignee for the bug.</li>
</ul>
</body>
</html>