[Webkit-unassigned] [Bug 252981] New: useConcurrentJIT=true makes JS2-wasm slower on higher core counts

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Feb 27 02:58:49 PST 2023


https://bugs.webkit.org/show_bug.cgi?id=252981

            Bug ID: 252981
           Summary: useConcurrentJIT=true makes JS2-wasm slower on higher
                    core counts
           Product: WebKit
           Version: WebKit Nightly Build
          Hardware: Unspecified
                OS: Unspecified
            Status: NEW
          Severity: Normal
          Priority: P2
         Component: JavaScriptCore
          Assignee: webkit-unassigned at lists.webkit.org
          Reporter: angelos at igalia.com

Created attachment 465195

  --> https://bugs.webkit.org/attachment.cgi?id=465195&action=review

Plot of richards-wasm score vs number of worklist threads, --useBBQJIT=false --useOMGJIT=false

Running the wasm parts of JS2 (wasm-cli.js) with useConcurrentJIT=true on an 80-core ARM64 box results in a performance slowdown. Specifically, with --useBBQJIT=false --useOMGJIT=false, the total score for wasm-cli.js hovers around 10 with useConcurrentJIT=true and between 13-14 for useConcurrentJIT=false.

As far as I can tell, this is because of contention when claiming functions from the Wasm::LLintPlan in the Wasm::Worklist. I can easily bring it back to parity with the useConcurrentJIT=false case by changing the worklist code to spawn a single thread or by only notifying one of the threads for the compilation phase.

The test most affected seems to be richards-wasm. Experimenting just with richards-wasm, I see performance start dropping off when the number of threads goes over 12 or so.

This may not currently be a big problem on end user devices, but it does affect benchmarking and is probably going to become more of an issue in the future.

I've experimented with simply batching the workload: have each worklist thread claim consecutive functions (up to functionBytes/batchSize, functionBytes being the sum of the function sizes in the module, batchSize being a tunable) and I can finally get a modest (~10%) speedup for the useConcurrentJIT=true case with a batchSize of 16KB (didn't put a lot of effort into tuning).

This probably applies on other wasm tiers too, but I haven't verified that. I can submit a PR with the batchSize changes but the batch size would have to be tuned for each user of Wasm::Worklist (and probably for each arch too).

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20230227/4b8b6951/attachment-0001.htm>


More information about the webkit-unassigned mailing list