[Webkit-unassigned] [Bug 162095] New: Speed up Heap::isMarkedConcurrently

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri Sep 16 15:32:33 PDT 2016


https://bugs.webkit.org/show_bug.cgi?id=162095

            Bug ID: 162095
           Summary: Speed up Heap::isMarkedConcurrently
    Classification: Unclassified
           Product: WebKit
           Version: WebKit Nightly Build
          Hardware: Unspecified
                OS: Unspecified
            Status: NEW
          Severity: Normal
          Priority: P2
         Component: JavaScriptCore
          Assignee: webkit-unassigned at lists.webkit.org
          Reporter: jfbastien at apple.com

Heap::isMarkedConcurrently has a fairly expensive load-load fence.

This fence is there because:
 1. If the read of m_version in MarkedBlock::needsFlip isn't what's expected then;
 2. The read of m_marks in MarkedBlock::isMarked needs to observe the value that was stored *before* m_version was stored.

This ordering isn't guaranteed on ARM, which has a weak memory model.

There are 3 ways to guarantee this ordering:
 A. Use a barrier instruction.
 B. Use a load-acquire (new in ARMv8).
 C. use ARM's address dependency rule, which C++ calls memory_order_consume.

In general:
 A. is slow but orders all of memory in an intuitive manner.
 B. is faster-ish and has the same property-ish.
 C. should be faster still, but *only orders dependent loads*. This last part is critical! Consume isn't an all-out replacement for acquire (acquire is rather a superset of consume).


ARM explains the address dependency rule in their document "barrier litmus tests and cookbook":

> *Resolving by the use of barriers and address dependency*
>
> There is a rule within the ARM architecture that:
>
> Where the value returned by a read is used to compute the virtual address of a subsequent read or write (this is known as an address dependency), then these two memory accesses will be observed in program order. An address dependency exists even if the value read by the first read has no effect in changing the virtual address (as might be the case if the value returned is masked off before it is used, or if it had no effect on changing a predicted address value).
> 
> This restriction applies only when the data value returned from one read is used as a data value to calculate the address of a subsequent read or write. This does not apply if the data value returned from
one read is used to determine the condition code flags, and the values of the flags are used for condition code evaluation to determine the address of a subsequent reads, either through conditional execution or
the evaluation of a branch. This is known as a control dependency.
>
> Where both a control and address dependency exist, the ordering behaviour is consistent with the address dependency. 


C++'s memory_order_consume is unfortunately unimplemented by C++ compilers, and maybe unimplementable as spec'd. I'm working with interested folks in the committee to fix this situation: http://wg21.link/p0190r2

You'll note that this paper has a bunch of proposed solutions, no C++ implementation of any of them, and that Linux uses consume ordering in RCU successfully. I therefore intend to:
 1. implement our own special-purpose dependency + consume in WebKit's Atomics.h;
 2. Benchmark and use it in this location;
 3. If that works out, commit and slowly start using it in other locations (which may require improving the hacky API I have in mind).
 4. Feed this information back to the C++ standards committee so that C++Next has a proper way to use consume ordering.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.webkit.org/pipermail/webkit-unassigned/attachments/20160916/efc6c187/attachment.html>


More information about the webkit-unassigned mailing list