[webkit-dev] PSA: you should use WTF::Lock and WTF::Condition instead of WTF::SpinLock, WTF::Mutex, WTF::ThreadCondition, std::mutex, std::condition_variable, or std::condition_variable_any
cabanier at gmail.com
Fri Aug 21 21:54:45 PDT 2015
very interesting to see you change out such a fundamental algorithm! It's
amazing that it works.
Reading the code, it seems very comparable to Windows' critical sections
. Is that the case?
Looking at github and msdn, 4000 seems to be the most common number of
I believe your code would also useful be outside of WebKit. Have you given
any thought on breaking it off and making it part of the operating system?
On Wed, Aug 19, 2015 at 1:08 PM, Filip Pizlo <fpizlo at apple.com> wrote:
> Hi everyone,
> Over the past two weeks or so I’ve been doing some work to improve the
> locking primitives that we use in WebKit. These new primitives have
> landed, and they are simply called Lock and Condition. You should use Lock
> instead of SpinLock, Mutex, or std::mutex, because it combines the best
> qualities of those other locks into one simple implementation. You should
> use Condition instead of WTF::ThreadCondition, std::condition_variable, or
> std::condition_variable_any because Condition is always smaller and
> faster. Also, Condition has a richer API for timed wait, which allows you
> to use either std::chrono time or the double-based time from
> <wtf/CurrentTime.h>. It can even use our notion of monotonic time.
> Prior to this change, we had to choose between WTF::SpinLock, which was
> fast, and WTF::Mutex/std::mutex, which didn’t waste CPU cycles during
> contention. After this change, we no longer have to make such choices: the
> new Lock is fast and doesn’t waste CPU cycles.
> I’ve already landed changes to all of WebKit that replace all uses of
> those other locking primitives with Lock/Condition.
> The specific benefits of these new primitives are:
> - Lock and Condition take 1 byte of space. That’s the total space that
> they will ever consume. Contention is handled using thread local data
> structures managed by WTF::ParkingLot and that space usage is O(number of
> threads). By comparison, WTF::Mutex and std::mutex take 64 bytes on
> Darwin. They are usually take a lot of space on all OSes. WTF::SpinLock
> takes 4 bytes. So, Lock is 4x more compact than SpinLock and 64x more
> compact than std::mutex/WTF::Mutex.
> - Lock is “adaptive”: it will not waste CPU time or power when a lock is
> held for a long time. SpinLock will peg a CPU at 100% while it’s trying to
> acquire a lock, which makes SpinLock unsuitable for any critical section
> that may be held for a while. This means never using SpinLock in critical
> sections that do I/O or that may block on other locks. Lock doesn’t have
> this problem; like std::mutex and WTF::Mutex, it has the ability to block
> threads indefinitely while waking them up as soon as the lock is available
> again. WTF::Lock behaves like a spinlock so long as no thread waits for
> too long (currently, too long = 40 spins), and otherwise turns into a
> queue-based lock with barging (a hot thread that calls lock() just as
> another thread is dequeued may barge in, which increases throughput and
> somewhat avoids the convoy problem) and random fairness (in the long term,
> every contending thread has an equal shot at getting the lock) and
> thundering herd avoidance (unlock could cause one thread to try to contend
> for the lock, but it won’t ever wake up threads only to have them go back
> to sleep).
> - Lock is very fast in the uncontended case. Locking and unlocking fast
> paths are just inline CAS instructions (or LL/SC sequences on ARM and
> friends). This makes Lock about 3x faster than some system mutexes, and
> within 2x of a spin lock. For Lock performance details, see
> http://trac.webkit.org/changeset/188323/trunk/Source/WTF/ChangeLog, and
> - Lock is very fast in the case of microcontention. Microcontention is
> when you have multiple threads all piling up on a lock that is held for a
> short time. You may remember that this case is important in WebKit, in
> particular in our parallel GC: https://trac.webkit.org/changeset/117478,
> https://www.webkit.org/blog/2136/on-spinlocks-and-sleep/. Well, Lock
> takes all of the good ideas from the WTF::SpinLock, which was our previous
> best answer for microcontention. Lock is up to 100x faster than WTF::Mutex
> and std::mutex in case of microcontention on some OSes.
> - Condition::notifyOne()/notifyAll() are just a fast inlined
> load-and-branch in the case that nobody is waiting on the condition. This
> feature, combined with the fact that Condition can be used with Lock, and
> the fact that Lock is very fast to acquire and release, means that
> producer/consumer scenarios can run up to 58x faster with Lock/Condition
> than with Mutex/ThreadCondition or std::mutex/std::condition_variable. For
> Condition performance details, see
> http://trac.webkit.org/changeset/188594/trunk/Source/WTF/ChangeLog and
> The algorithms used for Lock and Condition have novelty in them, but the
> overall heuristics are consistent with what I take to be existing best
> practices as documented in other places. The way that Lock handles
> contention is most similar to how Jikes RVM did it. That algorithm is best
> documented in http://www.filpizlo.com/papers/pizlo-pppj2011-fable.pdf (though
> ignore all of the stuff about biasing and inflation of “thin” locks -
> WTF::Lock doesn’t do any of that). That basic algorithm had been around
> for a while and took in many influences (see that paper for citations to
> other relevant stuff).
> FWIW, in parallel and concurrent code in JSC, switching over to
> Lock/Condition from the other primitives gave us speed-ups on SunSpider and
> Kraken, as well as speed-ups on some Octane subtests. Both parallel GC and
> concurrent/parallel JIT appeared to be happy with this change.
> Another nice outcome of this work is that the underlying queueing
> mechanism used by both Lock and Condition is available for implementing
> other locking protocols. It’s called ParkingLot, because its job is to
> allow you to “park” threads that shouldn’t run until someone else “unparks”
> them. Any valid memory address can be used as a handle for a ParkingLot
> queue. ParkingLot can be used to implement any locking protocol that uses
> futexes (https://www.kernel.org/doc/ols/2002/ols2002-pages-479-495.pdf) -
> just use “parkConditionally” whenever you would have used “FUTEX_WAIT” and
> use “unparkOne” when you would have used “FUTEX_WAKE”, but it can also do a
> lot more than that, since it allows you to pass std::function callbacks
> that run while ParkingLot holds its internal queue lock. This is what
> enables Lock to avoid the thundering herd and enables Condition to have a
> lock-free notify fast path when nobody is waiting. In theory, you can use
> ParkingLot to implement completely fair locks, counting semaphores,
> read-write locks, and other things. It’ll also come in handy for
> implementing new locking primitives for things like WebAssembly.
> TL;DR. Don’t use WTF::SpinLock, WTF::Mutex, std::mutex,
> WTF::ThreadCondition, std::condition_variable, or
> std::condition_variable_any. They waste CPU time and they waste memory.
> Use WTF::Lock and WTF::Condition instead.
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the webkit-dev