<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[188374] trunk</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/188374">188374</a></dd>
<dt>Author</dt> <dd>fpizlo@apple.com</dd>
<dt>Date</dt> <dd>2015-08-12 20:51:25 -0700 (Wed, 12 Aug 2015)</dd>
</dl>

<h3>Log Message</h3>
<pre>WTF::Lock should not suffer from the thundering herd
https://bugs.webkit.org/show_bug.cgi?id=147947

Reviewed by Geoffrey Garen.

Source/WTF:

This changes Lock::unlockSlow() to use unparkOne() instead of unparkAll(). The problem with
doing this is that it's not obvious after calling unparkOne() if there are any other threads
that are still parked on the lock's queue. If we assume that there are and leave the
hasParkedBit set, then future calls to unlock() will take the slow path. We don't want that
if there aren't actually any threads parked. On the other hand, if we assume that there
aren't any threads parked and clear the hasParkedBit, then if there actually were some
threads parked, then they may never be awoken since future calls to unlock() won't take slow
path and so won't call unparkOne(). In other words, we need a way to be very precise about
when we clear the hasParkedBit and we need to do it in a race-free way: it can't be the case
that we clear the bit just as some thread gets parked on the queue.

A similar problem arises in futexes, and one of the solutions is to have a thread that
acquires a lock after parking sets the hasParkedBit. This is what Rusty Russel's usersem
does. It's a subtle algorithm. Also, it means that if a thread barges in before the unparked
thread runs, then that barging thread will not know that there are threads parked. This
could increase the severity of barging.

Since ParkingLot is a user-level API, we don't have to worry about the kernel-user security
issues and so we can expose callbacks while ParkingLot is holding its internal locks. This
change does exactly that for unparkOne(). The new variant of unparkOne() will call a user
function while the queue from which we are unparking is locked. The callback is told basic
stats about the queue: did we unpark a thread this time, and could there be more threads to
unpark in the future. The callback runs while it's impossible for the queue state to change,
since the ParkingLot's internal locks for the queue is held. This means that
Lock::unlockSlow() can either clear, or leave, the hasParkedBit while releasing the lock
inside the callback from unparkOne(). This takes care of the thundering herd problem while
also reducing the greed that arises from barging threads.

This required some careful reworking of the ParkingLot algorithm. The first thing I noticed
was that the ThreadData::shouldPark flag was useless, since it's set exactly when
ThreadData::address is non-null. Then I had to make sure that dequeue() could lazily create
both hashtables and buckets, since the &quot;callback is called while queue is locked&quot; invariant
requires that we didn't exit early due to the hashtable or bucket not being present. Note
that all of this is done in such a way that the old unparkOne() and unparkAll() don't have
to create any buckets, though they now may create the hashtable. We don't care as much about
the hashtable being created by unpark since it's just such an unlikely scenario and it would
only happen once.

This change reduces the kernel CPU usage of WTF::Lock for the long critical section test by
about 8x and makes it always perform as well as WTF::WordLock and WTF::Mutex for that
benchmark.

* benchmarks/LockSpeedTest.cpp:
* wtf/Lock.cpp:
(WTF::LockBase::unlockSlow):
* wtf/Lock.h:
(WTF::LockBase::isLocked):
(WTF::LockBase::isFullyReset):
* wtf/ParkingLot.cpp:
(WTF::ParkingLot::parkConditionally):
(WTF::ParkingLot::unparkOne):
(WTF::ParkingLot::unparkAll):
* wtf/ParkingLot.h:
* wtf/WordLock.h:
(WTF::WordLock::isLocked):
(WTF::WordLock::isFullyReset):

Tools:

Add testing that checks that locks return to a pristine state after contention is over.

* TestWebKitAPI/Tests/WTF/Lock.cpp:
(TestWebKitAPI::LockInspector::isFullyReset):
(TestWebKitAPI::runLockTest):
(TestWebKitAPI::TEST):</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceWTFChangeLog">trunk/Source/WTF/ChangeLog</a></li>
<li><a href="#trunkSourceWTFbenchmarksLockSpeedTestcpp">trunk/Source/WTF/benchmarks/LockSpeedTest.cpp</a></li>
<li><a href="#trunkSourceWTFwtfLockcpp">trunk/Source/WTF/wtf/Lock.cpp</a></li>
<li><a href="#trunkSourceWTFwtfLockh">trunk/Source/WTF/wtf/Lock.h</a></li>
<li><a href="#trunkSourceWTFwtfParkingLotcpp">trunk/Source/WTF/wtf/ParkingLot.cpp</a></li>
<li><a href="#trunkSourceWTFwtfParkingLoth">trunk/Source/WTF/wtf/ParkingLot.h</a></li>
<li><a href="#trunkSourceWTFwtfWordLockh">trunk/Source/WTF/wtf/WordLock.h</a></li>
<li><a href="#trunkToolsChangeLog">trunk/Tools/ChangeLog</a></li>
<li><a href="#trunkToolsTestWebKitAPITestsWTFLockcpp">trunk/Tools/TestWebKitAPI/Tests/WTF/Lock.cpp</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceWTFChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/ChangeLog (188373 => 188374)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/ChangeLog        2015-08-13 02:12:13 UTC (rev 188373)
+++ trunk/Source/WTF/ChangeLog        2015-08-13 03:51:25 UTC (rev 188374)
</span><span class="lines">@@ -1,3 +1,67 @@
</span><ins>+2015-08-12  Filip Pizlo  &lt;fpizlo@apple.com&gt;
+
+        WTF::Lock should not suffer from the thundering herd
+        https://bugs.webkit.org/show_bug.cgi?id=147947
+
+        Reviewed by Geoffrey Garen.
+
+        This changes Lock::unlockSlow() to use unparkOne() instead of unparkAll(). The problem with
+        doing this is that it's not obvious after calling unparkOne() if there are any other threads
+        that are still parked on the lock's queue. If we assume that there are and leave the
+        hasParkedBit set, then future calls to unlock() will take the slow path. We don't want that
+        if there aren't actually any threads parked. On the other hand, if we assume that there
+        aren't any threads parked and clear the hasParkedBit, then if there actually were some
+        threads parked, then they may never be awoken since future calls to unlock() won't take slow
+        path and so won't call unparkOne(). In other words, we need a way to be very precise about
+        when we clear the hasParkedBit and we need to do it in a race-free way: it can't be the case
+        that we clear the bit just as some thread gets parked on the queue.
+
+        A similar problem arises in futexes, and one of the solutions is to have a thread that
+        acquires a lock after parking sets the hasParkedBit. This is what Rusty Russel's usersem
+        does. It's a subtle algorithm. Also, it means that if a thread barges in before the unparked
+        thread runs, then that barging thread will not know that there are threads parked. This
+        could increase the severity of barging.
+
+        Since ParkingLot is a user-level API, we don't have to worry about the kernel-user security
+        issues and so we can expose callbacks while ParkingLot is holding its internal locks. This
+        change does exactly that for unparkOne(). The new variant of unparkOne() will call a user
+        function while the queue from which we are unparking is locked. The callback is told basic
+        stats about the queue: did we unpark a thread this time, and could there be more threads to
+        unpark in the future. The callback runs while it's impossible for the queue state to change,
+        since the ParkingLot's internal locks for the queue is held. This means that
+        Lock::unlockSlow() can either clear, or leave, the hasParkedBit while releasing the lock
+        inside the callback from unparkOne(). This takes care of the thundering herd problem while
+        also reducing the greed that arises from barging threads.
+
+        This required some careful reworking of the ParkingLot algorithm. The first thing I noticed
+        was that the ThreadData::shouldPark flag was useless, since it's set exactly when
+        ThreadData::address is non-null. Then I had to make sure that dequeue() could lazily create
+        both hashtables and buckets, since the &quot;callback is called while queue is locked&quot; invariant
+        requires that we didn't exit early due to the hashtable or bucket not being present. Note
+        that all of this is done in such a way that the old unparkOne() and unparkAll() don't have
+        to create any buckets, though they now may create the hashtable. We don't care as much about
+        the hashtable being created by unpark since it's just such an unlikely scenario and it would
+        only happen once.
+
+        This change reduces the kernel CPU usage of WTF::Lock for the long critical section test by
+        about 8x and makes it always perform as well as WTF::WordLock and WTF::Mutex for that
+        benchmark.
+
+        * benchmarks/LockSpeedTest.cpp:
+        * wtf/Lock.cpp:
+        (WTF::LockBase::unlockSlow):
+        * wtf/Lock.h:
+        (WTF::LockBase::isLocked):
+        (WTF::LockBase::isFullyReset):
+        * wtf/ParkingLot.cpp:
+        (WTF::ParkingLot::parkConditionally):
+        (WTF::ParkingLot::unparkOne):
+        (WTF::ParkingLot::unparkAll):
+        * wtf/ParkingLot.h:
+        * wtf/WordLock.h:
+        (WTF::WordLock::isLocked):
+        (WTF::WordLock::isFullyReset):
+
</ins><span class="cx"> 2015-08-11  Filip Pizlo  &lt;fpizlo@apple.com&gt;
</span><span class="cx"> 
</span><span class="cx">         Always use a byte-sized lock implementation
</span></span></pre></div>
<a id="trunkSourceWTFbenchmarksLockSpeedTestcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/benchmarks/LockSpeedTest.cpp (188373 => 188374)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/benchmarks/LockSpeedTest.cpp        2015-08-13 02:12:13 UTC (rev 188373)
+++ trunk/Source/WTF/benchmarks/LockSpeedTest.cpp        2015-08-13 03:51:25 UTC (rev 188374)
</span><span class="lines">@@ -44,7 +44,7 @@
</span><span class="cx">     
</span><span class="cx"> NO_RETURN void usage()
</span><span class="cx"> {
</span><del>-    printf(&quot;Usage: LockSpeedTest spinlock|wordlock|lock|bytelock|mutex|all &lt;num thread groups&gt; &lt;num threads per group&gt; &lt;work per critical section&gt; &lt;num noise threads&gt; &lt;num iterations&gt;\n&quot;);
</del><ins>+    printf(&quot;Usage: LockSpeedTest spinlock|wordlock|lock|mutex|all &lt;num thread groups&gt; &lt;num threads per group&gt; &lt;work per critical section&gt; &lt;num noise threads&gt; &lt;num iterations&gt;\n&quot;);
</ins><span class="cx">     exit(1);
</span><span class="cx"> }
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceWTFwtfLockcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/Lock.cpp (188373 => 188374)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/Lock.cpp        2015-08-13 02:12:13 UTC (rev 188373)
+++ trunk/Source/WTF/wtf/Lock.cpp        2015-08-13 03:51:25 UTC (rev 188374)
</span><span class="lines">@@ -69,37 +69,43 @@
</span><span class="cx">         // We now expect the value to be isHeld|hasParked. So long as that's the case, we can park.
</span><span class="cx">         ParkingLot::compareAndPark(&amp;m_byte, isHeldBit | hasParkedBit);
</span><span class="cx"> 
</span><del>-        // We have awaken, or we never parked because the byte value changed. Either way, we loop
</del><ins>+        // We have awoken, or we never parked because the byte value changed. Either way, we loop
</ins><span class="cx">         // around and try again.
</span><span class="cx">     }
</span><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> void LockBase::unlockSlow()
</span><span class="cx"> {
</span><del>-    // Release the lock while finding out if someone is parked. Note that as soon as we do this,
-    // someone might barge in.
-    uint8_t oldByteValue;
</del><ins>+    // We could get here because the weak CAS in unlock() failed spuriously, or because there is
+    // someone parked. So, we need a CAS loop: even if right now the lock is just held, it could
+    // be held and parked if someone attempts to lock just as we are unlocking.
</ins><span class="cx">     for (;;) {
</span><del>-        oldByteValue = m_byte.load();
-        if (verbose)
-            dataLog(toString(currentThread(), &quot;: unlocking with &quot;, oldByteValue, &quot;\n&quot;));
-        ASSERT(oldByteValue &amp; isHeldBit);
-        if (m_byte.compareExchangeWeak(oldByteValue, 0))
-            break;
-    }
</del><ins>+        uint8_t oldByteValue = m_byte.load();
+        ASSERT(oldByteValue == isHeldBit || oldByteValue == (isHeldBit | hasParkedBit));
+        
+        if (oldByteValue == isHeldBit) {
+            if (m_byte.compareExchangeWeak(isHeldBit, 0))
+                return;
+            continue;
+        }
</ins><span class="cx"> 
</span><del>-    // Note that someone could try to park right now. If that happens, they will return immediately
-    // because our parking predicate is that m_byte == isHeldBit | hasParkedBit, but we've already set
-    // m_byte = 0.
</del><ins>+        // Someone is parked. Unpark exactly one thread, possibly leaving the parked bit set if
+        // there is a chance that there are still other threads parked.
+        ASSERT(oldByteValue == (isHeldBit | hasParkedBit));
+        ParkingLot::unparkOne(
+            &amp;m_byte,
+            [this] (bool, bool mayHaveMoreThreads) {
+                // We are the only ones that can clear either the isHeldBit or the hasParkedBit,
+                // so we should still see both bits set right now.
+                ASSERT(m_byte.load() == (isHeldBit | hasParkedBit));
</ins><span class="cx"> 
</span><del>-    // If there had been threads parked, unpark all of them. This causes a thundering herd, but while
-    // that is theoretically scary, it's totally fine in WebKit because we usually don't have enough
-    // threads for this to matter.
-    // FIXME: We don't really need this to exhibit thundering herd. We could use unparkOne(), and if
-    // that returns true, just set the parked bit again. If in the process of setting the parked bit
-    // we fail the CAS, then just unpark again.
-    if (oldByteValue &amp; hasParkedBit)
-        ParkingLot::unparkAll(&amp;m_byte);
</del><ins>+                if (mayHaveMoreThreads)
+                    m_byte.store(hasParkedBit);
+                else
+                    m_byte.store(0);
+            });
+        return;
+    }
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> } // namespace WTF
</span></span></pre></div>
<a id="trunkSourceWTFwtfLockh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/Lock.h (188373 => 188374)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/Lock.h        2015-08-13 02:12:13 UTC (rev 188373)
+++ trunk/Source/WTF/wtf/Lock.h        2015-08-13 03:51:25 UTC (rev 188374)
</span><span class="lines">@@ -31,6 +31,10 @@
</span><span class="cx"> #include &lt;wtf/Locker.h&gt;
</span><span class="cx"> #include &lt;wtf/Noncopyable.h&gt;
</span><span class="cx"> 
</span><ins>+namespace TestWebKitAPI {
+struct LockInspector;
+};
+
</ins><span class="cx"> namespace WTF {
</span><span class="cx"> 
</span><span class="cx"> // This is a fully adaptive mutex that only requires 1 byte of storage. It has fast paths that are
</span><span class="lines">@@ -73,12 +77,20 @@
</span><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx"> protected:
</span><ins>+    friend struct TestWebKitAPI::LockInspector;
+    
</ins><span class="cx">     static const uint8_t isHeldBit = 1;
</span><span class="cx">     static const uint8_t hasParkedBit = 2;
</span><span class="cx"> 
</span><span class="cx">     WTF_EXPORT_PRIVATE void lockSlow();
</span><span class="cx">     WTF_EXPORT_PRIVATE void unlockSlow();
</span><span class="cx"> 
</span><ins>+    // Method used for testing only.
+    bool isFullyReset() const
+    {
+        return !m_byte.load();
+    }
+
</ins><span class="cx">     Atomic&lt;uint8_t&gt; m_byte;
</span><span class="cx"> };
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceWTFwtfParkingLotcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/ParkingLot.cpp (188373 => 188374)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/ParkingLot.cpp        2015-08-13 02:12:13 UTC (rev 188373)
+++ trunk/Source/WTF/wtf/ParkingLot.cpp        2015-08-13 03:51:25 UTC (rev 188374)
</span><span class="lines">@@ -52,11 +52,11 @@
</span><span class="cx"> 
</span><span class="cx">     ThreadIdentifier threadIdentifier;
</span><span class="cx">     
</span><del>-    bool shouldPark { false };
</del><span class="cx">     std::mutex parkingLock;
</span><span class="cx">     std::condition_variable parkingCondition;
</span><span class="cx"> 
</span><del>-    const void* address;
</del><ins>+    const void* address { nullptr };
+    
</ins><span class="cx">     ThreadData* nextInQueue { nullptr };
</span><span class="cx"> };
</span><span class="cx"> 
</span><span class="lines">@@ -212,26 +212,37 @@
</span><span class="cx">     return WTF::PtrHash&lt;const void*&gt;::hash(address);
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-// Locks the hashtable. This reloops in case of rehashing, so the current hashtable may be different
-// after this returns than when you called it. Guarantees that there is a hashtable. This is pretty
-// slow and not scalable, so it's only used during thread creation and for debugging/testing.
-Vector&lt;Bucket*&gt; lockHashtable()
</del><ins>+Hashtable* ensureHashtable()
</ins><span class="cx"> {
</span><span class="cx">     for (;;) {
</span><del>-        Hashtable* currentHashtable;
</del><ins>+        Hashtable* currentHashtable = hashtable.load();
</ins><span class="cx"> 
</span><del>-        currentHashtable = hashtable.load();
</del><ins>+        if (currentHashtable)
+            return currentHashtable;
+
</ins><span class="cx">         if (!currentHashtable) {
</span><del>-            // Try to be the first to create the hashtable.
</del><span class="cx">             currentHashtable = Hashtable::create(maxLoadFactor);
</span><del>-            if (!hashtable.compareExchangeWeak(nullptr, currentHashtable)) {
-                Hashtable::destroy(currentHashtable);
-                continue;
</del><ins>+            if (hashtable.compareExchangeWeak(nullptr, currentHashtable)) {
+                if (verbose)
+                    dataLog(toString(currentThread(), &quot;: created initial hashtable &quot;, RawPointer(currentHashtable), &quot;\n&quot;));
+                return currentHashtable;
</ins><span class="cx">             }
</span><del>-            if (verbose)
-                dataLog(toString(currentThread(), &quot;: created initial hashtable &quot;, RawPointer(currentHashtable), &quot;\n&quot;));
</del><ins>+
+            Hashtable::destroy(currentHashtable);
</ins><span class="cx">         }
</span><ins>+    }
+}
</ins><span class="cx"> 
</span><ins>+// Locks the hashtable. This reloops in case of rehashing, so the current hashtable may be different
+// after this returns than when you called it. Guarantees that there is a hashtable. This is pretty
+// slow and not scalable, so it's only used during thread creation and for debugging/testing.
+Vector&lt;Bucket*&gt; lockHashtable()
+{
+    for (;;) {
+        Hashtable* currentHashtable = ensureHashtable();
+
+        ASSERT(currentHashtable);
+
</ins><span class="cx">         // Now find all of the buckets. This makes sure that the hashtable is full of buckets so that
</span><span class="cx">         // we can lock all of the buckets, not just the ones that are materialized.
</span><span class="cx">         Vector&lt;Bucket*&gt; buckets;
</span><span class="lines">@@ -405,7 +416,7 @@
</span><span class="cx">     unsigned hash = hashAddress(address);
</span><span class="cx"> 
</span><span class="cx">     for (;;) {
</span><del>-        Hashtable* myHashtable = hashtable.load();
</del><ins>+        Hashtable* myHashtable = ensureHashtable();
</ins><span class="cx">         unsigned index = hash % myHashtable-&gt;size;
</span><span class="cx">         Atomic&lt;Bucket*&gt;&amp; bucketPointer = myHashtable-&gt;data[index];
</span><span class="cx">         Bucket* bucket;
</span><span class="lines">@@ -444,50 +455,51 @@
</span><span class="cx">     }
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-template&lt;typename Functor&gt;
-bool dequeue(const void* address, const Functor&amp; functor)
</del><ins>+enum class BucketMode {
+    EnsureNonEmpty,
+    IgnoreEmpty
+};
+
+template&lt;typename DequeueFunctor, typename FinishFunctor&gt;
+bool dequeue(
+    const void* address, BucketMode bucketMode, const DequeueFunctor&amp; dequeueFunctor,
+    const FinishFunctor&amp; finishFunctor)
</ins><span class="cx"> {
</span><del>-    if (verbose)
-        dataLog(toString(currentThread(), &quot;: dequeueing address &quot;, RawPointer(address), &quot;\n&quot;));
</del><span class="cx">     unsigned hash = hashAddress(address);
</span><del>-    if (verbose)
-        dataLog(toString(currentThread(), &quot;: hash = &quot;, hash, &quot;\n&quot;));
</del><span class="cx"> 
</span><span class="cx">     for (;;) {
</span><del>-        Hashtable* myHashtable = hashtable.load();
-        if (!myHashtable) {
-            if (verbose)
-                dataLog(toString(currentThread(), &quot;: no hashtable.\n&quot;));
-            return false;
-        }
</del><ins>+        Hashtable* myHashtable = ensureHashtable();
</ins><span class="cx">         unsigned index = hash % myHashtable-&gt;size;
</span><del>-        if (verbose)
-            dataLog(toString(currentThread(), &quot;: index = &quot;, index, &quot;\n&quot;));
-        Bucket* bucket = myHashtable-&gt;data[index].load();
</del><ins>+        Atomic&lt;Bucket*&gt;&amp; bucketPointer = myHashtable-&gt;data[index];
+        Bucket* bucket = bucketPointer.load();
</ins><span class="cx">         if (!bucket) {
</span><del>-            if (verbose)
-                dataLog(toString(currentThread(), &quot;: no bucket at &quot;, RawPointer(bucket), &quot;.\n&quot;));
-            return false;
</del><ins>+            if (bucketMode == BucketMode::IgnoreEmpty)
+                return false;
+
+            for (;;) {
+                bucket = bucketPointer.load();
+                if (!bucket) {
+                    bucket = new Bucket();
+                    if (!bucketPointer.compareExchangeWeak(nullptr, bucket)) {
+                        delete bucket;
+                        continue;
+                    }
+                }
+                break;
+            }
</ins><span class="cx">         }
</span><span class="cx"> 
</span><del>-        if (verbose)
-            dataLog(toString(currentThread(), &quot;: locking bucket at &quot;, RawPointer(bucket), &quot;\n&quot;));
</del><span class="cx">         bucket-&gt;lock.lock();
</span><del>-        if (verbose)
-            dataLog(toString(currentThread(), &quot;: locked bucket at &quot;, RawPointer(bucket), &quot;\n&quot;));
</del><span class="cx"> 
</span><span class="cx">         // At this point the hashtable could have rehashed under us.
</span><span class="cx">         if (hashtable.load() != myHashtable) {
</span><del>-            if (verbose)
-                dataLog(toString(currentThread(), &quot;: hashtable changed.\n&quot;));
</del><span class="cx">             bucket-&gt;lock.unlock();
</span><span class="cx">             continue;
</span><span class="cx">         }
</span><span class="cx"> 
</span><del>-        if (verbose)
-            dataLog(toString(currentThread(), &quot;: found bucket.\n&quot;));
-        bucket-&gt;genericDequeue(functor);
</del><ins>+        bucket-&gt;genericDequeue(dequeueFunctor);
</ins><span class="cx">         bool result = !!bucket-&gt;queueHead;
</span><ins>+        finishFunctor(result);
</ins><span class="cx">         bucket-&gt;lock.unlock();
</span><span class="cx">         return result;
</span><span class="cx">     }
</span><span class="lines">@@ -502,8 +514,6 @@
</span><span class="cx">     
</span><span class="cx">     ThreadData* me = myThreadData();
</span><span class="cx"> 
</span><del>-    ASSERT(!me-&gt;shouldPark);
-    
</del><span class="cx">     bool result = enqueue(
</span><span class="cx">         address,
</span><span class="cx">         [&amp;] () -&gt; ThreadData* {
</span><span class="lines">@@ -511,7 +521,6 @@
</span><span class="cx">                 return nullptr;
</span><span class="cx"> 
</span><span class="cx">             me-&gt;address = address;
</span><del>-            me-&gt;shouldPark = true;
</del><span class="cx">             return me;
</span><span class="cx">         });
</span><span class="cx"> 
</span><span class="lines">@@ -522,10 +531,8 @@
</span><span class="cx">         dataLog(toString(currentThread(), &quot;: parking self: &quot;, RawPointer(me), &quot;\n&quot;));
</span><span class="cx">     {
</span><span class="cx">         std::unique_lock&lt;std::mutex&gt; locker(me-&gt;parkingLock);
</span><del>-        while (me-&gt;shouldPark)
</del><ins>+        while (me-&gt;address)
</ins><span class="cx">             me-&gt;parkingCondition.wait(locker);
</span><del>-
-        me-&gt;address = nullptr;
</del><span class="cx">     }
</span><span class="cx">     if (verbose)
</span><span class="cx">         dataLog(toString(currentThread(), &quot;: unparked self: &quot;, RawPointer(me), &quot;\n&quot;));
</span><span class="lines">@@ -540,27 +547,62 @@
</span><span class="cx">     ThreadData* threadData = nullptr;
</span><span class="cx">     bool result = dequeue(
</span><span class="cx">         address,
</span><ins>+        BucketMode::IgnoreEmpty,
</ins><span class="cx">         [&amp;] (ThreadData* element) {
</span><span class="cx">             if (element-&gt;address != address)
</span><span class="cx">                 return DequeueResult::Ignore;
</span><span class="cx">             threadData = element;
</span><span class="cx">             return DequeueResult::RemoveAndStop;
</span><del>-        });
</del><ins>+        },
+        [] (bool) { });
</ins><span class="cx"> 
</span><span class="cx">     if (!threadData)
</span><span class="cx">         return false;
</span><span class="cx"> 
</span><del>-    ASSERT(threadData-&gt;shouldPark);
</del><ins>+    ASSERT(threadData-&gt;address);
</ins><span class="cx">     
</span><span class="cx">     {
</span><span class="cx">         std::unique_lock&lt;std::mutex&gt; locker(threadData-&gt;parkingLock);
</span><del>-        threadData-&gt;shouldPark = false;
-        threadData-&gt;parkingCondition.notify_all();
</del><ins>+        threadData-&gt;address = nullptr;
</ins><span class="cx">     }
</span><ins>+    threadData-&gt;parkingCondition.notify_one();
</ins><span class="cx"> 
</span><span class="cx">     return result;
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+void ParkingLot::unparkOne(
+    const void* address,
+    std::function&lt;void(bool didUnparkThread, bool mayHaveMoreThreads)&gt; callback)
+{
+    if (verbose)
+        dataLog(toString(currentThread(), &quot;: unparking one the hard way.\n&quot;));
+
+    ThreadData* threadData = nullptr;
+    dequeue(
+        address,
+        BucketMode::EnsureNonEmpty,
+        [&amp;] (ThreadData* element) {
+            if (element-&gt;address != address)
+                return DequeueResult::Ignore;
+            threadData = element;
+            return DequeueResult::RemoveAndStop;
+        },
+        [&amp;] (bool mayHaveMoreThreads) {
+            callback(!!threadData, threadData &amp;&amp; mayHaveMoreThreads);
+        });
+
+    if (!threadData)
+        return;
+
+    ASSERT(threadData-&gt;address);
+    
+    {
+        std::unique_lock&lt;std::mutex&gt; locker(threadData-&gt;parkingLock);
+        threadData-&gt;address = nullptr;
+    }
+    threadData-&gt;parkingCondition.notify_one();
+}
+
</ins><span class="cx"> void ParkingLot::unparkAll(const void* address)
</span><span class="cx"> {
</span><span class="cx">     if (verbose)
</span><span class="lines">@@ -569,6 +611,7 @@
</span><span class="cx">     Vector&lt;ThreadData*, 8&gt; threadDatas;
</span><span class="cx">     dequeue(
</span><span class="cx">         address,
</span><ins>+        BucketMode::IgnoreEmpty,
</ins><span class="cx">         [&amp;] (ThreadData* element) {
</span><span class="cx">             if (verbose)
</span><span class="cx">                 dataLog(toString(currentThread(), &quot;: Observing element with address = &quot;, RawPointer(element-&gt;address), &quot;\n&quot;));
</span><span class="lines">@@ -576,15 +619,18 @@
</span><span class="cx">                 return DequeueResult::Ignore;
</span><span class="cx">             threadDatas.append(element);
</span><span class="cx">             return DequeueResult::RemoveAndContinue;
</span><del>-        });
</del><ins>+        },
+        [] (bool) { });
</ins><span class="cx"> 
</span><span class="cx">     for (ThreadData* threadData : threadDatas) {
</span><span class="cx">         if (verbose)
</span><span class="cx">             dataLog(toString(currentThread(), &quot;: unparking &quot;, RawPointer(threadData), &quot; with address &quot;, RawPointer(threadData-&gt;address), &quot;\n&quot;));
</span><del>-        ASSERT(threadData-&gt;shouldPark);
-        std::unique_lock&lt;std::mutex&gt; locker(threadData-&gt;parkingLock);
-        threadData-&gt;shouldPark = false;
-        threadData-&gt;parkingCondition.notify_all();
</del><ins>+        ASSERT(threadData-&gt;address);
+        {
+            std::unique_lock&lt;std::mutex&gt; locker(threadData-&gt;parkingLock);
+            threadData-&gt;address = nullptr;
+        }
+        threadData-&gt;parkingCondition.notify_one();
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     if (verbose)
</span></span></pre></div>
<a id="trunkSourceWTFwtfParkingLoth"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/ParkingLot.h (188373 => 188374)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/ParkingLot.h        2015-08-13 02:12:13 UTC (rev 188373)
+++ trunk/Source/WTF/wtf/ParkingLot.h        2015-08-13 03:51:25 UTC (rev 188374)
</span><span class="lines">@@ -59,6 +59,21 @@
</span><span class="cx">     // are no more threads on the queue.
</span><span class="cx">     WTF_EXPORT_PRIVATE static bool unparkOne(const void* address);
</span><span class="cx"> 
</span><ins>+    // Unparks one thread from the queue associated with the given address, and calls the given
+    // functor while the address is locked. Reports to the callback whether any thread got unparked
+    // and whether there may be any other threads still on the queue. This is an expert-mode version
+    // of unparkOne() that allows for really good thundering herd avoidance in adaptive mutexes.
+    // Without this, a lock implementation that uses unparkOne() has to have some trick for knowing
+    // if there are still threads parked on the queue, so that it can set some bit in its lock word
+    // to indicate that the next unlock() also needs to unparkOne(). But there is a race between
+    // manipulating that bit and some other thread acquiring the lock. It's possible to work around
+    // that race - see Rusty Russel's well-known usersem library - but it's not pretty. This form
+    // allows that race to be completely avoided, since there is no way that a thread can be parked
+    // while the callback is running.
+    WTF_EXPORT_PRIVATE static void unparkOne(
+        const void* address,
+        std::function&lt;void(bool didUnparkThread, bool mayHaveMoreThreads)&gt; callback);
+
</ins><span class="cx">     // Unparks every thread from the queue associated with the given address, which cannot be null.
</span><span class="cx">     WTF_EXPORT_PRIVATE static void unparkAll(const void* address);
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceWTFwtfWordLockh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/WordLock.h (188373 => 188374)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/WordLock.h        2015-08-13 02:12:13 UTC (rev 188373)
+++ trunk/Source/WTF/wtf/WordLock.h        2015-08-13 03:51:25 UTC (rev 188374)
</span><span class="lines">@@ -31,6 +31,10 @@
</span><span class="cx"> #include &lt;wtf/Locker.h&gt;
</span><span class="cx"> #include &lt;wtf/Noncopyable.h&gt;
</span><span class="cx"> 
</span><ins>+namespace TestWebKitAPI {
+struct LockInspector;
+};
+
</ins><span class="cx"> namespace WTF {
</span><span class="cx"> 
</span><span class="cx"> // A WordLock is a fully adaptive mutex that uses sizeof(void*) storage. It has a fast path that is
</span><span class="lines">@@ -77,6 +81,8 @@
</span><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx"> private:
</span><ins>+    friend struct TestWebKitAPI::LockInspector;
+    
</ins><span class="cx">     static const uintptr_t isLockedBit = 1;
</span><span class="cx">     static const uintptr_t isQueueLockedBit = 2;
</span><span class="cx">     static const uintptr_t queueHeadMask = 3;
</span><span class="lines">@@ -84,6 +90,12 @@
</span><span class="cx">     WTF_EXPORT_PRIVATE void lockSlow();
</span><span class="cx">     WTF_EXPORT_PRIVATE void unlockSlow();
</span><span class="cx"> 
</span><ins>+    // Method used for testing only.
+    bool isFullyReset() const
+    {
+        return !m_word.load();
+    }
+
</ins><span class="cx">     Atomic&lt;uintptr_t&gt; m_word;
</span><span class="cx"> };
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkToolsChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Tools/ChangeLog (188373 => 188374)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Tools/ChangeLog        2015-08-13 02:12:13 UTC (rev 188373)
+++ trunk/Tools/ChangeLog        2015-08-13 03:51:25 UTC (rev 188374)
</span><span class="lines">@@ -1,3 +1,17 @@
</span><ins>+2015-08-12  Filip Pizlo  &lt;fpizlo@apple.com&gt;
+
+        WTF::Lock should not suffer from the thundering herd
+        https://bugs.webkit.org/show_bug.cgi?id=147947
+
+        Reviewed by Geoffrey Garen.
+
+        Add testing that checks that locks return to a pristine state after contention is over.
+
+        * TestWebKitAPI/Tests/WTF/Lock.cpp:
+        (TestWebKitAPI::LockInspector::isFullyReset):
+        (TestWebKitAPI::runLockTest):
+        (TestWebKitAPI::TEST):
+
</ins><span class="cx"> 2015-08-12  Dewei Zhu  &lt;dewei_zhu@apple.com&gt;
</span><span class="cx"> 
</span><span class="cx">         Benchmarks supported by run_benchmark script should not assume we have internet access.
</span></span></pre></div>
<a id="trunkToolsTestWebKitAPITestsWTFLockcpp"></a>
<div class="modfile"><h4>Modified: trunk/Tools/TestWebKitAPI/Tests/WTF/Lock.cpp (188373 => 188374)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Tools/TestWebKitAPI/Tests/WTF/Lock.cpp        2015-08-13 02:12:13 UTC (rev 188373)
+++ trunk/Tools/TestWebKitAPI/Tests/WTF/Lock.cpp        2015-08-13 03:51:25 UTC (rev 188374)
</span><span class="lines">@@ -33,6 +33,14 @@
</span><span class="cx"> 
</span><span class="cx"> namespace TestWebKitAPI {
</span><span class="cx"> 
</span><ins>+struct LockInspector {
+    template&lt;typename LockType&gt;
+    static bool isFullyReset(LockType&amp; lock)
+    {
+        return lock.isFullyReset();
+    }
+};
+
</ins><span class="cx"> template&lt;typename LockType&gt;
</span><span class="cx"> void runLockTest(unsigned numThreadGroups, unsigned numThreadsPerGroup, unsigned workPerCriticalSection, unsigned numIterations)
</span><span class="cx"> {
</span><span class="lines">@@ -66,6 +74,17 @@
</span><span class="cx"> 
</span><span class="cx">     for (unsigned threadGroupIndex = numThreadGroups; threadGroupIndex--;)
</span><span class="cx">         EXPECT_EQ(expected, words[threadGroupIndex]);
</span><ins>+
+    // Now test that the locks correctly reset themselves. We expect that if a single thread locks
+    // each of the locks twice in a row, then the lock should be in a pristine state.
+    for (unsigned threadGroupIndex = numThreadGroups; threadGroupIndex--;) {
+        for (unsigned i = 2; i--;) {
+            locks[threadGroupIndex].lock();
+            locks[threadGroupIndex].unlock();
+        }
+
+        EXPECT_EQ(true, LockInspector::isFullyReset(locks[threadGroupIndex]));
+    }
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> TEST(WTF_WordLock, UncontendedShortSection)
</span><span class="lines">@@ -128,6 +147,11 @@
</span><span class="cx">     runLockTest&lt;Lock&gt;(10, 10, 10000, 1000);
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+TEST(WTF_Lock, ManyContendedLongerSections)
+{
+    runLockTest&lt;Lock&gt;(10, 10, 100000, 100);
+}
+
</ins><span class="cx"> TEST(WTF_Lock, SectionAddressCollision)
</span><span class="cx"> {
</span><span class="cx">     runLockTest&lt;Lock&gt;(4, 2, 10000, 2000);
</span></span></pre>
</div>
</div>

</body>
</html>