<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[204744] trunk/Source/bmalloc</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/204744">204744</a></dd>
<dt>Author</dt> <dd>ggaren@apple.com</dd>
<dt>Date</dt> <dd>2016-08-22 16:18:09 -0700 (Mon, 22 Aug 2016)</dd>
</dl>

<h3>Log Message</h3>
<pre>bmalloc: speed up the lock slow path
https://bugs.webkit.org/show_bug.cgi?id=161058

Reviewed by Filip Pizlo.

It is generally accepted practice that a lock should yield instead of
spinning when a lock acquisition fails, to avoid wasting CPU and power.

There are two problems with this generally accepted practice:

(1) It's a fallacy that yielding is free. In reality, yielding itself
consumes CPU and power -- by performing a syscall, running the OS
scheduler, and possibly performing a context switch. (Instruments
traces of MallocBench show the cost of yielding.) Therefore, spinning a
little to avoid yielding can actually *save* CPU and power.

(2) std::this_thread_yield() on Darwin is way too aggressive: It not only
yields but also depresses your priority to absolute zero for 10ms. A
recent PLT trace showed a few spots where the main thread just gave up
on loading and rendering a page for 10ms so an unimportant background
task could run.

To correct these problems, this patch adds a little bit of spinning to
the bmalloc lock slow path.

Below are performance results on various CPUs.

Mac Pro (12 hyperthreaded cores = 24 threads):

                                                    Baseline                       Patch                           Δ
    Execution Time:
        message_one                                    173ms                       173ms                            
        message_many                                   953ms                       927ms              ^ 1.03x faster
        churn --parallel                                60ms                        41ms              ^ 1.46x faster
        list_allocate --parallel                       224ms                       143ms              ^ 1.57x faster
        tree_allocate --parallel                     1,190ms                       758ms              ^ 1.57x faster
        tree_churn --parallel                        1,517ms                       906ms              ^ 1.67x faster
        facebook --parallel                          6,519ms                     4,580ms              ^ 1.42x faster
        reddit --parallel                            5,097ms                     3,411ms              ^ 1.49x faster
        flickr --parallel                            4,903ms                     3,501ms               ^ 1.4x faster
        theverge --parallel                          6,641ms                     4,505ms              ^ 1.47x faster

        &lt;geometric mean&gt;                             1,158ms                       832ms              ^ 1.39x faster
        &lt;arithmetic mean&gt;                            2,728ms                     1,895ms              ^ 1.44x faster
        &lt;harmonic mean&gt;                                332ms                       240ms              ^ 1.38x faster

MacBook Air (2 hyperthreaded cores = 4 threads):

                                                    Baseline                       Patch                           Δ
    Execution Time:
        message_one                                    911ms                       907ms               ^ 1.0x faster
        message_many                                   515ms                       513ms               ^ 1.0x faster
        churn --parallel                               132ms                       134ms              ! 1.02x slower
        list_allocate --parallel                       104ms                       102ms              ^ 1.02x faster
        tree_allocate --parallel                       117ms                       111ms              ^ 1.05x faster
        tree_churn --parallel                          154ms                       151ms              ^ 1.02x faster
        facebook --parallel                            719ms                       687ms              ^ 1.05x faster
        reddit --parallel                              382ms                       341ms              ^ 1.12x faster
        flickr --parallel                              372ms                       345ms              ^ 1.08x faster
        theverge --parallel                            489ms                       444ms               ^ 1.1x faster

        &lt;geometric mean&gt;                               299ms                       287ms              ^ 1.04x faster
        &lt;arithmetic mean&gt;                              390ms                       374ms              ^ 1.04x faster
        &lt;harmonic mean&gt;                                227ms                       220ms              ^ 1.03x faster

iPad (2 cores = 2 threads):

    [ Doesn't run Ruby, so no pretty subtest output. ]

                                                    Baseline                       Patch                           Δ
    Execution Time:                                 174.14ms                     171.5ms              ^ 1.02x faster

* bmalloc.xcodeproj/project.pbxproj:

* bmalloc/ScopeExit.h: Added. A barebones very wimpy version of
WTF::ScopeExit.
(bmalloc::ScopeExit::ScopeExit):
(bmalloc::ScopeExit::~ScopeExit):
(bmalloc::makeScopeExit):

* bmalloc/StaticMutex.cpp:
(bmalloc::StaticMutex::lockSlowCase): Spin before yielding -- that's the
speedup. Don't spin if another CPU is already spinning. In theory, more
than one spinner accomplishes nothing, and I found that there's a cutoff
around 8 or 16 spinners that becomes performance negative on Mac Pro.

(Note: Another way to accomplish a similar result, if you don't want to
use a bit of state in the lock, is to spin for a random duration between
0 and aLot. I tested a version of WTF::WeakRandom with unsynchronized
static state and it worked great. But I ultimately opted for the explicit
bit because I thought it was clearer.)

* bmalloc/StaticMutex.h:
(bmalloc::StaticMutex::init): Initialize our new bit.

* bmalloc/ThreadSwitch.h: Added.
(bmalloc::threadSwitch): Don't call yield() on Darwin because it's too
aggressive. swtch() does what we want: Go run something else, without
any other side-effects.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourcebmallocChangeLog">trunk/Source/bmalloc/ChangeLog</a></li>
<li><a href="#trunkSourcebmallocbmallocStaticMutexcpp">trunk/Source/bmalloc/bmalloc/StaticMutex.cpp</a></li>
<li><a href="#trunkSourcebmallocbmallocStaticMutexh">trunk/Source/bmalloc/bmalloc/StaticMutex.h</a></li>
<li><a href="#trunkSourcebmallocbmallocxcodeprojprojectpbxproj">trunk/Source/bmalloc/bmalloc.xcodeproj/project.pbxproj</a></li>
</ul>

<h3>Added Paths</h3>
<ul>
<li><a href="#trunkSourcebmallocbmallocScopeExith">trunk/Source/bmalloc/bmalloc/ScopeExit.h</a></li>
<li><a href="#trunkSourcebmallocbmallocThreadSwitchh">trunk/Source/bmalloc/bmalloc/ThreadSwitch.h</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourcebmallocChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/bmalloc/ChangeLog (204743 => 204744)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/bmalloc/ChangeLog        2016-08-22 22:44:42 UTC (rev 204743)
+++ trunk/Source/bmalloc/ChangeLog        2016-08-22 23:18:09 UTC (rev 204744)
</span><span class="lines">@@ -1,3 +1,105 @@
</span><ins>+2016-08-22  Geoffrey Garen  &lt;ggaren@apple.com&gt;
+
+        bmalloc: speed up the lock slow path
+        https://bugs.webkit.org/show_bug.cgi?id=161058
+
+        Reviewed by Filip Pizlo.
+
+        It is generally accepted practice that a lock should yield instead of
+        spinning when a lock acquisition fails, to avoid wasting CPU and power.
+
+        There are two problems with this generally accepted practice:
+
+        (1) It's a fallacy that yielding is free. In reality, yielding itself
+        consumes CPU and power -- by performing a syscall, running the OS
+        scheduler, and possibly performing a context switch. (Instruments
+        traces of MallocBench show the cost of yielding.) Therefore, spinning a
+        little to avoid yielding can actually *save* CPU and power.
+
+        (2) std::this_thread_yield() on Darwin is way too aggressive: It not only
+        yields but also depresses your priority to absolute zero for 10ms. A
+        recent PLT trace showed a few spots where the main thread just gave up
+        on loading and rendering a page for 10ms so an unimportant background
+        task could run.
+
+        To correct these problems, this patch adds a little bit of spinning to
+        the bmalloc lock slow path.
+
+        Below are performance results on various CPUs.
+
+        Mac Pro (12 hyperthreaded cores = 24 threads):
+
+                                                            Baseline                       Patch                           Δ
+            Execution Time:
+                message_one                                    173ms                       173ms                            
+                message_many                                   953ms                       927ms              ^ 1.03x faster
+                churn --parallel                                60ms                        41ms              ^ 1.46x faster
+                list_allocate --parallel                       224ms                       143ms              ^ 1.57x faster
+                tree_allocate --parallel                     1,190ms                       758ms              ^ 1.57x faster
+                tree_churn --parallel                        1,517ms                       906ms              ^ 1.67x faster
+                facebook --parallel                          6,519ms                     4,580ms              ^ 1.42x faster
+                reddit --parallel                            5,097ms                     3,411ms              ^ 1.49x faster
+                flickr --parallel                            4,903ms                     3,501ms               ^ 1.4x faster
+                theverge --parallel                          6,641ms                     4,505ms              ^ 1.47x faster
+
+                &lt;geometric mean&gt;                             1,158ms                       832ms              ^ 1.39x faster
+                &lt;arithmetic mean&gt;                            2,728ms                     1,895ms              ^ 1.44x faster
+                &lt;harmonic mean&gt;                                332ms                       240ms              ^ 1.38x faster
+
+        MacBook Air (2 hyperthreaded cores = 4 threads):
+
+                                                            Baseline                       Patch                           Δ
+            Execution Time:
+                message_one                                    911ms                       907ms               ^ 1.0x faster
+                message_many                                   515ms                       513ms               ^ 1.0x faster
+                churn --parallel                               132ms                       134ms              ! 1.02x slower
+                list_allocate --parallel                       104ms                       102ms              ^ 1.02x faster
+                tree_allocate --parallel                       117ms                       111ms              ^ 1.05x faster
+                tree_churn --parallel                          154ms                       151ms              ^ 1.02x faster
+                facebook --parallel                            719ms                       687ms              ^ 1.05x faster
+                reddit --parallel                              382ms                       341ms              ^ 1.12x faster
+                flickr --parallel                              372ms                       345ms              ^ 1.08x faster
+                theverge --parallel                            489ms                       444ms               ^ 1.1x faster
+
+                &lt;geometric mean&gt;                               299ms                       287ms              ^ 1.04x faster
+                &lt;arithmetic mean&gt;                              390ms                       374ms              ^ 1.04x faster
+                &lt;harmonic mean&gt;                                227ms                       220ms              ^ 1.03x faster
+
+        iPad (2 cores = 2 threads):
+
+            [ Doesn't run Ruby, so no pretty subtest output. ]
+
+                                                            Baseline                       Patch                           Δ
+            Execution Time:                                 174.14ms                     171.5ms              ^ 1.02x faster
+
+        * bmalloc.xcodeproj/project.pbxproj:
+
+        * bmalloc/ScopeExit.h: Added. A barebones very wimpy version of
+        WTF::ScopeExit.
+        (bmalloc::ScopeExit::ScopeExit):
+        (bmalloc::ScopeExit::~ScopeExit):
+        (bmalloc::makeScopeExit):
+
+        * bmalloc/StaticMutex.cpp:
+        (bmalloc::StaticMutex::lockSlowCase): Spin before yielding -- that's the
+        speedup. Don't spin if another CPU is already spinning. In theory, more
+        than one spinner accomplishes nothing, and I found that there's a cutoff
+        around 8 or 16 spinners that becomes performance negative on Mac Pro.
+
+        (Note: Another way to accomplish a similar result, if you don't want to
+        use a bit of state in the lock, is to spin for a random duration between
+        0 and aLot. I tested a version of WTF::WeakRandom with unsynchronized
+        static state and it worked great. But I ultimately opted for the explicit
+        bit because I thought it was clearer.)
+
+        * bmalloc/StaticMutex.h:
+        (bmalloc::StaticMutex::init): Initialize our new bit.
+
+        * bmalloc/ThreadSwitch.h: Added.
+        (bmalloc::threadSwitch): Don't call yield() on Darwin because it's too
+        aggressive. swtch() does what we want: Go run something else, without
+        any other side-effects.
+
</ins><span class="cx"> 2016-08-03  Geoffrey Garen  &lt;ggaren@apple.com&gt;
</span><span class="cx"> 
</span><span class="cx">         [bmalloc] Merging of XLargeRanges can leak the upper range
</span></span></pre></div>
<a id="trunkSourcebmallocbmallocScopeExith"></a>
<div class="addfile"><h4>Added: trunk/Source/bmalloc/bmalloc/ScopeExit.h (0 => 204744)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/bmalloc/bmalloc/ScopeExit.h                                (rev 0)
+++ trunk/Source/bmalloc/bmalloc/ScopeExit.h        2016-08-22 23:18:09 UTC (rev 204744)
</span><span class="lines">@@ -0,0 +1,53 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include &lt;type_traits&gt;
+
+namespace bmalloc {
+
+template&lt;typename ExitFunction&gt;
+class ScopeExit {
+public:
+    explicit ScopeExit(ExitFunction&amp;&amp; exitFunction)
+        : m_exitFunction(exitFunction)
+    {
+    }
+
+    ~ScopeExit()
+    {
+        m_exitFunction();
+    }
+
+private:
+    ExitFunction m_exitFunction;
+};
+
+template&lt;typename ExitFunction&gt;
+ScopeExit&lt;ExitFunction&gt; makeScopeExit(ExitFunction&amp;&amp; exitFunction)
+{
+    return ScopeExit&lt;ExitFunction&gt;(std::forward&lt;ExitFunction&gt;(exitFunction));
+}
+    
+} // namespace bmalloc
</ins></span></pre></div>
<a id="trunkSourcebmallocbmallocStaticMutexcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/bmalloc/bmalloc/StaticMutex.cpp (204743 => 204744)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/bmalloc/bmalloc/StaticMutex.cpp        2016-08-22 22:44:42 UTC (rev 204743)
+++ trunk/Source/bmalloc/bmalloc/StaticMutex.cpp        2016-08-22 23:18:09 UTC (rev 204744)
</span><span class="lines">@@ -23,15 +23,31 @@
</span><span class="cx">  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
</span><span class="cx">  */
</span><span class="cx"> 
</span><ins>+#include &quot;ScopeExit.h&quot;
</ins><span class="cx"> #include &quot;StaticMutex.h&quot;
</span><del>-#include &lt;thread&gt;
</del><ins>+#include &quot;ThreadSwitch.h&quot;
</ins><span class="cx"> 
</span><span class="cx"> namespace bmalloc {
</span><span class="cx"> 
</span><span class="cx"> void StaticMutex::lockSlowCase()
</span><span class="cx"> {
</span><ins>+    // The longest critical section in bmalloc is much shorter than the
+    // time it takes to make a system call to yield to the OS scheduler.
+    // So, we try again a lot before we yield.
+    static const size_t aLot = 256;
+    
+    if (!m_isSpinning.test_and_set()) {
+        auto clear = makeScopeExit([&amp;] { m_isSpinning.clear(); });
+
+        for (size_t i = 0; i &lt; aLot; ++i) {
+            if (try_lock())
+                return;
+        }
+    }
+
+    // Avoid spinning pathologically.
</ins><span class="cx">     while (!try_lock())
</span><del>-        std::this_thread::yield();
</del><ins>+        threadSwitch();
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> } // namespace bmalloc
</span></span></pre></div>
<a id="trunkSourcebmallocbmallocStaticMutexh"></a>
<div class="modfile"><h4>Modified: trunk/Source/bmalloc/bmalloc/StaticMutex.h (204743 => 204744)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/bmalloc/bmalloc/StaticMutex.h        2016-08-22 22:44:42 UTC (rev 204743)
+++ trunk/Source/bmalloc/bmalloc/StaticMutex.h        2016-08-22 23:18:09 UTC (rev 204744)
</span><span class="lines">@@ -52,6 +52,7 @@
</span><span class="cx">     void lockSlowCase();
</span><span class="cx"> 
</span><span class="cx">     std::atomic_flag m_flag;
</span><ins>+    std::atomic_flag m_isSpinning;
</ins><span class="cx"> };
</span><span class="cx"> 
</span><span class="cx"> static inline void sleep(
</span><span class="lines">@@ -78,6 +79,7 @@
</span><span class="cx"> inline void StaticMutex::init()
</span><span class="cx"> {
</span><span class="cx">     m_flag.clear();
</span><ins>+    m_isSpinning.clear();
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> inline bool StaticMutex::try_lock()
</span></span></pre></div>
<a id="trunkSourcebmallocbmallocThreadSwitchh"></a>
<div class="addfile"><h4>Added: trunk/Source/bmalloc/bmalloc/ThreadSwitch.h (0 => 204744)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/bmalloc/bmalloc/ThreadSwitch.h                                (rev 0)
+++ trunk/Source/bmalloc/bmalloc/ThreadSwitch.h        2016-08-22 23:18:09 UTC (rev 204744)
</span><span class="lines">@@ -0,0 +1,44 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#if BOS(DARWIN)
+#include &lt;mach/thread_switch.h&gt;
+#endif
+#include &lt;thread&gt;
+
+namespace bmalloc {
+    
+inline void threadSwitch()
+{
+    // yield() on Darwin will depress your priority to absolute 0 for 10ms,
+    // and possibly clock down the CPU -- so we avoid it.
+#if BOS(DARWIN)
+    swtch();
+#else
+    std::this_thread::yield();
+#endif
+}
+
+} // namespace bmalloc
</ins></span></pre></div>
<a id="trunkSourcebmallocbmallocxcodeprojprojectpbxproj"></a>
<div class="modfile"><h4>Modified: trunk/Source/bmalloc/bmalloc.xcodeproj/project.pbxproj (204743 => 204744)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/bmalloc/bmalloc.xcodeproj/project.pbxproj        2016-08-22 22:44:42 UTC (rev 204743)
+++ trunk/Source/bmalloc/bmalloc.xcodeproj/project.pbxproj        2016-08-22 23:18:09 UTC (rev 204744)
</span><span class="lines">@@ -24,6 +24,8 @@
</span><span class="cx">                 147DC6E31CA5B70B00724E8D /* Chunk.h in Headers */ = {isa = PBXBuildFile; fileRef = 147DC6E21CA5B70B00724E8D /* Chunk.h */; settings = {ATTRIBUTES = (Private, ); }; };
</span><span class="cx">                 14895D911A3A319C0006235D /* Environment.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 14895D8F1A3A319C0006235D /* Environment.cpp */; };
</span><span class="cx">                 14895D921A3A319C0006235D /* Environment.h in Headers */ = {isa = PBXBuildFile; fileRef = 14895D901A3A319C0006235D /* Environment.h */; settings = {ATTRIBUTES = (Private, ); }; };
</span><ins>+                148EFAE81D6B953B008E721E /* ScopeExit.h in Headers */ = {isa = PBXBuildFile; fileRef = 148EFAE61D6B953B008E721E /* ScopeExit.h */; };
+                148EFAE91D6B953B008E721E /* ThreadSwitch.h in Headers */ = {isa = PBXBuildFile; fileRef = 148EFAE71D6B953B008E721E /* ThreadSwitch.h */; };
</ins><span class="cx">                 14C8992B1CC485E70027A057 /* Map.h in Headers */ = {isa = PBXBuildFile; fileRef = 14C8992A1CC485E70027A057 /* Map.h */; settings = {ATTRIBUTES = (Private, ); }; };
</span><span class="cx">                 14C8992D1CC578330027A057 /* XLargeRange.h in Headers */ = {isa = PBXBuildFile; fileRef = 14C8992C1CC578330027A057 /* XLargeRange.h */; settings = {ATTRIBUTES = (Private, ); }; };
</span><span class="cx">                 14C919C918FCC59F0028DB43 /* BPlatform.h in Headers */ = {isa = PBXBuildFile; fileRef = 14C919C818FCC59F0028DB43 /* BPlatform.h */; settings = {ATTRIBUTES = (Private, ); }; };
</span><span class="lines">@@ -110,6 +112,8 @@
</span><span class="cx">                 1485656018A43DBA00ED6942 /* ObjectType.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = ObjectType.h; path = bmalloc/ObjectType.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 14895D8F1A3A319C0006235D /* Environment.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = Environment.cpp; path = bmalloc/Environment.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 14895D901A3A319C0006235D /* Environment.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = Environment.h; path = bmalloc/Environment.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><ins>+                148EFAE61D6B953B008E721E /* ScopeExit.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = ScopeExit.h; path = bmalloc/ScopeExit.h; sourceTree = &quot;&lt;group&gt;&quot;; };
+                148EFAE71D6B953B008E721E /* ThreadSwitch.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = ThreadSwitch.h; path = bmalloc/ThreadSwitch.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</ins><span class="cx">                 14B650C518F39F4800751968 /* Base.xcconfig */ = {isa = PBXFileReference; lastKnownFileType = text.xcconfig; path = Base.xcconfig; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 14B650C618F39F4800751968 /* bmalloc.xcconfig */ = {isa = PBXFileReference; lastKnownFileType = text.xcconfig; path = bmalloc.xcconfig; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 14B650C718F39F4800751968 /* DebugRelease.xcconfig */ = {isa = PBXFileReference; lastKnownFileType = text.xcconfig; path = DebugRelease.xcconfig; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="lines">@@ -262,9 +266,11 @@
</span><span class="cx">                                 14446A0717A61FA400F9EA1D /* PerProcess.h */,
</span><span class="cx">                                 144469FD17A61F1F00F9EA1D /* PerThread.h */,
</span><span class="cx">                                 145F6878179E3A4400D65598 /* Range.h */,
</span><ins>+                                148EFAE61D6B953B008E721E /* ScopeExit.h */,
</ins><span class="cx">                                 143CB81A19022BC900B16A45 /* StaticMutex.cpp */,
</span><span class="cx">                                 143CB81B19022BC900B16A45 /* StaticMutex.h */,
</span><span class="cx">                                 1417F64F18B7280C0076FA3F /* Syscall.h */,
</span><ins>+                                148EFAE71D6B953B008E721E /* ThreadSwitch.h */,
</ins><span class="cx">                                 1479E21217A1A255006D4E9D /* Vector.h */,
</span><span class="cx">                                 1479E21417A1A63E006D4E9D /* VMAllocate.h */,
</span><span class="cx">                         );
</span><span class="lines">@@ -309,6 +315,7 @@
</span><span class="cx">                                 4426E2831C839547008EB042 /* BSoftLinking.h in Headers */,
</span><span class="cx">                                 14DD789018F48CEB00950702 /* Sizes.h in Headers */,
</span><span class="cx">                                 14DD78C718F48D7500950702 /* BAssert.h in Headers */,
</span><ins>+                                148EFAE91D6B953B008E721E /* ThreadSwitch.h in Headers */,
</ins><span class="cx">                                 14DD78D018F48D7500950702 /* VMAllocate.h in Headers */,
</span><span class="cx">                                 14DD78CE18F48D7500950702 /* Syscall.h in Headers */,
</span><span class="cx">                                 14DD78C618F48D7500950702 /* AsyncTask.h in Headers */,
</span><span class="lines">@@ -316,6 +323,7 @@
</span><span class="cx">                                 14895D921A3A319C0006235D /* Environment.h in Headers */,
</span><span class="cx">                                 1400274A18F89C2300115C97 /* VMHeap.h in Headers */,
</span><span class="cx">                                 1400274918F89C1300115C97 /* Heap.h in Headers */,
</span><ins>+                                148EFAE81D6B953B008E721E /* ScopeExit.h in Headers */,
</ins><span class="cx">                                 140FA00319CE429C00FFD3C8 /* BumpRange.h in Headers */,
</span><span class="cx">                                 4426E2811C838EE0008EB042 /* Logging.h in Headers */,
</span><span class="cx">                                 14DD78C518F48D7500950702 /* Algorithm.h in Headers */,
</span></span></pre>
</div>
</div>

</body>
</html>