<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[214187] trunk/Source/JavaScriptCore</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/214187">214187</a></dd>
<dt>Author</dt> <dd>fpizlo@apple.com</dd>
<dt>Date</dt> <dd>2017-03-20 11:58:59 -0700 (Mon, 20 Mar 2017)</dd>
</dl>

<h3>Log Message</h3>
<pre>Graph coloring should use coalescable moves when spilling
https://bugs.webkit.org/show_bug.cgi?id=169820

Reviewed by Michael Saboff.

This makes our graph coloring register allocator use a new family of move instructions when
spilling both operands of the move. It's a three-operand move:

    Move (src), (dst), %scratch

Previously, if both operands got spilled, we would emit a new instruction to load or store that
spill slot. But this made it hard for allocateStack to see that the two spill locations are
coalescable. This new kind of instruction makes it obvious that it's a coalescable move.

This change implements the coalescing of spill slots inside allocateStack.

This is an outrageous speed-up on the tsf_ir_speed benchmark from http://filpizlo.com/tsf/. This
is an interesting benchmark because it has a super ugly interpreter loop with ~20 live variables
carried around the loop back edge. This change makes that interpreter run 5x faster.

This isn't a speed-up on any other benchmarks. It also doesn't regress anything. Compile time is
neither progressed or regressed, since the coalescing is super cheap, and this does not add any
significant new machinery to the register allocator (it's just a small change to spill codegen).
Overall on our wasm benchmarks, this is a 16% throughput progression.

* assembler/MacroAssembler.h:
(JSC::MacroAssembler::move):
(JSC::MacroAssembler::move32):
(JSC::MacroAssembler::moveFloat):
(JSC::MacroAssembler::moveDouble):
* b3/air/AirAllocateRegistersByGraphColoring.cpp:
(JSC::B3::Air::allocateRegistersByGraphColoring):
* b3/air/AirAllocateStack.cpp:
(JSC::B3::Air::allocateStack):
* b3/air/AirInst.cpp:
(JSC::B3::Air::Inst::hasEarlyDef):
(JSC::B3::Air::Inst::hasLateUseOrDef):
(JSC::B3::Air::Inst::needsPadding):
* b3/air/AirInst.h:
* b3/air/AirOpcode.opcodes:
* b3/air/AirPadInterference.cpp:
(JSC::B3::Air::padInterference):
* runtime/Options.h:</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceJavaScriptCoreChangeLog">trunk/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerMacroAssemblerh">trunk/Source/JavaScriptCore/assembler/MacroAssembler.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirAllocateRegistersByGraphColoringcpp">trunk/Source/JavaScriptCore/b3/air/AirAllocateRegistersByGraphColoring.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirAllocateStackcpp">trunk/Source/JavaScriptCore/b3/air/AirAllocateStack.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirInstcpp">trunk/Source/JavaScriptCore/b3/air/AirInst.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirInsth">trunk/Source/JavaScriptCore/b3/air/AirInst.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirOpcodeopcodes">trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirPadInterferencecpp">trunk/Source/JavaScriptCore/b3/air/AirPadInterference.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreruntimeOptionsh">trunk/Source/JavaScriptCore/runtime/Options.h</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ChangeLog (214186 => 214187)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ChangeLog        2017-03-20 18:51:56 UTC (rev 214186)
+++ trunk/Source/JavaScriptCore/ChangeLog        2017-03-20 18:58:59 UTC (rev 214187)
</span><span class="lines">@@ -1,3 +1,49 @@
</span><ins>+2017-03-20  Filip Pizlo  &lt;fpizlo@apple.com&gt;
+
+        Graph coloring should use coalescable moves when spilling
+        https://bugs.webkit.org/show_bug.cgi?id=169820
+
+        Reviewed by Michael Saboff.
+        
+        This makes our graph coloring register allocator use a new family of move instructions when
+        spilling both operands of the move. It's a three-operand move:
+        
+            Move (src), (dst), %scratch
+        
+        Previously, if both operands got spilled, we would emit a new instruction to load or store that
+        spill slot. But this made it hard for allocateStack to see that the two spill locations are
+        coalescable. This new kind of instruction makes it obvious that it's a coalescable move.
+        
+        This change implements the coalescing of spill slots inside allocateStack.
+        
+        This is an outrageous speed-up on the tsf_ir_speed benchmark from http://filpizlo.com/tsf/. This
+        is an interesting benchmark because it has a super ugly interpreter loop with ~20 live variables
+        carried around the loop back edge. This change makes that interpreter run 5x faster.
+        
+        This isn't a speed-up on any other benchmarks. It also doesn't regress anything. Compile time is
+        neither progressed or regressed, since the coalescing is super cheap, and this does not add any
+        significant new machinery to the register allocator (it's just a small change to spill codegen).
+        Overall on our wasm benchmarks, this is a 16% throughput progression.
+        
+        * assembler/MacroAssembler.h:
+        (JSC::MacroAssembler::move):
+        (JSC::MacroAssembler::move32):
+        (JSC::MacroAssembler::moveFloat):
+        (JSC::MacroAssembler::moveDouble):
+        * b3/air/AirAllocateRegistersByGraphColoring.cpp:
+        (JSC::B3::Air::allocateRegistersByGraphColoring):
+        * b3/air/AirAllocateStack.cpp:
+        (JSC::B3::Air::allocateStack):
+        * b3/air/AirInst.cpp:
+        (JSC::B3::Air::Inst::hasEarlyDef):
+        (JSC::B3::Air::Inst::hasLateUseOrDef):
+        (JSC::B3::Air::Inst::needsPadding):
+        * b3/air/AirInst.h:
+        * b3/air/AirOpcode.opcodes:
+        * b3/air/AirPadInterference.cpp:
+        (JSC::B3::Air::padInterference):
+        * runtime/Options.h:
+
</ins><span class="cx"> 2017-03-19  Chris Dumez  &lt;cdumez@apple.com&gt;
</span><span class="cx"> 
</span><span class="cx">         `const location = &quot;foo&quot;` throws in a worker
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerMacroAssemblerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/MacroAssembler.h (214186 => 214187)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/MacroAssembler.h        2017-03-20 18:51:56 UTC (rev 214186)
+++ trunk/Source/JavaScriptCore/assembler/MacroAssembler.h        2017-03-20 18:58:59 UTC (rev 214187)
</span><span class="lines">@@ -111,6 +111,7 @@
</span><span class="cx">     using MacroAssemblerBase::branch32;
</span><span class="cx">     using MacroAssemblerBase::compare32;
</span><span class="cx">     using MacroAssemblerBase::move;
</span><ins>+    using MacroAssemblerBase::moveDouble;
</ins><span class="cx">     using MacroAssemblerBase::add32;
</span><span class="cx">     using MacroAssemblerBase::mul32;
</span><span class="cx">     using MacroAssemblerBase::and32;
</span><span class="lines">@@ -487,6 +488,30 @@
</span><span class="cx">         return !(random() &amp; (BlindingModulus - 1));
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void move(Address src, Address dest, RegisterID scratch)
+    {
+        loadPtr(src, scratch);
+        storePtr(scratch, dest);
+    }
+    
+    void move32(Address src, Address dest, RegisterID scratch)
+    {
+        load32(src, scratch);
+        store32(scratch, dest);
+    }
+    
+    void moveFloat(Address src, Address dest, FPRegisterID scratch)
+    {
+        loadFloat(src, scratch);
+        storeFloat(scratch, dest);
+    }
+    
+    void moveDouble(Address src, Address dest, FPRegisterID scratch)
+    {
+        loadDouble(src, scratch);
+        storeDouble(scratch, dest);
+    }
+
</ins><span class="cx">     // Ptr methods
</span><span class="cx">     // On 32-bit platforms (i.e. x86), these methods directly map onto their 32-bit equivalents.
</span><span class="cx">     // FIXME: should this use a test for 32-bitness instead of this specific exception?
</span><span class="lines">@@ -648,7 +673,7 @@
</span><span class="cx">     {
</span><span class="cx">         move(Imm32(imm.asTrustedImmPtr()), dest);
</span><span class="cx">     }
</span><del>-
</del><ins>+    
</ins><span class="cx">     void comparePtr(RelationalCondition cond, RegisterID left, TrustedImm32 right, RegisterID dest)
</span><span class="cx">     {
</span><span class="cx">         compare32(cond, left, right, dest);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirAllocateRegistersByGraphColoringcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirAllocateRegistersByGraphColoring.cpp (214186 => 214187)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirAllocateRegistersByGraphColoring.cpp        2017-03-20 18:51:56 UTC (rev 214186)
+++ trunk/Source/JavaScriptCore/b3/air/AirAllocateRegistersByGraphColoring.cpp        2017-03-20 18:58:59 UTC (rev 214187)
</span><span class="lines">@@ -1750,7 +1750,9 @@
</span><span class="cx">             break;
</span><span class="cx">         }
</span><span class="cx"> 
</span><del>-        ASSERT_WITH_MESSAGE(inst.args.size() == 2, &quot;We assume coalecable moves only have two arguments in a few places.&quot;);
</del><ins>+        // Avoid the three-argument coalescable spill moves.
+        if (inst.args.size() != 2)
+            return false;
</ins><span class="cx"> 
</span><span class="cx">         if (!inst.args[0].isTmp() || !inst.args[1].isTmp())
</span><span class="cx">             return false;
</span><span class="lines">@@ -2015,6 +2017,7 @@
</span><span class="cx">                 // Move is the canonical way to move data between GPRs.
</span><span class="cx">                 bool canUseMove32IfDidSpill = false;
</span><span class="cx">                 bool didSpill = false;
</span><ins>+                bool needScratch = false;
</ins><span class="cx">                 if (bank == GP &amp;&amp; inst.kind.opcode == Move) {
</span><span class="cx">                     if ((inst.args[0].isTmp() &amp;&amp; m_tmpWidth.width(inst.args[0].tmp()) &lt;= Width32)
</span><span class="cx">                         || (inst.args[1].isTmp() &amp;&amp; m_tmpWidth.width(inst.args[1].tmp()) &lt;= Width32))
</span><span class="lines">@@ -2034,8 +2037,30 @@
</span><span class="cx">                         auto stackSlotEntry = stackSlots.find(arg.tmp());
</span><span class="cx">                         if (stackSlotEntry == stackSlots.end())
</span><span class="cx">                             return;
</span><del>-                        if (!inst.admitsStack(arg))
-                            return;
</del><ins>+                        bool needScratchIfSpilledInPlace = false;
+                        if (!inst.admitsStack(arg)) {
+                            if (traceDebug)
+                                dataLog(&quot;Have an inst that won't admit stack: &quot;, inst, &quot;\n&quot;);
+                            switch (inst.kind.opcode) {
+                            case Move:
+                            case MoveDouble:
+                            case MoveFloat:
+                            case Move32: {
+                                unsigned argIndex = &amp;arg - &amp;inst.args[0];
+                                unsigned otherArgIndex = argIndex ^ 1;
+                                Arg otherArg = inst.args[otherArgIndex];
+                                if (inst.args.size() == 2
+                                    &amp;&amp; otherArg.isStack()
+                                    &amp;&amp; otherArg.stackSlot()-&gt;isSpill()) {
+                                    needScratchIfSpilledInPlace = true;
+                                    break;
+                                }
+                                return;
+                            }
+                            default:
+                                return;
+                            }
+                        }
</ins><span class="cx">                         
</span><span class="cx">                         // If the Tmp holds a constant then we want to rematerialize its
</span><span class="cx">                         // value rather than loading it from the stack. In order for that
</span><span class="lines">@@ -2048,8 +2073,20 @@
</span><span class="cx">                         }
</span><span class="cx">                         
</span><span class="cx">                         Width spillWidth = m_tmpWidth.requiredWidth(arg.tmp());
</span><del>-                        if (Arg::isAnyDef(role) &amp;&amp; width &lt; spillWidth)
</del><ins>+                        if (Arg::isAnyDef(role) &amp;&amp; width &lt; spillWidth) {
+                            // Either there are users of this tmp who will use more than width,
+                            // or there are producers who will produce more than width non-zero
+                            // bits.
+                            // FIXME: It's not clear why we should have to return here. We have
+                            // a ZDef fixup in allocateStack. And if this isn't a ZDef, then it
+                            // doesn't seem like it matters what happens to the high bits. Note
+                            // that this isn't the case where we're storing more than what the
+                            // spill slot can hold - we already got that covered because we
+                            // stretch the spill slot on demand. One possibility is that it's ZDefs of
+                            // smaller width than 32-bit.
+                            // https://bugs.webkit.org/show_bug.cgi?id=169823
</ins><span class="cx">                             return;
</span><ins>+                        }
</ins><span class="cx">                         ASSERT(inst.kind.opcode == Move || !(Arg::isAnyUse(role) &amp;&amp; width &gt; spillWidth));
</span><span class="cx">                         
</span><span class="cx">                         if (spillWidth != Width32)
</span><span class="lines">@@ -2059,11 +2096,48 @@
</span><span class="cx">                             canUseMove32IfDidSpill ? 4 : bytes(width));
</span><span class="cx">                         arg = Arg::stack(stackSlotEntry-&gt;value);
</span><span class="cx">                         didSpill = true;
</span><ins>+                        if (needScratchIfSpilledInPlace)
+                            needScratch = true;
</ins><span class="cx">                     });
</span><span class="cx"> 
</span><span class="cx">                 if (didSpill &amp;&amp; canUseMove32IfDidSpill)
</span><span class="cx">                     inst.kind.opcode = Move32;
</span><del>-
</del><ins>+                
+                if (needScratch) {
+                    Bank instBank;
+                    switch (inst.kind.opcode) {
+                    case Move:
+                    case Move32:
+                        instBank = GP;
+                        break;
+                    case MoveDouble:
+                    case MoveFloat:
+                        instBank = FP;
+                        break;
+                    default:
+                        RELEASE_ASSERT_NOT_REACHED();
+                        instBank = GP;
+                        break;
+                    }
+                    
+                    RELEASE_ASSERT(instBank == bank);
+                    
+                    Tmp tmp = m_code.newTmp(bank);
+                    unspillableTmps.add(AbsoluteTmpMapper&lt;bank&gt;::absoluteIndex(tmp));
+                    inst.args.append(tmp);
+                    RELEASE_ASSERT(inst.args.size() == 3);
+                    
+                    // Without this, a chain of spill moves would need two registers, not one, because
+                    // the scratch registers of successive moves would appear to interfere with each
+                    // other. As well, we need this if the previous instruction had any late effects,
+                    // since otherwise the scratch would appear to interfere with those. On the other
+                    // hand, the late use added at the end of this spill move (previously it was just a
+                    // late def) doesn't change the padding situation: the late def would have already
+                    // caused it to report hasLateUseOrDef in Inst::needsPadding.
+                    insertionSet.insert(instIndex, Nop, inst.origin);
+                    continue;
+                }
+                
</ins><span class="cx">                 // For every other case, add Load/Store as needed.
</span><span class="cx">                 inst.forEachTmp([&amp;] (Tmp&amp; tmp, Arg::Role role, Bank argBank, Width) {
</span><span class="cx">                     if (tmp.isReg() || argBank != bank)
</span><span class="lines">@@ -2184,6 +2258,9 @@
</span><span class="cx"> void allocateRegistersByGraphColoring(Code&amp; code)
</span><span class="cx"> {
</span><span class="cx">     PhaseScope phaseScope(code, &quot;Air::allocateRegistersByGraphColoring&quot;);
</span><ins>+    
+    if (false)
+        dataLog(&quot;Code before graph coloring:\n&quot;, code);
</ins><span class="cx"> 
</span><span class="cx">     UseCounts&lt;Tmp&gt; useCounts(code);
</span><span class="cx">     GraphColoringRegisterAllocation graphColoringRegisterAllocation(code, useCounts);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirAllocateStackcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirAllocateStack.cpp (214186 => 214187)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirAllocateStack.cpp        2017-03-20 18:51:56 UTC (rev 214186)
+++ trunk/Source/JavaScriptCore/b3/air/AirAllocateStack.cpp        2017-03-20 18:58:59 UTC (rev 214187)
</span><span class="lines">@@ -90,6 +90,45 @@
</span><span class="cx">     RELEASE_ASSERT_NOT_REACHED();
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+struct CoalescableMove {
+    CoalescableMove()
+    {
+    }
+
+    CoalescableMove(StackSlot* src, StackSlot* dst, double frequency)
+        : src(src)
+        , dst(dst)
+        , frequency(frequency)
+    {
+    }
+    
+    bool operator==(const CoalescableMove&amp; other) const
+    {
+        return src == other.src
+            &amp;&amp; dst == other.dst
+            &amp;&amp; frequency == other.frequency;
+    }
+    
+    bool operator!=(const CoalescableMove&amp; other) const
+    {
+        return !(*this == other);
+    }
+    
+    explicit operator bool() const
+    {
+        return *this != CoalescableMove();
+    }
+    
+    void dump(PrintStream&amp; out) const
+    {
+        out.print(pointerDump(src), &quot;-&gt;&quot;, pointerDump(dst), &quot;(&quot;, frequency, &quot;)&quot;);
+    }
+    
+    StackSlot* src { nullptr };
+    StackSlot* dst { nullptr };
+    double frequency { PNaN };
+};
+
</ins><span class="cx"> } // anonymous namespace
</span><span class="cx"> 
</span><span class="cx"> void allocateStack(Code&amp; code)
</span><span class="lines">@@ -125,6 +164,56 @@
</span><span class="cx">     IndexMap&lt;StackSlot, HashSet&lt;StackSlot*&gt;&gt; interference(code.stackSlots().size());
</span><span class="cx">     Vector&lt;StackSlot*&gt; slots;
</span><span class="cx"> 
</span><ins>+    // We will perform some spill coalescing. To make that effective, we need to be able to identify
+    // coalescable moves and handle them specially in interference analysis.
+    auto isCoalescableMove = [&amp;] (Inst&amp; inst) -&gt; bool {
+        Width width;
+        switch (inst.kind.opcode) {
+        case Move:
+            width = pointerWidth();
+            break;
+        case Move32:
+        case MoveFloat:
+            width = Width32;
+            break;
+        case MoveDouble:
+            width = Width64;
+            break;
+        default:
+            return false;
+        }
+        
+        if (!Options::coalesceSpillSlots())
+            return false;
+        
+        if (inst.args.size() != 3)
+            return false;
+        
+        for (unsigned i = 0; i &lt; 2; ++i) {
+            Arg arg = inst.args[i];
+            if (!arg.isStack())
+                return false;
+            StackSlot* slot = arg.stackSlot();
+            if (slot-&gt;kind() != StackSlotKind::Spill)
+                return false;
+            if (slot-&gt;byteSize() != bytes(width))
+                return false;
+        }
+        
+        return true;
+    };
+    
+    auto isUselessMove = [&amp;] (Inst&amp; inst) -&gt; bool {
+        return isCoalescableMove(inst) &amp;&amp; inst.args[0] == inst.args[1];
+    };
+    
+    auto addEdge = [&amp;] (StackSlot* a, StackSlot* b) {
+        interference[a].add(b);
+        interference[b].add(a);
+    };
+    
+    Vector&lt;CoalescableMove&gt; coalescableMoves;
+
</ins><span class="cx">     for (BasicBlock* block : code) {
</span><span class="cx">         StackSlotLiveness::LocalCalc localCalc(liveness, block);
</span><span class="cx"> 
</span><span class="lines">@@ -132,8 +221,22 @@
</span><span class="cx">             if (verbose)
</span><span class="cx">                 dataLog(&quot;Interfering: &quot;, WTF::pointerListDump(localCalc.live()), &quot;\n&quot;);
</span><span class="cx"> 
</span><ins>+            Inst* prevInst = block-&gt;get(instIndex);
+            Inst* nextInst = block-&gt;get(instIndex + 1);
+            if (prevInst &amp;&amp; isCoalescableMove(*prevInst)) {
+                CoalescableMove move(prevInst-&gt;args[0].stackSlot(), prevInst-&gt;args[1].stackSlot(), block-&gt;frequency());
+                
+                coalescableMoves.append(move);
+                
+                for (StackSlot* otherSlot : localCalc.live()) {
+                    if (otherSlot != move.src)
+                        addEdge(move.dst, otherSlot);
+                }
+                
+                prevInst = nullptr;
+            }
</ins><span class="cx">             Inst::forEachDef&lt;Arg&gt;(
</span><del>-                block-&gt;get(instIndex), block-&gt;get(instIndex + 1),
</del><ins>+                prevInst, nextInst,
</ins><span class="cx">                 [&amp;] (Arg&amp; arg, Arg::Role, Bank, Width) {
</span><span class="cx">                     if (!arg.isStack())
</span><span class="cx">                         return;
</span><span class="lines">@@ -141,10 +244,8 @@
</span><span class="cx">                     if (slot-&gt;kind() != StackSlotKind::Spill)
</span><span class="cx">                         return;
</span><span class="cx"> 
</span><del>-                    for (StackSlot* otherSlot : localCalc.live()) {
-                        interference[slot].add(otherSlot);
-                        interference[otherSlot].add(slot);
-                    }
</del><ins>+                    for (StackSlot* otherSlot : localCalc.live())
+                        addEdge(slot, otherSlot);
</ins><span class="cx">                 });
</span><span class="cx">         };
</span><span class="cx"> 
</span><span class="lines">@@ -200,12 +301,67 @@
</span><span class="cx">         for (StackSlot* slot : code.stackSlots())
</span><span class="cx">             dataLog(&quot;Interference of &quot;, pointerDump(slot), &quot;: &quot;, pointerListDump(interference[slot]), &quot;\n&quot;);
</span><span class="cx">     }
</span><del>-
</del><ins>+    
+    // Now try to coalesce some moves.
+    std::sort(
+        coalescableMoves.begin(), coalescableMoves.end(),
+        [&amp;] (CoalescableMove&amp; a, CoalescableMove&amp; b) -&gt; bool {
+            return a.frequency &gt; b.frequency;
+        });
+    
+    IndexMap&lt;StackSlot, StackSlot*&gt; remappedStackSlots(code.stackSlots().size());
+    auto remap = [&amp;] (StackSlot* slot) -&gt; StackSlot* {
+        if (!slot)
+            return nullptr;
+        for (;;) {
+            StackSlot* remappedSlot = remappedStackSlots[slot];
+            if (!remappedSlot)
+                return slot;
+            slot = remappedSlot;
+        }
+    };
+    
+    for (CoalescableMove&amp; move : coalescableMoves) {
+        move.src = remap(move.src);
+        move.dst = remap(move.dst);
+        if (move.src == move.dst)
+            continue;
+        if (interference[move.src].contains(move.dst))
+            continue;
+        
+        StackSlot* slotToKill = move.src;
+        StackSlot* slotToKeep = move.dst;
+        
+        remappedStackSlots[slotToKill] = slotToKeep;
+        for (StackSlot* interferingSlot : interference[slotToKill]) {
+            if (interferingSlot == slotToKill)
+                continue;
+            interference[interferingSlot].remove(slotToKill);
+            interference[interferingSlot].add(slotToKeep);
+        }
+        interference[slotToKeep].add(interference[slotToKill].begin(), interference[slotToKill].end());
+        interference[slotToKill].clear();
+    }
+    
+    for (BasicBlock* block : code) {
+        for (Inst&amp; inst : *block) {
+            for (Arg&amp; arg : inst.args) {
+                if (arg.isStack())
+                    arg = Arg::stack(remap(arg.stackSlot()), arg.offset());
+            }
+            if (isUselessMove(inst))
+                inst = Inst();
+        }
+    }
+    
</ins><span class="cx">     // Now we assign stack locations. At its heart this algorithm is just first-fit. For each
</span><span class="cx">     // StackSlot we just want to find the offsetFromFP that is closest to zero while ensuring no
</span><span class="cx">     // overlap with other StackSlots that this overlaps with.
</span><span class="cx">     Vector&lt;StackSlot*&gt; otherSlots = assignedEscapedStackSlots;
</span><span class="cx">     for (StackSlot* slot : code.stackSlots()) {
</span><ins>+        if (remappedStackSlots[slot])
+            continue;
+        
</ins><span class="cx">         if (slot-&gt;offsetFromFP()) {
</span><span class="cx">             // Already assigned an offset.
</span><span class="cx">             continue;
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirInstcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirInst.cpp (214186 => 214187)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirInst.cpp        2017-03-20 18:51:56 UTC (rev 214186)
+++ trunk/Source/JavaScriptCore/b3/air/AirInst.cpp        2017-03-20 18:58:59 UTC (rev 214187)
</span><span class="lines">@@ -34,6 +34,35 @@
</span><span class="cx"> 
</span><span class="cx"> namespace JSC { namespace B3 { namespace Air {
</span><span class="cx"> 
</span><ins>+bool Inst::hasEarlyDef()
+{
+    if (kind.opcode == Patch &amp;&amp; !extraEarlyClobberedRegs().isEmpty())
+        return true;
+    bool result = false;
+    forEachArg(
+        [&amp;] (Arg&amp;, Arg::Role role, Bank, Width) {
+            result |= Arg::isEarlyDef(role);
+        });
+    return result;
+}
+
+bool Inst::hasLateUseOrDef()
+{
+    if (kind.opcode == Patch &amp;&amp; !extraClobberedRegs().isEmpty())
+        return true;
+    bool result = false;
+    forEachArg(
+        [&amp;] (Arg&amp;, Arg::Role role, Bank, Width) {
+            result |= Arg::isLateUse(role) || Arg::isLateDef(role);
+        });
+    return result;
+}
+
+bool Inst::needsPadding(Inst* prevInst, Inst* nextInst)
+{
+    return prevInst &amp;&amp; nextInst &amp;&amp; prevInst-&gt;hasLateUseOrDef() &amp;&amp; nextInst-&gt;hasEarlyDef();
+}
+
</ins><span class="cx"> bool Inst::hasArgEffects()
</span><span class="cx"> {
</span><span class="cx">     bool result = false;
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirInsth"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirInst.h (214186 => 214187)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirInst.h        2017-03-20 18:51:56 UTC (rev 214186)
+++ trunk/Source/JavaScriptCore/b3/air/AirInst.h        2017-03-20 18:58:59 UTC (rev 214187)
</span><span class="lines">@@ -143,7 +143,14 @@
</span><span class="cx">     // registers. Note that Thing can only be Arg or Tmp when you use this functor.
</span><span class="cx">     template&lt;typename Thing, typename Functor&gt;
</span><span class="cx">     static void forEachDefWithExtraClobberedRegs(Inst* prevInst, Inst* nextInst, const Functor&amp;);
</span><del>-
</del><ins>+    
+    // Some summaries about all arguments. These are useful for needsPadding().
+    bool hasEarlyDef();
+    bool hasLateUseOrDef();
+    
+    // Check if there needs to be a padding Nop between these two instructions.
+    static bool needsPadding(Inst* prevInst, Inst* nextInst);
+    
</ins><span class="cx">     // Use this to report which registers are live. This should be done just before codegen. Note
</span><span class="cx">     // that for efficiency, reportUsedRegisters() only works for the Patch opcode.
</span><span class="cx">     void reportUsedRegisters(const RegisterSet&amp;);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirOpcodeopcodes"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes (214186 => 214187)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes        2017-03-20 18:51:56 UTC (rev 214186)
+++ trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes        2017-03-20 18:58:59 UTC (rev 214187)
</span><span class="lines">@@ -623,6 +623,10 @@
</span><span class="cx">     Tmp, Index as storePtr
</span><span class="cx">     x86: Imm, Addr as storePtr
</span><span class="cx"> 
</span><ins>+# This is for moving between spill slots.
+Move U:G:Ptr, D:G:Ptr, S:G:Ptr
+    Addr, Addr, Tmp
+
</ins><span class="cx"> x86: Swap32 UD:G:32, UD:G:32
</span><span class="cx">     Tmp, Tmp
</span><span class="cx">     Tmp, Addr
</span><span class="lines">@@ -641,6 +645,10 @@
</span><span class="cx">     x86: Imm, Addr as store32
</span><span class="cx">     x86: Imm, Index as store32
</span><span class="cx"> 
</span><ins>+# This is for moving between spill slots.
+Move32 U:G:32, ZD:G:32, S:G:32
+    Addr, Addr, Tmp
+
</ins><span class="cx"> StoreZero32 U:G:32
</span><span class="cx">     Addr
</span><span class="cx">     Index
</span><span class="lines">@@ -675,6 +683,9 @@
</span><span class="cx">     Tmp, Addr as storeFloat
</span><span class="cx">     Tmp, Index as storeFloat
</span><span class="cx"> 
</span><ins>+MoveFloat U:F:32, D:F:32, S:F:32
+    Addr, Addr, Tmp
+
</ins><span class="cx"> MoveDouble U:F:64, D:F:64
</span><span class="cx">     Tmp, Tmp
</span><span class="cx">     Addr, Tmp as loadDouble
</span><span class="lines">@@ -682,6 +693,9 @@
</span><span class="cx">     Tmp, Addr as storeDouble
</span><span class="cx">     Tmp, Index as storeDouble
</span><span class="cx"> 
</span><ins>+MoveDouble U:F:64, D:F:64, S:F:64
+    Addr, Addr, Tmp
+
</ins><span class="cx"> MoveZeroToDouble D:F:64
</span><span class="cx">     Tmp
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirPadInterferencecpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirPadInterference.cpp (214186 => 214187)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirPadInterference.cpp        2017-03-20 18:51:56 UTC (rev 214186)
+++ trunk/Source/JavaScriptCore/b3/air/AirPadInterference.cpp        2017-03-20 18:58:59 UTC (rev 214187)
</span><span class="lines">@@ -38,46 +38,10 @@
</span><span class="cx"> {
</span><span class="cx">     InsertionSet insertionSet(code);
</span><span class="cx">     for (BasicBlock* block : code) {
</span><del>-        bool prevHadLate = false;
-        for (unsigned instIndex = 0; instIndex &lt; block-&gt;size(); ++instIndex) {
</del><ins>+        for (unsigned instIndex = 1; instIndex &lt; block-&gt;size(); ++instIndex) {
</ins><span class="cx">             Inst&amp; inst = block-&gt;at(instIndex);
</span><del>-            
-            bool hasEarlyDef = false;
-            bool hasLate = false;
-            inst.forEachArg(
-                [&amp;] (Arg&amp;, Arg::Role role, Bank, Width) {
-                    switch (role) {
-                    case Arg::EarlyDef:
-                    case Arg::EarlyZDef:
-                        hasEarlyDef = true;
-                        break;
-                    case Arg::LateUse:
-                    case Arg::Def:
-                    case Arg::ZDef:
-                    case Arg::LateColdUse:
-                    case Arg::UseDef:
-                    case Arg::UseZDef:
-                        hasLate = true;
-                        break;
-                    case Arg::Scratch:
-                        hasEarlyDef = true;
-                        hasLate = true;
-                        break;
-                    case Arg::Use:
-                    case Arg::ColdUse:
-                    case Arg::UseAddr:
-                        break;
-                    }
-                });
-            if (inst.kind.opcode == Patch) {
-                hasEarlyDef |= !inst.extraEarlyClobberedRegs().isEmpty();
-                hasLate |= !inst.extraClobberedRegs().isEmpty();
-            }
-            
-            if (hasEarlyDef &amp;&amp; prevHadLate)
</del><ins>+            if (Inst::needsPadding(&amp;block-&gt;at(instIndex - 1), &amp;inst))
</ins><span class="cx">                 insertionSet.insert(instIndex, Nop, inst.origin);
</span><del>-            
-            prevHadLate = hasLate;
</del><span class="cx">         }
</span><span class="cx">         insertionSet.execute(block);
</span><span class="cx">     }
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreruntimeOptionsh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/runtime/Options.h (214186 => 214187)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/runtime/Options.h        2017-03-20 18:51:56 UTC (rev 214186)
+++ trunk/Source/JavaScriptCore/runtime/Options.h        2017-03-20 18:58:59 UTC (rev 214187)
</span><span class="lines">@@ -397,6 +397,7 @@
</span><span class="cx">     v(bool, airSpillsEverything, false, Normal, nullptr) \
</span><span class="cx">     v(bool, airForceBriggsAllocator, false, Normal, nullptr) \
</span><span class="cx">     v(bool, airForceIRCAllocator, false, Normal, nullptr) \
</span><ins>+    v(bool, coalesceSpillSlots, true, Normal, nullptr) \
</ins><span class="cx">     v(bool, logAirRegisterPressure, false, Normal, nullptr) \
</span><span class="cx">     v(unsigned, maxB3TailDupBlockSize, 3, Normal, nullptr) \
</span><span class="cx">     v(unsigned, maxB3TailDupBlockSuccessors, 3, Normal, nullptr) \
</span></span></pre>
</div>
</div>

</body>
</html>