<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[192946] trunk/Source/JavaScriptCore</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/192946">192946</a></dd>
<dt>Author</dt> <dd>commit-queue@webkit.org</dd>
<dt>Date</dt> <dd>2015-12-02 10:49:09 -0800 (Wed, 02 Dec 2015)</dd>
</dl>

<h3>Log Message</h3>
<pre>[JSC] Handle x86 partial register stalls in Air
https://bugs.webkit.org/show_bug.cgi?id=151735

Patch by Benjamin Poulain &lt;bpoulain@apple.com&gt; on 2015-12-02
Reviewed by Filip Pizlo.

This patch adds a primitive false-dependency breaking
algorithm to Air. We look for redefinition of the same
variable that is too close to a partial definition.

There is not explicit dependency tracking going on,
but it is pretty fast and the extra xorps added on false-positives
are cheap anyway.

Typically, partial register stalls appear from instructions
interfering with themselves in small loops. Something like:

  Label0:
    cvtsi2sdq %eax, %xmm0
    ...
    jmp Label0

Those are correctly detected by propagating the local distance
information from block to block until no unsafe chain is found.

The test testInt32ToDoublePartialRegisterStall() checks the kind
of cases we typically find from JavaScript.
The execution time is 20% faster with a register reset (which is
astounding since the very next instruction has a real dependency).

Future tweaks will be needed when we can run more JavaScript:
-Handle function calls differently.
-Anything with a special can have hidden instructions.
 We need to take them into account.

* JavaScriptCore.xcodeproj/project.pbxproj:
* assembler/MacroAssemblerX86Common.h:
(JSC::MacroAssemblerX86Common::moveZeroToDouble):
* assembler/X86Assembler.h:
(JSC::X86Assembler::xorps_rr):
(JSC::X86Assembler::xorpd_rr):
According to the documentation, starting with Sandy Bridge,
registers reset can be done in the frontend with xorps.

* b3/B3IndexSet.h:
(JSC::B3::IndexSet::remove):
* b3/air/AirFixPartialRegisterStalls.cpp: Added.
(JSC::B3::Air::fixPartialRegisterStalls):
* b3/air/AirFixPartialRegisterStalls.h: Added.
* b3/air/AirGenerate.cpp:
(JSC::B3::Air::prepareForGeneration):
* b3/testb3.cpp:
(JSC::B3::testInt32ToDoublePartialRegisterStall):
(JSC::B3::run):
* jit/FPRInfo.h:</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceJavaScriptCoreChangeLog">trunk/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreJavaScriptCorexcodeprojprojectpbxproj">trunk/Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerMacroAssemblerX86Commonh">trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerX86Assemblerh">trunk/Source/JavaScriptCore/assembler/X86Assembler.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3B3IndexSeth">trunk/Source/JavaScriptCore/b3/B3IndexSet.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirGeneratecpp">trunk/Source/JavaScriptCore/b3/air/AirGenerate.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3testb3cpp">trunk/Source/JavaScriptCore/b3/testb3.cpp</a></li>
</ul>

<h3>Added Paths</h3>
<ul>
<li><a href="#trunkSourceJavaScriptCoreb3airAirFixPartialRegisterStallscpp">trunk/Source/JavaScriptCore/b3/air/AirFixPartialRegisterStalls.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirFixPartialRegisterStallsh">trunk/Source/JavaScriptCore/b3/air/AirFixPartialRegisterStalls.h</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ChangeLog (192945 => 192946)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ChangeLog        2015-12-02 17:27:37 UTC (rev 192945)
+++ trunk/Source/JavaScriptCore/ChangeLog        2015-12-02 18:49:09 UTC (rev 192946)
</span><span class="lines">@@ -1,3 +1,60 @@
</span><ins>+2015-12-02  Benjamin Poulain  &lt;bpoulain@apple.com&gt;
+
+        [JSC] Handle x86 partial register stalls in Air
+        https://bugs.webkit.org/show_bug.cgi?id=151735
+
+        Reviewed by Filip Pizlo.
+
+        This patch adds a primitive false-dependency breaking
+        algorithm to Air. We look for redefinition of the same
+        variable that is too close to a partial definition.
+
+        There is not explicit dependency tracking going on,
+        but it is pretty fast and the extra xorps added on false-positives
+        are cheap anyway.
+
+        Typically, partial register stalls appear from instructions
+        interfering with themselves in small loops. Something like:
+
+          Label0:
+            cvtsi2sdq %eax, %xmm0
+            ...
+            jmp Label0
+
+        Those are correctly detected by propagating the local distance
+        information from block to block until no unsafe chain is found.
+
+        The test testInt32ToDoublePartialRegisterStall() checks the kind
+        of cases we typically find from JavaScript.
+        The execution time is 20% faster with a register reset (which is
+        astounding since the very next instruction has a real dependency).
+
+        Future tweaks will be needed when we can run more JavaScript:
+        -Handle function calls differently.
+        -Anything with a special can have hidden instructions.
+         We need to take them into account.
+
+        * JavaScriptCore.xcodeproj/project.pbxproj:
+        * assembler/MacroAssemblerX86Common.h:
+        (JSC::MacroAssemblerX86Common::moveZeroToDouble):
+        * assembler/X86Assembler.h:
+        (JSC::X86Assembler::xorps_rr):
+        (JSC::X86Assembler::xorpd_rr):
+        According to the documentation, starting with Sandy Bridge,
+        registers reset can be done in the frontend with xorps.
+
+        * b3/B3IndexSet.h:
+        (JSC::B3::IndexSet::remove):
+        * b3/air/AirFixPartialRegisterStalls.cpp: Added.
+        (JSC::B3::Air::fixPartialRegisterStalls):
+        * b3/air/AirFixPartialRegisterStalls.h: Added.
+        * b3/air/AirGenerate.cpp:
+        (JSC::B3::Air::prepareForGeneration):
+        * b3/testb3.cpp:
+        (JSC::B3::testInt32ToDoublePartialRegisterStall):
+        (JSC::B3::run):
+        * jit/FPRInfo.h:
+
</ins><span class="cx"> 2015-12-01  Yusuke Suzuki  &lt;utatane.tea@gmail.com&gt;
</span><span class="cx"> 
</span><span class="cx">         [ES6] Implement LLInt/Baseline Support for ES6 Generators and enable this feature
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreJavaScriptCorexcodeprojprojectpbxproj"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj (192945 => 192946)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj        2015-12-02 17:27:37 UTC (rev 192945)
+++ trunk/Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj        2015-12-02 18:49:09 UTC (rev 192946)
</span><span class="lines">@@ -1086,6 +1086,8 @@
</span><span class="cx">                 1ACF7377171CA6FB00C9BB1E /* Weak.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 1ACF7376171CA6FB00C9BB1E /* Weak.cpp */; };
</span><span class="cx">                 2600B5A6152BAAA70091EE5F /* JSStringJoiner.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 2600B5A4152BAAA70091EE5F /* JSStringJoiner.cpp */; };
</span><span class="cx">                 2600B5A7152BAAA70091EE5F /* JSStringJoiner.h in Headers */ = {isa = PBXBuildFile; fileRef = 2600B5A5152BAAA70091EE5F /* JSStringJoiner.h */; };
</span><ins>+                262D85B61C0D650F006ACB61 /* AirFixPartialRegisterStalls.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 262D85B41C0D650F006ACB61 /* AirFixPartialRegisterStalls.cpp */; };
+                262D85B71C0D650F006ACB61 /* AirFixPartialRegisterStalls.h in Headers */ = {isa = PBXBuildFile; fileRef = 262D85B51C0D650F006ACB61 /* AirFixPartialRegisterStalls.h */; };
</ins><span class="cx">                 26718BA41BE99F780052017B /* AirIteratedRegisterCoalescing.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 26718BA21BE99F780052017B /* AirIteratedRegisterCoalescing.cpp */; };
</span><span class="cx">                 26718BA51BE99F780052017B /* AirIteratedRegisterCoalescing.h in Headers */ = {isa = PBXBuildFile; fileRef = 26718BA31BE99F780052017B /* AirIteratedRegisterCoalescing.h */; };
</span><span class="cx">                 2684D4381C00161C0081D663 /* AirLiveness.h in Headers */ = {isa = PBXBuildFile; fileRef = 2684D4371C00161C0081D663 /* AirLiveness.h */; };
</span><span class="lines">@@ -3138,6 +3140,8 @@
</span><span class="cx">                 1CAA8B4B0D32C39A0041BCFF /* JavaScriptCore.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = JavaScriptCore.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 2600B5A4152BAAA70091EE5F /* JSStringJoiner.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = JSStringJoiner.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 2600B5A5152BAAA70091EE5F /* JSStringJoiner.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = JSStringJoiner.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><ins>+                262D85B41C0D650F006ACB61 /* AirFixPartialRegisterStalls.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirFixPartialRegisterStalls.cpp; path = b3/air/AirFixPartialRegisterStalls.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
+                262D85B51C0D650F006ACB61 /* AirFixPartialRegisterStalls.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirFixPartialRegisterStalls.h; path = b3/air/AirFixPartialRegisterStalls.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</ins><span class="cx">                 264091FA1BE2FD4100684DB2 /* AirOpcode.opcodes */ = {isa = PBXFileReference; lastKnownFileType = text; name = AirOpcode.opcodes; path = b3/air/AirOpcode.opcodes; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 26718BA21BE99F780052017B /* AirIteratedRegisterCoalescing.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirIteratedRegisterCoalescing.cpp; path = b3/air/AirIteratedRegisterCoalescing.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 26718BA31BE99F780052017B /* AirIteratedRegisterCoalescing.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirIteratedRegisterCoalescing.h; path = b3/air/AirIteratedRegisterCoalescing.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="lines">@@ -4655,6 +4659,8 @@
</span><span class="cx">                                 0FEC85511BDACDC70080FF74 /* AirCode.h */,
</span><span class="cx">                                 0F4570361BE44C910062A629 /* AirEliminateDeadCode.cpp */,
</span><span class="cx">                                 0F4570371BE44C910062A629 /* AirEliminateDeadCode.h */,
</span><ins>+                                262D85B41C0D650F006ACB61 /* AirFixPartialRegisterStalls.cpp */,
+                                262D85B51C0D650F006ACB61 /* AirFixPartialRegisterStalls.h */,
</ins><span class="cx">                                 0FEC85521BDACDC70080FF74 /* AirFrequentedBlock.h */,
</span><span class="cx">                                 0FEC85531BDACDC70080FF74 /* AirGenerate.cpp */,
</span><span class="cx">                                 0FEC85541BDACDC70080FF74 /* AirGenerate.h */,
</span><span class="lines">@@ -7841,6 +7847,7 @@
</span><span class="cx">                                 86704B8712DBA33700A9FE7B /* YarrJIT.h in Headers */,
</span><span class="cx">                                 86704B8812DBA33700A9FE7B /* YarrParser.h in Headers */,
</span><span class="cx">                                 86704B8A12DBA33700A9FE7B /* YarrPattern.h in Headers */,
</span><ins>+                                262D85B71C0D650F006ACB61 /* AirFixPartialRegisterStalls.h in Headers */,
</ins><span class="cx">                                 86704B4312DB8A8100A9FE7B /* YarrSyntaxChecker.h in Headers */,
</span><span class="cx">                         );
</span><span class="cx">                         runOnlyForDeploymentPostprocessing = 0;
</span><span class="lines">@@ -8754,6 +8761,7 @@
</span><span class="cx">                                 A1B9E23D1B4E0D6700BC7FED /* IntlCollatorPrototype.cpp in Sources */,
</span><span class="cx">                                 A1587D6D1B4DC14100D69849 /* IntlDateTimeFormat.cpp in Sources */,
</span><span class="cx">                                 A1587D6F1B4DC14100D69849 /* IntlDateTimeFormatConstructor.cpp in Sources */,
</span><ins>+                                262D85B61C0D650F006ACB61 /* AirFixPartialRegisterStalls.cpp in Sources */,
</ins><span class="cx">                                 70B7919B1C024A46002481E2 /* JSGeneratorFunction.cpp in Sources */,
</span><span class="cx">                                 A1587D711B4DC14100D69849 /* IntlDateTimeFormatPrototype.cpp in Sources */,
</span><span class="cx">                                 A1D792FC1B43864B004516F5 /* IntlNumberFormat.cpp in Sources */,
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerMacroAssemblerX86Commonh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h (192945 => 192946)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h        2015-12-02 17:27:37 UTC (rev 192945)
+++ trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h        2015-12-02 18:49:09 UTC (rev 192946)
</span><span class="lines">@@ -996,7 +996,7 @@
</span><span class="cx"> 
</span><span class="cx">     void moveZeroToDouble(FPRegisterID reg)
</span><span class="cx">     {
</span><del>-        m_assembler.xorpd_rr(reg, reg);
</del><ins>+        m_assembler.xorps_rr(reg, reg);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     Jump branchDoubleNonZero(FPRegisterID reg, FPRegisterID scratch)
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerX86Assemblerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/X86Assembler.h (192945 => 192946)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/X86Assembler.h        2015-12-02 17:27:37 UTC (rev 192945)
+++ trunk/Source/JavaScriptCore/assembler/X86Assembler.h        2015-12-02 18:49:09 UTC (rev 192946)
</span><span class="lines">@@ -2120,8 +2120,17 @@
</span><span class="cx">         m_formatter.twoByteOp(OP2_DIVSD_VsdWsd, (RegisterID)dst, base, offset);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void xorps_rr(XMMRegisterID src, XMMRegisterID dst)
+    {
+        m_formatter.twoByteOp(OP2_XORPD_VpdWpd, (RegisterID)dst, (RegisterID)src);
+    }
+
</ins><span class="cx">     void xorpd_rr(XMMRegisterID src, XMMRegisterID dst)
</span><span class="cx">     {
</span><ins>+        if (src == dst) {
+            xorps_rr(src, dst);
+            return;
+        }
</ins><span class="cx">         m_formatter.prefix(PRE_SSE_66);
</span><span class="cx">         m_formatter.twoByteOp(OP2_XORPD_VpdWpd, (RegisterID)dst, (RegisterID)src);
</span><span class="cx">     }
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3B3IndexSeth"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/B3IndexSet.h (192945 => 192946)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/B3IndexSet.h        2015-12-02 17:27:37 UTC (rev 192945)
+++ trunk/Source/JavaScriptCore/b3/B3IndexSet.h        2015-12-02 18:49:09 UTC (rev 192946)
</span><span class="lines">@@ -48,6 +48,11 @@
</span><span class="cx">         return !m_set.set(value-&gt;index());
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    bool remove(T* value)
+    {
+        return m_set.clear(value-&gt;index());
+    }
+
</ins><span class="cx">     bool contains(T* value) const
</span><span class="cx">     {
</span><span class="cx">         if (!value)
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirFixPartialRegisterStallscpp"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/AirFixPartialRegisterStalls.cpp (0 => 192946)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirFixPartialRegisterStalls.cpp                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/AirFixPartialRegisterStalls.cpp        2015-12-02 18:49:09 UTC (rev 192946)
</span><span class="lines">@@ -0,0 +1,230 @@
</span><ins>+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include &quot;config.h&quot;
+#include &quot;AirFixPartialRegisterStalls.h&quot;
+
+#if ENABLE(B3_JIT)
+
+#include &quot;AirBasicBlock.h&quot;
+#include &quot;AirCode.h&quot;
+#include &quot;AirInsertionSet.h&quot;
+#include &quot;AirInst.h&quot;
+#include &quot;AirInstInlines.h&quot;
+#include &quot;AirPhaseScope.h&quot;
+#include &quot;B3IndexMap.h&quot;
+#include &quot;B3IndexSet.h&quot;
+#include &quot;MacroAssembler.h&quot;
+#include &lt;wtf/Vector.h&gt;
+
+namespace JSC { namespace B3 { namespace Air {
+
+namespace {
+
+bool hasPartialXmmRegUpdate(const Inst&amp; inst)
+{
+    switch (inst.opcode) {
+    case ConvertInt32ToDouble:
+    case ConvertInt64ToDouble:
+    case SqrtDouble:
+        return true;
+    default:
+        break;
+    }
+    return false;
+}
+
+bool isDependencyBreaking(const Inst&amp; inst)
+{
+    // &quot;xorps reg, reg&quot; is used by the frontend to remove the dependency on its argument.
+    return inst.opcode == MoveZeroToDouble;
+}
+
+// FIXME: find a good distance per architecture experimentally.
+// LLVM uses a distance of 16 but that comes from Nehalem.
+unsigned char minimumSafeDistance = 16;
+
+struct FPDefDistance {
+    FPDefDistance()
+    {
+        for (unsigned i = 0; i &lt; MacroAssembler::numberOfFPRegisters(); ++i)
+            distance[i] = 255;
+    }
+
+    void reset(FPRReg reg)
+    {
+        unsigned index = MacroAssembler::fpRegisterIndex(reg);
+        distance[index] = 255;
+    }
+
+    void add(FPRReg reg, unsigned registerDistance)
+    {
+        unsigned index = MacroAssembler::fpRegisterIndex(reg);
+        if (registerDistance &lt; distance[index])
+            distance[index] = static_cast&lt;unsigned char&gt;(registerDistance);
+    }
+
+    bool updateFromPrecessor(FPDefDistance&amp; precessorDistance, unsigned constantOffset = 0)
+    {
+        bool changed = false;
+        for (unsigned i = 0; i &lt; MacroAssembler::numberOfFPRegisters(); ++i) {
+            unsigned regDistance = precessorDistance.distance[i] + constantOffset;
+            if (regDistance &lt; minimumSafeDistance &amp;&amp; regDistance &lt; distance[i]) {
+                distance[i] = regDistance;
+                changed = true;
+            }
+        }
+        return changed;
+    }
+
+    unsigned char distance[MacroAssembler::numberOfFPRegisters()];
+};
+
+void updateDistances(Inst&amp; inst, FPDefDistance&amp; localDistance, unsigned&amp; distanceToBlockEnd)
+{
+    --distanceToBlockEnd;
+
+    if (isDependencyBreaking(inst)) {
+        localDistance.reset(inst.args[0].tmp().fpr());
+        return;
+    }
+
+    inst.forEachTmp([&amp;] (Tmp&amp; tmp, Arg::Role role, Arg::Type) {
+        ASSERT_WITH_MESSAGE(tmp.isReg(), &quot;This phase must be run after register allocation.&quot;);
+
+        if (tmp.isFPR() &amp;&amp; Arg::isDef(role))
+            localDistance.add(tmp.fpr(), distanceToBlockEnd);
+    });
+}
+
+}
+
+void fixPartialRegisterStalls(Code&amp; code)
+{
+    if (!isX86())
+        return;
+
+    PhaseScope phaseScope(code, &quot;fixPartialRegisterStalls&quot;);
+
+    Vector&lt;BasicBlock*&gt; candidates;
+
+    for (BasicBlock* block : code) {
+        for (const Inst&amp; inst : *block) {
+            if (hasPartialXmmRegUpdate(inst)) {
+                candidates.append(block);
+                break;
+            }
+        }
+    }
+
+    // Fortunately, Partial Stalls are rarely used. Return early if no block
+    // cares about them.
+    if (candidates.isEmpty())
+        return;
+
+    // For each block, this provides the distance to the last instruction setting each register
+    // on block *entry*.
+    IndexMap&lt;BasicBlock, FPDefDistance&gt; lastDefDistance(code.size());
+
+    // Blocks with dirty distance at head.
+    IndexSet&lt;BasicBlock&gt; dirty;
+
+    // First, we compute the local distance for each block and push it to the successors.
+    for (BasicBlock* block : code) {
+        FPDefDistance localDistance;
+
+        unsigned distanceToBlockEnd = block-&gt;size();
+        for (Inst&amp; inst : *block)
+            updateDistances(inst, localDistance, distanceToBlockEnd);
+
+        for (BasicBlock* successor : block-&gt;successorBlocks()) {
+            if (lastDefDistance[successor].updateFromPrecessor(localDistance))
+                dirty.add(successor);
+        }
+    }
+
+    // Now we propagate the minimums accross blocks.
+    bool changed;
+    do {
+        changed = false;
+
+        for (BasicBlock* block : code) {
+            if (!dirty.remove(block))
+                continue;
+
+            // Little shortcut: if the block is big enough, propagating it won't add any information.
+            if (block-&gt;size() &gt;= minimumSafeDistance)
+                continue;
+
+            unsigned blockSize = block-&gt;size();
+            FPDefDistance&amp; blockDistance = lastDefDistance[block];
+            for (BasicBlock* successor : block-&gt;successorBlocks()) {
+                if (lastDefDistance[successor].updateFromPrecessor(blockDistance, blockSize)) {
+                    dirty.add(successor);
+                    changed = true;
+                }
+            }
+        }
+    } while (changed);
+
+    // Finally, update each block as needed.
+    InsertionSet insertionSet(code);
+    for (BasicBlock* block : candidates) {
+        unsigned distanceToBlockEnd = block-&gt;size();
+        FPDefDistance&amp; localDistance = lastDefDistance[block];
+
+        for (unsigned i = 0; i &lt; block-&gt;size(); ++i) {
+            Inst&amp; inst = block-&gt;at(i);
+
+            if (hasPartialXmmRegUpdate(inst)) {
+                RegisterSet defs;
+                RegisterSet uses;
+                inst.forEachTmp([&amp;] (Tmp&amp; tmp, Arg::Role role, Arg::Type) {
+                    if (tmp.isFPR()) {
+                        if (Arg::isDef(role))
+                            defs.set(tmp.fpr());
+                        if (Arg::isAnyUse(role))
+                            uses.set(tmp.fpr());
+                    }
+                });
+                // We only care about values we define but not use. Otherwise we have to wait
+                // for the value to be resolved anyway.
+                defs.exclude(uses);
+
+                defs.forEach([&amp;] (Reg reg) {
+                    if (localDistance.distance[MacroAssembler::fpRegisterIndex(reg.fpr())] &lt; minimumSafeDistance)
+                        insertionSet.insert(i, MoveZeroToDouble, inst.origin, Tmp(reg));
+                });
+            }
+
+            updateDistances(inst, localDistance, distanceToBlockEnd);
+        }
+        insertionSet.execute(block);
+    }
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirFixPartialRegisterStallsh"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/AirFixPartialRegisterStalls.h (0 => 192946)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirFixPartialRegisterStalls.h                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/AirFixPartialRegisterStalls.h        2015-12-02 18:49:09 UTC (rev 192946)
</span><span class="lines">@@ -0,0 +1,49 @@
</span><ins>+/*
+ * Copyright (C) 2015 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef AirFixPartialRegisterStalls_h
+#define AirFixPartialRegisterStalls_h
+
+#if ENABLE(B3_JIT)
+
+namespace JSC { namespace B3 { namespace Air {
+
+class Code;
+
+// x86 has a pipelining hazard caused by false dependencies between instructions.
+//
+// Some instructions update only part of a register, they can only be scheduled after
+// the previous definition is computed. This problem can be avoided by the compiler
+// by explicitely resetting the entire register before executing the instruction with
+// partial update.
+//
+// See &quot;Partial XMM Register Stalls&quot; and &quot;Dependency Breaking Idioms&quot; in the manual.
+void fixPartialRegisterStalls(Code&amp;);
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirFixPartialRegisterStalls_h
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirGeneratecpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirGenerate.cpp (192945 => 192946)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirGenerate.cpp        2015-12-02 17:27:37 UTC (rev 192945)
+++ trunk/Source/JavaScriptCore/b3/air/AirGenerate.cpp        2015-12-02 18:49:09 UTC (rev 192946)
</span><span class="lines">@@ -31,6 +31,7 @@
</span><span class="cx"> #include &quot;AirAllocateStack.h&quot;
</span><span class="cx"> #include &quot;AirCode.h&quot;
</span><span class="cx"> #include &quot;AirEliminateDeadCode.h&quot;
</span><ins>+#include &quot;AirFixPartialRegisterStalls.h&quot;
</ins><span class="cx"> #include &quot;AirGenerationContext.h&quot;
</span><span class="cx"> #include &quot;AirHandleCalleeSaves.h&quot;
</span><span class="cx"> #include &quot;AirIteratedRegisterCoalescing.h&quot;
</span><span class="lines">@@ -95,6 +96,10 @@
</span><span class="cx">     // frequency successor is also the fall-through target.
</span><span class="cx">     optimizeBlockOrder(code);
</span><span class="cx"> 
</span><ins>+    // Attempt to remove false dependencies between instructions created by partial register changes.
+    // This must be executed as late as possible as it depends on the instructions order and register use.
+    fixPartialRegisterStalls(code);
+
</ins><span class="cx">     // This is needed to satisfy a requirement of B3::StackmapValue.
</span><span class="cx">     reportUsedRegisters(code);
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3testb3cpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/testb3.cpp (192945 => 192946)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/testb3.cpp        2015-12-02 17:27:37 UTC (rev 192945)
+++ trunk/Source/JavaScriptCore/b3/testb3.cpp        2015-12-02 18:49:09 UTC (rev 192946)
</span><span class="lines">@@ -2667,6 +2667,105 @@
</span><span class="cx">     compileAndRun&lt;double&gt;(proc, 1.1, 2.5);
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+void testInt32ToDoublePartialRegisterStall()
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    BasicBlock* loop = proc.addBlock();
+    BasicBlock* done = proc.addBlock();
+
+    // Head.
+    Value* total = root-&gt;appendNew&lt;ConstDoubleValue&gt;(proc, Origin(), 0.);
+    Value* counter = root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0);
+    UpsilonValue* originalTotal = root-&gt;appendNew&lt;UpsilonValue&gt;(proc, Origin(), total);
+    UpsilonValue* originalCounter = root-&gt;appendNew&lt;UpsilonValue&gt;(proc, Origin(), counter);
+    root-&gt;appendNew&lt;ControlValue&gt;(proc, Jump, Origin(), FrequentedBlock(loop));
+
+    // Loop.
+    Value* loopCounter = loop-&gt;appendNew&lt;Value&gt;(proc, Phi, Int64, Origin());
+    Value* loopTotal = loop-&gt;appendNew&lt;Value&gt;(proc, Phi, Double, Origin());
+    originalCounter-&gt;setPhi(loopCounter);
+    originalTotal-&gt;setPhi(loopTotal);
+
+    Value* truncatedCounter = loop-&gt;appendNew&lt;Value&gt;(proc, Trunc, Origin(), loopCounter);
+    Value* doubleCounter = loop-&gt;appendNew&lt;Value&gt;(proc, IToD, Origin(), truncatedCounter);
+    Value* updatedTotal = loop-&gt;appendNew&lt;Value&gt;(proc, Add, Origin(), doubleCounter, loopTotal);
+    UpsilonValue* updatedTotalUpsilon = loop-&gt;appendNew&lt;UpsilonValue&gt;(proc, Origin(), updatedTotal);
+    updatedTotalUpsilon-&gt;setPhi(loopTotal);
+
+    Value* decCounter = loop-&gt;appendNew&lt;Value&gt;(proc, Sub, Origin(), loopCounter, loop-&gt;appendNew&lt;Const64Value&gt;(proc, Origin(), 1));
+    UpsilonValue* decCounterUpsilon = loop-&gt;appendNew&lt;UpsilonValue&gt;(proc, Origin(), decCounter);
+    decCounterUpsilon-&gt;setPhi(loopCounter);
+    loop-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Branch, Origin(),
+        decCounter,
+        FrequentedBlock(loop), FrequentedBlock(done));
+
+    // Tail.
+    done-&gt;appendNew&lt;ControlValue&gt;(proc, Return, Origin(), updatedTotal);
+    CHECK(isIdentical(compileAndRun&lt;double&gt;(proc, 100000), 5000050000.));
+}
+
+void testInt32ToDoublePartialRegisterWithoutStall()
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    BasicBlock* loop = proc.addBlock();
+    BasicBlock* done = proc.addBlock();
+
+    // Head.
+    Value* total = root-&gt;appendNew&lt;ConstDoubleValue&gt;(proc, Origin(), 0.);
+    Value* counter = root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0);
+    UpsilonValue* originalTotal = root-&gt;appendNew&lt;UpsilonValue&gt;(proc, Origin(), total);
+    UpsilonValue* originalCounter = root-&gt;appendNew&lt;UpsilonValue&gt;(proc, Origin(), counter);
+    uint64_t forPaddingInput;
+    Value* forPaddingInputAddress = root-&gt;appendNew&lt;ConstPtrValue&gt;(proc, Origin(), &amp;forPaddingInput);
+    uint64_t forPaddingOutput;
+    Value* forPaddingOutputAddress = root-&gt;appendNew&lt;ConstPtrValue&gt;(proc, Origin(), &amp;forPaddingOutput);
+    root-&gt;appendNew&lt;ControlValue&gt;(proc, Jump, Origin(), FrequentedBlock(loop));
+
+    // Loop.
+    Value* loopCounter = loop-&gt;appendNew&lt;Value&gt;(proc, Phi, Int64, Origin());
+    Value* loopTotal = loop-&gt;appendNew&lt;Value&gt;(proc, Phi, Double, Origin());
+    originalCounter-&gt;setPhi(loopCounter);
+    originalTotal-&gt;setPhi(loopTotal);
+
+    Value* truncatedCounter = loop-&gt;appendNew&lt;Value&gt;(proc, Trunc, Origin(), loopCounter);
+    Value* doubleCounter = loop-&gt;appendNew&lt;Value&gt;(proc, IToD, Origin(), truncatedCounter);
+    Value* updatedTotal = loop-&gt;appendNew&lt;Value&gt;(proc, Add, Origin(), doubleCounter, loopTotal);
+
+    // Add enough padding instructions to avoid a stall.
+    Value* loadPadding = loop-&gt;appendNew&lt;MemoryValue&gt;(proc, Load, Int64, Origin(), forPaddingInputAddress);
+    Value* padding = loop-&gt;appendNew&lt;Value&gt;(proc, BitXor, Origin(), loadPadding, loopCounter);
+    padding = loop-&gt;appendNew&lt;Value&gt;(proc, Add, Origin(), padding, loopCounter);
+    padding = loop-&gt;appendNew&lt;Value&gt;(proc, BitOr, Origin(), padding, loopCounter);
+    padding = loop-&gt;appendNew&lt;Value&gt;(proc, Sub, Origin(), padding, loopCounter);
+    padding = loop-&gt;appendNew&lt;Value&gt;(proc, BitXor, Origin(), padding, loopCounter);
+    padding = loop-&gt;appendNew&lt;Value&gt;(proc, Add, Origin(), padding, loopCounter);
+    padding = loop-&gt;appendNew&lt;Value&gt;(proc, BitOr, Origin(), padding, loopCounter);
+    padding = loop-&gt;appendNew&lt;Value&gt;(proc, Sub, Origin(), padding, loopCounter);
+    padding = loop-&gt;appendNew&lt;Value&gt;(proc, BitXor, Origin(), padding, loopCounter);
+    padding = loop-&gt;appendNew&lt;Value&gt;(proc, Add, Origin(), padding, loopCounter);
+    padding = loop-&gt;appendNew&lt;Value&gt;(proc, BitOr, Origin(), padding, loopCounter);
+    padding = loop-&gt;appendNew&lt;Value&gt;(proc, Sub, Origin(), padding, loopCounter);
+    loop-&gt;appendNew&lt;MemoryValue&gt;(proc, Store, Origin(), padding, forPaddingOutputAddress);
+
+    UpsilonValue* updatedTotalUpsilon = loop-&gt;appendNew&lt;UpsilonValue&gt;(proc, Origin(), updatedTotal);
+    updatedTotalUpsilon-&gt;setPhi(loopTotal);
+
+    Value* decCounter = loop-&gt;appendNew&lt;Value&gt;(proc, Sub, Origin(), loopCounter, loop-&gt;appendNew&lt;Const64Value&gt;(proc, Origin(), 1));
+    UpsilonValue* decCounterUpsilon = loop-&gt;appendNew&lt;UpsilonValue&gt;(proc, Origin(), decCounter);
+    decCounterUpsilon-&gt;setPhi(loopCounter);
+    loop-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Branch, Origin(),
+        decCounter,
+        FrequentedBlock(loop), FrequentedBlock(done));
+
+    // Tail.
+    done-&gt;appendNew&lt;ControlValue&gt;(proc, Return, Origin(), updatedTotal);
+    CHECK(isIdentical(compileAndRun&lt;double&gt;(proc, 100000), 5000050000.));
+}
+
</ins><span class="cx"> void testBranch()
</span><span class="cx"> {
</span><span class="cx">     Procedure proc;
</span><span class="lines">@@ -5888,6 +5987,9 @@
</span><span class="cx">     RUN(testSpillGP());
</span><span class="cx">     RUN(testSpillFP());
</span><span class="cx"> 
</span><ins>+    RUN(testInt32ToDoublePartialRegisterStall());
+    RUN(testInt32ToDoublePartialRegisterWithoutStall());
+
</ins><span class="cx">     RUN(testCallSimple(1, 2));
</span><span class="cx">     RUN(testCallFunctionWithHellaArguments());
</span><span class="cx"> 
</span></span></pre>
</div>
</div>

</body>
</html>