<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[198873] trunk/Source/JavaScriptCore</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/198873">198873</a></dd>
<dt>Author</dt> <dd>benjamin@webkit.org</dd>
<dt>Date</dt> <dd>2016-03-30 19:05:13 -0700 (Wed, 30 Mar 2016)</dd>
</dl>

<h3>Log Message</h3>
<pre>[JSC][x86] Add the 3 operands forms of floating point addition and multiplication
https://bugs.webkit.org/show_bug.cgi?id=156043

Reviewed by Geoffrey Garen.

When they are available, VADD and VMUL are better options to lower
floating point addition and multiplication.

In the simple cases when one of the operands is aliased to the destination,
those forms have the same size or 1 byte shorter depending on the registers.

In the more advanced cases, we gain nice advantages with the new forms:
-We can get rid of the MoveDouble in front the instruction when we cannot
 alias.
-We can disable aliasing entirely in Air. That is useful for latency
 since computing coalescing is not exactly cheap.

* assembler/MacroAssemblerX86Common.cpp:
* assembler/MacroAssemblerX86Common.h:
(JSC::MacroAssemblerX86Common::and32):
(JSC::MacroAssemblerX86Common::mul32):
(JSC::MacroAssemblerX86Common::or32):
(JSC::MacroAssemblerX86Common::xor32):
(JSC::MacroAssemblerX86Common::branchAdd32):
The change in B3LowerToAir exposed a bug in the fake 3 operands
forms of those instructions. If the address is equal to
the destination, we were nuking the address.

For example,
    Add32([%<a href="http://trac.webkit.org/projects/webkit/changeset/11">r11</a>], %eax, %<a href="http://trac.webkit.org/projects/webkit/changeset/11">r11</a>)
would generate:
    move %eax, %<a href="http://trac.webkit.org/projects/webkit/changeset/11">r11</a>
    add32 [%<a href="http://trac.webkit.org/projects/webkit/changeset/11">r11</a>], %<a href="http://trac.webkit.org/projects/webkit/changeset/11">r11</a>
which crashes.

I updated codegen of those cases to support that case through
    load32 [%<a href="http://trac.webkit.org/projects/webkit/changeset/11">r11</a>], %<a href="http://trac.webkit.org/projects/webkit/changeset/11">r11</a>
    add32 %eax, %<a href="http://trac.webkit.org/projects/webkit/changeset/11">r11</a>

The weird case were all arguments have the same registers
is handled too.

(JSC::MacroAssemblerX86Common::addDouble):
(JSC::MacroAssemblerX86Common::addFloat):
(JSC::MacroAssemblerX86Common::mulDouble):
(JSC::MacroAssemblerX86Common::mulFloat):
(JSC::MacroAssemblerX86Common::supportsFloatingPointRounding):
(JSC::MacroAssemblerX86Common::supportsAVX):
(JSC::MacroAssemblerX86Common::updateEax1EcxFlags):
* assembler/MacroAssemblerX86_64.h:
(JSC::MacroAssemblerX86_64::branchAdd64):
* assembler/X86Assembler.h:
(JSC::X86Assembler::vaddsd_rr):
(JSC::X86Assembler::vaddsd_mr):
(JSC::X86Assembler::vaddss_rr):
(JSC::X86Assembler::vaddss_mr):
(JSC::X86Assembler::vmulsd_rr):
(JSC::X86Assembler::vmulsd_mr):
(JSC::X86Assembler::vmulss_rr):
(JSC::X86Assembler::vmulss_mr):
(JSC::X86Assembler::X86InstructionFormatter::SingleInstructionBufferWriter::memoryModRM):
* b3/B3LowerToAir.cpp:
(JSC::B3::Air::LowerToAir::appendBinOp):
Add the 3 operand forms so that we lower Add and Mul
to the best form directly.

I will change how we lower the fake 3 operands instructions
but the codegen should end up the same in most cases.
The new codegen is the load32 + op above.

* b3/air/AirInstInlines.h:
(JSC::B3::Air::Inst::shouldTryAliasingDef):
* b3/air/testair.cpp:
(JSC::B3::Air::testX86VMULSD):
(JSC::B3::Air::testX86VMULSDDestRex):
(JSC::B3::Air::testX86VMULSDOp1DestRex):
(JSC::B3::Air::testX86VMULSDOp2DestRex):
(JSC::B3::Air::testX86VMULSDOpsDestRex):
(JSC::B3::Air::testX86VMULSDAddr):
(JSC::B3::Air::testX86VMULSDAddrOpRexAddr):
(JSC::B3::Air::testX86VMULSDDestRexAddr):
(JSC::B3::Air::testX86VMULSDRegOpDestRexAddr):
(JSC::B3::Air::testX86VMULSDAddrOpDestRexAddr):
Make sure we have some coverage for AVX encoding of instructions.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceJavaScriptCoreChangeLog">trunk/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerMacroAssemblerX86Commoncpp">trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerMacroAssemblerX86Commonh">trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerMacroAssemblerX86_64h">trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerX86Assemblerh">trunk/Source/JavaScriptCore/assembler/X86Assembler.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3B3LowerToAircpp">trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirInstInlinesh">trunk/Source/JavaScriptCore/b3/air/AirInstInlines.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airtestaircpp">trunk/Source/JavaScriptCore/b3/air/testair.cpp</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ChangeLog (198872 => 198873)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ChangeLog        2016-03-31 02:03:57 UTC (rev 198872)
+++ trunk/Source/JavaScriptCore/ChangeLog        2016-03-31 02:05:13 UTC (rev 198873)
</span><span class="lines">@@ -1,3 +1,90 @@
</span><ins>+2016-03-30  Benjamin Poulain  &lt;benjamin@webkit.org&gt;
+
+        [JSC][x86] Add the 3 operands forms of floating point addition and multiplication
+        https://bugs.webkit.org/show_bug.cgi?id=156043
+
+        Reviewed by Geoffrey Garen.
+
+        When they are available, VADD and VMUL are better options to lower
+        floating point addition and multiplication.
+
+        In the simple cases when one of the operands is aliased to the destination,
+        those forms have the same size or 1 byte shorter depending on the registers.
+
+        In the more advanced cases, we gain nice advantages with the new forms:
+        -We can get rid of the MoveDouble in front the instruction when we cannot
+         alias.
+        -We can disable aliasing entirely in Air. That is useful for latency
+         since computing coalescing is not exactly cheap.
+
+        * assembler/MacroAssemblerX86Common.cpp:
+        * assembler/MacroAssemblerX86Common.h:
+        (JSC::MacroAssemblerX86Common::and32):
+        (JSC::MacroAssemblerX86Common::mul32):
+        (JSC::MacroAssemblerX86Common::or32):
+        (JSC::MacroAssemblerX86Common::xor32):
+        (JSC::MacroAssemblerX86Common::branchAdd32):
+        The change in B3LowerToAir exposed a bug in the fake 3 operands
+        forms of those instructions. If the address is equal to
+        the destination, we were nuking the address.
+
+        For example,
+            Add32([%r11], %eax, %r11)
+        would generate:
+            move %eax, %r11
+            add32 [%r11], %r11
+        which crashes.
+
+        I updated codegen of those cases to support that case through
+            load32 [%r11], %r11
+            add32 %eax, %r11
+
+        The weird case were all arguments have the same registers
+        is handled too.
+
+        (JSC::MacroAssemblerX86Common::addDouble):
+        (JSC::MacroAssemblerX86Common::addFloat):
+        (JSC::MacroAssemblerX86Common::mulDouble):
+        (JSC::MacroAssemblerX86Common::mulFloat):
+        (JSC::MacroAssemblerX86Common::supportsFloatingPointRounding):
+        (JSC::MacroAssemblerX86Common::supportsAVX):
+        (JSC::MacroAssemblerX86Common::updateEax1EcxFlags):
+        * assembler/MacroAssemblerX86_64.h:
+        (JSC::MacroAssemblerX86_64::branchAdd64):
+        * assembler/X86Assembler.h:
+        (JSC::X86Assembler::vaddsd_rr):
+        (JSC::X86Assembler::vaddsd_mr):
+        (JSC::X86Assembler::vaddss_rr):
+        (JSC::X86Assembler::vaddss_mr):
+        (JSC::X86Assembler::vmulsd_rr):
+        (JSC::X86Assembler::vmulsd_mr):
+        (JSC::X86Assembler::vmulss_rr):
+        (JSC::X86Assembler::vmulss_mr):
+        (JSC::X86Assembler::X86InstructionFormatter::SingleInstructionBufferWriter::memoryModRM):
+        * b3/B3LowerToAir.cpp:
+        (JSC::B3::Air::LowerToAir::appendBinOp):
+        Add the 3 operand forms so that we lower Add and Mul
+        to the best form directly.
+
+        I will change how we lower the fake 3 operands instructions
+        but the codegen should end up the same in most cases.
+        The new codegen is the load32 + op above.
+
+        * b3/air/AirInstInlines.h:
+        (JSC::B3::Air::Inst::shouldTryAliasingDef):
+        * b3/air/testair.cpp:
+        (JSC::B3::Air::testX86VMULSD):
+        (JSC::B3::Air::testX86VMULSDDestRex):
+        (JSC::B3::Air::testX86VMULSDOp1DestRex):
+        (JSC::B3::Air::testX86VMULSDOp2DestRex):
+        (JSC::B3::Air::testX86VMULSDOpsDestRex):
+        (JSC::B3::Air::testX86VMULSDAddr):
+        (JSC::B3::Air::testX86VMULSDAddrOpRexAddr):
+        (JSC::B3::Air::testX86VMULSDDestRexAddr):
+        (JSC::B3::Air::testX86VMULSDRegOpDestRexAddr):
+        (JSC::B3::Air::testX86VMULSDAddrOpDestRexAddr):
+        Make sure we have some coverage for AVX encoding of instructions.
+
</ins><span class="cx"> 2016-03-30  Saam Barati  &lt;sbarati@apple.com&gt;
</span><span class="cx"> 
</span><span class="cx">         Change some release asserts in CodeBlock linking into debug asserts
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerMacroAssemblerX86Commoncpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.cpp (198872 => 198873)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.cpp        2016-03-31 02:03:57 UTC (rev 198872)
+++ trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.cpp        2016-03-31 02:05:13 UTC (rev 198873)
</span><span class="lines">@@ -553,6 +553,7 @@
</span><span class="cx"> #endif
</span><span class="cx"> 
</span><span class="cx"> MacroAssemblerX86Common::CPUIDCheckState MacroAssemblerX86Common::s_sse4_1CheckState = CPUIDCheckState::NotChecked;
</span><ins>+MacroAssemblerX86Common::CPUIDCheckState MacroAssemblerX86Common::s_avxCheckState = CPUIDCheckState::NotChecked;
</ins><span class="cx"> MacroAssemblerX86Common::CPUIDCheckState MacroAssemblerX86Common::s_lzcntCheckState = CPUIDCheckState::NotChecked;
</span><span class="cx"> 
</span><span class="cx"> } // namespace JSC
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerMacroAssemblerX86Commonh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h (198872 => 198873)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h        2016-03-31 02:03:57 UTC (rev 198872)
+++ trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h        2016-03-31 02:05:13 UTC (rev 198873)
</span><span class="lines">@@ -270,14 +270,20 @@
</span><span class="cx"> 
</span><span class="cx">     void and32(Address op1, RegisterID op2, RegisterID dest)
</span><span class="cx">     {
</span><del>-        move32IfNeeded(op2, dest);
-        and32(op1, dest);
</del><ins>+        if (op2 == dest)
+            and32(op1, dest);
+        else if (op1.base == dest) {
+            load32(op1, dest);
+            and32(op2, dest);
+        } else {
+            zeroExtend32ToPtr(op2, dest);
+            and32(op1, dest);
+        }
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void and32(RegisterID op1, Address op2, RegisterID dest)
</span><span class="cx">     {
</span><del>-        move32IfNeeded(op1, dest);
-        and32(op2, dest);
</del><ins>+        and32(op2, op1, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void and32(TrustedImm32 imm, RegisterID src, RegisterID dest)
</span><span class="lines">@@ -360,16 +366,22 @@
</span><span class="cx">         m_assembler.imull_mr(src.offset, src.base, dest);
</span><span class="cx">     }
</span><span class="cx"> 
</span><del>-    void mul32(Address src1, RegisterID src2, RegisterID dest)
</del><ins>+    void mul32(Address op1, RegisterID op2, RegisterID dest)
</ins><span class="cx">     {
</span><del>-        move32IfNeeded(src2, dest);
-        mul32(src1, dest);
</del><ins>+        if (op2 == dest)
+            mul32(op1, dest);
+        else if (op1.base == dest) {
+            load32(op1, dest);
+            mul32(op2, dest);
+        } else {
+            zeroExtend32ToPtr(op2, dest);
+            mul32(op1, dest);
+        }
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void mul32(RegisterID src1, Address src2, RegisterID dest)
</span><span class="cx">     {
</span><del>-        move32IfNeeded(src1, dest);
-        mul32(src2, dest);
</del><ins>+        mul32(src2, src1, dest);
</ins><span class="cx">     }
</span><span class="cx">     
</span><span class="cx">     void mul32(TrustedImm32 imm, RegisterID src, RegisterID dest)
</span><span class="lines">@@ -450,14 +462,20 @@
</span><span class="cx"> 
</span><span class="cx">     void or32(Address op1, RegisterID op2, RegisterID dest)
</span><span class="cx">     {
</span><del>-        move32IfNeeded(op2, dest);
-        or32(op1, dest);
</del><ins>+        if (op2 == dest)
+            or32(op1, dest);
+        else if (op1.base == dest) {
+            load32(op1, dest);
+            or32(op2, dest);
+        } else {
+            zeroExtend32ToPtr(op2, dest);
+            or32(op1, dest);
+        }
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void or32(RegisterID op1, Address op2, RegisterID dest)
</span><span class="cx">     {
</span><del>-        move32IfNeeded(op1, dest);
-        or32(op2, dest);
</del><ins>+        or32(op2, op1, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void or32(TrustedImm32 imm, RegisterID src, RegisterID dest)
</span><span class="lines">@@ -609,14 +627,20 @@
</span><span class="cx"> 
</span><span class="cx">     void xor32(Address op1, RegisterID op2, RegisterID dest)
</span><span class="cx">     {
</span><del>-        move32IfNeeded(op2, dest);
-        xor32(op1, dest);
</del><ins>+        if (op2 == dest)
+            xor32(op1, dest);
+        else if (op1.base == dest) {
+            load32(op1, dest);
+            xor32(op2, dest);
+        } else {
+            zeroExtend32ToPtr(op2, dest);
+            xor32(op1, dest);
+        }
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void xor32(RegisterID op1, Address op2, RegisterID dest)
</span><span class="cx">     {
</span><del>-        move32IfNeeded(op1, dest);
-        xor32(op2, dest);
</del><ins>+        xor32(op2, op1, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void xor32(TrustedImm32 imm, RegisterID src, RegisterID dest)
</span><span class="lines">@@ -1066,96 +1090,94 @@
</span><span class="cx"> 
</span><span class="cx">     void addDouble(FPRegisterID src, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        m_assembler.addsd_rr(src, dest);
</del><ins>+        addDouble(src, dest, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void addDouble(FPRegisterID op1, FPRegisterID op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op1 == dest)
-            addDouble(op2, dest);
</del><ins>+        if (supportsAVX())
+            m_assembler.vaddsd_rr(op1, op2, dest);
</ins><span class="cx">         else {
</span><del>-            moveDouble(op2, dest);
-            addDouble(op1, dest);
</del><ins>+            ASSERT(isSSE2Present());
+            if (op1 == dest)
+                m_assembler.addsd_rr(op2, dest);
+            else {
+                moveDouble(op2, dest);
+                m_assembler.addsd_rr(op1, dest);
+            }
</ins><span class="cx">         }
</span><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void addDouble(Address src, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        m_assembler.addsd_mr(src.offset, src.base, dest);
</del><ins>+        addDouble(src, dest, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void addDouble(Address op1, FPRegisterID op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op2 == dest) {
-            addDouble(op1, dest);
-            return;
</del><ins>+        if (supportsAVX())
+            m_assembler.vaddsd_mr(op1.offset, op1.base, op2, dest);
+        else {
+            ASSERT(isSSE2Present());
+            if (op2 == dest) {
+                m_assembler.addsd_mr(op1.offset, op1.base, dest);
+                return;
+            }
+
+            loadDouble(op1, dest);
+            addDouble(op2, dest);
</ins><span class="cx">         }
</span><del>-
-        loadDouble(op1, dest);
-        addDouble(op2, dest);
</del><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void addDouble(FPRegisterID op1, Address op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op1 == dest) {
-            addDouble(op2, dest);
-            return;
-        }
-
-        loadDouble(op2, dest);
-        addDouble(op1, dest);
</del><ins>+        addDouble(op2, op1, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void addFloat(FPRegisterID src, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        m_assembler.addss_rr(src, dest);
</del><ins>+        addFloat(src, dest, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void addFloat(Address src, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        m_assembler.addss_mr(src.offset, src.base, dest);
</del><ins>+        addFloat(src, dest, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void addFloat(FPRegisterID op1, FPRegisterID op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op1 == dest)
-            addFloat(op2, dest);
</del><ins>+        if (supportsAVX())
+            m_assembler.vaddss_rr(op1, op2, dest);
</ins><span class="cx">         else {
</span><del>-            moveDouble(op2, dest);
-            addFloat(op1, dest);
</del><ins>+            ASSERT(isSSE2Present());
+            if (op1 == dest)
+                m_assembler.addss_rr(op2, dest);
+            else {
+                moveDouble(op2, dest);
+                m_assembler.addss_rr(op1, dest);
+            }
</ins><span class="cx">         }
</span><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void addFloat(Address op1, FPRegisterID op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op2 == dest) {
-            addFloat(op1, dest);
-            return;
</del><ins>+        if (supportsAVX())
+            m_assembler.vaddss_mr(op1.offset, op1.base, op2, dest);
+        else {
+            ASSERT(isSSE2Present());
+            if (op2 == dest) {
+                m_assembler.addss_mr(op1.offset, op1.base, dest);
+                return;
+            }
+
+            loadFloat(op1, dest);
+            addFloat(op2, dest);
</ins><span class="cx">         }
</span><del>-
-        loadFloat(op1, dest);
-        addFloat(op2, dest);
</del><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void addFloat(FPRegisterID op1, Address op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op1 == dest) {
-            addFloat(op2, dest);
-            return;
-        }
-
-        loadFloat(op2, dest);
-        addFloat(op1, dest);
</del><ins>+        addFloat(op1, op2, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void divDouble(FPRegisterID src, FPRegisterID dest)
</span><span class="lines">@@ -1226,92 +1248,92 @@
</span><span class="cx"> 
</span><span class="cx">     void mulDouble(FPRegisterID src, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        m_assembler.mulsd_rr(src, dest);
</del><ins>+        mulDouble(src, dest, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void mulDouble(FPRegisterID op1, FPRegisterID op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op1 == dest)
-            mulDouble(op2, dest);
</del><ins>+        if (supportsAVX())
+            m_assembler.vmulsd_rr(op1, op2, dest);
</ins><span class="cx">         else {
</span><del>-            moveDouble(op2, dest);
-            mulDouble(op1, dest);
</del><ins>+            ASSERT(isSSE2Present());
+            if (op1 == dest)
+                m_assembler.mulsd_rr(op2, dest);
+            else {
+                moveDouble(op2, dest);
+                m_assembler.mulsd_rr(op1, dest);
+            }
</ins><span class="cx">         }
</span><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void mulDouble(Address src, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        m_assembler.mulsd_mr(src.offset, src.base, dest);
</del><ins>+        mulDouble(src, dest, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void mulDouble(Address op1, FPRegisterID op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op2 == dest) {
-            mulDouble(op1, dest);
-            return;
</del><ins>+        if (supportsAVX())
+            m_assembler.vmulsd_mr(op1.offset, op1.base, op2, dest);
+        else {
+            ASSERT(isSSE2Present());
+            if (op2 == dest) {
+                m_assembler.mulsd_mr(op1.offset, op1.base, dest);
+                return;
+            }
+            loadDouble(op1, dest);
+            mulDouble(op2, dest);
</ins><span class="cx">         }
</span><del>-        loadDouble(op1, dest);
-        mulDouble(op2, dest);
</del><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void mulDouble(FPRegisterID op1, Address op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op1 == dest) {
-            mulDouble(op2, dest);
-            return;
-        }
-        loadDouble(op2, dest);
-        mulDouble(op1, dest);
</del><ins>+        return mulDouble(op2, op1, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void mulFloat(FPRegisterID src, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        m_assembler.mulss_rr(src, dest);
</del><ins>+        mulFloat(src, dest, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void mulFloat(Address src, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        m_assembler.mulss_mr(src.offset, src.base, dest);
</del><ins>+        mulFloat(src, dest, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void mulFloat(FPRegisterID op1, FPRegisterID op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op1 == dest)
-            mulFloat(op2, dest);
</del><ins>+        if (supportsAVX())
+            m_assembler.vmulss_rr(op1, op2, dest);
</ins><span class="cx">         else {
</span><del>-            moveDouble(op2, dest);
-            mulFloat(op1, dest);
</del><ins>+            ASSERT(isSSE2Present());
+            if (op1 == dest)
+                m_assembler.mulss_rr(op2, dest);
+            else {
+                moveDouble(op2, dest);
+                m_assembler.mulss_rr(op1, dest);
+            }
</ins><span class="cx">         }
</span><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void mulFloat(Address op1, FPRegisterID op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op2 == dest) {
-            mulFloat(op1, dest);
-            return;
</del><ins>+        if (supportsAVX())
+            m_assembler.vmulss_mr(op1.offset, op1.base, op2, dest);
+        else {
+            ASSERT(isSSE2Present());
+            if (op2 == dest) {
+                m_assembler.mulss_mr(op1.offset, op1.base, dest);
+                return;
+            }
+            loadFloat(op1, dest);
+            mulFloat(op2, dest);
</ins><span class="cx">         }
</span><del>-        loadFloat(op1, dest);
-        mulFloat(op2, dest);
</del><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void mulFloat(FPRegisterID op1, Address op2, FPRegisterID dest)
</span><span class="cx">     {
</span><del>-        ASSERT(isSSE2Present());
-        if (op1 == dest) {
-            mulFloat(op2, dest);
-            return;
-        }
-        loadFloat(op2, dest);
-        mulFloat(op1, dest);
</del><ins>+        mulFloat(op2, op1, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void andDouble(FPRegisterID src, FPRegisterID dst)
</span><span class="lines">@@ -2143,16 +2165,21 @@
</span><span class="cx">         return branchAdd32(cond, src1, dest);
</span><span class="cx">     }
</span><span class="cx"> 
</span><del>-    Jump branchAdd32(ResultCondition cond, Address src1, RegisterID src2, RegisterID dest)
</del><ins>+    Jump branchAdd32(ResultCondition cond, Address op1, RegisterID op2, RegisterID dest)
</ins><span class="cx">     {
</span><del>-        move32IfNeeded(src2, dest);
-        return branchAdd32(cond, src1, dest);
</del><ins>+        if (op2 == dest)
+            return branchAdd32(cond, op1, dest);
+        if (op1.base == dest) {
+            load32(op1, dest);
+            return branchAdd32(cond, op2, dest);
+        }
+        zeroExtend32ToPtr(op2, dest);
+        return branchAdd32(cond, op1, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     Jump branchAdd32(ResultCondition cond, RegisterID src1, Address src2, RegisterID dest)
</span><span class="cx">     {
</span><del>-        move32IfNeeded(src1, dest);
-        return branchAdd32(cond, src2, dest);
</del><ins>+        return branchAdd32(cond, src2, src1, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     Jump branchAdd32(ResultCondition cond, RegisterID src, TrustedImm32 imm, RegisterID dest)
</span><span class="lines">@@ -2452,38 +2479,50 @@
</span><span class="cx"> 
</span><span class="cx">     static bool supportsFloatingPointRounding()
</span><span class="cx">     {
</span><del>-        if (s_sse4_1CheckState == CPUIDCheckState::NotChecked) {
-            int flags = 0;
</del><ins>+        if (s_sse4_1CheckState == CPUIDCheckState::NotChecked)
+            updateEax1EcxFlags();
+        return s_sse4_1CheckState == CPUIDCheckState::Set;
+    }
+
+    static bool supportsAVX()
+    {
+        if (s_avxCheckState == CPUIDCheckState::NotChecked)
+            updateEax1EcxFlags();
+        return s_avxCheckState == CPUIDCheckState::Set;
+    }
+
+    static void updateEax1EcxFlags()
+    {
+        int flags = 0;
</ins><span class="cx"> #if COMPILER(MSVC)
</span><del>-            int cpuInfo[4];
-            __cpuid(cpuInfo, 0x1);
-            flags = cpuInfo[2];
</del><ins>+        int cpuInfo[4];
+        __cpuid(cpuInfo, 0x1);
+        flags = cpuInfo[2];
</ins><span class="cx"> #elif COMPILER(GCC_OR_CLANG)
</span><span class="cx"> #if CPU(X86_64)
</span><del>-            asm (
-                &quot;movl $0x1, %%eax;&quot;
-                &quot;cpuid;&quot;
-                &quot;movl %%ecx, %0;&quot;
-                : &quot;=g&quot; (flags)
-                :
-                : &quot;%eax&quot;, &quot;%ebx&quot;, &quot;%ecx&quot;, &quot;%edx&quot;
-                );
</del><ins>+        asm (
+            &quot;movl $0x1, %%eax;&quot;
+            &quot;cpuid;&quot;
+            &quot;movl %%ecx, %0;&quot;
+            : &quot;=g&quot; (flags)
+            :
+            : &quot;%eax&quot;, &quot;%ebx&quot;, &quot;%ecx&quot;, &quot;%edx&quot;
+            );
</ins><span class="cx"> #else
</span><del>-            asm (
-                &quot;movl $0x1, %%eax;&quot;
-                &quot;pushl %%ebx;&quot;
-                &quot;cpuid;&quot;
-                &quot;popl %%ebx;&quot;
-                &quot;movl %%ecx, %0;&quot;
-                : &quot;=g&quot; (flags)
-                :
-                : &quot;%eax&quot;, &quot;%ecx&quot;, &quot;%edx&quot;
-                );
</del><ins>+        asm (
+            &quot;movl $0x1, %%eax;&quot;
+            &quot;pushl %%ebx;&quot;
+            &quot;cpuid;&quot;
+            &quot;popl %%ebx;&quot;
+            &quot;movl %%ecx, %0;&quot;
+            : &quot;=g&quot; (flags)
+            :
+            : &quot;%eax&quot;, &quot;%ecx&quot;, &quot;%edx&quot;
+            );
</ins><span class="cx"> #endif
</span><span class="cx"> #endif // COMPILER(GCC_OR_CLANG)
</span><del>-            s_sse4_1CheckState = (flags &amp; (1 &lt;&lt; 19)) ? CPUIDCheckState::Set : CPUIDCheckState::Clear;
-        }
-        return s_sse4_1CheckState == CPUIDCheckState::Set;
</del><ins>+        s_sse4_1CheckState = (flags &amp; (1 &lt;&lt; 19)) ? CPUIDCheckState::Set : CPUIDCheckState::Clear;
+        s_avxCheckState = (flags &amp; (1 &lt;&lt; 28)) ? CPUIDCheckState::Set : CPUIDCheckState::Clear;
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx"> #if ENABLE(MASM_PROBE)
</span><span class="lines">@@ -2731,7 +2770,8 @@
</span><span class="cx">         Clear,
</span><span class="cx">         Set
</span><span class="cx">     };
</span><del>-    static CPUIDCheckState s_sse4_1CheckState;
</del><ins>+    JS_EXPORT_PRIVATE static CPUIDCheckState s_sse4_1CheckState;
+    JS_EXPORT_PRIVATE static CPUIDCheckState s_avxCheckState;
</ins><span class="cx">     static CPUIDCheckState s_lzcntCheckState;
</span><span class="cx"> };
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerMacroAssemblerX86_64h"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h (198872 => 198873)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h        2016-03-31 02:03:57 UTC (rev 198872)
+++ trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h        2016-03-31 02:05:13 UTC (rev 198873)
</span><span class="lines">@@ -922,16 +922,21 @@
</span><span class="cx">         return branchAdd64(cond, src1, dest);
</span><span class="cx">     }
</span><span class="cx"> 
</span><del>-    Jump branchAdd64(ResultCondition cond, Address src1, RegisterID src2, RegisterID dest)
</del><ins>+    Jump branchAdd64(ResultCondition cond, Address op1, RegisterID op2, RegisterID dest)
</ins><span class="cx">     {
</span><del>-        move(src2, dest);
-        return branchAdd64(cond, src1, dest);
</del><ins>+        if (op2 == dest)
+            return branchAdd64(cond, op1, dest);
+        if (op1.base == dest) {
+            load32(op1, dest);
+            return branchAdd64(cond, op2, dest);
+        }
+        move(op2, dest);
+        return branchAdd64(cond, op1, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     Jump branchAdd64(ResultCondition cond, RegisterID src1, Address src2, RegisterID dest)
</span><span class="cx">     {
</span><del>-        move(src1, dest);
-        return branchAdd64(cond, src2, dest);
</del><ins>+        return branchAdd64(cond, src2, src1, dest);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     Jump branchAdd64(ResultCondition cond, RegisterID src, RegisterID dest)
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerX86Assemblerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/X86Assembler.h (198872 => 198873)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/X86Assembler.h        2016-03-31 02:03:57 UTC (rev 198872)
+++ trunk/Source/JavaScriptCore/assembler/X86Assembler.h        2016-03-31 02:05:13 UTC (rev 198873)
</span><span class="lines">@@ -305,6 +305,17 @@
</span><span class="cx">         OP3_MFENCE           = 0xF0,
</span><span class="cx">     } ThreeByteOpcodeID;
</span><span class="cx"> 
</span><ins>+    struct VexPrefix {
+        enum : uint8_t {
+            TwoBytes = 0xC5,
+            ThreeBytes = 0xC4
+        };
+    };
+    enum class VexImpliedBytes : uint8_t {
+        TwoBytesOp = 1,
+        ThreeBytesOp38 = 2,
+        ThreeBytesOp3A = 3
+    };
</ins><span class="cx">     
</span><span class="cx">     TwoByteOpcodeID cmovcc(Condition cond)
</span><span class="cx">     {
</span><span class="lines">@@ -2087,24 +2098,44 @@
</span><span class="cx">         m_formatter.twoByteOp(OP2_ADDSD_VsdWsd, (RegisterID)dst, (RegisterID)src);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void vaddsd_rr(XMMRegisterID a, XMMRegisterID b, XMMRegisterID dst)
+    {
+        m_formatter.vexNdsLigWigCommutativeTwoByteOp(PRE_SSE_F2, OP2_ADDSD_VsdWsd, (RegisterID)dst, (RegisterID)a, (RegisterID)b);
+    }
+
</ins><span class="cx">     void addsd_mr(int offset, RegisterID base, XMMRegisterID dst)
</span><span class="cx">     {
</span><span class="cx">         m_formatter.prefix(PRE_SSE_F2);
</span><span class="cx">         m_formatter.twoByteOp(OP2_ADDSD_VsdWsd, (RegisterID)dst, base, offset);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void vaddsd_mr(int offset, RegisterID base, XMMRegisterID b, XMMRegisterID dst)
+    {
+        m_formatter.vexNdsLigWigTwoByteOp(PRE_SSE_F2, OP2_ADDSD_VsdWsd, (RegisterID)dst, (RegisterID)b, base, offset);
+    }
+
</ins><span class="cx">     void addss_rr(XMMRegisterID src, XMMRegisterID dst)
</span><span class="cx">     {
</span><span class="cx">         m_formatter.prefix(PRE_SSE_F3);
</span><span class="cx">         m_formatter.twoByteOp(OP2_ADDSD_VsdWsd, (RegisterID)dst, (RegisterID)src);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void vaddss_rr(XMMRegisterID a, XMMRegisterID b, XMMRegisterID dst)
+    {
+        m_formatter.vexNdsLigWigCommutativeTwoByteOp(PRE_SSE_F3, OP2_ADDSD_VsdWsd, (RegisterID)dst, (RegisterID)a, (RegisterID)b);
+    }
+
</ins><span class="cx">     void addss_mr(int offset, RegisterID base, XMMRegisterID dst)
</span><span class="cx">     {
</span><span class="cx">         m_formatter.prefix(PRE_SSE_F3);
</span><span class="cx">         m_formatter.twoByteOp(OP2_ADDSD_VsdWsd, (RegisterID)dst, base, offset);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void vaddss_mr(int offset, RegisterID base, XMMRegisterID b, XMMRegisterID dst)
+    {
+        m_formatter.vexNdsLigWigTwoByteOp(PRE_SSE_F3, OP2_ADDSD_VsdWsd, (RegisterID)dst, (RegisterID)b, base, offset);
+    }
+
</ins><span class="cx"> #if !CPU(X86_64)
</span><span class="cx">     void addsd_mr(const void* address, XMMRegisterID dst)
</span><span class="cx">     {
</span><span class="lines">@@ -2295,24 +2326,44 @@
</span><span class="cx">         m_formatter.twoByteOp(OP2_MULSD_VsdWsd, (RegisterID)dst, (RegisterID)src);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void vmulsd_rr(XMMRegisterID a, XMMRegisterID b, XMMRegisterID dst)
+    {
+        m_formatter.vexNdsLigWigCommutativeTwoByteOp(PRE_SSE_F2, OP2_MULSD_VsdWsd, (RegisterID)dst, (RegisterID)a, (RegisterID)b);
+    }
+
</ins><span class="cx">     void mulsd_mr(int offset, RegisterID base, XMMRegisterID dst)
</span><span class="cx">     {
</span><span class="cx">         m_formatter.prefix(PRE_SSE_F2);
</span><span class="cx">         m_formatter.twoByteOp(OP2_MULSD_VsdWsd, (RegisterID)dst, base, offset);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void vmulsd_mr(int offset, RegisterID base, XMMRegisterID b, XMMRegisterID dst)
+    {
+        m_formatter.vexNdsLigWigTwoByteOp(PRE_SSE_F2, OP2_MULSD_VsdWsd, (RegisterID)dst, (RegisterID)b, base, offset);
+    }
+
</ins><span class="cx">     void mulss_rr(XMMRegisterID src, XMMRegisterID dst)
</span><span class="cx">     {
</span><span class="cx">         m_formatter.prefix(PRE_SSE_F3);
</span><span class="cx">         m_formatter.twoByteOp(OP2_MULSD_VsdWsd, (RegisterID)dst, (RegisterID)src);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void vmulss_rr(XMMRegisterID a, XMMRegisterID b, XMMRegisterID dst)
+    {
+        m_formatter.vexNdsLigWigCommutativeTwoByteOp(PRE_SSE_F3, OP2_MULSD_VsdWsd, (RegisterID)dst, (RegisterID)a, (RegisterID)b);
+    }
+
</ins><span class="cx">     void mulss_mr(int offset, RegisterID base, XMMRegisterID dst)
</span><span class="cx">     {
</span><span class="cx">         m_formatter.prefix(PRE_SSE_F3);
</span><span class="cx">         m_formatter.twoByteOp(OP2_MULSD_VsdWsd, (RegisterID)dst, base, offset);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void vmulss_mr(int offset, RegisterID base, XMMRegisterID b, XMMRegisterID dst)
+    {
+        m_formatter.vexNdsLigWigTwoByteOp(PRE_SSE_F3, OP2_MULSD_VsdWsd, (RegisterID)dst, (RegisterID)b, base, offset);
+    }
+
</ins><span class="cx">     void pextrw_irr(int whichWord, XMMRegisterID src, RegisterID dst)
</span><span class="cx">     {
</span><span class="cx">         m_formatter.prefix(PRE_SSE_66);
</span><span class="lines">@@ -3068,6 +3119,46 @@
</span><span class="cx">                 putIntUnchecked(reinterpret_cast&lt;int32_t&gt;(address));
</span><span class="cx">             }
</span><span class="cx"> #endif
</span><ins>+            ALWAYS_INLINE void twoBytesVex(OneByteOpcodeID simdPrefix, RegisterID inOpReg, RegisterID r)
+            {
+                putByteUnchecked(VexPrefix::TwoBytes);
+
+                uint8_t secondByte = vexEncodeSimdPrefix(simdPrefix);
+                secondByte |= (~inOpReg &amp; 0xf) &lt;&lt; 3;
+                secondByte |= !regRequiresRex(r) &lt;&lt; 7;
+                putByteUnchecked(secondByte);
+            }
+
+            ALWAYS_INLINE void threeBytesVexNds(OneByteOpcodeID simdPrefix, VexImpliedBytes impliedBytes, RegisterID r, RegisterID inOpReg, RegisterID b)
+            {
+                putByteUnchecked(VexPrefix::ThreeBytes);
+
+                uint8_t secondByte = static_cast&lt;uint8_t&gt;(impliedBytes);
+                secondByte |= !regRequiresRex(r) &lt;&lt; 7;
+                secondByte |= 1 &lt;&lt; 6; // REX.X
+                secondByte |= !regRequiresRex(b) &lt;&lt; 5;
+                putByteUnchecked(secondByte);
+
+                uint8_t thirdByte = vexEncodeSimdPrefix(simdPrefix);
+                thirdByte |= (~inOpReg &amp; 0xf) &lt;&lt; 3;
+                putByteUnchecked(thirdByte);
+            }
+        private:
+            uint8_t vexEncodeSimdPrefix(OneByteOpcodeID simdPrefix)
+            {
+                switch (simdPrefix) {
+                case 0x66:
+                    return 1;
+                case 0xF3:
+                    return 2;
+                case 0xF2:
+                    return 3;
+                default:
+                    RELEASE_ASSERT_NOT_REACHED();
+                }
+                return 0;
+            }
+
</ins><span class="cx">         };
</span><span class="cx"> 
</span><span class="cx">         // Word-sized operands / no operand instruction formatters.
</span><span class="lines">@@ -3189,7 +3280,33 @@
</span><span class="cx">             writer.memoryModRM(reg, address);
</span><span class="cx">         }
</span><span class="cx"> #endif
</span><ins>+        void vexNdsLigWigCommutativeTwoByteOp(OneByteOpcodeID simdPrefix, TwoByteOpcodeID opcode, RegisterID dest, RegisterID a, RegisterID b)
+        {
+            SingleInstructionBufferWriter writer(m_buffer);
</ins><span class="cx"> 
</span><ins>+            // Since this is a commutative operation, we can try switching the arguments.
+            if (regRequiresRex(b))
+                std::swap(a, b);
+
+            if (regRequiresRex(b))
+                writer.threeBytesVexNds(simdPrefix, VexImpliedBytes::TwoBytesOp, dest, a, b);
+            else
+                writer.twoBytesVex(simdPrefix, a, dest);
+            writer.putByteUnchecked(opcode);
+            writer.registerModRM(dest, b);
+        }
+
+        void vexNdsLigWigTwoByteOp(OneByteOpcodeID simdPrefix, TwoByteOpcodeID opcode, RegisterID dest, RegisterID a, RegisterID base, int offset)
+        {
+            SingleInstructionBufferWriter writer(m_buffer);
+            if (regRequiresRex(base))
+                writer.threeBytesVexNds(simdPrefix, VexImpliedBytes::TwoBytesOp, dest, a, base);
+            else
+                writer.twoBytesVex(simdPrefix, a, dest);
+            writer.putByteUnchecked(opcode);
+            writer.memoryModRM(dest, base, offset);
+        }
+
</ins><span class="cx">         void threeByteOp(TwoByteOpcodeID twoBytePrefix, ThreeByteOpcodeID opcode)
</span><span class="cx">         {
</span><span class="cx">             SingleInstructionBufferWriter writer(m_buffer);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3B3LowerToAircpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp (198872 => 198873)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp        2016-03-31 02:03:57 UTC (rev 198872)
+++ trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp        2016-03-31 02:05:13 UTC (rev 198873)
</span><span class="lines">@@ -715,8 +715,13 @@
</span><span class="cx">         // over three operand forms.
</span><span class="cx"> 
</span><span class="cx">         if (left != right) {
</span><ins>+            ArgPromise leftAddr = loadPromise(left);
+            if (isValidForm(opcode, leftAddr.kind(), Arg::Tmp, Arg::Tmp)) {
+                append(opcode, leftAddr.consume(*this), tmp(right), result);
+                return;
+            }
+
</ins><span class="cx">             if (commutativity == Commutative) {
</span><del>-                ArgPromise leftAddr = loadPromise(left);
</del><span class="cx">                 if (isValidForm(opcode, leftAddr.kind(), Arg::Tmp)) {
</span><span class="cx">                     append(relaxedMoveForType(m_value-&gt;type()), tmp(right), result);
</span><span class="cx">                     append(opcode, leftAddr.consume(*this), result);
</span><span class="lines">@@ -725,6 +730,10 @@
</span><span class="cx">             }
</span><span class="cx"> 
</span><span class="cx">             ArgPromise rightAddr = loadPromise(right);
</span><ins>+            if (isValidForm(opcode, Arg::Tmp, rightAddr.kind(), Arg::Tmp)) {
+                append(opcode, tmp(left), rightAddr.consume(*this), result);
+                return;
+            }
</ins><span class="cx">             if (isValidForm(opcode, rightAddr.kind(), Arg::Tmp)) {
</span><span class="cx">                 append(relaxedMoveForType(m_value-&gt;type()), tmp(left), result);
</span><span class="cx">                 append(opcode, rightAddr.consume(*this), result);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirInstInlinesh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirInstInlines.h (198872 => 198873)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirInstInlines.h        2016-03-31 02:03:57 UTC (rev 198872)
+++ trunk/Source/JavaScriptCore/b3/air/AirInstInlines.h        2016-03-31 02:05:13 UTC (rev 198873)
</span><span class="lines">@@ -180,17 +180,24 @@
</span><span class="cx">     case Or64:
</span><span class="cx">     case Xor32:
</span><span class="cx">     case Xor64:
</span><del>-    case AddDouble:
-    case AddFloat:
</del><span class="cx">     case AndFloat:
</span><span class="cx">     case AndDouble:
</span><del>-    case MulDouble:
-    case MulFloat:
</del><span class="cx">     case XorDouble:
</span><span class="cx">     case XorFloat:
</span><span class="cx">         if (args.size() == 3)
</span><span class="cx">             return 2;
</span><span class="cx">         break;
</span><ins>+    case AddDouble:
+    case AddFloat:
+    case MulDouble:
+    case MulFloat:
+#if CPU(X86) || CPU(X86_64)
+        if (MacroAssembler::supportsAVX())
+            return Nullopt;
+#endif
+        if (args.size() == 3)
+            return 2;
+        break;
</ins><span class="cx">     case BranchAdd32:
</span><span class="cx">     case BranchAdd64:
</span><span class="cx">         if (args.size() == 4)
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airtestaircpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/testair.cpp (198872 => 198873)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/testair.cpp        2016-03-31 02:03:57 UTC (rev 198872)
+++ trunk/Source/JavaScriptCore/b3/air/testair.cpp        2016-03-31 02:05:13 UTC (rev 198873)
</span><span class="lines">@@ -1633,6 +1633,151 @@
</span><span class="cx">     CHECK(things[3] == 3);
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+#if CPU(X86) || CPU(X86_64)
+void testX86VMULSD()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(MulDouble, nullptr, Tmp(FPRInfo::argumentFPR0), Tmp(FPRInfo::argumentFPR1), Tmp(FPRInfo::argumentFPR2));
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::argumentFPR2), Tmp(FPRInfo::returnValueFPR));
+    root-&gt;append(RetDouble, nullptr, Tmp(FPRInfo::returnValueFPR));
+
+    CHECK(compileAndRun&lt;double&gt;(proc, 2.4, 4.2, pureNaN()) == 2.4 * 4.2);
+}
+
+void testX86VMULSDDestRex()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(MulDouble, nullptr, Tmp(FPRInfo::argumentFPR0), Tmp(FPRInfo::argumentFPR1), Tmp(X86Registers::xmm15));
+    root-&gt;append(MoveDouble, nullptr, Tmp(X86Registers::xmm15), Tmp(FPRInfo::returnValueFPR));
+    root-&gt;append(RetDouble, nullptr, Tmp(FPRInfo::returnValueFPR));
+
+    CHECK(compileAndRun&lt;double&gt;(proc, 2.4, 4.2, pureNaN()) == 2.4 * 4.2);
+}
+
+void testX86VMULSDOp1DestRex()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::argumentFPR0), Tmp(X86Registers::xmm14));
+    root-&gt;append(MulDouble, nullptr, Tmp(X86Registers::xmm14), Tmp(FPRInfo::argumentFPR1), Tmp(X86Registers::xmm15));
+    root-&gt;append(MoveDouble, nullptr, Tmp(X86Registers::xmm15), Tmp(FPRInfo::returnValueFPR));
+    root-&gt;append(RetDouble, nullptr, Tmp(FPRInfo::returnValueFPR));
+
+    CHECK(compileAndRun&lt;double&gt;(proc, 2.4, 4.2, pureNaN()) == 2.4 * 4.2);
+}
+
+void testX86VMULSDOp2DestRex()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::argumentFPR1), Tmp(X86Registers::xmm14));
+    root-&gt;append(MulDouble, nullptr, Tmp(FPRInfo::argumentFPR0), Tmp(X86Registers::xmm14), Tmp(X86Registers::xmm15));
+    root-&gt;append(MoveDouble, nullptr, Tmp(X86Registers::xmm15), Tmp(FPRInfo::returnValueFPR));
+    root-&gt;append(RetDouble, nullptr, Tmp(FPRInfo::returnValueFPR));
+
+    CHECK(compileAndRun&lt;double&gt;(proc, 2.4, 4.2, pureNaN()) == 2.4 * 4.2);
+}
+
+void testX86VMULSDOpsDestRex()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::argumentFPR0), Tmp(X86Registers::xmm14));
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::argumentFPR1), Tmp(X86Registers::xmm13));
+    root-&gt;append(MulDouble, nullptr, Tmp(X86Registers::xmm14), Tmp(X86Registers::xmm13), Tmp(X86Registers::xmm15));
+    root-&gt;append(MoveDouble, nullptr, Tmp(X86Registers::xmm15), Tmp(FPRInfo::returnValueFPR));
+    root-&gt;append(RetDouble, nullptr, Tmp(FPRInfo::returnValueFPR));
+
+    CHECK(compileAndRun&lt;double&gt;(proc, 2.4, 4.2, pureNaN()) == 2.4 * 4.2);
+}
+
+void testX86VMULSDAddr()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(MulDouble, nullptr, Tmp(FPRInfo::argumentFPR0), Arg::addr(Tmp(GPRInfo::argumentGPR0), - 16), Tmp(FPRInfo::argumentFPR2));
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::argumentFPR2), Tmp(FPRInfo::returnValueFPR));
+    root-&gt;append(RetDouble, nullptr, Tmp(FPRInfo::returnValueFPR));
+
+    double secondArg = 4.2;
+    CHECK(compileAndRun&lt;double&gt;(proc, 2.4, &amp;secondArg + 2, pureNaN()) == 2.4 * 4.2);
+}
+
+void testX86VMULSDAddrOpRexAddr()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::argumentGPR0), Tmp(X86Registers::r13));
+    root-&gt;append(MulDouble, nullptr, Tmp(FPRInfo::argumentFPR0), Arg::addr(Tmp(X86Registers::r13), - 16), Tmp(FPRInfo::argumentFPR2));
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::argumentFPR2), Tmp(FPRInfo::returnValueFPR));
+    root-&gt;append(RetDouble, nullptr, Tmp(FPRInfo::returnValueFPR));
+
+    double secondArg = 4.2;
+    CHECK(compileAndRun&lt;double&gt;(proc, 2.4, &amp;secondArg + 2, pureNaN()) == 2.4 * 4.2);
+}
+
+void testX86VMULSDDestRexAddr()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(MulDouble, nullptr, Tmp(FPRInfo::argumentFPR0), Arg::addr(Tmp(GPRInfo::argumentGPR0), 16), Tmp(X86Registers::xmm15));
+    root-&gt;append(MoveDouble, nullptr, Tmp(X86Registers::xmm15), Tmp(FPRInfo::returnValueFPR));
+    root-&gt;append(RetDouble, nullptr, Tmp(FPRInfo::returnValueFPR));
+
+    double secondArg = 4.2;
+    CHECK(compileAndRun&lt;double&gt;(proc, 2.4, &amp;secondArg - 2, pureNaN()) == 2.4 * 4.2);
+}
+
+void testX86VMULSDRegOpDestRexAddr()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::argumentFPR0), Tmp(X86Registers::xmm14));
+    root-&gt;append(MulDouble, nullptr, Arg::addr(Tmp(GPRInfo::argumentGPR0)), Tmp(X86Registers::xmm14), Tmp(X86Registers::xmm15));
+    root-&gt;append(MoveDouble, nullptr, Tmp(X86Registers::xmm15), Tmp(FPRInfo::returnValueFPR));
+    root-&gt;append(RetDouble, nullptr, Tmp(FPRInfo::returnValueFPR));
+
+    double secondArg = 4.2;
+    CHECK(compileAndRun&lt;double&gt;(proc, 2.4, &amp;secondArg, pureNaN()) == 2.4 * 4.2);
+}
+
+void testX86VMULSDAddrOpDestRexAddr()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::argumentGPR0), Tmp(X86Registers::r13));
+    root-&gt;append(MulDouble, nullptr, Tmp(FPRInfo::argumentFPR0), Arg::addr(Tmp(X86Registers::r13), 8), Tmp(X86Registers::xmm15));
+    root-&gt;append(MoveDouble, nullptr, Tmp(X86Registers::xmm15), Tmp(FPRInfo::returnValueFPR));
+    root-&gt;append(RetDouble, nullptr, Tmp(FPRInfo::returnValueFPR));
+
+    double secondArg = 4.2;
+    CHECK(compileAndRun&lt;double&gt;(proc, 2.4, &amp;secondArg - 1, pureNaN()) == 2.4 * 4.2);
+}
+
+#endif
+
</ins><span class="cx"> #define RUN(test) do {                          \
</span><span class="cx">         if (!shouldRun(#test))                  \
</span><span class="cx">             break;                              \
</span><span class="lines">@@ -1693,6 +1838,20 @@
</span><span class="cx">     RUN(testShuffleSwapDouble());
</span><span class="cx">     RUN(testShuffleShiftDouble());
</span><span class="cx"> 
</span><ins>+#if CPU(X86) || CPU(X86_64)
+    RUN(testX86VMULSD());
+    RUN(testX86VMULSDDestRex());
+    RUN(testX86VMULSDOp1DestRex());
+    RUN(testX86VMULSDOp2DestRex());
+    RUN(testX86VMULSDOpsDestRex());
+
+    RUN(testX86VMULSDAddr());
+    RUN(testX86VMULSDAddrOpRexAddr());
+    RUN(testX86VMULSDDestRexAddr());
+    RUN(testX86VMULSDRegOpDestRexAddr());
+    RUN(testX86VMULSDAddrOpDestRexAddr());
+#endif
+
</ins><span class="cx">     if (tasks.isEmpty())
</span><span class="cx">         usage();
</span><span class="cx"> 
</span></span></pre>
</div>
</div>

</body>
</html>