<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[195084] trunk/Source/JavaScriptCore</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/195084">195084</a></dd>
<dt>Author</dt> <dd>fpizlo@apple.com</dd>
<dt>Date</dt> <dd>2016-01-14 16:58:22 -0800 (Thu, 14 Jan 2016)</dd>
</dl>

<h3>Log Message</h3>
<pre>Air needs a Shuffle instruction
https://bugs.webkit.org/show_bug.cgi?id=152952

Reviewed by Saam Barati.

This adds an instruction called Shuffle. Shuffle allows you to simultaneously perform
multiple moves to perform arbitrary permutations over registers and memory. We call these
rotations. It also allows you to perform &quot;shifts&quot;, like (a =&gt; b, b =&gt; c): after the shift,
c will have b's old value, b will have a's old value, and a will be unchanged. Shifts can
use immediates as their source.

Shuffle is added as a custom instruction, since it has a variable number of arguments. It
takes any number of triplets of arguments, where each triplet describes one mapping of the
shuffle. For example, to represent (a =&gt; b, b =&gt; c), we might say:

    Shuffle %a, %b, 64, %b, %c, 64

Note the &quot;64&quot;s, those are width arguments that describe how many bits of the register are
being moved. Each triplet is referred to as a &quot;shuffle pair&quot;. We call it a pair because the
most relevant part of it is the pair of registers or memroy locations (i.e. %a, %b form one
of the pairs in the example). For GP arguments, the width follows ZDef semantics.

In the future, we will be able to use Shuffle for a lot of things. This patch is modest about
how to use it:

- C calling convention argument marshalling. Previously we used move instructions. But that's
  problematic since it introduces artificial interference between the argument registers and
  the inputs. Using Shuffle removes that interference. This helps a bit.

- Cold C calls. This is what really motivated me to write this patch. If we have a C call on
  a cold path, then we want it to appear to the register allocator like it doesn't clobber
  any registers. Only after register allocation should we handle the clobbering by simply
  saving all of the live volatile registers to the stack. If you imagine the saving and the
  argument marshalling, you can see how before the call, we want to have a Shuffle that does
  both of those things. This is important. If argument marshalling was separate from the
  saving, then we'd still appear to clobber argument registers. Doing them together as one
  Shuffle means that the cold call doesn't appear to even clobber the argument registers.

Unfortunately, I was wrong about cold C calls being the dominant problem with our register
allocator right now. Fixing this revealed other problems in my current tuning benchmark,
Octane/encrypt. Nonetheless, this is a small speed-up across the board, and gives us some
functionality we will need to implement other optimizations.

* CMakeLists.txt:
* JavaScriptCore.xcodeproj/project.pbxproj:
* assembler/AbstractMacroAssembler.h:
(JSC::isX86_64):
(JSC::isIOS):
(JSC::optimizeForARMv7IDIVSupported):
* assembler/MacroAssemblerX86Common.h:
(JSC::MacroAssemblerX86Common::zeroExtend32ToPtr):
(JSC::MacroAssemblerX86Common::swap32):
(JSC::MacroAssemblerX86Common::moveConditionally32):
* assembler/MacroAssemblerX86_64.h:
(JSC::MacroAssemblerX86_64::store64WithAddressOffsetPatch):
(JSC::MacroAssemblerX86_64::swap64):
(JSC::MacroAssemblerX86_64::move64ToDouble):
* assembler/X86Assembler.h:
(JSC::X86Assembler::xchgl_rr):
(JSC::X86Assembler::xchgl_rm):
(JSC::X86Assembler::xchgq_rr):
(JSC::X86Assembler::xchgq_rm):
(JSC::X86Assembler::movl_rr):
* b3/B3CCallValue.h:
* b3/B3Compilation.cpp:
(JSC::B3::Compilation::Compilation):
(JSC::B3::Compilation::~Compilation):
* b3/B3Compilation.h:
(JSC::B3::Compilation::code):
* b3/B3LowerToAir.cpp:
(JSC::B3::Air::LowerToAir::run):
(JSC::B3::Air::LowerToAir::createSelect):
(JSC::B3::Air::LowerToAir::lower):
(JSC::B3::Air::LowerToAir::marshallCCallArgument): Deleted.
* b3/B3OpaqueByproducts.h:
(JSC::B3::OpaqueByproducts::count):
* b3/B3StackmapSpecial.cpp:
(JSC::B3::StackmapSpecial::isArgValidForValue):
(JSC::B3::StackmapSpecial::isArgValidForRep):
* b3/air/AirArg.cpp:
(JSC::B3::Air::Arg::isStackMemory):
(JSC::B3::Air::Arg::isRepresentableAs):
(JSC::B3::Air::Arg::usesTmp):
(JSC::B3::Air::Arg::canRepresent):
(JSC::B3::Air::Arg::isCompatibleType):
(JSC::B3::Air::Arg::dump):
(WTF::printInternal):
* b3/air/AirArg.h:
(JSC::B3::Air::Arg::forEachType):
(JSC::B3::Air::Arg::isWarmUse):
(JSC::B3::Air::Arg::cooled):
(JSC::B3::Air::Arg::isEarlyUse):
(JSC::B3::Air::Arg::imm64):
(JSC::B3::Air::Arg::immPtr):
(JSC::B3::Air::Arg::addr):
(JSC::B3::Air::Arg::special):
(JSC::B3::Air::Arg::widthArg):
(JSC::B3::Air::Arg::operator==):
(JSC::B3::Air::Arg::isImm64):
(JSC::B3::Air::Arg::isSomeImm):
(JSC::B3::Air::Arg::isAddr):
(JSC::B3::Air::Arg::isIndex):
(JSC::B3::Air::Arg::isMemory):
(JSC::B3::Air::Arg::isRelCond):
(JSC::B3::Air::Arg::isSpecial):
(JSC::B3::Air::Arg::isWidthArg):
(JSC::B3::Air::Arg::isAlive):
(JSC::B3::Air::Arg::base):
(JSC::B3::Air::Arg::hasOffset):
(JSC::B3::Air::Arg::offset):
(JSC::B3::Air::Arg::width):
(JSC::B3::Air::Arg::isGPTmp):
(JSC::B3::Air::Arg::isGP):
(JSC::B3::Air::Arg::isFP):
(JSC::B3::Air::Arg::isType):
(JSC::B3::Air::Arg::isGPR):
(JSC::B3::Air::Arg::isValidForm):
(JSC::B3::Air::Arg::forEachTmpFast):
* b3/air/AirBasicBlock.h:
(JSC::B3::Air::BasicBlock::insts):
(JSC::B3::Air::BasicBlock::appendInst):
(JSC::B3::Air::BasicBlock::append):
* b3/air/AirCCallingConvention.cpp: Added.
(JSC::B3::Air::computeCCallingConvention):
(JSC::B3::Air::cCallResult):
(JSC::B3::Air::buildCCall):
* b3/air/AirCCallingConvention.h: Added.
* b3/air/AirCode.h:
(JSC::B3::Air::Code::proc):
* b3/air/AirCustom.cpp: Added.
(JSC::B3::Air::CCallCustom::isValidForm):
(JSC::B3::Air::CCallCustom::generate):
(JSC::B3::Air::ShuffleCustom::isValidForm):
(JSC::B3::Air::ShuffleCustom::generate):
* b3/air/AirCustom.h:
(JSC::B3::Air::PatchCustom::forEachArg):
(JSC::B3::Air::PatchCustom::generate):
(JSC::B3::Air::CCallCustom::forEachArg):
(JSC::B3::Air::CCallCustom::isValidFormStatic):
(JSC::B3::Air::CCallCustom::admitsStack):
(JSC::B3::Air::CCallCustom::hasNonArgNonControlEffects):
(JSC::B3::Air::ColdCCallCustom::forEachArg):
(JSC::B3::Air::ShuffleCustom::forEachArg):
(JSC::B3::Air::ShuffleCustom::isValidFormStatic):
(JSC::B3::Air::ShuffleCustom::admitsStack):
(JSC::B3::Air::ShuffleCustom::hasNonArgNonControlEffects):
* b3/air/AirEmitShuffle.cpp: Added.
(JSC::B3::Air::ShufflePair::dump):
(JSC::B3::Air::emitShuffle):
* b3/air/AirEmitShuffle.h: Added.
(JSC::B3::Air::ShufflePair::ShufflePair):
(JSC::B3::Air::ShufflePair::src):
(JSC::B3::Air::ShufflePair::dst):
(JSC::B3::Air::ShufflePair::width):
* b3/air/AirGenerate.cpp:
(JSC::B3::Air::prepareForGeneration):
* b3/air/AirGenerate.h:
* b3/air/AirInsertionSet.cpp:
(JSC::B3::Air::InsertionSet::insertInsts):
(JSC::B3::Air::InsertionSet::execute):
* b3/air/AirInsertionSet.h:
(JSC::B3::Air::InsertionSet::insertInst):
(JSC::B3::Air::InsertionSet::insert):
* b3/air/AirInst.h:
(JSC::B3::Air::Inst::operator bool):
(JSC::B3::Air::Inst::append):
* b3/air/AirLowerAfterRegAlloc.cpp: Added.
(JSC::B3::Air::lowerAfterRegAlloc):
* b3/air/AirLowerAfterRegAlloc.h: Added.
* b3/air/AirLowerMacros.cpp: Added.
(JSC::B3::Air::lowerMacros):
* b3/air/AirLowerMacros.h: Added.
* b3/air/AirOpcode.opcodes:
* b3/air/AirRegisterPriority.h:
(JSC::B3::Air::regsInPriorityOrder):
* b3/air/testair.cpp: Added.
(hiddenTruthBecauseNoReturnIsStupid):
(usage):
(JSC::B3::Air::compile):
(JSC::B3::Air::invoke):
(JSC::B3::Air::compileAndRun):
(JSC::B3::Air::testSimple):
(JSC::B3::Air::loadConstantImpl):
(JSC::B3::Air::loadConstant):
(JSC::B3::Air::loadDoubleConstant):
(JSC::B3::Air::testShuffleSimpleSwap):
(JSC::B3::Air::testShuffleSimpleShift):
(JSC::B3::Air::testShuffleLongShift):
(JSC::B3::Air::testShuffleLongShiftBackwards):
(JSC::B3::Air::testShuffleSimpleRotate):
(JSC::B3::Air::testShuffleSimpleBroadcast):
(JSC::B3::Air::testShuffleBroadcastAllRegs):
(JSC::B3::Air::testShuffleTreeShift):
(JSC::B3::Air::testShuffleTreeShiftBackward):
(JSC::B3::Air::testShuffleTreeShiftOtherBackward):
(JSC::B3::Air::testShuffleMultipleShifts):
(JSC::B3::Air::testShuffleRotateWithFringe):
(JSC::B3::Air::testShuffleRotateWithLongFringe):
(JSC::B3::Air::testShuffleMultipleRotates):
(JSC::B3::Air::testShuffleShiftAndRotate):
(JSC::B3::Air::testShuffleShiftAllRegs):
(JSC::B3::Air::testShuffleRotateAllRegs):
(JSC::B3::Air::testShuffleSimpleSwap64):
(JSC::B3::Air::testShuffleSimpleShift64):
(JSC::B3::Air::testShuffleSwapMixedWidth):
(JSC::B3::Air::testShuffleShiftMixedWidth):
(JSC::B3::Air::testShuffleShiftMemory):
(JSC::B3::Air::testShuffleShiftMemoryLong):
(JSC::B3::Air::testShuffleShiftMemoryAllRegs):
(JSC::B3::Air::testShuffleShiftMemoryAllRegs64):
(JSC::B3::Air::combineHiLo):
(JSC::B3::Air::testShuffleShiftMemoryAllRegsMixedWidth):
(JSC::B3::Air::testShuffleRotateMemory):
(JSC::B3::Air::testShuffleRotateMemory64):
(JSC::B3::Air::testShuffleRotateMemoryMixedWidth):
(JSC::B3::Air::testShuffleRotateMemoryAllRegs64):
(JSC::B3::Air::testShuffleRotateMemoryAllRegsMixedWidth):
(JSC::B3::Air::testShuffleSwapDouble):
(JSC::B3::Air::testShuffleShiftDouble):
(JSC::B3::Air::run):
(run):
(main):
* b3/testb3.cpp:
(JSC::B3::testCallSimple):
(JSC::B3::testCallRare):
(JSC::B3::testCallRareLive):
(JSC::B3::testCallSimplePure):
(JSC::B3::run):</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceJavaScriptCoreCMakeListstxt">trunk/Source/JavaScriptCore/CMakeLists.txt</a></li>
<li><a href="#trunkSourceJavaScriptCoreChangeLog">trunk/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreJavaScriptCorexcodeprojprojectpbxproj">trunk/Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerAbstractMacroAssemblerh">trunk/Source/JavaScriptCore/assembler/AbstractMacroAssembler.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerMacroAssemblerX86Commonh">trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerMacroAssemblerX86_64h">trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerX86Assemblerh">trunk/Source/JavaScriptCore/assembler/X86Assembler.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3B3CCallValueh">trunk/Source/JavaScriptCore/b3/B3CCallValue.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3B3Compilationcpp">trunk/Source/JavaScriptCore/b3/B3Compilation.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3B3Compilationh">trunk/Source/JavaScriptCore/b3/B3Compilation.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3B3LowerToAircpp">trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3B3OpaqueByproductsh">trunk/Source/JavaScriptCore/b3/B3OpaqueByproducts.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3B3StackmapSpecialcpp">trunk/Source/JavaScriptCore/b3/B3StackmapSpecial.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirArgcpp">trunk/Source/JavaScriptCore/b3/air/AirArg.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirArgh">trunk/Source/JavaScriptCore/b3/air/AirArg.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirBasicBlockh">trunk/Source/JavaScriptCore/b3/air/AirBasicBlock.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirCodeh">trunk/Source/JavaScriptCore/b3/air/AirCode.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirCustomh">trunk/Source/JavaScriptCore/b3/air/AirCustom.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirGeneratecpp">trunk/Source/JavaScriptCore/b3/air/AirGenerate.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirGenerateh">trunk/Source/JavaScriptCore/b3/air/AirGenerate.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirInsertionSetcpp">trunk/Source/JavaScriptCore/b3/air/AirInsertionSet.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirInsertionSeth">trunk/Source/JavaScriptCore/b3/air/AirInsertionSet.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirInsth">trunk/Source/JavaScriptCore/b3/air/AirInst.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirOpcodeopcodes">trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirRegisterPriorityh">trunk/Source/JavaScriptCore/b3/air/AirRegisterPriority.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3testb3cpp">trunk/Source/JavaScriptCore/b3/testb3.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreftlFTLLowerDFGToLLVMcpp">trunk/Source/JavaScriptCore/ftl/FTLLowerDFGToLLVM.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreftlFTLOSRExitcpp">trunk/Source/JavaScriptCore/ftl/FTLOSRExit.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreftlFTLOSRExitHandlecpp">trunk/Source/JavaScriptCore/ftl/FTLOSRExitHandle.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreftlFTLOSRExitHandleh">trunk/Source/JavaScriptCore/ftl/FTLOSRExitHandle.h</a></li>
</ul>

<h3>Added Paths</h3>
<ul>
<li><a href="#trunkSourceJavaScriptCoreb3airAirCCallingConventioncpp">trunk/Source/JavaScriptCore/b3/air/AirCCallingConvention.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirCCallingConventionh">trunk/Source/JavaScriptCore/b3/air/AirCCallingConvention.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirCustomcpp">trunk/Source/JavaScriptCore/b3/air/AirCustom.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirEmitShufflecpp">trunk/Source/JavaScriptCore/b3/air/AirEmitShuffle.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirEmitShuffleh">trunk/Source/JavaScriptCore/b3/air/AirEmitShuffle.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirLowerAfterRegAlloccpp">trunk/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirLowerAfterRegAlloch">trunk/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirLowerMacroscpp">trunk/Source/JavaScriptCore/b3/air/AirLowerMacros.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirLowerMacrosh">trunk/Source/JavaScriptCore/b3/air/AirLowerMacros.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airtestaircpp">trunk/Source/JavaScriptCore/b3/air/testair.cpp</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceJavaScriptCoreCMakeListstxt"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/CMakeLists.txt (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/CMakeLists.txt        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/CMakeLists.txt        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -73,8 +73,11 @@
</span><span class="cx">     b3/air/AirArg.cpp
</span><span class="cx">     b3/air/AirBasicBlock.cpp
</span><span class="cx">     b3/air/AirCCallSpecial.cpp
</span><ins>+    b3/air/AirCCallingConvention.cpp
</ins><span class="cx">     b3/air/AirCode.cpp
</span><ins>+    b3/air/AirCustom.cpp
</ins><span class="cx">     b3/air/AirEliminateDeadCode.cpp
</span><ins>+    b3/air/AirEmitShuffle.cpp
</ins><span class="cx">     b3/air/AirFixPartialRegisterStalls.cpp
</span><span class="cx">     b3/air/AirGenerate.cpp
</span><span class="cx">     b3/air/AirGenerated.cpp
</span><span class="lines">@@ -82,6 +85,8 @@
</span><span class="cx">     b3/air/AirInsertionSet.cpp
</span><span class="cx">     b3/air/AirInst.cpp
</span><span class="cx">     b3/air/AirIteratedRegisterCoalescing.cpp
</span><ins>+    b3/air/AirLowerAfterRegAlloc.cpp
+    b3/air/AirLowerMacros.cpp
</ins><span class="cx">     b3/air/AirOptimizeBlockOrder.cpp
</span><span class="cx">     b3/air/AirPhaseScope.cpp
</span><span class="cx">     b3/air/AirRegisterPriority.cpp
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ChangeLog (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ChangeLog        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/ChangeLog        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,3 +1,234 @@
</span><ins>+2016-01-14  Filip Pizlo  &lt;fpizlo@apple.com&gt;
+
+        Air needs a Shuffle instruction
+        https://bugs.webkit.org/show_bug.cgi?id=152952
+
+        Reviewed by Saam Barati.
+
+        This adds an instruction called Shuffle. Shuffle allows you to simultaneously perform
+        multiple moves to perform arbitrary permutations over registers and memory. We call these
+        rotations. It also allows you to perform &quot;shifts&quot;, like (a =&gt; b, b =&gt; c): after the shift,
+        c will have b's old value, b will have a's old value, and a will be unchanged. Shifts can
+        use immediates as their source.
+
+        Shuffle is added as a custom instruction, since it has a variable number of arguments. It
+        takes any number of triplets of arguments, where each triplet describes one mapping of the
+        shuffle. For example, to represent (a =&gt; b, b =&gt; c), we might say:
+
+            Shuffle %a, %b, 64, %b, %c, 64
+
+        Note the &quot;64&quot;s, those are width arguments that describe how many bits of the register are
+        being moved. Each triplet is referred to as a &quot;shuffle pair&quot;. We call it a pair because the
+        most relevant part of it is the pair of registers or memroy locations (i.e. %a, %b form one
+        of the pairs in the example). For GP arguments, the width follows ZDef semantics.
+
+        In the future, we will be able to use Shuffle for a lot of things. This patch is modest about
+        how to use it:
+
+        - C calling convention argument marshalling. Previously we used move instructions. But that's
+          problematic since it introduces artificial interference between the argument registers and
+          the inputs. Using Shuffle removes that interference. This helps a bit.
+
+        - Cold C calls. This is what really motivated me to write this patch. If we have a C call on
+          a cold path, then we want it to appear to the register allocator like it doesn't clobber
+          any registers. Only after register allocation should we handle the clobbering by simply
+          saving all of the live volatile registers to the stack. If you imagine the saving and the
+          argument marshalling, you can see how before the call, we want to have a Shuffle that does
+          both of those things. This is important. If argument marshalling was separate from the
+          saving, then we'd still appear to clobber argument registers. Doing them together as one
+          Shuffle means that the cold call doesn't appear to even clobber the argument registers.
+
+        Unfortunately, I was wrong about cold C calls being the dominant problem with our register
+        allocator right now. Fixing this revealed other problems in my current tuning benchmark,
+        Octane/encrypt. Nonetheless, this is a small speed-up across the board, and gives us some
+        functionality we will need to implement other optimizations.
+
+        * CMakeLists.txt:
+        * JavaScriptCore.xcodeproj/project.pbxproj:
+        * assembler/AbstractMacroAssembler.h:
+        (JSC::isX86_64):
+        (JSC::isIOS):
+        (JSC::optimizeForARMv7IDIVSupported):
+        * assembler/MacroAssemblerX86Common.h:
+        (JSC::MacroAssemblerX86Common::zeroExtend32ToPtr):
+        (JSC::MacroAssemblerX86Common::swap32):
+        (JSC::MacroAssemblerX86Common::moveConditionally32):
+        * assembler/MacroAssemblerX86_64.h:
+        (JSC::MacroAssemblerX86_64::store64WithAddressOffsetPatch):
+        (JSC::MacroAssemblerX86_64::swap64):
+        (JSC::MacroAssemblerX86_64::move64ToDouble):
+        * assembler/X86Assembler.h:
+        (JSC::X86Assembler::xchgl_rr):
+        (JSC::X86Assembler::xchgl_rm):
+        (JSC::X86Assembler::xchgq_rr):
+        (JSC::X86Assembler::xchgq_rm):
+        (JSC::X86Assembler::movl_rr):
+        * b3/B3CCallValue.h:
+        * b3/B3Compilation.cpp:
+        (JSC::B3::Compilation::Compilation):
+        (JSC::B3::Compilation::~Compilation):
+        * b3/B3Compilation.h:
+        (JSC::B3::Compilation::code):
+        * b3/B3LowerToAir.cpp:
+        (JSC::B3::Air::LowerToAir::run):
+        (JSC::B3::Air::LowerToAir::createSelect):
+        (JSC::B3::Air::LowerToAir::lower):
+        (JSC::B3::Air::LowerToAir::marshallCCallArgument): Deleted.
+        * b3/B3OpaqueByproducts.h:
+        (JSC::B3::OpaqueByproducts::count):
+        * b3/B3StackmapSpecial.cpp:
+        (JSC::B3::StackmapSpecial::isArgValidForValue):
+        (JSC::B3::StackmapSpecial::isArgValidForRep):
+        * b3/air/AirArg.cpp:
+        (JSC::B3::Air::Arg::isStackMemory):
+        (JSC::B3::Air::Arg::isRepresentableAs):
+        (JSC::B3::Air::Arg::usesTmp):
+        (JSC::B3::Air::Arg::canRepresent):
+        (JSC::B3::Air::Arg::isCompatibleType):
+        (JSC::B3::Air::Arg::dump):
+        (WTF::printInternal):
+        * b3/air/AirArg.h:
+        (JSC::B3::Air::Arg::forEachType):
+        (JSC::B3::Air::Arg::isWarmUse):
+        (JSC::B3::Air::Arg::cooled):
+        (JSC::B3::Air::Arg::isEarlyUse):
+        (JSC::B3::Air::Arg::imm64):
+        (JSC::B3::Air::Arg::immPtr):
+        (JSC::B3::Air::Arg::addr):
+        (JSC::B3::Air::Arg::special):
+        (JSC::B3::Air::Arg::widthArg):
+        (JSC::B3::Air::Arg::operator==):
+        (JSC::B3::Air::Arg::isImm64):
+        (JSC::B3::Air::Arg::isSomeImm):
+        (JSC::B3::Air::Arg::isAddr):
+        (JSC::B3::Air::Arg::isIndex):
+        (JSC::B3::Air::Arg::isMemory):
+        (JSC::B3::Air::Arg::isRelCond):
+        (JSC::B3::Air::Arg::isSpecial):
+        (JSC::B3::Air::Arg::isWidthArg):
+        (JSC::B3::Air::Arg::isAlive):
+        (JSC::B3::Air::Arg::base):
+        (JSC::B3::Air::Arg::hasOffset):
+        (JSC::B3::Air::Arg::offset):
+        (JSC::B3::Air::Arg::width):
+        (JSC::B3::Air::Arg::isGPTmp):
+        (JSC::B3::Air::Arg::isGP):
+        (JSC::B3::Air::Arg::isFP):
+        (JSC::B3::Air::Arg::isType):
+        (JSC::B3::Air::Arg::isGPR):
+        (JSC::B3::Air::Arg::isValidForm):
+        (JSC::B3::Air::Arg::forEachTmpFast):
+        * b3/air/AirBasicBlock.h:
+        (JSC::B3::Air::BasicBlock::insts):
+        (JSC::B3::Air::BasicBlock::appendInst):
+        (JSC::B3::Air::BasicBlock::append):
+        * b3/air/AirCCallingConvention.cpp: Added.
+        (JSC::B3::Air::computeCCallingConvention):
+        (JSC::B3::Air::cCallResult):
+        (JSC::B3::Air::buildCCall):
+        * b3/air/AirCCallingConvention.h: Added.
+        * b3/air/AirCode.h:
+        (JSC::B3::Air::Code::proc):
+        * b3/air/AirCustom.cpp: Added.
+        (JSC::B3::Air::CCallCustom::isValidForm):
+        (JSC::B3::Air::CCallCustom::generate):
+        (JSC::B3::Air::ShuffleCustom::isValidForm):
+        (JSC::B3::Air::ShuffleCustom::generate):
+        * b3/air/AirCustom.h:
+        (JSC::B3::Air::PatchCustom::forEachArg):
+        (JSC::B3::Air::PatchCustom::generate):
+        (JSC::B3::Air::CCallCustom::forEachArg):
+        (JSC::B3::Air::CCallCustom::isValidFormStatic):
+        (JSC::B3::Air::CCallCustom::admitsStack):
+        (JSC::B3::Air::CCallCustom::hasNonArgNonControlEffects):
+        (JSC::B3::Air::ColdCCallCustom::forEachArg):
+        (JSC::B3::Air::ShuffleCustom::forEachArg):
+        (JSC::B3::Air::ShuffleCustom::isValidFormStatic):
+        (JSC::B3::Air::ShuffleCustom::admitsStack):
+        (JSC::B3::Air::ShuffleCustom::hasNonArgNonControlEffects):
+        * b3/air/AirEmitShuffle.cpp: Added.
+        (JSC::B3::Air::ShufflePair::dump):
+        (JSC::B3::Air::emitShuffle):
+        * b3/air/AirEmitShuffle.h: Added.
+        (JSC::B3::Air::ShufflePair::ShufflePair):
+        (JSC::B3::Air::ShufflePair::src):
+        (JSC::B3::Air::ShufflePair::dst):
+        (JSC::B3::Air::ShufflePair::width):
+        * b3/air/AirGenerate.cpp:
+        (JSC::B3::Air::prepareForGeneration):
+        * b3/air/AirGenerate.h:
+        * b3/air/AirInsertionSet.cpp:
+        (JSC::B3::Air::InsertionSet::insertInsts):
+        (JSC::B3::Air::InsertionSet::execute):
+        * b3/air/AirInsertionSet.h:
+        (JSC::B3::Air::InsertionSet::insertInst):
+        (JSC::B3::Air::InsertionSet::insert):
+        * b3/air/AirInst.h:
+        (JSC::B3::Air::Inst::operator bool):
+        (JSC::B3::Air::Inst::append):
+        * b3/air/AirLowerAfterRegAlloc.cpp: Added.
+        (JSC::B3::Air::lowerAfterRegAlloc):
+        * b3/air/AirLowerAfterRegAlloc.h: Added.
+        * b3/air/AirLowerMacros.cpp: Added.
+        (JSC::B3::Air::lowerMacros):
+        * b3/air/AirLowerMacros.h: Added.
+        * b3/air/AirOpcode.opcodes:
+        * b3/air/AirRegisterPriority.h:
+        (JSC::B3::Air::regsInPriorityOrder):
+        * b3/air/testair.cpp: Added.
+        (hiddenTruthBecauseNoReturnIsStupid):
+        (usage):
+        (JSC::B3::Air::compile):
+        (JSC::B3::Air::invoke):
+        (JSC::B3::Air::compileAndRun):
+        (JSC::B3::Air::testSimple):
+        (JSC::B3::Air::loadConstantImpl):
+        (JSC::B3::Air::loadConstant):
+        (JSC::B3::Air::loadDoubleConstant):
+        (JSC::B3::Air::testShuffleSimpleSwap):
+        (JSC::B3::Air::testShuffleSimpleShift):
+        (JSC::B3::Air::testShuffleLongShift):
+        (JSC::B3::Air::testShuffleLongShiftBackwards):
+        (JSC::B3::Air::testShuffleSimpleRotate):
+        (JSC::B3::Air::testShuffleSimpleBroadcast):
+        (JSC::B3::Air::testShuffleBroadcastAllRegs):
+        (JSC::B3::Air::testShuffleTreeShift):
+        (JSC::B3::Air::testShuffleTreeShiftBackward):
+        (JSC::B3::Air::testShuffleTreeShiftOtherBackward):
+        (JSC::B3::Air::testShuffleMultipleShifts):
+        (JSC::B3::Air::testShuffleRotateWithFringe):
+        (JSC::B3::Air::testShuffleRotateWithLongFringe):
+        (JSC::B3::Air::testShuffleMultipleRotates):
+        (JSC::B3::Air::testShuffleShiftAndRotate):
+        (JSC::B3::Air::testShuffleShiftAllRegs):
+        (JSC::B3::Air::testShuffleRotateAllRegs):
+        (JSC::B3::Air::testShuffleSimpleSwap64):
+        (JSC::B3::Air::testShuffleSimpleShift64):
+        (JSC::B3::Air::testShuffleSwapMixedWidth):
+        (JSC::B3::Air::testShuffleShiftMixedWidth):
+        (JSC::B3::Air::testShuffleShiftMemory):
+        (JSC::B3::Air::testShuffleShiftMemoryLong):
+        (JSC::B3::Air::testShuffleShiftMemoryAllRegs):
+        (JSC::B3::Air::testShuffleShiftMemoryAllRegs64):
+        (JSC::B3::Air::combineHiLo):
+        (JSC::B3::Air::testShuffleShiftMemoryAllRegsMixedWidth):
+        (JSC::B3::Air::testShuffleRotateMemory):
+        (JSC::B3::Air::testShuffleRotateMemory64):
+        (JSC::B3::Air::testShuffleRotateMemoryMixedWidth):
+        (JSC::B3::Air::testShuffleRotateMemoryAllRegs64):
+        (JSC::B3::Air::testShuffleRotateMemoryAllRegsMixedWidth):
+        (JSC::B3::Air::testShuffleSwapDouble):
+        (JSC::B3::Air::testShuffleShiftDouble):
+        (JSC::B3::Air::run):
+        (run):
+        (main):
+        * b3/testb3.cpp:
+        (JSC::B3::testCallSimple):
+        (JSC::B3::testCallRare):
+        (JSC::B3::testCallRareLive):
+        (JSC::B3::testCallSimplePure):
+        (JSC::B3::run):
+
</ins><span class="cx"> 2016-01-14  Keith Miller  &lt;keith_miller@apple.com&gt;
</span><span class="cx"> 
</span><span class="cx">         Unreviewed mark passing es6 tests as no longer failing.
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreJavaScriptCorexcodeprojprojectpbxproj"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/JavaScriptCore.xcodeproj/project.pbxproj        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -25,6 +25,7 @@
</span><span class="cx">                         buildPhases = (
</span><span class="cx">                         );
</span><span class="cx">                         dependencies = (
</span><ins>+                                0F6183471C45F67A0072450B /* PBXTargetDependency */,
</ins><span class="cx">                                 0F93275D1C20BF3A00CF6564 /* PBXTargetDependency */,
</span><span class="cx">                                 0FEC85B11BDB5D8F0080FF74 /* PBXTargetDependency */,
</span><span class="cx">                                 5D6B2A4F152B9E23005231DE /* PBXTargetDependency */,
</span><span class="lines">@@ -400,6 +401,21 @@
</span><span class="cx">                 0F5EF91E16878F7A003E5C25 /* JITThunks.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F5EF91B16878F78003E5C25 /* JITThunks.cpp */; };
</span><span class="cx">                 0F5EF91F16878F7D003E5C25 /* JITThunks.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F5EF91C16878F78003E5C25 /* JITThunks.h */; settings = {ATTRIBUTES = (Private, ); }; };
</span><span class="cx">                 0F5F08CF146C7633000472A9 /* UnconditionalFinalizer.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F5F08CE146C762F000472A9 /* UnconditionalFinalizer.h */; settings = {ATTRIBUTES = (Private, ); }; };
</span><ins>+                0F6183291C45BF070072450B /* AirCCallingConvention.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183201C45BF070072450B /* AirCCallingConvention.cpp */; };
+                0F61832A1C45BF070072450B /* AirCCallingConvention.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183211C45BF070072450B /* AirCCallingConvention.h */; };
+                0F61832B1C45BF070072450B /* AirCustom.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183221C45BF070072450B /* AirCustom.cpp */; };
+                0F61832C1C45BF070072450B /* AirEmitShuffle.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183231C45BF070072450B /* AirEmitShuffle.cpp */; };
+                0F61832D1C45BF070072450B /* AirEmitShuffle.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183241C45BF070072450B /* AirEmitShuffle.h */; };
+                0F61832E1C45BF070072450B /* AirLowerAfterRegAlloc.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183251C45BF070072450B /* AirLowerAfterRegAlloc.cpp */; };
+                0F61832F1C45BF070072450B /* AirLowerAfterRegAlloc.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183261C45BF070072450B /* AirLowerAfterRegAlloc.h */; };
+                0F6183301C45BF070072450B /* AirLowerMacros.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183271C45BF070072450B /* AirLowerMacros.cpp */; };
+                0F6183311C45BF070072450B /* AirLowerMacros.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183281C45BF070072450B /* AirLowerMacros.h */; };
+                0F6183331C45F35C0072450B /* AirOpcode.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183321C45F35C0072450B /* AirOpcode.h */; };
+                0F6183361C45F3B60072450B /* AirOpcodeGenerated.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183341C45F3B60072450B /* AirOpcodeGenerated.h */; };
+                0F6183371C45F3B60072450B /* AirOpcodeUtils.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F6183351C45F3B60072450B /* AirOpcodeUtils.h */; };
+                0F61833C1C45F62A0072450B /* Foundation.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 51F0EB6105C86C6B00E6DF1B /* Foundation.framework */; };
+                0F61833D1C45F62A0072450B /* JavaScriptCore.framework in Frameworks */ = {isa = PBXBuildFile; fileRef = 932F5BD90822A1C700736975 /* JavaScriptCore.framework */; };
+                0F6183451C45F6600072450B /* testair.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 0F6183441C45F6600072450B /* testair.cpp */; };
</ins><span class="cx">                 0F620174143FCD330068B77C /* DFGVariableAccessData.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F620172143FCD2F0068B77C /* DFGVariableAccessData.h */; };
</span><span class="cx">                 0F620176143FCD3B0068B77C /* DFGBasicBlock.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F620170143FCD2F0068B77C /* DFGBasicBlock.h */; };
</span><span class="cx">                 0F620177143FCD3F0068B77C /* DFGAbstractValue.h in Headers */ = {isa = PBXBuildFile; fileRef = 0F62016F143FCD2F0068B77C /* DFGAbstractValue.h */; };
</span><span class="lines">@@ -2079,6 +2095,13 @@
</span><span class="cx"> /* End PBXBuildFile section */
</span><span class="cx"> 
</span><span class="cx"> /* Begin PBXContainerItemProxy section */
</span><ins>+                0F6183461C45F67A0072450B /* PBXContainerItemProxy */ = {
+                        isa = PBXContainerItemProxy;
+                        containerPortal = 0867D690FE84028FC02AAC07 /* Project object */;
+                        proxyType = 1;
+                        remoteGlobalIDString = 0F6183381C45F62A0072450B;
+                        remoteInfo = testair;
+                };
</ins><span class="cx">                 0F93275C1C20BF3A00CF6564 /* PBXContainerItemProxy */ = {
</span><span class="cx">                         isa = PBXContainerItemProxy;
</span><span class="cx">                         containerPortal = 0867D690FE84028FC02AAC07 /* Project object */;
</span><span class="lines">@@ -2542,6 +2565,20 @@
</span><span class="cx">                 0F5EF91B16878F78003E5C25 /* JITThunks.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = JITThunks.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 0F5EF91C16878F78003E5C25 /* JITThunks.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = JITThunks.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 0F5F08CE146C762F000472A9 /* UnconditionalFinalizer.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = UnconditionalFinalizer.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><ins>+                0F6183201C45BF070072450B /* AirCCallingConvention.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirCCallingConvention.cpp; path = b3/air/AirCCallingConvention.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183211C45BF070072450B /* AirCCallingConvention.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirCCallingConvention.h; path = b3/air/AirCCallingConvention.h; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183221C45BF070072450B /* AirCustom.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirCustom.cpp; path = b3/air/AirCustom.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183231C45BF070072450B /* AirEmitShuffle.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirEmitShuffle.cpp; path = b3/air/AirEmitShuffle.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183241C45BF070072450B /* AirEmitShuffle.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirEmitShuffle.h; path = b3/air/AirEmitShuffle.h; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183251C45BF070072450B /* AirLowerAfterRegAlloc.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirLowerAfterRegAlloc.cpp; path = b3/air/AirLowerAfterRegAlloc.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183261C45BF070072450B /* AirLowerAfterRegAlloc.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirLowerAfterRegAlloc.h; path = b3/air/AirLowerAfterRegAlloc.h; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183271C45BF070072450B /* AirLowerMacros.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = AirLowerMacros.cpp; path = b3/air/AirLowerMacros.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183281C45BF070072450B /* AirLowerMacros.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = AirLowerMacros.h; path = b3/air/AirLowerMacros.h; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183321C45F35C0072450B /* AirOpcode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = AirOpcode.h; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183341C45F3B60072450B /* AirOpcodeGenerated.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = AirOpcodeGenerated.h; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183351C45F3B60072450B /* AirOpcodeUtils.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = AirOpcodeUtils.h; sourceTree = &quot;&lt;group&gt;&quot;; };
+                0F6183431C45F62A0072450B /* testair */ = {isa = PBXFileReference; explicitFileType = &quot;compiled.mach-o.executable&quot;; includeInIndex = 0; path = testair; sourceTree = BUILT_PRODUCTS_DIR; };
+                0F6183441C45F6600072450B /* testair.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = testair.cpp; path = b3/air/testair.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
</ins><span class="cx">                 0F62016F143FCD2F0068B77C /* DFGAbstractValue.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGAbstractValue.h; path = dfg/DFGAbstractValue.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 0F620170143FCD2F0068B77C /* DFGBasicBlock.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGBasicBlock.h; path = dfg/DFGBasicBlock.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 0F620172143FCD2F0068B77C /* DFGVariableAccessData.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = DFGVariableAccessData.h; path = dfg/DFGVariableAccessData.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="lines">@@ -3372,8 +3409,8 @@
</span><span class="cx">                 7013CA8A1B491A9400CAE613 /* JSJob.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = JSJob.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 7035587C1C418419004BD7BF /* MapPrototype.js */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.javascript; path = MapPrototype.js; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 7035587D1C418419004BD7BF /* SetPrototype.js */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.javascript; path = SetPrototype.js; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><del>-                7035587E1C418458004BD7BF /* MapPrototype.lut.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = MapPrototype.lut.h; path = MapPrototype.lut.h; sourceTree = &quot;&lt;group&gt;&quot;; };
-                7035587F1C418458004BD7BF /* SetPrototype.lut.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = SetPrototype.lut.h; path = SetPrototype.lut.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</del><ins>+                7035587E1C418458004BD7BF /* MapPrototype.lut.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = MapPrototype.lut.h; sourceTree = &quot;&lt;group&gt;&quot;; };
+                7035587F1C418458004BD7BF /* SetPrototype.lut.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = SetPrototype.lut.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</ins><span class="cx">                 704FD35305697E6D003DBED9 /* BooleanObject.h */ = {isa = PBXFileReference; fileEncoding = 30; indentWidth = 4; lastKnownFileType = sourcecode.c.h; path = BooleanObject.h; sourceTree = &quot;&lt;group&gt;&quot;; tabWidth = 8; };
</span><span class="cx">                 705B41A31A6E501E00716757 /* Symbol.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; path = Symbol.cpp; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="cx">                 705B41A41A6E501E00716757 /* Symbol.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; path = Symbol.h; sourceTree = &quot;&lt;group&gt;&quot;; };
</span><span class="lines">@@ -4306,6 +4343,15 @@
</span><span class="cx"> /* End PBXFileReference section */
</span><span class="cx"> 
</span><span class="cx"> /* Begin PBXFrameworksBuildPhase section */
</span><ins>+                0F61833B1C45F62A0072450B /* Frameworks */ = {
+                        isa = PBXFrameworksBuildPhase;
+                        buildActionMask = 2147483647;
+                        files = (
+                                0F61833C1C45F62A0072450B /* Foundation.framework in Frameworks */,
+                                0F61833D1C45F62A0072450B /* JavaScriptCore.framework in Frameworks */,
+                        );
+                        runOnlyForDeploymentPostprocessing = 0;
+                };
</ins><span class="cx">                 0F9327511C20BCBA00CF6564 /* Frameworks */ = {
</span><span class="cx">                         isa = PBXFrameworksBuildPhase;
</span><span class="cx">                         buildActionMask = 2147483647;
</span><span class="lines">@@ -4403,6 +4449,7 @@
</span><span class="cx">                                 0FEC85AD1BDB5CF10080FF74 /* testb3 */,
</span><span class="cx">                                 6511230514046A4C002B101D /* testRegExp */,
</span><span class="cx">                                 0F9327591C20BCBA00CF6564 /* dynbench */,
</span><ins>+                                0F6183431C45F62A0072450B /* testair */,
</ins><span class="cx">                         );
</span><span class="cx">                         name = Products;
</span><span class="cx">                         sourceTree = &quot;&lt;group&gt;&quot;;
</span><span class="lines">@@ -4793,13 +4840,18 @@
</span><span class="cx">                                 0FEC854C1BDACDC70080FF74 /* AirBasicBlock.cpp */,
</span><span class="cx">                                 0FEC854D1BDACDC70080FF74 /* AirBasicBlock.h */,
</span><span class="cx">                                 0FB3878B1BFBC44D00E3AB1E /* AirBlockWorklist.h */,
</span><ins>+                                0F6183201C45BF070072450B /* AirCCallingConvention.cpp */,
+                                0F6183211C45BF070072450B /* AirCCallingConvention.h */,
</ins><span class="cx">                                 0FEC854E1BDACDC70080FF74 /* AirCCallSpecial.cpp */,
</span><span class="cx">                                 0FEC854F1BDACDC70080FF74 /* AirCCallSpecial.h */,
</span><span class="cx">                                 0FEC85501BDACDC70080FF74 /* AirCode.cpp */,
</span><span class="cx">                                 0FEC85511BDACDC70080FF74 /* AirCode.h */,
</span><ins>+                                0F6183221C45BF070072450B /* AirCustom.cpp */,
</ins><span class="cx">                                 0F10F1A21C420BF0001C07D2 /* AirCustom.h */,
</span><span class="cx">                                 0F4570361BE44C910062A629 /* AirEliminateDeadCode.cpp */,
</span><span class="cx">                                 0F4570371BE44C910062A629 /* AirEliminateDeadCode.h */,
</span><ins>+                                0F6183231C45BF070072450B /* AirEmitShuffle.cpp */,
+                                0F6183241C45BF070072450B /* AirEmitShuffle.h */,
</ins><span class="cx">                                 262D85B41C0D650F006ACB61 /* AirFixPartialRegisterStalls.cpp */,
</span><span class="cx">                                 262D85B51C0D650F006ACB61 /* AirFixPartialRegisterStalls.h */,
</span><span class="cx">                                 0F4C91671C2B3D68004341A6 /* AirFixSpillSlotZDef.h */,
</span><span class="lines">@@ -4818,6 +4870,10 @@
</span><span class="cx">                                 26718BA21BE99F780052017B /* AirIteratedRegisterCoalescing.cpp */,
</span><span class="cx">                                 26718BA31BE99F780052017B /* AirIteratedRegisterCoalescing.h */,
</span><span class="cx">                                 2684D4371C00161C0081D663 /* AirLiveness.h */,
</span><ins>+                                0F6183251C45BF070072450B /* AirLowerAfterRegAlloc.cpp */,
+                                0F6183261C45BF070072450B /* AirLowerAfterRegAlloc.h */,
+                                0F6183271C45BF070072450B /* AirLowerMacros.cpp */,
+                                0F6183281C45BF070072450B /* AirLowerMacros.h */,
</ins><span class="cx">                                 264091FA1BE2FD4100684DB2 /* AirOpcode.opcodes */,
</span><span class="cx">                                 0FB3878C1BFBC44D00E3AB1E /* AirOptimizeBlockOrder.cpp */,
</span><span class="cx">                                 0FB3878D1BFBC44D00E3AB1E /* AirOptimizeBlockOrder.h */,
</span><span class="lines">@@ -4843,6 +4899,7 @@
</span><span class="cx">                                 0F3730921C0D67EE00052BFA /* AirUseCounts.h */,
</span><span class="cx">                                 0FEC856B1BDACDC70080FF74 /* AirValidate.cpp */,
</span><span class="cx">                                 0FEC856C1BDACDC70080FF74 /* AirValidate.h */,
</span><ins>+                                0F6183441C45F6600072450B /* testair.cpp */,
</ins><span class="cx">                         );
</span><span class="cx">                         name = air;
</span><span class="cx">                         sourceTree = &quot;&lt;group&gt;&quot;;
</span><span class="lines">@@ -5300,6 +5357,9 @@
</span><span class="cx">                 650FDF8D09D0FCA700769E54 /* Derived Sources */ = {
</span><span class="cx">                         isa = PBXGroup;
</span><span class="cx">                         children = (
</span><ins>+                                0F6183321C45F35C0072450B /* AirOpcode.h */,
+                                0F6183341C45F3B60072450B /* AirOpcodeGenerated.h */,
+                                0F6183351C45F3B60072450B /* AirOpcodeUtils.h */,
</ins><span class="cx">                                 996B73151BDA05AA00331B84 /* ArrayConstructor.lut.h */,
</span><span class="cx">                                 996B73161BDA05AA00331B84 /* ArrayIteratorPrototype.lut.h */,
</span><span class="cx">                                 996B73071BD9FA2C00331B84 /* BooleanPrototype.lut.h */,
</span><span class="lines">@@ -6899,6 +6959,7 @@
</span><span class="cx">                                 0F4570391BE44C910062A629 /* AirEliminateDeadCode.h in Headers */,
</span><span class="cx">                                 79CFC6F01C33B10000C768EA /* LLIntPCRanges.h in Headers */,
</span><span class="cx">                                 79D5CD5B1C1106A900CECA07 /* SamplingProfiler.h in Headers */,
</span><ins>+                                0F6183311C45BF070072450B /* AirLowerMacros.h in Headers */,
</ins><span class="cx">                                 0FEC85771BDACDC70080FF74 /* AirFrequentedBlock.h in Headers */,
</span><span class="cx">                                 0FEC85791BDACDC70080FF74 /* AirGenerate.h in Headers */,
</span><span class="cx">                                 79DF66B11BF26A570001CF11 /* FTLExceptionHandlerManager.h in Headers */,
</span><span class="lines">@@ -6961,6 +7022,7 @@
</span><span class="cx">                                 0FEC85141BDACDAC0080FF74 /* B3ControlValue.h in Headers */,
</span><span class="cx">                                 0FEC85C11BE167A00080FF74 /* B3Effects.h in Headers */,
</span><span class="cx">                                 0FEC85161BDACDAC0080FF74 /* B3FrequencyClass.h in Headers */,
</span><ins>+                                0F61832F1C45BF070072450B /* AirLowerAfterRegAlloc.h in Headers */,
</ins><span class="cx">                                 0FEC85171BDACDAC0080FF74 /* B3FrequentedBlock.h in Headers */,
</span><span class="cx">                                 0FEC85191BDACDAC0080FF74 /* B3Generate.h in Headers */,
</span><span class="cx">                                 0FEC851A1BDACDAC0080FF74 /* B3GenericFrequentedBlock.h in Headers */,
</span><span class="lines">@@ -7154,6 +7216,7 @@
</span><span class="cx">                                 A77A424017A0BBFD00A8DB81 /* DFGClobberize.h in Headers */,
</span><span class="cx">                                 0F37308D1C0BD29100052BFA /* B3PhiChildren.h in Headers */,
</span><span class="cx">                                 A77A424217A0BBFD00A8DB81 /* DFGClobberSet.h in Headers */,
</span><ins>+                                0F61832D1C45BF070072450B /* AirEmitShuffle.h in Headers */,
</ins><span class="cx">                                 0F3C1F1B1B868E7900ABB08B /* DFGClobbersExitState.h in Headers */,
</span><span class="cx">                                 0F04396E1B03DC0B009598B7 /* DFGCombinedLiveness.h in Headers */,
</span><span class="cx">                                 0F7B294D14C3CD4C007C3DB1 /* DFGCommon.h in Headers */,
</span><span class="lines">@@ -7258,6 +7321,7 @@
</span><span class="cx">                                 0F2B9CED19D0BA7D00B1D1B5 /* DFGPromotedHeapLocation.h in Headers */,
</span><span class="cx">                                 0FFC92161B94FB3E0071DD66 /* DFGPropertyTypeKey.h in Headers */,
</span><span class="cx">                                 0FB17663196B8F9E0091052A /* DFGPureValue.h in Headers */,
</span><ins>+                                0F6183361C45F3B60072450B /* AirOpcodeGenerated.h in Headers */,
</ins><span class="cx">                                 0F3730911C0CD70C00052BFA /* AllowMacroScratchRegisterUsage.h in Headers */,
</span><span class="cx">                                 0F3A1BFA1A9ECB7D000DE01A /* DFGPutStackSinkingPhase.h in Headers */,
</span><span class="cx">                                 86EC9DD11328DF82002B2AD7 /* DFGRegisterBank.h in Headers */,
</span><span class="lines">@@ -7274,6 +7338,7 @@
</span><span class="cx">                                 0FC20CBA18556A3500C9E954 /* DFGSSALoweringPhase.h in Headers */,
</span><span class="cx">                                 0F9FB4F517FCB91700CB67F8 /* DFGStackLayoutPhase.h in Headers */,
</span><span class="cx">                                 0F4F29E018B6AD1C0057BC15 /* DFGStaticExecutionCountEstimationPhase.h in Headers */,
</span><ins>+                                0F6183371C45F3B60072450B /* AirOpcodeUtils.h in Headers */,
</ins><span class="cx">                                 0F9E32641B05AB0400801ED5 /* DFGStoreBarrierInsertionPhase.h in Headers */,
</span><span class="cx">                                 0FC20CB61852E2C600C9E954 /* DFGStrengthReductionPhase.h in Headers */,
</span><span class="cx">                                 0F63947815DCE34B006A597C /* DFGStructureAbstractValue.h in Headers */,
</span><span class="lines">@@ -7628,6 +7693,7 @@
</span><span class="cx">                                 0F2B66F717B6B5AB00A7AE3F /* JSInt8Array.h in Headers */,
</span><span class="cx">                                 A76C51761182748D00715B05 /* JSInterfaceJIT.h in Headers */,
</span><span class="cx">                                 E33F50811B8429A400413856 /* JSInternalPromise.h in Headers */,
</span><ins>+                                0F61832A1C45BF070072450B /* AirCCallingConvention.h in Headers */,
</ins><span class="cx">                                 E33F50791B84225700413856 /* JSInternalPromiseConstructor.h in Headers */,
</span><span class="cx">                                 E33F50871B8449EF00413856 /* JSInternalPromiseConstructor.lut.h in Headers */,
</span><span class="cx">                                 E33F50851B8437A000413856 /* JSInternalPromiseDeferred.h in Headers */,
</span><span class="lines">@@ -8018,6 +8084,7 @@
</span><span class="cx">                                 142E313C134FF0A600AFADB5 /* Weak.h in Headers */,
</span><span class="cx">                                 14E84F9F14EE1ACC00D6D5D4 /* WeakBlock.h in Headers */,
</span><span class="cx">                                 14BFCE6910CDB1FC00364CCE /* WeakGCMap.h in Headers */,
</span><ins>+                                0F6183331C45F35C0072450B /* AirOpcode.h in Headers */,
</ins><span class="cx">                                 AD86A93E1AA4D88D002FE77F /* WeakGCMapInlines.h in Headers */,
</span><span class="cx">                                 14F7256614EE265E00B1652B /* WeakHandleOwner.h in Headers */,
</span><span class="cx">                                 14E84FA214EE1ACC00D6D5D4 /* WeakImpl.h in Headers */,
</span><span class="lines">@@ -8050,6 +8117,22 @@
</span><span class="cx"> /* End PBXHeadersBuildPhase section */
</span><span class="cx"> 
</span><span class="cx"> /* Begin PBXNativeTarget section */
</span><ins>+                0F6183381C45F62A0072450B /* testair */ = {
+                        isa = PBXNativeTarget;
+                        buildConfigurationList = 0F61833E1C45F62A0072450B /* Build configuration list for PBXNativeTarget &quot;testair&quot; */;
+                        buildPhases = (
+                                0F6183391C45F62A0072450B /* Sources */,
+                                0F61833B1C45F62A0072450B /* Frameworks */,
+                        );
+                        buildRules = (
+                        );
+                        dependencies = (
+                        );
+                        name = testair;
+                        productName = testapi;
+                        productReference = 0F6183431C45F62A0072450B /* testair */;
+                        productType = &quot;com.apple.product-type.tool&quot;;
+                };
</ins><span class="cx">                 0F93274E1C20BCBA00CF6564 /* dynbench */ = {
</span><span class="cx">                         isa = PBXNativeTarget;
</span><span class="cx">                         buildConfigurationList = 0F9327541C20BCBA00CF6564 /* Build configuration list for PBXNativeTarget &quot;dynbench&quot; */;
</span><span class="lines">@@ -8257,6 +8340,7 @@
</span><span class="cx">                                 0FEC85941BDB5CF10080FF74 /* testb3 */,
</span><span class="cx">                                 5D6B2A47152B9E17005231DE /* Test Tools */,
</span><span class="cx">                                 0F93274E1C20BCBA00CF6564 /* dynbench */,
</span><ins>+                                0F6183381C45F62A0072450B /* testair */,
</ins><span class="cx">                         );
</span><span class="cx">                 };
</span><span class="cx"> /* End PBXProject section */
</span><span class="lines">@@ -8493,6 +8577,14 @@
</span><span class="cx"> /* End PBXShellScriptBuildPhase section */
</span><span class="cx"> 
</span><span class="cx"> /* Begin PBXSourcesBuildPhase section */
</span><ins>+                0F6183391C45F62A0072450B /* Sources */ = {
+                        isa = PBXSourcesBuildPhase;
+                        buildActionMask = 2147483647;
+                        files = (
+                                0F6183451C45F6600072450B /* testair.cpp in Sources */,
+                        );
+                        runOnlyForDeploymentPostprocessing = 0;
+                };
</ins><span class="cx">                 0F93274F1C20BCBA00CF6564 /* Sources */ = {
</span><span class="cx">                         isa = PBXSourcesBuildPhase;
</span><span class="cx">                         buildActionMask = 2147483647;
</span><span class="lines">@@ -8697,6 +8789,7 @@
</span><span class="cx">                                 52B717B51A0597E1007AF4F3 /* ControlFlowProfiler.cpp in Sources */,
</span><span class="cx">                                 0FBADF541BD1F4B800E073C1 /* CopiedBlock.cpp in Sources */,
</span><span class="cx">                                 C240305514B404E60079EB64 /* CopiedSpace.cpp in Sources */,
</span><ins>+                                0F6183301C45BF070072450B /* AirLowerMacros.cpp in Sources */,
</ins><span class="cx">                                 C2239D1716262BDD005AC5FD /* CopyVisitor.cpp in Sources */,
</span><span class="cx">                                 2A111245192FCE79005EE18D /* CustomGetterSetter.cpp in Sources */,
</span><span class="cx">                                 62E3D5F01B8D0B7300B868BB /* DataFormat.cpp in Sources */,
</span><span class="lines">@@ -8745,6 +8838,7 @@
</span><span class="cx">                                 0F0981F71BC5E565004814F8 /* DFGCopyBarrierOptimizationPhase.cpp in Sources */,
</span><span class="cx">                                 0FBE0F7216C1DB030082C5E8 /* DFGCPSRethreadingPhase.cpp in Sources */,
</span><span class="cx">                                 A7D89CF517A0B8CC00773AD8 /* DFGCriticalEdgeBreakingPhase.cpp in Sources */,
</span><ins>+                                0F6183291C45BF070072450B /* AirCCallingConvention.cpp in Sources */,
</ins><span class="cx">                                 0FFFC95914EF90A600C72532 /* DFGCSEPhase.cpp in Sources */,
</span><span class="cx">                                 0F2FC77216E12F710038D976 /* DFGDCEPhase.cpp in Sources */,
</span><span class="cx">                                 0F338E121BF0276C0013C88F /* B3OpaqueByproducts.cpp in Sources */,
</span><span class="lines">@@ -8761,6 +8855,7 @@
</span><span class="cx">                                 A78A9774179738B8009DF744 /* DFGFailedFinalizer.cpp in Sources */,
</span><span class="cx">                                 A78A9776179738B8009DF744 /* DFGFinalizer.cpp in Sources */,
</span><span class="cx">                                 0F2BDC15151C5D4D00CD8910 /* DFGFixupPhase.cpp in Sources */,
</span><ins>+                                0F61832C1C45BF070072450B /* AirEmitShuffle.cpp in Sources */,
</ins><span class="cx">                                 0F9D339617FFC4E60073C2BC /* DFGFlushedAt.cpp in Sources */,
</span><span class="cx">                                 A7D89CF717A0B8CC00773AD8 /* DFGFlushFormat.cpp in Sources */,
</span><span class="cx">                                 0F69CC88193AC60A0045759E /* DFGFrozenValue.cpp in Sources */,
</span><span class="lines">@@ -9072,6 +9167,7 @@
</span><span class="cx">                                 E33F50841B8437A000413856 /* JSInternalPromiseDeferred.cpp in Sources */,
</span><span class="cx">                                 E33F50741B8421C000413856 /* JSInternalPromisePrototype.cpp in Sources */,
</span><span class="cx">                                 A503FA1B188E0FB000110F14 /* JSJavaScriptCallFrame.cpp in Sources */,
</span><ins>+                                0F61832E1C45BF070072450B /* AirLowerAfterRegAlloc.cpp in Sources */,
</ins><span class="cx">                                 A503FA1D188E0FB000110F14 /* JSJavaScriptCallFramePrototype.cpp in Sources */,
</span><span class="cx">                                 7013CA8B1B491A9400CAE613 /* JSJob.cpp in Sources */,
</span><span class="cx">                                 140B7D1D0DC69AF7009C42B8 /* JSLexicalEnvironment.cpp in Sources */,
</span><span class="lines">@@ -9214,6 +9310,7 @@
</span><span class="cx">                                 0F9D4C0C1C3E1C11006CD984 /* FTLExceptionTarget.cpp in Sources */,
</span><span class="cx">                                 0FB1058B1675483100F8AB6E /* ProfilerOSRExit.cpp in Sources */,
</span><span class="cx">                                 0FB1058D1675483700F8AB6E /* ProfilerOSRExitSite.cpp in Sources */,
</span><ins>+                                0F61832B1C45BF070072450B /* AirCustom.cpp in Sources */,
</ins><span class="cx">                                 0F13912B16771C3A009CCB07 /* ProfilerProfiledBytecodes.cpp in Sources */,
</span><span class="cx">                                 0FD3E40D1B618B6600C80E1E /* PropertyCondition.cpp in Sources */,
</span><span class="cx">                                 A7FB60A4103F7DC20017A286 /* PropertyDescriptor.cpp in Sources */,
</span><span class="lines">@@ -9366,6 +9463,11 @@
</span><span class="cx"> /* End PBXSourcesBuildPhase section */
</span><span class="cx"> 
</span><span class="cx"> /* Begin PBXTargetDependency section */
</span><ins>+                0F6183471C45F67A0072450B /* PBXTargetDependency */ = {
+                        isa = PBXTargetDependency;
+                        target = 0F6183381C45F62A0072450B /* testair */;
+                        targetProxy = 0F6183461C45F67A0072450B /* PBXContainerItemProxy */;
+                };
</ins><span class="cx">                 0F93275D1C20BF3A00CF6564 /* PBXTargetDependency */ = {
</span><span class="cx">                         isa = PBXTargetDependency;
</span><span class="cx">                         target = 0F93274E1C20BCBA00CF6564 /* dynbench */;
</span><span class="lines">@@ -9477,6 +9579,42 @@
</span><span class="cx">                         };
</span><span class="cx">                         name = Production;
</span><span class="cx">                 };
</span><ins>+                0F61833F1C45F62A0072450B /* Debug */ = {
+                        isa = XCBuildConfiguration;
+                        baseConfigurationReference = BC021BF2136900C300FC5467 /* ToolExecutable.xcconfig */;
+                        buildSettings = {
+                                CODE_SIGN_ENTITLEMENTS_ios_testair = entitlements.plist;
+                                PRODUCT_NAME = testair;
+                        };
+                        name = Debug;
+                };
+                0F6183401C45F62A0072450B /* Release */ = {
+                        isa = XCBuildConfiguration;
+                        baseConfigurationReference = BC021BF2136900C300FC5467 /* ToolExecutable.xcconfig */;
+                        buildSettings = {
+                                CODE_SIGN_ENTITLEMENTS_ios_testair = entitlements.plist;
+                                PRODUCT_NAME = testair;
+                        };
+                        name = Release;
+                };
+                0F6183411C45F62A0072450B /* Profiling */ = {
+                        isa = XCBuildConfiguration;
+                        baseConfigurationReference = BC021BF2136900C300FC5467 /* ToolExecutable.xcconfig */;
+                        buildSettings = {
+                                CODE_SIGN_ENTITLEMENTS_ios_testair = entitlements.plist;
+                                PRODUCT_NAME = testair;
+                        };
+                        name = Profiling;
+                };
+                0F6183421C45F62A0072450B /* Production */ = {
+                        isa = XCBuildConfiguration;
+                        baseConfigurationReference = BC021BF2136900C300FC5467 /* ToolExecutable.xcconfig */;
+                        buildSettings = {
+                                CODE_SIGN_ENTITLEMENTS_ios_testair = entitlements.plist;
+                                PRODUCT_NAME = testair;
+                        };
+                        name = Production;
+                };
</ins><span class="cx">                 0F9327551C20BCBA00CF6564 /* Debug */ = {
</span><span class="cx">                         isa = XCBuildConfiguration;
</span><span class="cx">                         baseConfigurationReference = BC021BF2136900C300FC5467 /* ToolExecutable.xcconfig */;
</span><span class="lines">@@ -9906,6 +10044,17 @@
</span><span class="cx">                         defaultConfigurationIsVisible = 0;
</span><span class="cx">                         defaultConfigurationName = Production;
</span><span class="cx">                 };
</span><ins>+                0F61833E1C45F62A0072450B /* Build configuration list for PBXNativeTarget &quot;testair&quot; */ = {
+                        isa = XCConfigurationList;
+                        buildConfigurations = (
+                                0F61833F1C45F62A0072450B /* Debug */,
+                                0F6183401C45F62A0072450B /* Release */,
+                                0F6183411C45F62A0072450B /* Profiling */,
+                                0F6183421C45F62A0072450B /* Production */,
+                        );
+                        defaultConfigurationIsVisible = 0;
+                        defaultConfigurationName = Production;
+                };
</ins><span class="cx">                 0F9327541C20BCBA00CF6564 /* Build configuration list for PBXNativeTarget &quot;dynbench&quot; */ = {
</span><span class="cx">                         isa = XCConfigurationList;
</span><span class="cx">                         buildConfigurations = (
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerAbstractMacroAssemblerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/AbstractMacroAssembler.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/AbstractMacroAssembler.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/assembler/AbstractMacroAssembler.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2008, 2012, 2014, 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2008, 2012, 2014-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -76,6 +76,15 @@
</span><span class="cx"> #endif
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+inline bool isIOS()
+{
+#if PLATFORM(IOS)
+    return true;
+#else
+    return false;
+#endif
+}
+
</ins><span class="cx"> inline bool optimizeForARMv7IDIVSupported()
</span><span class="cx"> {
</span><span class="cx">     return isARMv7IDIVSupported() &amp;&amp; Options::useArchitectureSpecificOptimizations();
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerMacroAssemblerX86Commonh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1411,6 +1411,16 @@
</span><span class="cx">     }
</span><span class="cx"> #endif
</span><span class="cx"> 
</span><ins>+    void swap32(RegisterID src, RegisterID dest)
+    {
+        m_assembler.xchgl_rr(src, dest);
+    }
+
+    void swap32(RegisterID src, Address dest)
+    {
+        m_assembler.xchgl_rm(src, dest.offset, dest.base);
+    }
+
</ins><span class="cx">     void moveConditionally32(RelationalCondition cond, RegisterID left, RegisterID right, RegisterID src, RegisterID dest)
</span><span class="cx">     {
</span><span class="cx">         m_assembler.cmpl_rr(right, left);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerMacroAssemblerX86_64h"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86_64.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2008, 2012, 2014, 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2008, 2012, 2014-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -655,6 +655,16 @@
</span><span class="cx">         return DataLabel32(this);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void swap64(RegisterID src, RegisterID dest)
+    {
+        m_assembler.xchgq_rr(src, dest);
+    }
+
+    void swap64(RegisterID src, Address dest)
+    {
+        m_assembler.xchgq_rm(src, dest.offset, dest.base);
+    }
+
</ins><span class="cx">     void move64ToDouble(RegisterID src, FPRegisterID dest)
</span><span class="cx">     {
</span><span class="cx">         m_assembler.movq_rr(src, dest);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerX86Assemblerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/X86Assembler.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/X86Assembler.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/assembler/X86Assembler.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2008, 2012-2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2008, 2012-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -1431,6 +1431,11 @@
</span><span class="cx">             m_formatter.oneByteOp(OP_XCHG_EvGv, src, dst);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void xchgl_rm(RegisterID src, int offset, RegisterID base)
+    {
+        m_formatter.oneByteOp(OP_XCHG_EvGv, src, base, offset);
+    }
+
</ins><span class="cx"> #if CPU(X86_64)
</span><span class="cx">     void xchgq_rr(RegisterID src, RegisterID dst)
</span><span class="cx">     {
</span><span class="lines">@@ -1441,6 +1446,11 @@
</span><span class="cx">         else
</span><span class="cx">             m_formatter.oneByteOp64(OP_XCHG_EvGv, src, dst);
</span><span class="cx">     }
</span><ins>+
+    void xchgq_rm(RegisterID src, int offset, RegisterID base)
+    {
+        m_formatter.oneByteOp64(OP_XCHG_EvGv, src, base, offset);
+    }
</ins><span class="cx"> #endif
</span><span class="cx"> 
</span><span class="cx">     void movl_rr(RegisterID src, RegisterID dst)
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3B3CCallValueh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/B3CCallValue.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/B3CCallValue.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/B3CCallValue.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -39,7 +39,7 @@
</span><span class="cx"> 
</span><span class="cx">     ~CCallValue();
</span><span class="cx"> 
</span><del>-    Effects effects;
</del><ins>+    Effects effects { Effects::forCall() };
</ins><span class="cx"> 
</span><span class="cx"> private:
</span><span class="cx">     friend class Procedure;
</span><span class="lines">@@ -47,7 +47,6 @@
</span><span class="cx">     template&lt;typename... Arguments&gt;
</span><span class="cx">     CCallValue(unsigned index, Type type, Origin origin, Arguments... arguments)
</span><span class="cx">         : Value(index, CheckedOpcode, CCall, type, origin, arguments...)
</span><del>-        , effects(Effects::forCall())
</del><span class="cx">     {
</span><span class="cx">         RELEASE_ASSERT(numChildren() &gt;= 1);
</span><span class="cx">     }
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3B3Compilationcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/B3Compilation.cpp (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/B3Compilation.cpp        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/B3Compilation.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -52,6 +52,12 @@
</span><span class="cx">     m_byproducts = proc.releaseByproducts();
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+Compilation::Compilation(MacroAssemblerCodeRef codeRef, std::unique_ptr&lt;OpaqueByproducts&gt; byproducts)
+    : m_codeRef(codeRef)
+    , m_byproducts(WTFMove(byproducts))
+{
+}
+
</ins><span class="cx"> Compilation::~Compilation()
</span><span class="cx"> {
</span><span class="cx"> }
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3B3Compilationh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/B3Compilation.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/B3Compilation.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/B3Compilation.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -55,6 +55,11 @@
</span><span class="cx"> 
</span><span class="cx"> public:
</span><span class="cx">     JS_EXPORT_PRIVATE Compilation(VM&amp;, Procedure&amp;, unsigned optLevel = 1);
</span><ins>+
+    // This constructor allows you to manually create a Compilation. It's currently only used by test
+    // code. Probably best to keep it that way.
+    JS_EXPORT_PRIVATE Compilation(MacroAssemblerCodeRef, std::unique_ptr&lt;OpaqueByproducts&gt;);
+    
</ins><span class="cx">     JS_EXPORT_PRIVATE ~Compilation();
</span><span class="cx"> 
</span><span class="cx">     MacroAssemblerCodePtr code() const { return m_codeRef.code(); }
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3B3LowerToAircpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -35,6 +35,7 @@
</span><span class="cx"> #include &quot;AirStackSlot.h&quot;
</span><span class="cx"> #include &quot;B3ArgumentRegValue.h&quot;
</span><span class="cx"> #include &quot;B3BasicBlockInlines.h&quot;
</span><ins>+#include &quot;B3BlockWorklist.h&quot;
</ins><span class="cx"> #include &quot;B3CCallValue.h&quot;
</span><span class="cx"> #include &quot;B3CheckSpecial.h&quot;
</span><span class="cx"> #include &quot;B3Commutativity.h&quot;
</span><span class="lines">@@ -99,6 +100,15 @@
</span><span class="cx">             }
</span><span class="cx">         }
</span><span class="cx"> 
</span><ins>+        // Figure out which blocks are not rare.
+        m_fastWorklist.push(m_procedure[0]);
+        while (B3::BasicBlock* block = m_fastWorklist.pop()) {
+            for (B3::FrequentedBlock&amp; successor : block-&gt;successors()) {
+                if (!successor.isRare())
+                    m_fastWorklist.push(successor.block());
+            }
+        }
+
</ins><span class="cx">         m_procedure.resetValueOwners(); // Used by crossesInterference().
</span><span class="cx"> 
</span><span class="cx">         // Lower defs before uses on a global level. This is a good heuristic to lock down a
</span><span class="lines">@@ -108,6 +118,8 @@
</span><span class="cx">             // Reset some state.
</span><span class="cx">             m_insts.resize(0);
</span><span class="cx"> 
</span><ins>+            m_isRare = !m_fastWorklist.saw(block);
+
</ins><span class="cx">             if (verbose)
</span><span class="cx">                 dataLog(&quot;Lowering Block &quot;, *block, &quot;:\n&quot;);
</span><span class="cx">             
</span><span class="lines">@@ -1552,37 +1564,6 @@
</span><span class="cx">             inverted);
</span><span class="cx">     }
</span><span class="cx"> 
</span><del>-    template&lt;typename BankInfo&gt;
-    Arg marshallCCallArgument(unsigned&amp; argumentCount, unsigned&amp; stackOffset, Value* child)
-    {
-        unsigned argumentIndex = argumentCount++;
-        if (argumentIndex &lt; BankInfo::numberOfArgumentRegisters) {
-            Tmp result = Tmp(BankInfo::toArgumentRegister(argumentIndex));
-            append(relaxedMoveForType(child-&gt;type()), immOrTmp(child), result);
-            return result;
-        }
-
-#if CPU(ARM64) &amp;&amp; PLATFORM(IOS)
-        // iOS does not follow the ARM64 ABI regarding function calls.
-        // Arguments must be packed.
-        unsigned slotSize = sizeofType(child-&gt;type());
-        stackOffset = WTF::roundUpToMultipleOf(slotSize, stackOffset);
-#else
-        unsigned slotSize = sizeof(void*);
-#endif
-        Arg result = Arg::callArg(stackOffset);
-        stackOffset += slotSize;
-        
-        // Put the code for storing the argument before anything else. This significantly eases the
-        // burden on the register allocator. If we could, we'd hoist these stores as far as
-        // possible.
-        // FIXME: Add a phase to hoist stores as high as possible to relieve register pressure.
-        // https://bugs.webkit.org/show_bug.cgi?id=151063
-        m_insts.last().insert(0, createStore(child, result));
-        
-        return result;
-    }
-
</del><span class="cx">     void lower()
</span><span class="cx">     {
</span><span class="cx">         switch (m_value-&gt;opcode()) {
</span><span class="lines">@@ -1934,14 +1915,10 @@
</span><span class="cx">             return;
</span><span class="cx">         }
</span><span class="cx"> 
</span><del>-        case CCall: {
</del><ins>+        case B3::CCall: {
</ins><span class="cx">             CCallValue* cCall = m_value-&gt;as&lt;CCallValue&gt;();
</span><del>-            Inst inst(Patch, cCall, Arg::special(m_code.cCallSpecial()));
</del><span class="cx"> 
</span><del>-            // This is a bit weird - we have a super intense contract with Arg::CCallSpecial. It might
-            // be better if we factored Air::CCallSpecial out of the Air namespace and made it a B3
-            // thing.
-            // FIXME: https://bugs.webkit.org/show_bug.cgi?id=151045
</del><ins>+            Inst inst(m_isRare ? Air::ColdCCall : Air::CCall, cCall);
</ins><span class="cx"> 
</span><span class="cx">             // We have a ton of flexibility regarding the callee argument, but currently, we don't
</span><span class="cx">             // use it yet. It gets weird for reasons:
</span><span class="lines">@@ -1954,48 +1931,13 @@
</span><span class="cx">             // FIXME: https://bugs.webkit.org/show_bug.cgi?id=151052
</span><span class="cx">             inst.args.append(tmp(cCall-&gt;child(0)));
</span><span class="cx"> 
</span><del>-            // We need to tell Air what registers this defines.
-            inst.args.append(Tmp(GPRInfo::returnValueGPR));
-            inst.args.append(Tmp(GPRInfo::returnValueGPR2));
-            inst.args.append(Tmp(FPRInfo::returnValueFPR));
</del><ins>+            if (cCall-&gt;type() != Void)
+                inst.args.append(tmp(cCall));
</ins><span class="cx"> 
</span><del>-            // Now marshall the arguments. This is where we implement the C calling convention. After
-            // this, Air does not know what the convention is; it just takes our word for it.
-            unsigned gpArgumentCount = 0;
-            unsigned fpArgumentCount = 0;
-            unsigned stackOffset = 0;
-            for (unsigned i = 1; i &lt; cCall-&gt;numChildren(); ++i) {
-                Value* argChild = cCall-&gt;child(i);
-                Arg arg;
-                
-                switch (Arg::typeForB3Type(argChild-&gt;type())) {
-                case Arg::GP:
-                    arg = marshallCCallArgument&lt;GPRInfo&gt;(gpArgumentCount, stackOffset, argChild);
-                    break;
</del><ins>+            for (unsigned i = 1; i &lt; cCall-&gt;numChildren(); ++i)
+                inst.args.append(immOrTmp(cCall-&gt;child(i)));
</ins><span class="cx"> 
</span><del>-                case Arg::FP:
-                    arg = marshallCCallArgument&lt;FPRInfo&gt;(fpArgumentCount, stackOffset, argChild);
-                    break;
-                }
-
-                if (arg.isTmp())
-                    inst.args.append(arg);
-            }
-            
</del><span class="cx">             m_insts.last().append(WTFMove(inst));
</span><del>-
-            switch (cCall-&gt;type()) {
-            case Void:
-                break;
-            case Int32:
-            case Int64:
-                append(Move, Tmp(GPRInfo::returnValueGPR), tmp(cCall));
-                break;
-            case Float:
-            case Double:
-                append(MoveDouble, Tmp(FPRInfo::returnValueFPR), tmp(cCall));
-                break;
-            }
</del><span class="cx">             return;
</span><span class="cx">         }
</span><span class="cx"> 
</span><span class="lines">@@ -2287,11 +2229,13 @@
</span><span class="cx"> 
</span><span class="cx">     UseCounts m_useCounts;
</span><span class="cx">     PhiChildren m_phiChildren;
</span><ins>+    BlockWorklist m_fastWorklist;
</ins><span class="cx"> 
</span><span class="cx">     Vector&lt;Vector&lt;Inst, 4&gt;&gt; m_insts;
</span><span class="cx">     Vector&lt;Inst&gt; m_prologue;
</span><span class="cx"> 
</span><span class="cx">     B3::BasicBlock* m_block;
</span><ins>+    bool m_isRare;
</ins><span class="cx">     unsigned m_index;
</span><span class="cx">     Value* m_value;
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3B3OpaqueByproductsh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/B3OpaqueByproducts.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/B3OpaqueByproducts.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/B3OpaqueByproducts.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -39,7 +39,7 @@
</span><span class="cx">     WTF_MAKE_FAST_ALLOCATED;
</span><span class="cx"> public:
</span><span class="cx">     OpaqueByproducts();
</span><del>-    ~OpaqueByproducts();
</del><ins>+    JS_EXPORT_PRIVATE ~OpaqueByproducts();
</ins><span class="cx"> 
</span><span class="cx">     size_t count() const { return m_byproducts.size(); }
</span><span class="cx">     
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3B3StackmapSpecialcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/B3StackmapSpecial.cpp (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/B3StackmapSpecial.cpp        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/B3StackmapSpecial.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -200,21 +200,14 @@
</span><span class="cx">     case Arg::Tmp:
</span><span class="cx">     case Arg::Imm:
</span><span class="cx">     case Arg::Imm64:
</span><del>-    case Arg::Stack:
-    case Arg::CallArg:
-        break; // OK
-    case Arg::Addr:
-        if (arg.base() != Tmp(GPRInfo::callFrameRegister)
-            &amp;&amp; arg.base() != Tmp(MacroAssembler::stackPointerRegister))
-            return false;
</del><span class="cx">         break;
</span><span class="cx">     default:
</span><del>-        return false;
</del><ins>+        if (!arg.isStackMemory())
+            return false;
+        break;
</ins><span class="cx">     }
</span><del>-    
-    Arg::Type type = Arg::typeForB3Type(value-&gt;type());
</del><span class="cx"> 
</span><del>-    return arg.isType(type);
</del><ins>+    return arg.canRepresent(value);
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> bool StackmapSpecial::isArgValidForRep(Air::Code&amp; code, const Air::Arg&amp; arg, const ValueRep&amp; rep)
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirArgcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirArg.cpp (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirArg.cpp        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirArg.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -30,6 +30,9 @@
</span><span class="cx"> 
</span><span class="cx"> #include &quot;AirSpecial.h&quot;
</span><span class="cx"> #include &quot;AirStackSlot.h&quot;
</span><ins>+#include &quot;B3Value.h&quot;
+#include &quot;FPRInfo.h&quot;
+#include &quot;GPRInfo.h&quot;
</ins><span class="cx"> 
</span><span class="cx"> #if COMPILER(GCC) &amp;&amp; ASSERT_DISABLED
</span><span class="cx"> #pragma GCC diagnostic push
</span><span class="lines">@@ -38,6 +41,20 @@
</span><span class="cx"> 
</span><span class="cx"> namespace JSC { namespace B3 { namespace Air {
</span><span class="cx"> 
</span><ins>+bool Arg::isStackMemory() const
+{
+    switch (kind()) {
+    case Addr:
+        return base() == Air::Tmp(GPRInfo::callFrameRegister)
+            || base() == Air::Tmp(MacroAssembler::stackPointerRegister);
+    case Stack:
+    case CallArg:
+        return true;
+    default:
+        return false;
+    }
+}
+
</ins><span class="cx"> bool Arg::isRepresentableAs(Width width, Signedness signedness) const
</span><span class="cx"> {
</span><span class="cx">     switch (signedness) {
</span><span class="lines">@@ -67,6 +84,31 @@
</span><span class="cx">     ASSERT_NOT_REACHED();
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+bool Arg::usesTmp(Air::Tmp tmp) const
+{
+    bool uses = false;
+    const_cast&lt;Arg*&gt;(this)-&gt;forEachTmpFast(
+        [&amp;] (Air::Tmp otherTmp) {
+            if (otherTmp == tmp)
+                uses = true;
+        });
+    return uses;
+}
+
+bool Arg::canRepresent(Value* value) const
+{
+    return isType(typeForB3Type(value-&gt;type()));
+}
+
+bool Arg::isCompatibleType(const Arg&amp; other) const
+{
+    if (hasType())
+        return other.isType(type());
+    if (other.hasType())
+        return isType(other.type());
+    return true;
+}
+
</ins><span class="cx"> void Arg::dump(PrintStream&amp; out) const
</span><span class="cx"> {
</span><span class="cx">     switch (m_kind) {
</span><span class="lines">@@ -117,6 +159,9 @@
</span><span class="cx">     case Special:
</span><span class="cx">         out.print(pointerDump(special()));
</span><span class="cx">         return;
</span><ins>+    case WidthArg:
+        out.print(width());
+        return;
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     RELEASE_ASSERT_NOT_REACHED();
</span><span class="lines">@@ -167,6 +212,9 @@
</span><span class="cx">     case Arg::Special:
</span><span class="cx">         out.print(&quot;Special&quot;);
</span><span class="cx">         return;
</span><ins>+    case Arg::WidthArg:
+        out.print(&quot;WidthArg&quot;);
+        return;
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     RELEASE_ASSERT_NOT_REACHED();
</span><span class="lines">@@ -231,16 +279,16 @@
</span><span class="cx"> {
</span><span class="cx">     switch (width) {
</span><span class="cx">     case Arg::Width8:
</span><del>-        out.print(&quot;Width8&quot;);
</del><ins>+        out.print(&quot;8&quot;);
</ins><span class="cx">         return;
</span><span class="cx">     case Arg::Width16:
</span><del>-        out.print(&quot;Width16&quot;);
</del><ins>+        out.print(&quot;16&quot;);
</ins><span class="cx">         return;
</span><span class="cx">     case Arg::Width32:
</span><del>-        out.print(&quot;Width32&quot;);
</del><ins>+        out.print(&quot;32&quot;);
</ins><span class="cx">         return;
</span><span class="cx">     case Arg::Width64:
</span><del>-        out.print(&quot;Width64&quot;);
</del><ins>+        out.print(&quot;64&quot;);
</ins><span class="cx">         return;
</span><span class="cx">     }
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirArgh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirArg.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirArg.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirArg.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -38,8 +38,12 @@
</span><span class="cx"> #pragma GCC diagnostic ignored &quot;-Wreturn-type&quot;
</span><span class="cx"> #endif // COMPILER(GCC) &amp;&amp; ASSERT_DISABLED
</span><span class="cx"> 
</span><del>-namespace JSC { namespace B3 { namespace Air {
</del><ins>+namespace JSC { namespace B3 {
</ins><span class="cx"> 
</span><ins>+class Value;
+
+namespace Air {
+
</ins><span class="cx"> class Special;
</span><span class="cx"> class StackSlot;
</span><span class="cx"> 
</span><span class="lines">@@ -74,7 +78,8 @@
</span><span class="cx">         RelCond,
</span><span class="cx">         ResCond,
</span><span class="cx">         DoubleCond,
</span><del>-        Special
</del><ins>+        Special,
+        WidthArg
</ins><span class="cx">     };
</span><span class="cx"> 
</span><span class="cx">     enum Role : int8_t {
</span><span class="lines">@@ -161,6 +166,13 @@
</span><span class="cx"> 
</span><span class="cx">     static const unsigned numTypes = 2;
</span><span class="cx"> 
</span><ins>+    template&lt;typename Functor&gt;
+    static void forEachType(const Functor&amp; functor)
+    {
+        functor(GP);
+        functor(FP);
+    }
+
</ins><span class="cx">     enum Width : int8_t {
</span><span class="cx">         Width8,
</span><span class="cx">         Width16,
</span><span class="lines">@@ -227,6 +239,26 @@
</span><span class="cx">         return isAnyUse(role) &amp;&amp; !isColdUse(role);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    static Role cooled(Role role)
+    {
+        switch (role) {
+        case ColdUse:
+        case LateColdUse:
+        case UseDef:
+        case UseZDef:
+        case Def:
+        case ZDef:
+        case UseAddr:
+        case Scratch:
+        case EarlyDef:
+            return role;
+        case Use:
+            return ColdUse;
+        case LateUse:
+            return LateColdUse;
+        }
+    }
+
</ins><span class="cx">     // Returns true if the Role implies that the Inst will Use the Arg before doing anything else.
</span><span class="cx">     static bool isEarlyUse(Role role)
</span><span class="cx">     {
</span><span class="lines">@@ -449,6 +481,11 @@
</span><span class="cx">         return result;
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    static Arg immPtr(const void* address)
+    {
+        return imm64(bitwise_cast&lt;intptr_t&gt;(address));
+    }
+
</ins><span class="cx">     static Arg addr(Air::Tmp base, int32_t offset = 0)
</span><span class="cx">     {
</span><span class="cx">         ASSERT(base.isGP());
</span><span class="lines">@@ -563,6 +600,14 @@
</span><span class="cx">         return result;
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    static Arg widthArg(Width width)
+    {
+        Arg result;
+        result.m_kind = WidthArg;
+        result.m_offset = width;
+        return result;
+    }
+
</ins><span class="cx">     bool operator==(const Arg&amp; other) const
</span><span class="cx">     {
</span><span class="cx">         return m_offset == other.m_offset
</span><span class="lines">@@ -599,6 +644,11 @@
</span><span class="cx">         return kind() == Imm64;
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    bool isSomeImm() const
+    {
+        return isImm() || isImm64();
+    }
+
</ins><span class="cx">     bool isAddr() const
</span><span class="cx">     {
</span><span class="cx">         return kind() == Addr;
</span><span class="lines">@@ -619,6 +669,21 @@
</span><span class="cx">         return kind() == Index;
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    bool isMemory() const
+    {
+        switch (kind()) {
+        case Addr:
+        case Stack:
+        case CallArg:
+        case Index:
+            return true;
+        default:
+            return false;
+        }
+    }
+
+    bool isStackMemory() const;
+
</ins><span class="cx">     bool isRelCond() const
</span><span class="cx">     {
</span><span class="cx">         return kind() == RelCond;
</span><span class="lines">@@ -651,6 +716,11 @@
</span><span class="cx">         return kind() == Special;
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    bool isWidthArg() const
+    {
+        return kind() == WidthArg;
+    }
+
</ins><span class="cx">     bool isAlive() const
</span><span class="cx">     {
</span><span class="cx">         return isTmp() || isStack();
</span><span class="lines">@@ -694,18 +764,7 @@
</span><span class="cx">         return m_base;
</span><span class="cx">     }
</span><span class="cx"> 
</span><del>-    bool hasOffset() const
-    {
-        switch (kind()) {
-        case Addr:
-        case Stack:
-        case CallArg:
-        case Index:
-            return true;
-        default:
-            return false;
-        }
-    }
</del><ins>+    bool hasOffset() const { return isMemory(); }
</ins><span class="cx">     
</span><span class="cx">     int32_t offset() const
</span><span class="cx">     {
</span><span class="lines">@@ -744,6 +803,12 @@
</span><span class="cx">         return bitwise_cast&lt;Air::Special*&gt;(m_offset);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    Width width() const
+    {
+        ASSERT(kind() == WidthArg);
+        return static_cast&lt;Width&gt;(m_offset);
+    }
+
</ins><span class="cx">     bool isGPTmp() const
</span><span class="cx">     {
</span><span class="cx">         return isTmp() &amp;&amp; tmp().isGP();
</span><span class="lines">@@ -768,6 +833,7 @@
</span><span class="cx">         case ResCond:
</span><span class="cx">         case DoubleCond:
</span><span class="cx">         case Special:
</span><ins>+        case WidthArg:
</ins><span class="cx">             return true;
</span><span class="cx">         case Tmp:
</span><span class="cx">             return isGPTmp();
</span><span class="lines">@@ -786,6 +852,7 @@
</span><span class="cx">         case ResCond:
</span><span class="cx">         case DoubleCond:
</span><span class="cx">         case Special:
</span><ins>+        case WidthArg:
</ins><span class="cx">         case Invalid:
</span><span class="cx">             return false;
</span><span class="cx">         case Addr:
</span><span class="lines">@@ -829,6 +896,10 @@
</span><span class="cx">         ASSERT_NOT_REACHED();
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    bool canRepresent(Value* value) const;
+
+    bool isCompatibleType(const Arg&amp; other) const;
+
</ins><span class="cx">     bool isGPR() const
</span><span class="cx">     {
</span><span class="cx">         return isTmp() &amp;&amp; tmp().isGPR();
</span><span class="lines">@@ -970,6 +1041,7 @@
</span><span class="cx">         case ResCond:
</span><span class="cx">         case DoubleCond:
</span><span class="cx">         case Special:
</span><ins>+        case WidthArg:
</ins><span class="cx">             return true;
</span><span class="cx">         }
</span><span class="cx">         ASSERT_NOT_REACHED();
</span><span class="lines">@@ -992,6 +1064,8 @@
</span><span class="cx">         }
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    bool usesTmp(Air::Tmp tmp) const;
+
</ins><span class="cx">     // This is smart enough to know that an address arg in a Def or UseDef rule will use its
</span><span class="cx">     // tmps and never def them. For example, this:
</span><span class="cx">     //
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirBasicBlockh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirBasicBlock.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirBasicBlock.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirBasicBlock.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -78,15 +78,17 @@
</span><span class="cx">     InstList&amp; insts() { return m_insts; }
</span><span class="cx"> 
</span><span class="cx">     template&lt;typename Inst&gt;
</span><del>-    void appendInst(Inst&amp;&amp; inst)
</del><ins>+    Inst&amp; appendInst(Inst&amp;&amp; inst)
</ins><span class="cx">     {
</span><span class="cx">         m_insts.append(std::forward&lt;Inst&gt;(inst));
</span><ins>+        return m_insts.last();
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     template&lt;typename... Arguments&gt;
</span><del>-    void append(Arguments&amp;&amp;... arguments)
</del><ins>+    Inst&amp; append(Arguments&amp;&amp;... arguments)
</ins><span class="cx">     {
</span><span class="cx">         m_insts.append(Inst(std::forward&lt;Arguments&gt;(arguments)...));
</span><ins>+        return m_insts.last();
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     // The &quot;0&quot; case is the case to which the branch jumps, so the &quot;then&quot; case. The &quot;1&quot; case is the
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirCCallingConventioncpp"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/AirCCallingConvention.cpp (0 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirCCallingConvention.cpp                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/AirCCallingConvention.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -0,0 +1,127 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include &quot;config.h&quot;
+#include &quot;AirCCallingConvention.h&quot;
+
+#if ENABLE(B3_JIT)
+
+#include &quot;AirCCallSpecial.h&quot;
+#include &quot;AirCode.h&quot;
+#include &quot;B3CCallValue.h&quot;
+#include &quot;B3ValueInlines.h&quot;
+
+namespace JSC { namespace B3 { namespace Air {
+
+namespace {
+
+template&lt;typename BankInfo&gt;
+Arg marshallCCallArgumentImpl(unsigned&amp; argumentCount, unsigned&amp; stackOffset, Value* child)
+{
+    unsigned argumentIndex = argumentCount++;
+    if (argumentIndex &lt; BankInfo::numberOfArgumentRegisters)
+        return Tmp(BankInfo::toArgumentRegister(argumentIndex));
+
+    unsigned slotSize;
+    if (isARM64() &amp;&amp; isIOS()) {
+        // Arguments are packed.
+        slotSize = sizeofType(child-&gt;type());
+    } else {
+        // Arguments are aligned.
+        slotSize = 8;
+    }
+
+    stackOffset = WTF::roundUpToMultipleOf(slotSize, stackOffset);
+    Arg result = Arg::callArg(stackOffset);
+    stackOffset += slotSize;
+    return result;
+}
+
+Arg marshallCCallArgument(
+    unsigned&amp; gpArgumentCount, unsigned&amp; fpArgumentCount, unsigned&amp; stackOffset, Value* child)
+{
+    switch (Arg::typeForB3Type(child-&gt;type())) {
+    case Arg::GP:
+        return marshallCCallArgumentImpl&lt;GPRInfo&gt;(gpArgumentCount, stackOffset, child);
+    case Arg::FP:
+        return marshallCCallArgumentImpl&lt;FPRInfo&gt;(fpArgumentCount, stackOffset, child);
+    }
+    RELEASE_ASSERT_NOT_REACHED();
+    return Arg();
+}
+
+} // anonymous namespace
+
+Vector&lt;Arg&gt; computeCCallingConvention(Code&amp; code, CCallValue* value)
+{
+    Vector&lt;Arg&gt; result;
+    result.append(Tmp(CCallSpecial::scratchRegister));
+    unsigned gpArgumentCount = 0;
+    unsigned fpArgumentCount = 0;
+    unsigned stackOffset = 0;
+    for (unsigned i = 1; i &lt; value-&gt;numChildren(); ++i) {
+        result.append(
+            marshallCCallArgument(gpArgumentCount, fpArgumentCount, stackOffset, value-&gt;child(i)));
+    }
+    code.requestCallArgAreaSize(WTF::roundUpToMultipleOf(stackAlignmentBytes(), stackOffset));
+    return result;
+}
+
+Tmp cCallResult(Type type)
+{
+    switch (type) {
+    case Void:
+        return Tmp();
+    case Int32:
+    case Int64:
+        return Tmp(GPRInfo::returnValueGPR);
+    case Float:
+    case Double:
+        return Tmp(FPRInfo::returnValueFPR);
+    }
+
+    RELEASE_ASSERT_NOT_REACHED();
+    return Tmp();
+}
+
+Inst buildCCall(Code&amp; code, Value* origin, const Vector&lt;Arg&gt;&amp; arguments)
+{
+    Inst inst(Patch, origin, Arg::special(code.cCallSpecial()));
+    inst.args.append(arguments[0]);
+    inst.args.append(Tmp(GPRInfo::returnValueGPR));
+    inst.args.append(Tmp(GPRInfo::returnValueGPR2));
+    inst.args.append(Tmp(FPRInfo::returnValueFPR));
+    for (unsigned i = 1; i &lt; arguments.size(); ++i) {
+        Arg arg = arguments[i];
+        if (arg.isTmp())
+            inst.args.append(arg);
+    }
+    return inst;
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirCCallingConventionh"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/AirCCallingConvention.h (0 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirCCallingConvention.h                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/AirCCallingConvention.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -0,0 +1,55 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#ifndef AirCCallingConvention_h
+#define AirCCallingConvention_h
+
+#if ENABLE(B3_JIT)
+
+#include &quot;AirArg.h&quot;
+#include &quot;AirInst.h&quot;
+#include &quot;B3Type.h&quot;
+#include &lt;wtf/Vector.h&gt;
+
+namespace JSC { namespace B3 {
+
+class CCallValue;
+
+namespace Air {
+
+class Code;
+
+Vector&lt;Arg&gt; computeCCallingConvention(Code&amp;, CCallValue*);
+
+Tmp cCallResult(Type);
+
+Inst buildCCall(Code&amp;, Value* origin, const Vector&lt;Arg&gt;&amp;);
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirCCallingConvention_h
+
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirCodeh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirCode.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirCode.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirCode.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -60,12 +60,13 @@
</span><span class="cx"> 
</span><span class="cx">     Procedure&amp; proc() { return m_proc; }
</span><span class="cx"> 
</span><del>-    BasicBlock* addBlock(double frequency = 1);
</del><ins>+    JS_EXPORT_PRIVATE BasicBlock* addBlock(double frequency = 1);
</ins><span class="cx"> 
</span><span class="cx">     // Note that you can rely on stack slots always getting indices that are larger than the index
</span><span class="cx">     // of any prior stack slot. In fact, all stack slots you create in the future will have an index
</span><span class="cx">     // that is &gt;= stackSlots().size().
</span><del>-    StackSlot* addStackSlot(unsigned byteSize, StackSlotKind, StackSlotValue* = nullptr);
</del><ins>+    JS_EXPORT_PRIVATE StackSlot* addStackSlot(
+        unsigned byteSize, StackSlotKind, StackSlotValue* = nullptr);
</ins><span class="cx">     StackSlot* addStackSlot(StackSlotValue*);
</span><span class="cx"> 
</span><span class="cx">     Special* addSpecial(std::unique_ptr&lt;Special&gt;);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirCustomcpp"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/AirCustom.cpp (0 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirCustom.cpp                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/AirCustom.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -0,0 +1,160 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include &quot;config.h&quot;
+#include &quot;AirCustom.h&quot;
+
+#if ENABLE(B3_JIT)
+
+#include &quot;B3CCallValue.h&quot;
+#include &quot;B3ValueInlines.h&quot;
+
+namespace JSC { namespace B3 { namespace Air {
+
+bool CCallCustom::isValidForm(Inst&amp; inst)
+{
+    CCallValue* value = inst.origin-&gt;as&lt;CCallValue&gt;();
+    if (!value)
+        return false;
+
+    if (inst.args.size() != (value-&gt;type() == Void ? 0 : 1) + value-&gt;numChildren())
+        return false;
+
+    // The arguments can only refer to the stack, tmps, or immediates.
+    for (Arg&amp; arg : inst.args) {
+        if (!arg.isTmp() &amp;&amp; !arg.isStackMemory() &amp;&amp; !arg.isSomeImm())
+            return false;
+    }
+
+    unsigned offset = 0;
+
+    if (!inst.args[0].isGP())
+        return false;
+
+    // If there is a result then it cannot be an immediate.
+    if (value-&gt;type() != Void) {
+        if (inst.args[1].isSomeImm())
+            return false;
+        if (!inst.args[1].canRepresent(value))
+            return false;
+        offset++;
+    }
+
+    for (unsigned i = value-&gt;numChildren(); i-- &gt; 1;) {
+        Value* child = value-&gt;child(i);
+        Arg arg = inst.args[offset + i];
+        if (!arg.canRepresent(child))
+            return false;
+    }
+
+    return true;
+}
+
+CCallHelpers::Jump CCallCustom::generate(Inst&amp; inst, CCallHelpers&amp;, GenerationContext&amp;)
+{
+    dataLog(&quot;FATAL: Unlowered C call: &quot;, inst, &quot;\n&quot;);
+    UNREACHABLE_FOR_PLATFORM();
+    return CCallHelpers::Jump();
+}
+
+bool ShuffleCustom::isValidForm(Inst&amp; inst)
+{
+    if (inst.args.size() % 3)
+        return false;
+
+    // A destination may only appear once. This requirement allows us to avoid the undefined behavior
+    // of having a destination that is supposed to get multiple inputs simultaneously. It also
+    // imposes some interesting constraints on the &quot;shape&quot; of the shuffle. If we treat a shuffle pair
+    // as an edge and the Args as nodes, then the single-destination requirement means that the
+    // shuffle graph consists of two kinds of subgraphs:
+    //
+    // - Spanning trees. We call these shifts. They can be executed as a sequence of Move
+    //   instructions and don't usually require scratch registers.
+    //
+    // - Closed loops. These loops consist of nodes that have one successor and one predecessor, so
+    //   there is no way to &quot;get into&quot; the loop from outside of it. These can be executed using swaps
+    //   or by saving one of the Args to a scratch register and executing it as a shift.
+    HashSet&lt;Arg&gt; dsts;
+
+    for (unsigned i = 0; i &lt; inst.args.size(); ++i) {
+        Arg arg = inst.args[i];
+        unsigned mode = i % 3;
+
+        if (mode == 2) {
+            // It's the width.
+            if (!arg.isWidthArg())
+                return false;
+            continue;
+        }
+
+        // The source can be an immediate.
+        if (!mode) {
+            if (arg.isSomeImm())
+                continue;
+
+            if (!arg.isCompatibleType(inst.args[i + 1]))
+                return false;
+        } else {
+            ASSERT(mode == 1);
+            if (!dsts.add(arg).isNewEntry)
+                return false;
+        }
+
+        if (arg.isTmp() || arg.isMemory())
+            continue;
+
+        return false;
+    }
+
+    // No destination register may appear in any address expressions. The lowering can't handle it
+    // and it's not useful for the way we end up using Shuffles. Normally, Shuffles only used for
+    // stack addresses and non-stack registers.
+    for (Arg&amp; arg : inst.args) {
+        if (!arg.isMemory())
+            continue;
+        bool ok = true;
+        arg.forEachTmpFast(
+            [&amp;] (Tmp tmp) {
+                if (dsts.contains(tmp))
+                    ok = false;
+            });
+        if (!ok)
+            return false;
+    }
+
+    return true;
+}
+
+CCallHelpers::Jump ShuffleCustom::generate(Inst&amp; inst, CCallHelpers&amp;, GenerationContext&amp;)
+{
+    dataLog(&quot;FATAL: Unlowered shuffle: &quot;, inst, &quot;\n&quot;);
+    UNREACHABLE_FOR_PLATFORM();
+    return CCallHelpers::Jump();
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirCustomh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirCustom.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirCustom.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirCustom.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -30,6 +30,7 @@
</span><span class="cx"> 
</span><span class="cx"> #include &quot;AirInst.h&quot;
</span><span class="cx"> #include &quot;AirSpecial.h&quot;
</span><ins>+#include &quot;B3Value.h&quot;
</ins><span class="cx"> 
</span><span class="cx"> namespace JSC { namespace B3 { namespace Air {
</span><span class="cx"> 
</span><span class="lines">@@ -51,6 +52,8 @@
</span><span class="cx"> // you need to carry extra state around with the instruction. Also, Specials mean that you
</span><span class="cx"> // always have access to Code&amp; even in methods that don't take a GenerationContext.
</span><span class="cx"> 
</span><ins>+// Definition of Patch instruction. Patch is used to delegate the behavior of the instruction to the
+// Special object, which will be the first argument to the instruction.
</ins><span class="cx"> struct PatchCustom {
</span><span class="cx">     template&lt;typename Functor&gt;
</span><span class="cx">     static void forEachArg(Inst&amp; inst, const Functor&amp; functor)
</span><span class="lines">@@ -96,6 +99,114 @@
</span><span class="cx">     }
</span><span class="cx"> };
</span><span class="cx"> 
</span><ins>+// Definition of CCall instruction. CCall is used for hot path C function calls. It's lowered to a
+// Patch with an Air CCallSpecial along with code to marshal instructions. The lowering happens
+// before register allocation, so that the register allocator sees the clobbers.
+struct CCallCustom {
+    template&lt;typename Functor&gt;
+    static void forEachArg(Inst&amp; inst, const Functor&amp; functor)
+    {
+        Value* value = inst.origin;
+
+        unsigned index = 0;
+
+        functor(inst.args[index++], Arg::Use, Arg::GP, Arg::pointerWidth()); // callee
+        
+        if (value-&gt;type() != Void) {
+            functor(
+                inst.args[index++], Arg::Def,
+                Arg::typeForB3Type(value-&gt;type()),
+                Arg::widthForB3Type(value-&gt;type()));
+        }
+
+        for (unsigned i = 1; i &lt; value-&gt;numChildren(); ++i) {
+            Value* child = value-&gt;child(i);
+            functor(
+                inst.args[index++], Arg::Use,
+                Arg::typeForB3Type(child-&gt;type()),
+                Arg::widthForB3Type(child-&gt;type()));
+        }
+    }
+
+    template&lt;typename... Arguments&gt;
+    static bool isValidFormStatic(Arguments...)
+    {
+        return false;
+    }
+
+    static bool isValidForm(Inst&amp;);
+
+    static bool admitsStack(Inst&amp;, unsigned)
+    {
+        return true;
+    }
+
+    static bool hasNonArgNonControlEffects(Inst&amp;)
+    {
+        return true;
+    }
+
+    // This just crashes, since we expect C calls to be lowered before generation.
+    static CCallHelpers::Jump generate(Inst&amp;, CCallHelpers&amp;, GenerationContext&amp;);
+};
+
+struct ColdCCallCustom : CCallCustom {
+    template&lt;typename Functor&gt;
+    static void forEachArg(Inst&amp; inst, const Functor&amp; functor)
+    {
+        // This is just like a call, but uses become cold.
+        CCallCustom::forEachArg(
+            inst,
+            [&amp;] (Arg&amp; arg, Arg::Role role, Arg::Type type, Arg::Width width) {
+                functor(arg, Arg::cooled(role), type, width);
+            });
+    }
+};
+
+struct ShuffleCustom {
+    template&lt;typename Functor&gt;
+    static void forEachArg(Inst&amp; inst, const Functor&amp; functor)
+    {
+        unsigned limit = inst.args.size() / 3 * 3;
+        for (unsigned i = 0; i &lt; limit; i += 3) {
+            Arg&amp; src = inst.args[i + 0];
+            Arg&amp; dst = inst.args[i + 1];
+            Arg&amp; widthArg = inst.args[i + 2];
+            Arg::Width width = widthArg.width();
+            Arg::Type type = src.isGP() &amp;&amp; dst.isGP() ? Arg::GP : Arg::FP;
+            functor(src, Arg::Use, type, width);
+            functor(dst, Arg::Def, type, width);
+            functor(widthArg, Arg::Use, Arg::GP, Arg::Width8);
+        }
+    }
+
+    template&lt;typename... Arguments&gt;
+    static bool isValidFormStatic(Arguments...)
+    {
+        return false;
+    }
+
+    static bool isValidForm(Inst&amp;);
+    
+    static bool admitsStack(Inst&amp;, unsigned index)
+    {
+        switch (index % 3) {
+        case 0:
+        case 1:
+            return true;
+        default:
+            return false;
+        }
+    }
+
+    static bool hasNonArgNonControlEffects(Inst&amp;)
+    {
+        return false;
+    }
+
+    static CCallHelpers::Jump generate(Inst&amp;, CCallHelpers&amp;, GenerationContext&amp;);
+};
+
</ins><span class="cx"> } } } // namespace JSC::B3::Air
</span><span class="cx"> 
</span><span class="cx"> #endif // ENABLE(B3_JIT)
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirEmitShufflecpp"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/AirEmitShuffle.cpp (0 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirEmitShuffle.cpp                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/AirEmitShuffle.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -0,0 +1,520 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include &quot;config.h&quot;
+#include &quot;AirEmitShuffle.h&quot;
+
+#if ENABLE(B3_JIT)
+
+#include &quot;AirInstInlines.h&quot;
+#include &quot;AirRegisterPriority.h&quot;
+#include &lt;wtf/GraphNodeWorklist.h&gt;
+#include &lt;wtf/ListDump.h&gt;
+
+namespace JSC { namespace B3 { namespace Air {
+
+namespace {
+
+bool verbose = false;
+
+template&lt;typename Functor&gt;
+Tmp findPossibleScratch(Arg::Type type, const Functor&amp; functor) {
+    for (Reg reg : regsInPriorityOrder(type)) {
+        Tmp tmp(reg);
+        if (functor(tmp))
+            return tmp;
+    }
+    return Tmp();
+}
+
+Tmp findPossibleScratch(Arg::Type type, const Arg&amp; arg1, const Arg&amp; arg2) {
+    return findPossibleScratch(
+        type,
+        [&amp;] (Tmp tmp) -&gt; bool {
+            return !arg1.usesTmp(tmp) &amp;&amp; !arg2.usesTmp(tmp);
+        });
+}
+
+// Example: (a =&gt; b, b =&gt; a, a =&gt; c, b =&gt; d)
+struct Rotate {
+    Vector&lt;ShufflePair&gt; loop; // in the example, this is the loop: (a =&gt; b, b =&gt; a)
+    Vector&lt;ShufflePair&gt; fringe; // in the example, these are the associated shifts: (a =&gt; c, b =&gt; d)
+};
+
+} // anonymous namespace
+
+void ShufflePair::dump(PrintStream&amp; out) const
+{
+    out.print(width(), &quot;:&quot;, src(), &quot;=&gt;&quot;, dst());
+}
+
+Vector&lt;Inst&gt; emitShuffle(
+    Vector&lt;ShufflePair&gt; pairs, std::array&lt;Arg, 2&gt; scratches, Arg::Type type, Value* origin)
+{
+    pairs.removeAllMatching(
+        [&amp;] (const ShufflePair&amp; pair) -&gt; bool {
+            return pair.src() == pair.dst();
+        });
+    
+    // First validate that this is the kind of shuffle that we know how to deal with.
+#if !ASSERT_DISABLED
+    for (const ShufflePair&amp; pair : pairs) {
+        ASSERT(pair.src().isType(type));
+        ASSERT(pair.dst().isType(type));
+        ASSERT(pair.dst().isTmp() || pair.dst().isMemory());
+    }
+#endif // !ASSERT_DISABLED
+
+    // There are two possible kinds of operations that we will do:
+    //
+    // - Shift. Example: (a =&gt; b, b =&gt; c). We emit this as &quot;Move b, c; Move a, b&quot;. This only requires
+    //   scratch registers if there are memory-&gt;memory moves. We want to find as many of these as
+    //   possible because they are cheaper. Note that shifts can involve the same source mentioned
+    //   multiple times. Example: (a =&gt; b, a =&gt; c, b =&gt; d, b =&gt; e).
+    //
+    // - Rotate. Example: (a =&gt; b, b =&gt; a). We want to emit this as &quot;Swap a, b&quot;, but that instruction
+    //   may not be available, in which case we may need a scratch register or a scratch memory
+    //   location. A gnarlier example is (a =&gt; b, b =&gt; c, c =&gt; a). We can emit this as &quot;Swap b, c;
+    //   Swap a, b&quot;. Note that swapping has to be careful about differing widths.
+    //
+    // Note that a rotate can have &quot;fringe&quot;. For example, we might have (a =&gt; b, b =&gt; a, a =&gt;c,
+    // b =&gt; d). This has a rotate loop (a =&gt; b, b =&gt; a) and some fringe (a =&gt; c, b =&gt; d). We treat
+    // the whole thing as a single rotate.
+    //
+    // We will find multiple disjoint such operations. We can execute them in any order.
+
+    // We interpret these as Moves that should be executed backwards. All shifts are keyed by their
+    // starting source.
+    HashMap&lt;Arg, Vector&lt;ShufflePair&gt;&gt; shifts;
+
+    // We interpret these as Swaps over src()'s that should be executed backwards, i.e. for a list
+    // of size 3 we would do &quot;Swap list[1].src(), list[2].src(); Swap list[0].src(), list[1].src()&quot;.
+    // Note that we actually can't do that if the widths don't match or other bad things happen.
+    // But, prior to executing all of that, we need to execute the fringe: the shifts comming off the
+    // rotate.
+    Vector&lt;Rotate&gt; rotates;
+
+    {
+        HashMap&lt;Arg, Vector&lt;ShufflePair&gt;&gt; mapping;
+        for (const ShufflePair&amp; pair : pairs)
+            mapping.add(pair.src(), Vector&lt;ShufflePair&gt;()).iterator-&gt;value.append(pair);
+
+        Vector&lt;ShufflePair&gt; currentPairs;
+
+        while (!mapping.isEmpty()) {
+            ASSERT(currentPairs.isEmpty());
+            Arg originalSrc = mapping.begin()-&gt;key;
+            ASSERT(!shifts.contains(originalSrc));
+            if (verbose)
+                dataLog(&quot;Processing from &quot;, originalSrc, &quot;\n&quot;);
+            
+            GraphNodeWorklist&lt;Arg&gt; worklist;
+            worklist.push(originalSrc);
+            while (Arg src = worklist.pop()) {
+                HashMap&lt;Arg, Vector&lt;ShufflePair&gt;&gt;::iterator iter = mapping.find(src);
+                if (iter == mapping.end()) {
+                    // With a shift it's possible that we previously built the tail of this shift.
+                    // See if that's the case now.
+                    if (verbose)
+                        dataLog(&quot;Trying to append shift at &quot;, src, &quot;\n&quot;);
+                    currentPairs.appendVector(shifts.take(src));
+                    continue;
+                }
+                Vector&lt;ShufflePair&gt; pairs = WTFMove(iter-&gt;value);
+                mapping.remove(iter);
+
+                for (const ShufflePair&amp; pair : pairs) {
+                    currentPairs.append(pair);
+                    ASSERT(pair.src() == src);
+                    worklist.push(pair.dst());
+                }
+            }
+
+            ASSERT(currentPairs.size());
+            ASSERT(currentPairs[0].src() == originalSrc);
+
+            if (verbose)
+                dataLog(&quot;currentPairs = &quot;, listDump(currentPairs), &quot;\n&quot;);
+
+            bool isRotate = false;
+            for (const ShufflePair&amp; pair : currentPairs) {
+                if (pair.dst() == originalSrc) {
+                    isRotate = true;
+                    break;
+                }
+            }
+
+            if (isRotate) {
+                if (verbose)
+                    dataLog(&quot;It's a rotate.\n&quot;);
+                Rotate rotate;
+                
+                // The common case is that the rotate does not have fringe. When this happens, the
+                // last destination is the first source.
+                if (currentPairs.last().dst() == originalSrc)
+                    rotate.loop = WTFMove(currentPairs);
+                else {
+                    // This is the slow path. The rotate has fringe.
+                    
+                    HashMap&lt;Arg, ShufflePair&gt; dstMapping;
+                    for (const ShufflePair&amp; pair : currentPairs)
+                        dstMapping.add(pair.dst(), pair);
+
+                    ShufflePair pair = dstMapping.take(originalSrc);
+                    for (;;) {
+                        rotate.loop.append(pair);
+
+                        auto iter = dstMapping.find(pair.src());
+                        if (iter == dstMapping.end())
+                            break;
+                        pair = iter-&gt;value;
+                        dstMapping.remove(iter);
+                    }
+
+                    rotate.loop.reverse();
+
+                    // Make sure that the fringe appears in the same order as how it appeared in the
+                    // currentPairs, since that's the DFS order.
+                    for (const ShufflePair&amp; pair : currentPairs) {
+                        // But of course we only include it if it's not in the loop.
+                        if (dstMapping.contains(pair.dst()))
+                            rotate.fringe.append(pair);
+                    }
+                }
+                
+                // If the graph search terminates because we returned to the first source, then the
+                // pair list has to have a very particular shape.
+                for (unsigned i = rotate.loop.size() - 1; i--;)
+                    ASSERT(rotate.loop[i].dst() == rotate.loop[i + 1].src());
+                rotates.append(WTFMove(rotate));
+                currentPairs.resize(0);
+            } else {
+                if (verbose)
+                    dataLog(&quot;It's a shift.\n&quot;);
+                shifts.add(originalSrc, WTFMove(currentPairs));
+            }
+        }
+    }
+
+    if (verbose) {
+        dataLog(&quot;Shifts:\n&quot;);
+        for (auto&amp; entry : shifts)
+            dataLog(&quot;    &quot;, entry.key, &quot;: &quot;, listDump(entry.value), &quot;\n&quot;);
+        dataLog(&quot;Rotates:\n&quot;);
+        for (auto&amp; rotate : rotates)
+            dataLog(&quot;    loop = &quot;, listDump(rotate.loop), &quot;, fringe = &quot;, listDump(rotate.fringe), &quot;\n&quot;);
+    }
+
+    // In the worst case, we need two scratch registers. The way we do this is that the client passes
+    // us what scratch registers he happens to have laying around. We will need scratch registers in
+    // the following cases:
+    //
+    // - Shuffle pairs where both src and dst refer to memory.
+    // - Rotate when no Swap instruction is available.
+    //
+    // Lucky for us, we are guaranteed to have extra scratch registers anytime we have a Shift that
+    // ends with a register. We search for such a register right now.
+
+    auto moveForWidth = [&amp;] (Arg::Width width) -&gt; Opcode {
+        switch (width) {
+        case Arg::Width32:
+            return type == Arg::GP ? Move32 : MoveFloat;
+        case Arg::Width64:
+            return type == Arg::GP ? Move : MoveDouble;
+        default:
+            RELEASE_ASSERT_NOT_REACHED();
+        }
+    };
+
+    Opcode conservativeMove = moveForWidth(Arg::conservativeWidth(type));
+
+    // We will emit things in reverse. We maintain a list of packs of instructions, and then we emit
+    // append them together in reverse (for example the thing at the end of resultPacks is placed
+    // first). This is useful because the last thing we emit frees up its destination registers, so
+    // it affects how we emit things before it.
+    Vector&lt;Vector&lt;Inst&gt;&gt; resultPacks;
+    Vector&lt;Inst&gt; result;
+
+    auto commitResult = [&amp;] () {
+        resultPacks.append(WTFMove(result));
+    };
+
+    auto getScratch = [&amp;] (unsigned index, Tmp possibleScratch) -&gt; Tmp {
+        if (scratches[index].isTmp())
+            return scratches[index].tmp();
+
+        if (!possibleScratch)
+            return Tmp();
+        result.append(Inst(conservativeMove, origin, possibleScratch, scratches[index]));
+        return possibleScratch;
+    };
+
+    auto returnScratch = [&amp;] (unsigned index, Tmp tmp) {
+        if (Arg(tmp) != scratches[index])
+            result.append(Inst(conservativeMove, origin, scratches[index], tmp));
+    };
+
+    auto handleShiftPair = [&amp;] (const ShufflePair&amp; pair, unsigned scratchIndex) {
+        Opcode move = moveForWidth(pair.width());
+        
+        if (!isValidForm(move, pair.src().kind(), pair.dst().kind())) {
+            Tmp scratch =
+                getScratch(scratchIndex, findPossibleScratch(type, pair.src(), pair.dst()));
+            RELEASE_ASSERT(scratch);
+            if (isValidForm(move, pair.src().kind(), Arg::Tmp))
+                result.append(Inst(moveForWidth(pair.width()), origin, pair.src(), scratch));
+            else {
+                ASSERT(pair.src().isSomeImm());
+                ASSERT(move == Move32);
+                result.append(Inst(Move, origin, Arg::imm64(pair.src().value()), scratch));
+            }
+            result.append(Inst(moveForWidth(pair.width()), origin, scratch, pair.dst()));
+            returnScratch(scratchIndex, scratch);
+            return;
+        }
+        
+        result.append(Inst(move, origin, pair.src(), pair.dst()));
+    };
+
+    auto handleShift = [&amp;] (Vector&lt;ShufflePair&gt;&amp; shift) {
+        // FIXME: We could optimize the spill behavior of the shifter by checking if any of the
+        // shifts need spills. If they do, then we could try to get a register out here. Note that
+        // this may fail where the current strategy succeeds: out here we need a register that does
+        // not interfere with any of the shifts, while the current strategy only needs to find a
+        // scratch register that does not interfer with a particular shift. So, this optimization
+        // will be opportunistic: if it succeeds, then the individual shifts can use that scratch,
+        // otherwise they will do what they do now.
+        
+        for (unsigned i = shift.size(); i--;)
+            handleShiftPair(shift[i], 0);
+
+        Arg lastDst = shift.last().dst();
+        if (lastDst.isTmp()) {
+            for (Arg&amp; scratch : scratches) {
+                ASSERT(scratch != lastDst);
+                if (!scratch.isTmp()) {
+                    scratch = lastDst;
+                    break;
+                }
+            }
+        }
+    };
+
+    // First handle shifts whose last destination is a tmp because these free up scratch registers.
+    // These end up last in the final sequence, so the final destination of these shifts will be
+    // available as a scratch location for anything emitted prior (so, after, since we're emitting in
+    // reverse).
+    for (auto&amp; entry : shifts) {
+        Vector&lt;ShufflePair&gt;&amp; shift = entry.value;
+        if (shift.last().dst().isTmp())
+            handleShift(shift);
+        commitResult();
+    }
+
+    // Now handle the rest of the shifts.
+    for (auto&amp; entry : shifts) {
+        Vector&lt;ShufflePair&gt;&amp; shift = entry.value;
+        if (!shift.last().dst().isTmp())
+            handleShift(shift);
+        commitResult();
+    }
+
+    for (Rotate&amp; rotate : rotates) {
+        if (!rotate.fringe.isEmpty()) {
+            // Make sure we do the fringe first! This won't clobber any of the registers that are
+            // part of the rotation.
+            handleShift(rotate.fringe);
+        }
+        
+        bool canSwap = false;
+        Opcode swap = Oops;
+        Arg::Width swapWidth = Arg::Width8; // bogus value
+
+        // Currently, the swap instruction is not available for floating point on any architecture we
+        // support.
+        if (type == Arg::GP) {
+            // Figure out whether we will be doing 64-bit swaps or 32-bit swaps. If we have a mix of
+            // widths we handle that by fixing up the relevant register with zero-extends.
+            swap = Swap32;
+            swapWidth = Arg::Width32;
+            bool hasMemory = false;
+            bool hasIndex = false;
+            for (ShufflePair&amp; pair : rotate.loop) {
+                switch (pair.width()) {
+                case Arg::Width32:
+                    break;
+                case Arg::Width64:
+                    swap = Swap64;
+                    swapWidth = Arg::Width64;
+                    break;
+                default:
+                    RELEASE_ASSERT_NOT_REACHED();
+                    break;
+                }
+
+                hasMemory |= pair.src().isMemory() || pair.dst().isMemory();
+                hasIndex |= pair.src().isIndex() || pair.dst().isIndex();
+            }
+            
+            canSwap = isValidForm(swap, Arg::Tmp, Arg::Tmp);
+
+            // We can totally use swaps even if there are shuffles involving memory. But, we play it
+            // safe in that case. There are corner cases we don't handle, and our ability to do it is
+            // contingent upon swap form availability.
+            
+            if (hasMemory) {
+                canSwap &amp;= isValidForm(swap, Arg::Tmp, Arg::Addr);
+                
+                // We don't take the swapping path if there is a mix of widths and some of the
+                // shuffles involve memory. That gets too confusing. We might be able to relax this
+                // to only bail if there are subwidth pairs involving memory, but I haven't thought
+                // about it very hard. Anyway, this case is not common: rotates involving memory
+                // don't arise for function calls, and they will only happen for rotates in user code
+                // if some of the variables get spilled. It's hard to imagine a program that rotates
+                // data around in variables while also doing a combination of uint32-&gt;uint64 and
+                // int64-&gt;int32 casts.
+                for (ShufflePair&amp; pair : rotate.loop)
+                    canSwap &amp;= pair.width() == swapWidth;
+            }
+
+            if (hasIndex)
+                canSwap &amp;= isValidForm(swap, Arg::Tmp, Arg::Index);
+        }
+
+        if (canSwap) {
+            for (unsigned i = rotate.loop.size() - 1; i--;) {
+                Arg left = rotate.loop[i].src();
+                Arg right = rotate.loop[i + 1].src();
+
+                if (left.isMemory() &amp;&amp; right.isMemory()) {
+                    // Note that this is a super rare outcome. Rotates are rare. Spills are rare.
+                    // Moving data between two spills is rare. To get here a lot of rare stuff has to
+                    // all happen at once.
+                    
+                    Tmp scratch = getScratch(0, findPossibleScratch(type, left, right));
+                    RELEASE_ASSERT(scratch);
+                    result.append(Inst(moveForWidth(swapWidth), origin, left, scratch));
+                    result.append(Inst(swap, origin, scratch, right));
+                    result.append(Inst(moveForWidth(swapWidth), origin, scratch, left));
+                    returnScratch(0, scratch);
+                    continue;
+                }
+
+                if (left.isMemory())
+                    std::swap(left, right);
+                
+                result.append(Inst(swap, origin, left, right));
+            }
+
+            for (ShufflePair pair : rotate.loop) {
+                if (pair.width() == swapWidth)
+                    continue;
+
+                RELEASE_ASSERT(pair.width() == Arg::Width32);
+                RELEASE_ASSERT(swapWidth == Arg::Width64);
+                RELEASE_ASSERT(pair.dst().isTmp());
+
+                // Need to do an extra zero extension.
+                result.append(Inst(Move32, origin, pair.dst(), pair.dst()));
+            }
+        } else {
+            // We can treat this as a shift so long as we take the last destination (i.e. first
+            // source) and save it first. Then we handle the first entry in the pair in the rotate
+            // specially, after we restore the last destination. This requires some special care to
+            // find a scratch register. It's possible that we have a rotate that uses the entire
+            // available register file.
+
+            Tmp scratch = findPossibleScratch(
+                type,
+                [&amp;] (Tmp tmp) -&gt; bool {
+                    for (ShufflePair pair : rotate.loop) {
+                        if (pair.src().usesTmp(tmp))
+                            return false;
+                        if (pair.dst().usesTmp(tmp))
+                            return false;
+                    }
+                    return true;
+                });
+
+            // NOTE: This is the most likely use of scratch registers.
+            scratch = getScratch(0, scratch);
+
+            // We may not have found a scratch register. When this happens, we can just use the spill
+            // slot directly.
+            Arg rotateSave = scratch ? Arg(scratch) : scratches[0];
+            
+            handleShiftPair(
+                ShufflePair(rotate.loop.last().dst(), rotateSave, rotate.loop[0].width()), 1);
+
+            for (unsigned i = rotate.loop.size(); i-- &gt; 1;)
+                handleShiftPair(rotate.loop[i], 1);
+
+            handleShiftPair(
+                ShufflePair(rotateSave, rotate.loop[0].dst(), rotate.loop[0].width()), 1);
+
+            if (scratch)
+                returnScratch(0, scratch);
+        }
+
+        commitResult();
+    }
+
+    ASSERT(result.isEmpty());
+
+    for (unsigned i = resultPacks.size(); i--;)
+        result.appendVector(resultPacks[i]);
+
+    return result;
+}
+
+Vector&lt;Inst&gt; emitShuffle(
+    const Vector&lt;ShufflePair&gt;&amp; pairs,
+    const std::array&lt;Arg, 2&gt;&amp; gpScratch, const std::array&lt;Arg, 2&gt;&amp; fpScratch,
+    Value* origin)
+{
+    Vector&lt;ShufflePair&gt; gpPairs;
+    Vector&lt;ShufflePair&gt; fpPairs;
+    for (const ShufflePair&amp; pair : pairs) {
+        if (pair.src().isMemory() &amp;&amp; pair.dst().isMemory() &amp;&amp; pair.width() &gt; Arg::pointerWidth()) {
+            // 8-byte memory-to-memory moves on a 32-bit platform are best handled as float moves.
+            fpPairs.append(pair);
+        } else if (pair.src().isGP() &amp;&amp; pair.dst().isGP()) {
+            // This means that gpPairs gets memory-to-memory shuffles. The assumption is that we
+            // can do that more efficiently using GPRs, except in the special case above.
+            gpPairs.append(pair);
+        } else
+            fpPairs.append(pair);
+    }
+
+    Vector&lt;Inst&gt; result;
+    result.appendVector(emitShuffle(gpPairs, gpScratch, Arg::GP, origin));
+    result.appendVector(emitShuffle(fpPairs, fpScratch, Arg::FP, origin));
+    return result;
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirEmitShuffleh"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/AirEmitShuffle.h (0 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirEmitShuffle.h                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/AirEmitShuffle.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -0,0 +1,115 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#ifndef AirEmitShuffle_h
+#define AirEmitShuffle_h
+
+#if ENABLE(B3_JIT)
+
+#include &quot;AirArg.h&quot;
+#include &quot;AirInst.h&quot;
+#include &lt;wtf/Vector.h&gt;
+
+namespace JSC { namespace B3 {
+
+class Value;
+
+namespace Air {
+
+class ShufflePair {
+public:
+    ShufflePair()
+    {
+    }
+    
+    ShufflePair(const Arg&amp; src, const Arg&amp; dst, Arg::Width width)
+        : m_src(src)
+        , m_dst(dst)
+        , m_width(width)
+    {
+    }
+
+    const Arg&amp; src() const { return m_src; }
+    const Arg&amp; dst() const { return m_dst; }
+
+    // The width determines the kind of move we do. You can only choose Width32 or Width64 right now.
+    // For GP, it picks between Move32 and Move. For FP, it picks between MoveFloat and MoveDouble.
+    Arg::Width width() const { return m_width; }
+
+    void dump(PrintStream&amp;) const;
+    
+private:
+    Arg m_src;
+    Arg m_dst;
+    Arg::Width m_width { Arg::Width8 };
+};
+
+// Perform a shuffle of a given type. The scratch argument is mandatory. You should pass it as
+// follows: If you know that you have scratch registers or temporaries available - that is, they're
+// registers that are not mentioned in the shuffle, have the same type as the shuffle, and are not
+// live at the shuffle - then you can pass them. If you don't have scratch registers available or if
+// you don't feel like looking for them, you can pass memory locations. It's always safe to pass a
+// pair of memory locations, and replacing either memory location with a register can be viewed as an
+// optimization. It's a pretty important optimization. Some more notes:
+//
+// - We define scratch registers as things that are not live before the shuffle and are not one of
+//   the destinations of the shuffle. Not being live before the shuffle also means that they cannot
+//   be used for any of the sources of the shuffle.
+//
+// - A second scratch location is only needed when you have shuffle pairs where memory is used both
+//   as source and destination.
+//
+// - You're guaranteed not to need any scratch locations if there is a Swap instruction available for
+//   the type and you don't have any memory locations that are both the source and the destination of
+//   some pairs. GP supports Swap on x86 while FP never supports Swap.
+//
+// - Passing memory locations as scratch if are running emitShuffle() before register allocation is
+//   silly, since that will cause emitShuffle() to pick some specific registers when it does need
+//   scratch. One easy way to avoid that predicament is to ensure that you call emitShuffle() after
+//   register allocation. For this reason we could add a Shuffle instruction so that we can defer
+//   shufflings until after regalloc.
+//
+// - Shuffles with memory=&gt;memory pairs are not very well tuned. You should avoid them if you want
+//   performance. If you need to do them, then making sure that you reserve a temporary is one way to
+//   get acceptable performance.
+//
+// NOTE: Use this method (and its friend below) to emit shuffles after register allocation. Before
+// register allocation it is much better to simply use the Shuffle instruction.
+Vector&lt;Inst&gt; emitShuffle(
+    Vector&lt;ShufflePair&gt;, std::array&lt;Arg, 2&gt; scratch, Arg::Type, Value* origin);
+
+// Perform a shuffle that involves any number of types. Pass scratch registers or memory locations
+// for each type according to the rules above.
+Vector&lt;Inst&gt; emitShuffle(
+    const Vector&lt;ShufflePair&gt;&amp;,
+    const std::array&lt;Arg, 2&gt;&amp; gpScratch, const std::array&lt;Arg, 2&gt;&amp; fpScratch,
+    Value* origin);
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirEmitShuffle_h
+
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirGeneratecpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirGenerate.cpp (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirGenerate.cpp        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirGenerate.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -35,6 +35,8 @@
</span><span class="cx"> #include &quot;AirGenerationContext.h&quot;
</span><span class="cx"> #include &quot;AirHandleCalleeSaves.h&quot;
</span><span class="cx"> #include &quot;AirIteratedRegisterCoalescing.h&quot;
</span><ins>+#include &quot;AirLowerAfterRegAlloc.h&quot;
+#include &quot;AirLowerMacros.h&quot;
</ins><span class="cx"> #include &quot;AirOpcodeUtils.h&quot;
</span><span class="cx"> #include &quot;AirOptimizeBlockOrder.h&quot;
</span><span class="cx"> #include &quot;AirReportUsedRegisters.h&quot;
</span><span class="lines">@@ -65,6 +67,8 @@
</span><span class="cx">         dataLog(code);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    lowerMacros(code);
+
</ins><span class="cx">     // This is where we run our optimizations and transformations.
</span><span class="cx">     // FIXME: Add Air optimizations.
</span><span class="cx">     // https://bugs.webkit.org/show_bug.cgi?id=150456
</span><span class="lines">@@ -80,6 +84,8 @@
</span><span class="cx">     else
</span><span class="cx">         iteratedRegisterCoalescing(code);
</span><span class="cx"> 
</span><ins>+    lowerAfterRegAlloc(code);
+
</ins><span class="cx">     // Prior to this point the prologue and epilogue is implicit. This makes it explicit. It also
</span><span class="cx">     // does things like identify which callee-saves we're using and saves them.
</span><span class="cx">     handleCalleeSaves(code);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirGenerateh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirGenerate.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirGenerate.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirGenerate.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -38,11 +38,11 @@
</span><span class="cx"> 
</span><span class="cx"> // This takes an Air::Code that hasn't had any stack allocation and optionally hasn't had any
</span><span class="cx"> // register allocation and does both of those things.
</span><del>-void prepareForGeneration(Code&amp;);
</del><ins>+JS_EXPORT_PRIVATE void prepareForGeneration(Code&amp;);
</ins><span class="cx"> 
</span><span class="cx"> // This generates the code using the given CCallHelpers instance. Note that this may call callbacks
</span><span class="cx"> // in the supplied code as it is generating.
</span><del>-void generate(Code&amp;, CCallHelpers&amp;);
</del><ins>+JS_EXPORT_PRIVATE void generate(Code&amp;, CCallHelpers&amp;);
</ins><span class="cx"> 
</span><span class="cx"> } } } // namespace JSC::B3::Air
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirInsertionSetcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirInsertionSet.cpp (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirInsertionSet.cpp        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirInsertionSet.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -33,6 +33,18 @@
</span><span class="cx"> 
</span><span class="cx"> namespace JSC { namespace B3 { namespace Air {
</span><span class="cx"> 
</span><ins>+void InsertionSet::insertInsts(size_t index, const Vector&lt;Inst&gt;&amp; insts)
+{
+    for (const Inst&amp; inst : insts)
+        insertInst(index, inst);
+}
+
+void InsertionSet::insertInsts(size_t index, Vector&lt;Inst&gt;&amp;&amp; insts)
+{
+    for (Inst&amp; inst : insts)
+        insertInst(index, WTFMove(inst));
+}
+
</ins><span class="cx"> void InsertionSet::execute(BasicBlock* block)
</span><span class="cx"> {
</span><span class="cx">     bubbleSort(m_insertions.begin(), m_insertions.end());
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirInsertionSeth"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirInsertionSet.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirInsertionSet.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirInsertionSet.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -59,6 +59,9 @@
</span><span class="cx">     {
</span><span class="cx">         appendInsertion(Insertion(index, std::forward&lt;Inst&gt;(inst)));
</span><span class="cx">     }
</span><ins>+
+    void insertInsts(size_t index, const Vector&lt;Inst&gt;&amp;);
+    void insertInsts(size_t index, Vector&lt;Inst&gt;&amp;&amp;);
</ins><span class="cx">     
</span><span class="cx">     template&lt;typename... Arguments&gt;
</span><span class="cx">     void insert(size_t index, Arguments&amp;&amp;... arguments)
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirInsth"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirInst.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirInst.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirInst.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -85,6 +85,15 @@
</span><span class="cx"> 
</span><span class="cx">     explicit operator bool() const { return origin || opcode != Nop || args.size(); }
</span><span class="cx"> 
</span><ins>+    void append() { }
+    
+    template&lt;typename... Arguments&gt;
+    void append(Arg arg, Arguments... arguments)
+    {
+        args.append(arg);
+        append(arguments...);
+    }
+
</ins><span class="cx">     // Note that these functors all avoid using &quot;const&quot; because we want to use them for things that
</span><span class="cx">     // edit IR. IR is meant to be edited; if you're carrying around a &quot;const Inst&amp;&quot; then you're
</span><span class="cx">     // probably doing it wrong.
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirLowerAfterRegAlloccpp"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.cpp (0 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.cpp                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -0,0 +1,242 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include &quot;config.h&quot;
+#include &quot;AirLowerAfterRegAlloc.h&quot;
+
+#if ENABLE(B3_JIT)
+
+#include &quot;AirCCallingConvention.h&quot;
+#include &quot;AirCode.h&quot;
+#include &quot;AirEmitShuffle.h&quot;
+#include &quot;AirInsertionSet.h&quot;
+#include &quot;AirInstInlines.h&quot;
+#include &quot;AirLiveness.h&quot;
+#include &quot;AirPhaseScope.h&quot;
+#include &quot;AirRegisterPriority.h&quot;
+#include &quot;B3CCallValue.h&quot;
+#include &quot;B3ValueInlines.h&quot;
+#include &quot;RegisterSet.h&quot;
+#include &lt;wtf/HashMap.h&gt;
+
+namespace JSC { namespace B3 { namespace Air {
+
+namespace {
+
+bool verbose = false;
+    
+} // anonymous namespace
+
+void lowerAfterRegAlloc(Code&amp; code)
+{
+    PhaseScope phaseScope(code, &quot;lowerAfterRegAlloc&quot;);
+
+    if (verbose)
+        dataLog(&quot;Code before lowerAfterRegAlloc:\n&quot;, code);
+
+    HashMap&lt;Inst*, RegisterSet&gt; usedRegisters;
+
+    RegLiveness liveness(code);
+    for (BasicBlock* block : code) {
+        RegLiveness::LocalCalc localCalc(liveness, block);
+
+        for (unsigned instIndex = block-&gt;size(); instIndex--;) {
+            Inst&amp; inst = block-&gt;at(instIndex);
+            
+            RegisterSet set;
+
+            bool isRelevant = inst.opcode == Shuffle || inst.opcode == ColdCCall;
+            
+            if (isRelevant) {
+                for (Reg reg : localCalc.live())
+                    set.set(reg);
+            }
+            
+            localCalc.execute(instIndex);
+
+            if (isRelevant)
+                usedRegisters.add(&amp;inst, set);
+        }
+    }
+
+    auto getScratches = [&amp;] (RegisterSet set, Arg::Type type) -&gt; std::array&lt;Arg, 2&gt; {
+        std::array&lt;Arg, 2&gt; result;
+        for (unsigned i = 0; i &lt; 2; ++i) {
+            bool found = false;
+            for (Reg reg : regsInPriorityOrder(type)) {
+                if (!set.get(reg)) {
+                    result[i] = Tmp(reg);
+                    set.set(reg);
+                    found = true;
+                    break;
+                }
+            }
+            if (!found) {
+                result[i] = Arg::stack(
+                    code.addStackSlot(
+                        Arg::bytes(Arg::conservativeWidth(type)),
+                        StackSlotKind::Anonymous));
+            }
+        }
+        return result;
+    };
+
+    // Now transform the code.
+    InsertionSet insertionSet(code);
+    for (BasicBlock* block : code) {
+        for (unsigned instIndex = 0; instIndex &lt; block-&gt;size(); ++instIndex) {
+            Inst&amp; inst = block-&gt;at(instIndex);
+
+            switch (inst.opcode) {
+            case Shuffle: {
+                RegisterSet set = usedRegisters.get(&amp;inst);
+                Vector&lt;ShufflePair&gt; pairs;
+                for (unsigned i = 0; i &lt; inst.args.size(); i += 3) {
+                    Arg src = inst.args[i + 0];
+                    Arg dst = inst.args[i + 1];
+                    Arg::Width width = inst.args[i + 2].width();
+
+                    // The used register set contains things live after the shuffle. But
+                    // emitShuffle() wants a scratch register that is not just dead but also does not
+                    // interfere with either sources or destinations.
+                    auto excludeRegisters = [&amp;] (Tmp tmp) {
+                        if (tmp.isReg())
+                            set.set(tmp.reg());
+                    };
+                    src.forEachTmpFast(excludeRegisters);
+                    dst.forEachTmpFast(excludeRegisters);
+                    
+                    pairs.append(ShufflePair(src, dst, width));
+                }
+                std::array&lt;Arg, 2&gt; gpScratch = getScratches(set, Arg::GP);
+                std::array&lt;Arg, 2&gt; fpScratch = getScratches(set, Arg::FP);
+                insertionSet.insertInsts(
+                    instIndex, emitShuffle(pairs, gpScratch, fpScratch, inst.origin));
+                inst = Inst();
+                break;
+            }
+
+            case ColdCCall: {
+                CCallValue* value = inst.origin-&gt;as&lt;CCallValue&gt;();
+
+                RegisterSet liveRegs = usedRegisters.get(&amp;inst);
+                RegisterSet regsToSave = liveRegs;
+                regsToSave.exclude(RegisterSet::calleeSaveRegisters());
+                regsToSave.exclude(RegisterSet::stackRegisters());
+                regsToSave.exclude(RegisterSet::reservedHardwareRegisters());
+
+                RegisterSet preUsed = regsToSave;
+                Vector&lt;Arg&gt; destinations = computeCCallingConvention(code, value);
+                Tmp result = cCallResult(value-&gt;type());
+                Arg originalResult = result ? inst.args[1] : Arg();
+                
+                Vector&lt;ShufflePair&gt; pairs;
+                for (unsigned i = 0; i &lt; destinations.size(); ++i) {
+                    Value* child = value-&gt;child(i);
+                    Arg src = inst.args[result ? (i &gt;= 1 ? i + 1 : i) : i ];
+                    Arg dst = destinations[i];
+                    Arg::Width width = Arg::widthForB3Type(child-&gt;type());
+                    pairs.append(ShufflePair(src, dst, width));
+
+                    auto excludeRegisters = [&amp;] (Tmp tmp) {
+                        if (tmp.isReg())
+                            preUsed.set(tmp.reg());
+                    };
+                    src.forEachTmpFast(excludeRegisters);
+                    dst.forEachTmpFast(excludeRegisters);
+                }
+
+                std::array&lt;Arg, 2&gt; gpScratch = getScratches(preUsed, Arg::GP);
+                std::array&lt;Arg, 2&gt; fpScratch = getScratches(preUsed, Arg::FP);
+                
+                // Also need to save all live registers. Don't need to worry about the result
+                // register.
+                if (originalResult.isReg())
+                    regsToSave.clear(originalResult.reg());
+                Vector&lt;StackSlot*&gt; stackSlots;
+                regsToSave.forEach(
+                    [&amp;] (Reg reg) {
+                        Tmp tmp(reg);
+                        Arg arg(tmp);
+                        Arg::Width width = Arg::conservativeWidth(arg.type());
+                        StackSlot* stackSlot =
+                            code.addStackSlot(Arg::bytes(width), StackSlotKind::Anonymous);
+                        pairs.append(ShufflePair(arg, Arg::stack(stackSlot), width));
+                        stackSlots.append(stackSlot);
+                    });
+
+                if (verbose)
+                    dataLog(&quot;Pre-call pairs for &quot;, inst, &quot;: &quot;, listDump(pairs), &quot;\n&quot;);
+                
+                insertionSet.insertInsts(
+                    instIndex, emitShuffle(pairs, gpScratch, fpScratch, inst.origin));
+
+                inst = buildCCall(code, inst.origin, destinations);
+
+                // Now we need to emit code to restore registers.
+                pairs.resize(0);
+                unsigned stackSlotIndex = 0;
+                regsToSave.forEach(
+                    [&amp;] (Reg reg) {
+                        Tmp tmp(reg);
+                        Arg arg(tmp);
+                        Arg::Width width = Arg::conservativeWidth(arg.type());
+                        StackSlot* stackSlot = stackSlots[stackSlotIndex++];
+                        pairs.append(ShufflePair(Arg::stack(stackSlot), arg, width));
+                    });
+                if (result) {
+                    ShufflePair pair(result, originalResult, Arg::widthForB3Type(value-&gt;type()));
+                    pairs.append(pair);
+                }
+
+                gpScratch = getScratches(liveRegs, Arg::GP);
+                fpScratch = getScratches(liveRegs, Arg::FP);
+                
+                insertionSet.insertInsts(
+                    instIndex + 1, emitShuffle(pairs, gpScratch, fpScratch, inst.origin));
+                break;
+            }
+
+            default:
+                break;
+            }
+        }
+
+        insertionSet.execute(block);
+
+        block-&gt;insts().removeAllMatching(
+            [&amp;] (Inst&amp; inst) -&gt; bool {
+                return !inst;
+            });
+    }
+
+    if (verbose)
+        dataLog(&quot;Code after lowerAfterRegAlloc:\n&quot;, code);
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirLowerAfterRegAlloch"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.h (0 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.h                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/AirLowerAfterRegAlloc.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -0,0 +1,44 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#ifndef AirLowerAfterRegAlloc_h
+#define AirLowerAfterRegAlloc_h
+
+#if ENABLE(B3_JIT)
+
+namespace JSC { namespace B3 { namespace Air {
+
+class Code;
+
+// This lowers Shuffle and ColdCCall instructions. This phase is designed to be run after register
+// allocation.
+
+void lowerAfterRegAlloc(Code&amp;);
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirLowerAfterRegAlloc_h
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirLowerMacroscpp"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/AirLowerMacros.cpp (0 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirLowerMacros.cpp                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/AirLowerMacros.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -0,0 +1,105 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include &quot;config.h&quot;
+#include &quot;AirLowerMacros.h&quot;
+
+#if ENABLE(B3_JIT)
+
+#include &quot;AirCCallingConvention.h&quot;
+#include &quot;AirCode.h&quot;
+#include &quot;AirInsertionSet.h&quot;
+#include &quot;AirInstInlines.h&quot;
+#include &quot;AirPhaseScope.h&quot;
+#include &quot;B3CCallValue.h&quot;
+#include &quot;B3ValueInlines.h&quot;
+
+namespace JSC { namespace B3 { namespace Air {
+
+void lowerMacros(Code&amp; code)
+{
+    PhaseScope phaseScope(code, &quot;lowerMacros&quot;);
+
+    InsertionSet insertionSet(code);
+    for (BasicBlock* block : code) {
+        for (unsigned instIndex = 0; instIndex &lt; block-&gt;size(); ++instIndex) {
+            Inst&amp; inst = block-&gt;at(instIndex);
+
+            switch (inst.opcode) {
+            case CCall: {
+                CCallValue* value = inst.origin-&gt;as&lt;CCallValue&gt;();
+
+                Vector&lt;Arg&gt; destinations = computeCCallingConvention(code, value);
+
+                Inst shuffleArguments(Shuffle, value);
+                unsigned offset = value-&gt;type() == Void ? 0 : 1;
+                for (unsigned i = 1; i &lt; destinations.size(); ++i) {
+                    Value* child = value-&gt;child(i);
+                    shuffleArguments.args.append(inst.args[offset + i]);
+                    shuffleArguments.args.append(destinations[i]);
+                    shuffleArguments.args.append(Arg::widthArg(Arg::widthForB3Type(child-&gt;type())));
+                }
+                insertionSet.insertInst(instIndex, WTFMove(shuffleArguments));
+
+                // Indicate that we're using our original callee argument.
+                destinations[0] = inst.args[0];
+
+                // Save where the original instruction put its result.
+                Arg resultDst = value-&gt;type() == Void ? Arg() : inst.args[1];
+                
+                inst = buildCCall(code, inst.origin, destinations);
+
+                Tmp result = cCallResult(value-&gt;type());
+                switch (value-&gt;type()) {
+                case Void:
+                    break;
+                case Float:
+                    insertionSet.insert(instIndex + 1, MoveFloat, value, result, resultDst);
+                    break;
+                case Double:
+                    insertionSet.insert(instIndex + 1, MoveDouble, value, result, resultDst);
+                    break;
+                case Int32:
+                    insertionSet.insert(instIndex + 1, Move32, value, result, resultDst);
+                    break;
+                case Int64:
+                    insertionSet.insert(instIndex + 1, Move, value, result, resultDst);
+                    break;
+                }
+                break;
+            }
+
+            default:
+                break;
+            }
+        }
+        insertionSet.execute(block);
+    }
+}
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirLowerMacrosh"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/AirLowerMacros.h (0 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirLowerMacros.h                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/AirLowerMacros.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -0,0 +1,45 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#ifndef AirLowerMacros_h
+#define AirLowerMacros_h
+
+#if ENABLE(B3_JIT)
+
+namespace JSC { namespace B3 { namespace Air {
+
+class Code;
+
+// Air has some opcodes that are very high-level and are meant to reduce the amount of low-level
+// knowledge in the B3-&gt;Air lowering. The current example is CCall.
+
+void lowerMacros(Code&amp;);
+
+} } } // namespace JSC::B3::Air
+
+#endif // ENABLE(B3_JIT)
+
+#endif // AirLowerMacros_h
+
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirOpcodeopcodes"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -417,6 +417,14 @@
</span><span class="cx">     Tmp, Index as storePtr
</span><span class="cx">     x86: Imm, Addr as storePtr
</span><span class="cx"> 
</span><ins>+x86: Swap32 UD:G:32, UD:G:32
+    Tmp, Tmp
+    Tmp, Addr
+
+x86_64: Swap64 UD:G:64, UD:G:64
+    Tmp, Tmp
+    Tmp, Addr
+
</ins><span class="cx"> Move32 U:G:32, ZD:G:32
</span><span class="cx">     Tmp, Tmp as zeroExtend32ToPtr
</span><span class="cx">     Addr, Tmp as load32
</span><span class="lines">@@ -682,7 +690,19 @@
</span><span class="cx"> 
</span><span class="cx"> Oops /terminal
</span><span class="cx"> 
</span><ins>+# A Shuffle is a multi-source, multi-destination move. It simultaneously does multiple moves at once.
+# The moves are specified as triplets of src, dst, and width. For example you can request a swap this
+# way:
+#     Shuffle %tmp1, %tmp2, 64, %tmp2, %tmp1, 64
+custom Shuffle
+
</ins><span class="cx"> # Air allows for exotic behavior. A Patch's behavior is determined entirely by the Special operand,
</span><span class="cx"> # which must be the first operand.
</span><span class="cx"> custom Patch
</span><span class="cx"> 
</span><ins>+# Instructions used for lowering C calls. These don't make it to Air generation. They get lowered to
+# something else first. The origin Value must be a CCallValue.
+custom CCall
+custom ColdCCall
+
+
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirRegisterPriorityh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirRegisterPriority.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirRegisterPriority.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/air/AirRegisterPriority.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -52,7 +52,7 @@
</span><span class="cx">     return RegistersInPriorityOrder&lt;Bank&gt;::inPriorityOrder();
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-const Vector&lt;Reg&gt;&amp; regsInPriorityOrder(Arg::Type);
</del><ins>+JS_EXPORT_PRIVATE const Vector&lt;Reg&gt;&amp; regsInPriorityOrder(Arg::Type);
</ins><span class="cx"> 
</span><span class="cx"> } } } // namespace JSC::B3::Air
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airtestaircpp"></a>
<div class="addfile"><h4>Added: trunk/Source/JavaScriptCore/b3/air/testair.cpp (0 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/testair.cpp                                (rev 0)
+++ trunk/Source/JavaScriptCore/b3/air/testair.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -0,0 +1,1707 @@
</span><ins>+/*
+ * Copyright (C) 2016 Apple Inc. All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY APPLE INC. ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL APPLE INC. OR
+ * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
+ * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
+ * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
+ * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
+ * OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 
+ */
+
+#include &quot;config.h&quot;
+
+#include &quot;AirCode.h&quot;
+#include &quot;AirGenerate.h&quot;
+#include &quot;AirInstInlines.h&quot;
+#include &quot;AirRegisterPriority.h&quot;
+#include &quot;AllowMacroScratchRegisterUsage.h&quot;
+#include &quot;B3Compilation.h&quot;
+#include &quot;B3Procedure.h&quot;
+#include &quot;CCallHelpers.h&quot;
+#include &quot;InitializeThreading.h&quot;
+#include &quot;JSCInlines.h&quot;
+#include &quot;LinkBuffer.h&quot;
+#include &quot;PureNaN.h&quot;
+#include &quot;VM.h&quot;
+#include &lt;cmath&gt;
+#include &lt;map&gt;
+#include &lt;string&gt;
+#include &lt;wtf/Lock.h&gt;
+#include &lt;wtf/NumberOfCores.h&gt;
+#include &lt;wtf/Threading.h&gt;
+
+// We don't have a NO_RETURN_DUE_TO_EXIT, nor should we. That's ridiculous.
+static bool hiddenTruthBecauseNoReturnIsStupid() { return true; }
+
+static void usage()
+{
+    dataLog(&quot;Usage: testb3 [&lt;filter&gt;]\n&quot;);
+    if (hiddenTruthBecauseNoReturnIsStupid())
+        exit(1);
+}
+
+#if ENABLE(B3_JIT)
+
+using namespace JSC;
+using namespace JSC::B3::Air;
+
+namespace {
+
+StaticLock crashLock;
+
+// Nothing fancy for now; we just use the existing WTF assertion machinery.
+#define CHECK(x) do {                                                   \
+        if (!!(x))                                                      \
+            break;                                                      \
+        crashLock.lock();                                               \
+        WTFReportAssertionFailure(__FILE__, __LINE__, WTF_PRETTY_FUNCTION, #x); \
+        CRASH();                                                        \
+    } while (false)
+
+VM* vm;
+
+std::unique_ptr&lt;B3::Compilation&gt; compile(B3::Procedure&amp; proc)
+{
+    prepareForGeneration(proc.code());
+    CCallHelpers jit(vm);
+    generate(proc.code(), jit);
+    LinkBuffer linkBuffer(*vm, jit, nullptr);
+
+    return std::make_unique&lt;B3::Compilation&gt;(
+        FINALIZE_CODE(linkBuffer, (&quot;testair compilation&quot;)), proc.releaseByproducts());
+}
+
+template&lt;typename T, typename... Arguments&gt;
+T invoke(const B3::Compilation&amp; code, Arguments... arguments)
+{
+    T (*function)(Arguments...) = bitwise_cast&lt;T(*)(Arguments...)&gt;(code.code().executableAddress());
+    return function(arguments...);
+}
+
+template&lt;typename T, typename... Arguments&gt;
+T compileAndRun(B3::Procedure&amp; procedure, Arguments... arguments)
+{
+    return invoke&lt;T&gt;(*compile(procedure), arguments...);
+}
+
+void testSimple()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(Move, nullptr, Arg::imm(42), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(compileAndRun&lt;int&gt;(proc) == 42);
+}
+
+// Use this to put a constant into a register without Air being able to see the constant.
+template&lt;typename T&gt;
+void loadConstantImpl(BasicBlock* block, T value, B3::Air::Opcode move, Tmp tmp, Tmp scratch)
+{
+    static StaticLock lock;
+    static std::map&lt;T, T*&gt;* map; // I'm not messing with HashMap's problems with integers.
+
+    LockHolder locker(lock);
+    if (!map)
+        map = new std::map&lt;T, T*&gt;();
+
+    if (!map-&gt;count(value))
+        (*map)[value] = new T(value);
+
+    T* ptr = (*map)[value];
+    block-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(ptr)), scratch);
+    block-&gt;append(move, nullptr, Arg::addr(scratch), tmp);
+}
+
+void loadConstant(BasicBlock* block, intptr_t value, Tmp tmp)
+{
+    loadConstantImpl&lt;intptr_t&gt;(block, value, Move, tmp, tmp);
+}
+
+void loadDoubleConstant(BasicBlock* block, double value, Tmp tmp, Tmp scratch)
+{
+    loadConstantImpl&lt;double&gt;(block, value, MoveDouble, tmp, scratch);
+}
+
+void testShuffleSimpleSwap()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32));
+
+    int32_t things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 2);
+    CHECK(things[2] == 4);
+    CHECK(things[3] == 3);
+}
+
+void testShuffleSimpleShift()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32));
+
+    int32_t things[5];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 2);
+    CHECK(things[2] == 3);
+    CHECK(things[3] == 3);
+    CHECK(things[4] == 4);
+}
+
+void testShuffleLongShift()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    loadConstant(root, 7, Tmp(GPRInfo::regT6));
+    loadConstant(root, 8, Tmp(GPRInfo::regT7));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT5), Tmp(GPRInfo::regT6), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT6), Tmp(GPRInfo::regT7), Arg::widthArg(Arg::Width32));
+
+    int32_t things[8];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT6), Arg::addr(base, 6 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT7), Arg::addr(base, 7 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 3);
+    CHECK(things[4] == 4);
+    CHECK(things[5] == 5);
+    CHECK(things[6] == 6);
+    CHECK(things[7] == 7);
+}
+
+void testShuffleLongShiftBackwards()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    loadConstant(root, 7, Tmp(GPRInfo::regT6));
+    loadConstant(root, 8, Tmp(GPRInfo::regT7));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT6), Tmp(GPRInfo::regT7), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT5), Tmp(GPRInfo::regT6), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32));
+
+    int32_t things[8];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT6), Arg::addr(base, 6 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT7), Arg::addr(base, 7 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 3);
+    CHECK(things[4] == 4);
+    CHECK(things[5] == 5);
+    CHECK(things[6] == 6);
+    CHECK(things[7] == 7);
+}
+
+void testShuffleSimpleRotate()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT0), Arg::widthArg(Arg::Width32));
+
+    int32_t things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 3);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 4);
+}
+
+void testShuffleSimpleBroadcast()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32));
+
+    int32_t things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 1);
+    CHECK(things[3] == 1);
+}
+
+void testShuffleBroadcastAllRegs()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    const Vector&lt;Reg&gt;&amp; regs = regsInPriorityOrder(Arg::GP);
+
+    BasicBlock* root = code.addBlock();
+    root-&gt;append(Move, nullptr, Arg::imm(35), Tmp(GPRInfo::regT0));
+    unsigned count = 1;
+    for (Reg reg : regs) {
+        if (reg != Reg(GPRInfo::regT0))
+            loadConstant(root, count++, Tmp(reg));
+    }
+    Inst&amp; shuffle = root-&gt;append(Shuffle, nullptr);
+    for (Reg reg : regs) {
+        if (reg != Reg(GPRInfo::regT0))
+            shuffle.append(Tmp(GPRInfo::regT0), Tmp(reg), Arg::widthArg(Arg::Width32));
+    }
+
+    StackSlot* slot = code.addStackSlot(sizeof(int32_t) * regs.size(), B3::StackSlotKind::Locked);
+    for (unsigned i = 0; i &lt; regs.size(); ++i)
+        root-&gt;append(Move32, nullptr, Tmp(regs[i]), Arg::stack(slot, i * sizeof(int32_t)));
+
+    Vector&lt;int32_t&gt; things(regs.size(), 666);
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things[0])), base);
+    for (unsigned i = 0; i &lt; regs.size(); ++i) {
+        root-&gt;append(Move32, nullptr, Arg::stack(slot, i * sizeof(int32_t)), Tmp(GPRInfo::regT0));
+        root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, i * sizeof(int32_t)));
+    }
+    
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    for (int32_t thing : things)
+        CHECK(thing == 35);
+}
+
+void testShuffleTreeShift()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    loadConstant(root, 7, Tmp(GPRInfo::regT6));
+    loadConstant(root, 8, Tmp(GPRInfo::regT7));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT6), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT7), Arg::widthArg(Arg::Width32));
+
+    int32_t things[8];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT6), Arg::addr(base, 6 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT7), Arg::addr(base, 7 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 1);
+    CHECK(things[3] == 2);
+    CHECK(things[4] == 2);
+    CHECK(things[5] == 3);
+    CHECK(things[6] == 3);
+    CHECK(things[7] == 4);
+}
+
+void testShuffleTreeShiftBackward()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    loadConstant(root, 7, Tmp(GPRInfo::regT6));
+    loadConstant(root, 8, Tmp(GPRInfo::regT7));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT7), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT6), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32));
+
+    int32_t things[8];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT6), Arg::addr(base, 6 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT7), Arg::addr(base, 7 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 1);
+    CHECK(things[3] == 2);
+    CHECK(things[4] == 2);
+    CHECK(things[5] == 3);
+    CHECK(things[6] == 3);
+    CHECK(things[7] == 4);
+}
+
+void testShuffleTreeShiftOtherBackward()
+{
+    // NOTE: This test was my original attempt at TreeShiftBackward but mistakes were made. So, this
+    // ends up being just a weird test. But weird tests are useful, so I kept it.
+    
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    loadConstant(root, 7, Tmp(GPRInfo::regT6));
+    loadConstant(root, 8, Tmp(GPRInfo::regT7));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT7), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT5), Tmp(GPRInfo::regT6), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT5), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT6), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT6), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT7), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT7), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32));
+
+    int32_t things[8];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT6), Arg::addr(base, 6 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT7), Arg::addr(base, 7 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 8);
+    CHECK(things[2] == 8);
+    CHECK(things[3] == 7);
+    CHECK(things[4] == 7);
+    CHECK(things[5] == 6);
+    CHECK(things[6] == 6);
+    CHECK(things[7] == 5);
+}
+
+void testShuffleMultipleShifts()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32));
+
+    int32_t things[6];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 3);
+    CHECK(things[3] == 3);
+    CHECK(things[4] == 3);
+    CHECK(things[5] == 1);
+}
+
+void testShuffleRotateWithFringe()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT0), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32));
+
+    int32_t things[6];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 3);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 1);
+    CHECK(things[4] == 2);
+    CHECK(things[5] == 3);
+}
+
+void testShuffleRotateWithLongFringe()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT0), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32));
+
+    int32_t things[6];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 3);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 1);
+    CHECK(things[4] == 4);
+    CHECK(things[5] == 5);
+}
+
+void testShuffleMultipleRotates()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT0), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT5), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32));
+
+    int32_t things[6];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 3);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 6);
+    CHECK(things[4] == 4);
+    CHECK(things[5] == 5);
+}
+
+void testShuffleShiftAndRotate()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    loadConstant(root, 4, Tmp(GPRInfo::regT3));
+    loadConstant(root, 5, Tmp(GPRInfo::regT4));
+    loadConstant(root, 6, Tmp(GPRInfo::regT5));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT1), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT0), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT4), Tmp(GPRInfo::regT5), Arg::widthArg(Arg::Width32));
+
+    int32_t things[6];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT5), Arg::addr(base, 5 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 3);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 2);
+    CHECK(things[3] == 4);
+    CHECK(things[4] == 4);
+    CHECK(things[5] == 5);
+}
+
+void testShuffleShiftAllRegs()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    const Vector&lt;Reg&gt;&amp; regs = regsInPriorityOrder(Arg::GP);
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i &lt; regs.size(); ++i)
+        loadConstant(root, 35 + i, Tmp(regs[i]));
+    Inst&amp; shuffle = root-&gt;append(Shuffle, nullptr);
+    for (unsigned i = 1; i &lt; regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width32));
+
+    StackSlot* slot = code.addStackSlot(sizeof(int32_t) * regs.size(), B3::StackSlotKind::Locked);
+    for (unsigned i = 0; i &lt; regs.size(); ++i)
+        root-&gt;append(Move32, nullptr, Tmp(regs[i]), Arg::stack(slot, i * sizeof(int32_t)));
+
+    Vector&lt;int32_t&gt; things(regs.size(), 666);
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things[0])), base);
+    for (unsigned i = 0; i &lt; regs.size(); ++i) {
+        root-&gt;append(Move32, nullptr, Arg::stack(slot, i * sizeof(int32_t)), Tmp(GPRInfo::regT0));
+        root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, i * sizeof(int32_t)));
+    }
+    
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 35);
+    for (unsigned i = 1; i &lt; regs.size(); ++i)
+        CHECK(things[i] == 35 + static_cast&lt;int32_t&gt;(i) - 1);
+}
+
+void testShuffleRotateAllRegs()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    const Vector&lt;Reg&gt;&amp; regs = regsInPriorityOrder(Arg::GP);
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i &lt; regs.size(); ++i)
+        loadConstant(root, 35 + i, Tmp(regs[i]));
+    Inst&amp; shuffle = root-&gt;append(Shuffle, nullptr);
+    for (unsigned i = 1; i &lt; regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width32));
+    shuffle.append(Tmp(regs.last()), Tmp(regs[0]), Arg::widthArg(Arg::Width32));
+
+    StackSlot* slot = code.addStackSlot(sizeof(int32_t) * regs.size(), B3::StackSlotKind::Locked);
+    for (unsigned i = 0; i &lt; regs.size(); ++i)
+        root-&gt;append(Move32, nullptr, Tmp(regs[i]), Arg::stack(slot, i * sizeof(int32_t)));
+
+    Vector&lt;int32_t&gt; things(regs.size(), 666);
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things[0])), base);
+    for (unsigned i = 0; i &lt; regs.size(); ++i) {
+        root-&gt;append(Move32, nullptr, Arg::stack(slot, i * sizeof(int32_t)), Tmp(GPRInfo::regT0));
+        root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, i * sizeof(int32_t)));
+    }
+    
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 35 + static_cast&lt;int32_t&gt;(regs.size()) - 1);
+    for (unsigned i = 1; i &lt; regs.size(); ++i)
+        CHECK(things[i] == 35 + static_cast&lt;int32_t&gt;(i) - 1);
+}
+
+void testShuffleSimpleSwap64()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 10000000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 20000000000000000ll, Tmp(GPRInfo::regT1));
+    loadConstant(root, 30000000000000000ll, Tmp(GPRInfo::regT2));
+    loadConstant(root, 40000000000000000ll, Tmp(GPRInfo::regT3));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width64),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width64));
+
+    int64_t things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 10000000000000000ll);
+    CHECK(things[1] == 20000000000000000ll);
+    CHECK(things[2] == 40000000000000000ll);
+    CHECK(things[3] == 30000000000000000ll);
+}
+
+void testShuffleSimpleShift64()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 10000000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 20000000000000000ll, Tmp(GPRInfo::regT1));
+    loadConstant(root, 30000000000000000ll, Tmp(GPRInfo::regT2));
+    loadConstant(root, 40000000000000000ll, Tmp(GPRInfo::regT3));
+    loadConstant(root, 50000000000000000ll, Tmp(GPRInfo::regT4));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width64),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width64));
+
+    int64_t things[5];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 10000000000000000ll);
+    CHECK(things[1] == 20000000000000000ll);
+    CHECK(things[2] == 30000000000000000ll);
+    CHECK(things[3] == 30000000000000000ll);
+    CHECK(things[4] == 40000000000000000ll);
+}
+
+void testShuffleSwapMixedWidth()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 10000000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 20000000000000000ll, Tmp(GPRInfo::regT1));
+    loadConstant(root, 30000000000000000ll, Tmp(GPRInfo::regT2));
+    loadConstant(root, 40000000000000000ll, Tmp(GPRInfo::regT3));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width32),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT2), Arg::widthArg(Arg::Width64));
+
+    int64_t things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 10000000000000000ll);
+    CHECK(things[1] == 20000000000000000ll);
+    CHECK(things[2] == 40000000000000000ll);
+    CHECK(things[3] == static_cast&lt;uint32_t&gt;(30000000000000000ll));
+}
+
+void testShuffleShiftMixedWidth()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 10000000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 20000000000000000ll, Tmp(GPRInfo::regT1));
+    loadConstant(root, 30000000000000000ll, Tmp(GPRInfo::regT2));
+    loadConstant(root, 40000000000000000ll, Tmp(GPRInfo::regT3));
+    loadConstant(root, 50000000000000000ll, Tmp(GPRInfo::regT4));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT2), Tmp(GPRInfo::regT3), Arg::widthArg(Arg::Width64),
+        Tmp(GPRInfo::regT3), Tmp(GPRInfo::regT4), Arg::widthArg(Arg::Width32));
+
+    int64_t things[5];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT3), Arg::addr(base, 3 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT4), Arg::addr(base, 4 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 10000000000000000ll);
+    CHECK(things[1] == 20000000000000000ll);
+    CHECK(things[2] == 30000000000000000ll);
+    CHECK(things[3] == 30000000000000000ll);
+    CHECK(things[4] == static_cast&lt;uint32_t&gt;(40000000000000000ll));
+}
+
+void testShuffleShiftMemory()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    int32_t memory[2];
+    memory[0] = 35;
+    memory[1] = 36;
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    root-&gt;append(Move, nullptr, Arg::immPtr(&amp;memory), Tmp(GPRInfo::regT2));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int32_t)),
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int32_t)), Arg::widthArg(Arg::Width32));
+
+    int32_t things[2];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(memory[0] == 35);
+    CHECK(memory[1] == 35);
+}
+
+void testShuffleShiftMemoryLong()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    int32_t memory[2];
+    memory[0] = 35;
+    memory[1] = 36;
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    loadConstant(root, 3, Tmp(GPRInfo::regT2));
+    root-&gt;append(Move, nullptr, Arg::immPtr(&amp;memory), Tmp(GPRInfo::regT3));
+    root-&gt;append(
+        Shuffle, nullptr,
+        
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+        
+        Tmp(GPRInfo::regT1), Arg::addr(Tmp(GPRInfo::regT3), 0 * sizeof(int32_t)),
+        Arg::widthArg(Arg::Width32),
+        
+        Arg::addr(Tmp(GPRInfo::regT3), 0 * sizeof(int32_t)),
+        Arg::addr(Tmp(GPRInfo::regT3), 1 * sizeof(int32_t)), Arg::widthArg(Arg::Width32),
+
+        Arg::addr(Tmp(GPRInfo::regT3), 1 * sizeof(int32_t)), Tmp(GPRInfo::regT2),
+        Arg::widthArg(Arg::Width32));
+
+    int32_t things[3];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT2), Arg::addr(base, 2 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 1);
+    CHECK(things[2] == 36);
+    CHECK(memory[0] == 2);
+    CHECK(memory[1] == 35);
+}
+
+void testShuffleShiftMemoryAllRegs()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    int32_t memory[2];
+    memory[0] = 35;
+    memory[1] = 36;
+
+    Vector&lt;Reg&gt; regs = regsInPriorityOrder(Arg::GP);
+    regs.removeFirst(Reg(GPRInfo::regT0));
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i &lt; regs.size(); ++i)
+        loadConstant(root, i + 1, Tmp(regs[i]));
+    root-&gt;append(Move, nullptr, Arg::immPtr(&amp;memory), Tmp(GPRInfo::regT0));
+    Inst&amp; shuffle = root-&gt;append(
+        Shuffle, nullptr,
+        
+        Tmp(regs[0]), Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int32_t)),
+        Arg::widthArg(Arg::Width32),
+        
+        Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int32_t)),
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int32_t)), Arg::widthArg(Arg::Width32),
+
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int32_t)), Tmp(regs[1]),
+        Arg::widthArg(Arg::Width32));
+
+    for (unsigned i = 2; i &lt; regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width32));
+
+    Vector&lt;int32_t&gt; things(regs.size(), 666);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things[0])), Tmp(GPRInfo::regT0));
+    for (unsigned i = 0; i &lt; regs.size(); ++i) {
+        root-&gt;append(
+            Move32, nullptr, Tmp(regs[i]), Arg::addr(Tmp(GPRInfo::regT0), i * sizeof(int32_t)));
+    }
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 36);
+    for (unsigned i = 2; i &lt; regs.size(); ++i)
+        CHECK(things[i] == static_cast&lt;int32_t&gt;(i));
+    CHECK(memory[0] == 1);
+    CHECK(memory[1] == 35);
+}
+
+void testShuffleShiftMemoryAllRegs64()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    Vector&lt;Reg&gt; regs = regsInPriorityOrder(Arg::GP);
+    regs.removeFirst(Reg(GPRInfo::regT0));
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i &lt; regs.size(); ++i)
+        loadConstant(root, (i + 1) * 1000000000000ll, Tmp(regs[i]));
+    root-&gt;append(Move, nullptr, Arg::immPtr(&amp;memory), Tmp(GPRInfo::regT0));
+    Inst&amp; shuffle = root-&gt;append(
+        Shuffle, nullptr,
+        
+        Tmp(regs[0]), Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width64),
+        
+        Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width64),
+
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Tmp(regs[1]),
+        Arg::widthArg(Arg::Width64));
+
+    for (unsigned i = 2; i &lt; regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width64));
+
+    Vector&lt;int64_t&gt; things(regs.size(), 666);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things[0])), Tmp(GPRInfo::regT0));
+    for (unsigned i = 0; i &lt; regs.size(); ++i) {
+        root-&gt;append(
+            Move, nullptr, Tmp(regs[i]), Arg::addr(Tmp(GPRInfo::regT0), i * sizeof(int64_t)));
+    }
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1000000000000ll);
+    CHECK(things[1] == 36000000000000ll);
+    for (unsigned i = 2; i &lt; regs.size(); ++i)
+        CHECK(things[i] == static_cast&lt;int64_t&gt;(i) * 1000000000000ll);
+    CHECK(memory[0] == 1000000000000ll);
+    CHECK(memory[1] == 35000000000000ll);
+}
+
+int64_t combineHiLo(int64_t high, int64_t low)
+{
+    union {
+        int64_t value;
+        int32_t halves[2];
+    } u;
+    u.value = high;
+    u.halves[0] = static_cast&lt;int32_t&gt;(low);
+    return u.value;
+}
+
+void testShuffleShiftMemoryAllRegsMixedWidth()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    Vector&lt;Reg&gt; regs = regsInPriorityOrder(Arg::GP);
+    regs.removeFirst(Reg(GPRInfo::regT0));
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i &lt; regs.size(); ++i)
+        loadConstant(root, (i + 1) * 1000000000000ll, Tmp(regs[i]));
+    root-&gt;append(Move, nullptr, Arg::immPtr(&amp;memory), Tmp(GPRInfo::regT0));
+    Inst&amp; shuffle = root-&gt;append(
+        Shuffle, nullptr,
+        
+        Tmp(regs[0]), Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width32),
+        
+        Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width64),
+
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Tmp(regs[1]),
+        Arg::widthArg(Arg::Width32));
+
+    for (unsigned i = 2; i &lt; regs.size(); ++i) {
+        shuffle.append(
+            Tmp(regs[i - 1]), Tmp(regs[i]),
+            (i &amp; 1) ? Arg::widthArg(Arg::Width32) : Arg::widthArg(Arg::Width64));
+    }
+
+    Vector&lt;int64_t&gt; things(regs.size(), 666);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things[0])), Tmp(GPRInfo::regT0));
+    for (unsigned i = 0; i &lt; regs.size(); ++i) {
+        root-&gt;append(
+            Move, nullptr, Tmp(regs[i]), Arg::addr(Tmp(GPRInfo::regT0), i * sizeof(int64_t)));
+    }
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1000000000000ll);
+    CHECK(things[1] == static_cast&lt;uint32_t&gt;(36000000000000ll));
+    for (unsigned i = 2; i &lt; regs.size(); ++i) {
+        int64_t value = static_cast&lt;int64_t&gt;(i) * 1000000000000ll;
+        CHECK(things[i] == ((i &amp; 1) ? static_cast&lt;uint32_t&gt;(value) : value));
+    }
+    CHECK(memory[0] == combineHiLo(35000000000000ll, 1000000000000ll));
+    CHECK(memory[1] == 35000000000000ll);
+}
+
+void testShuffleRotateMemory()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    int32_t memory[2];
+    memory[0] = 35;
+    memory[1] = 36;
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2, Tmp(GPRInfo::regT1));
+    root-&gt;append(Move, nullptr, Arg::immPtr(&amp;memory), Tmp(GPRInfo::regT2));
+    root-&gt;append(
+        Shuffle, nullptr,
+        
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+
+        Tmp(GPRInfo::regT1), Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int32_t)),
+        Arg::widthArg(Arg::Width32),
+        
+        Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int32_t)),
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int32_t)), Arg::widthArg(Arg::Width32),
+
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int32_t)), Tmp(GPRInfo::regT0),
+        Arg::widthArg(Arg::Width32));
+
+    int32_t things[2];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int32_t)));
+    root-&gt;append(Move32, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int32_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 36);
+    CHECK(things[1] == 1);
+    CHECK(memory[0] == 2);
+    CHECK(memory[1] == 35);
+}
+
+void testShuffleRotateMemory64()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2000000000000ll, Tmp(GPRInfo::regT1));
+    root-&gt;append(Move, nullptr, Arg::immPtr(&amp;memory), Tmp(GPRInfo::regT2));
+    root-&gt;append(
+        Shuffle, nullptr,
+        
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width64),
+
+        Tmp(GPRInfo::regT1), Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width64),
+        
+        Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width64),
+
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int64_t)), Tmp(GPRInfo::regT0),
+        Arg::widthArg(Arg::Width64));
+
+    int64_t things[2];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 36000000000000ll);
+    CHECK(things[1] == 1000000000000ll);
+    CHECK(memory[0] == 2000000000000ll);
+    CHECK(memory[1] == 35000000000000ll);
+}
+
+void testShuffleRotateMemoryMixedWidth()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    BasicBlock* root = code.addBlock();
+    loadConstant(root, 1000000000000ll, Tmp(GPRInfo::regT0));
+    loadConstant(root, 2000000000000ll, Tmp(GPRInfo::regT1));
+    root-&gt;append(Move, nullptr, Arg::immPtr(&amp;memory), Tmp(GPRInfo::regT2));
+    root-&gt;append(
+        Shuffle, nullptr,
+        
+        Tmp(GPRInfo::regT0), Tmp(GPRInfo::regT1), Arg::widthArg(Arg::Width32),
+
+        Tmp(GPRInfo::regT1), Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width64),
+        
+        Arg::addr(Tmp(GPRInfo::regT2), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width32),
+
+        Arg::addr(Tmp(GPRInfo::regT2), 1 * sizeof(int64_t)), Tmp(GPRInfo::regT0),
+        Arg::widthArg(Arg::Width64));
+
+    int64_t things[2];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT0), Arg::addr(base, 0 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Tmp(GPRInfo::regT1), Arg::addr(base, 1 * sizeof(int64_t)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 36000000000000ll);
+    CHECK(things[1] == static_cast&lt;uint32_t&gt;(1000000000000ll));
+    CHECK(memory[0] == 2000000000000ll);
+    CHECK(memory[1] == combineHiLo(36000000000000ll, 35000000000000ll));
+}
+
+void testShuffleRotateMemoryAllRegs64()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    Vector&lt;Reg&gt; regs = regsInPriorityOrder(Arg::GP);
+    regs.removeFirst(Reg(GPRInfo::regT0));
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i &lt; regs.size(); ++i)
+        loadConstant(root, (i + 1) * 1000000000000ll, Tmp(regs[i]));
+    root-&gt;append(Move, nullptr, Arg::immPtr(&amp;memory), Tmp(GPRInfo::regT0));
+    Inst&amp; shuffle = root-&gt;append(
+        Shuffle, nullptr,
+        
+        Tmp(regs[0]), Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width64),
+        
+        Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width64),
+
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Tmp(regs[1]),
+        Arg::widthArg(Arg::Width64),
+
+        regs.last(), regs[0], Arg::widthArg(Arg::Width64));
+
+    for (unsigned i = 2; i &lt; regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width64));
+
+    Vector&lt;int64_t&gt; things(regs.size(), 666);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things[0])), Tmp(GPRInfo::regT0));
+    for (unsigned i = 0; i &lt; regs.size(); ++i) {
+        root-&gt;append(
+            Move, nullptr, Tmp(regs[i]), Arg::addr(Tmp(GPRInfo::regT0), i * sizeof(int64_t)));
+    }
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == static_cast&lt;int64_t&gt;(regs.size()) * 1000000000000ll);
+    CHECK(things[1] == 36000000000000ll);
+    for (unsigned i = 2; i &lt; regs.size(); ++i)
+        CHECK(things[i] == static_cast&lt;int64_t&gt;(i) * 1000000000000ll);
+    CHECK(memory[0] == 1000000000000ll);
+    CHECK(memory[1] == 35000000000000ll);
+}
+
+void testShuffleRotateMemoryAllRegsMixedWidth()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    int64_t memory[2];
+    memory[0] = 35000000000000ll;
+    memory[1] = 36000000000000ll;
+
+    Vector&lt;Reg&gt; regs = regsInPriorityOrder(Arg::GP);
+    regs.removeFirst(Reg(GPRInfo::regT0));
+
+    BasicBlock* root = code.addBlock();
+    for (unsigned i = 0; i &lt; regs.size(); ++i)
+        loadConstant(root, (i + 1) * 1000000000000ll, Tmp(regs[i]));
+    root-&gt;append(Move, nullptr, Arg::immPtr(&amp;memory), Tmp(GPRInfo::regT0));
+    Inst&amp; shuffle = root-&gt;append(
+        Shuffle, nullptr,
+        
+        Tmp(regs[0]), Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::widthArg(Arg::Width32),
+        
+        Arg::addr(Tmp(GPRInfo::regT0), 0 * sizeof(int64_t)),
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Arg::widthArg(Arg::Width64),
+
+        Arg::addr(Tmp(GPRInfo::regT0), 1 * sizeof(int64_t)), Tmp(regs[1]),
+        Arg::widthArg(Arg::Width32),
+
+        regs.last(), regs[0], Arg::widthArg(Arg::Width32));
+
+    for (unsigned i = 2; i &lt; regs.size(); ++i)
+        shuffle.append(Tmp(regs[i - 1]), Tmp(regs[i]), Arg::widthArg(Arg::Width64));
+
+    Vector&lt;int64_t&gt; things(regs.size(), 666);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things[0])), Tmp(GPRInfo::regT0));
+    for (unsigned i = 0; i &lt; regs.size(); ++i) {
+        root-&gt;append(
+            Move, nullptr, Tmp(regs[i]), Arg::addr(Tmp(GPRInfo::regT0), i * sizeof(int64_t)));
+    }
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == static_cast&lt;uint32_t&gt;(static_cast&lt;int64_t&gt;(regs.size()) * 1000000000000ll));
+    CHECK(things[1] == static_cast&lt;uint32_t&gt;(36000000000000ll));
+    for (unsigned i = 2; i &lt; regs.size(); ++i)
+        CHECK(things[i] == static_cast&lt;int64_t&gt;(i) * 1000000000000ll);
+    CHECK(memory[0] == combineHiLo(35000000000000ll, 1000000000000ll));
+    CHECK(memory[1] == 35000000000000ll);
+}
+
+void testShuffleSwapDouble()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadDoubleConstant(root, 1, Tmp(FPRInfo::fpRegT0), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 2, Tmp(FPRInfo::fpRegT1), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 3, Tmp(FPRInfo::fpRegT2), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 4, Tmp(FPRInfo::fpRegT3), Tmp(GPRInfo::regT0));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(FPRInfo::fpRegT2), Tmp(FPRInfo::fpRegT3), Arg::widthArg(Arg::Width64),
+        Tmp(FPRInfo::fpRegT3), Tmp(FPRInfo::fpRegT2), Arg::widthArg(Arg::Width64));
+
+    double things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT0), Arg::addr(base, 0 * sizeof(double)));
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT1), Arg::addr(base, 1 * sizeof(double)));
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT2), Arg::addr(base, 2 * sizeof(double)));
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT3), Arg::addr(base, 3 * sizeof(double)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 2);
+    CHECK(things[2] == 4);
+    CHECK(things[3] == 3);
+}
+
+void testShuffleShiftDouble()
+{
+    B3::Procedure proc;
+    Code&amp; code = proc.code();
+
+    BasicBlock* root = code.addBlock();
+    loadDoubleConstant(root, 1, Tmp(FPRInfo::fpRegT0), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 2, Tmp(FPRInfo::fpRegT1), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 3, Tmp(FPRInfo::fpRegT2), Tmp(GPRInfo::regT0));
+    loadDoubleConstant(root, 4, Tmp(FPRInfo::fpRegT3), Tmp(GPRInfo::regT0));
+    root-&gt;append(
+        Shuffle, nullptr,
+        Tmp(FPRInfo::fpRegT2), Tmp(FPRInfo::fpRegT3), Arg::widthArg(Arg::Width64));
+
+    double things[4];
+    Tmp base = code.newTmp(Arg::GP);
+    root-&gt;append(Move, nullptr, Arg::imm64(bitwise_cast&lt;intptr_t&gt;(&amp;things)), base);
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT0), Arg::addr(base, 0 * sizeof(double)));
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT1), Arg::addr(base, 1 * sizeof(double)));
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT2), Arg::addr(base, 2 * sizeof(double)));
+    root-&gt;append(MoveDouble, nullptr, Tmp(FPRInfo::fpRegT3), Arg::addr(base, 3 * sizeof(double)));
+    root-&gt;append(Move, nullptr, Arg::imm(0), Tmp(GPRInfo::returnValueGPR));
+    root-&gt;append(Ret32, nullptr, Tmp(GPRInfo::returnValueGPR));
+
+    memset(things, 0, sizeof(things));
+    
+    CHECK(!compileAndRun&lt;int&gt;(proc));
+
+    CHECK(things[0] == 1);
+    CHECK(things[1] == 2);
+    CHECK(things[2] == 3);
+    CHECK(things[3] == 3);
+}
+
+#define RUN(test) do {                          \
+        if (!shouldRun(#test))                  \
+            break;                              \
+        tasks.append(                           \
+            createSharedTask&lt;void()&gt;(           \
+                [&amp;] () {                        \
+                    dataLog(#test &quot;...\n&quot;);     \
+                    test;                       \
+                    dataLog(#test &quot;: OK!\n&quot;);   \
+                }));                            \
+    } while (false);
+
+void run(const char* filter)
+{
+    JSC::initializeThreading();
+    vm = &amp;VM::create(LargeHeap).leakRef();
+
+    Deque&lt;RefPtr&lt;SharedTask&lt;void()&gt;&gt;&gt; tasks;
+
+    auto shouldRun = [&amp;] (const char* testName) -&gt; bool {
+        return !filter || !!strcasestr(testName, filter);
+    };
+
+    RUN(testSimple());
+    
+    RUN(testShuffleSimpleSwap());
+    RUN(testShuffleSimpleShift());
+    RUN(testShuffleLongShift());
+    RUN(testShuffleLongShiftBackwards());
+    RUN(testShuffleSimpleRotate());
+    RUN(testShuffleSimpleBroadcast());
+    RUN(testShuffleBroadcastAllRegs());
+    RUN(testShuffleTreeShift());
+    RUN(testShuffleTreeShiftBackward());
+    RUN(testShuffleTreeShiftOtherBackward());
+    RUN(testShuffleMultipleShifts());
+    RUN(testShuffleRotateWithFringe());
+    RUN(testShuffleRotateWithLongFringe());
+    RUN(testShuffleMultipleRotates());
+    RUN(testShuffleShiftAndRotate());
+    RUN(testShuffleShiftAllRegs());
+    RUN(testShuffleRotateAllRegs());
+    RUN(testShuffleSimpleSwap64());
+    RUN(testShuffleSimpleShift64());
+    RUN(testShuffleSwapMixedWidth());
+    RUN(testShuffleShiftMixedWidth());
+    RUN(testShuffleShiftMemory());
+    RUN(testShuffleShiftMemoryLong());
+    RUN(testShuffleShiftMemoryAllRegs());
+    RUN(testShuffleShiftMemoryAllRegs64());
+    RUN(testShuffleShiftMemoryAllRegsMixedWidth());
+    RUN(testShuffleRotateMemory());
+    RUN(testShuffleRotateMemory64());
+    RUN(testShuffleRotateMemoryMixedWidth());
+    RUN(testShuffleRotateMemoryAllRegs64());
+    RUN(testShuffleRotateMemoryAllRegsMixedWidth());
+    RUN(testShuffleSwapDouble());
+    RUN(testShuffleShiftDouble());
+
+    if (tasks.isEmpty())
+        usage();
+
+    Lock lock;
+
+    Vector&lt;ThreadIdentifier&gt; threads;
+    for (unsigned i = filter ? 1 : WTF::numberOfProcessorCores(); i--;) {
+        threads.append(
+            createThread(
+                &quot;testb3 thread&quot;,
+                [&amp;] () {
+                    for (;;) {
+                        RefPtr&lt;SharedTask&lt;void()&gt;&gt; task;
+                        {
+                            LockHolder locker(lock);
+                            if (tasks.isEmpty())
+                                return;
+                            task = tasks.takeFirst();
+                        }
+
+                        task-&gt;run();
+                    }
+                }));
+    }
+
+    for (ThreadIdentifier thread : threads)
+        waitForThreadCompletion(thread);
+    crashLock.lock();
+}
+
+} // anonymois namespace
+
+#else // ENABLE(B3_JIT)
+
+static void run(const char*)
+{
+    dataLog(&quot;B3 JIT is not enabled.\n&quot;);
+}
+
+#endif // ENABLE(B3_JIT)
+
+int main(int argc, char** argv)
+{
+    const char* filter = nullptr;
+    switch (argc) {
+    case 1:
+        break;
+    case 2:
+        filter = argv[1];
+        break;
+    default:
+        usage();
+        break;
+    }
+    
+    run(filter);
+    return 0;
+}
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3testb3cpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/testb3.cpp (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/testb3.cpp        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/b3/testb3.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -7775,6 +7775,65 @@
</span><span class="cx">     CHECK(compileAndRun&lt;int&gt;(proc, a, b) == a + b);
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+void testCallRare(int a, int b)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    BasicBlock* common = proc.addBlock();
+    BasicBlock* rare = proc.addBlock();
+
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Branch, Origin(),
+        root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0),
+        FrequentedBlock(rare, FrequencyClass::Rare),
+        FrequentedBlock(common));
+
+    common-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(), common-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), 0));
+    
+    rare-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        rare-&gt;appendNew&lt;CCallValue&gt;(
+            proc, Int32, Origin(),
+            rare-&gt;appendNew&lt;ConstPtrValue&gt;(proc, Origin(), bitwise_cast&lt;void*&gt;(simpleFunction)),
+            rare-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR1),
+            rare-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR2)));
+
+    CHECK(compileAndRun&lt;int&gt;(proc, true, a, b) == a + b);
+}
+
+void testCallRareLive(int a, int b, int c)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    BasicBlock* common = proc.addBlock();
+    BasicBlock* rare = proc.addBlock();
+
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Branch, Origin(),
+        root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0),
+        FrequentedBlock(rare, FrequencyClass::Rare),
+        FrequentedBlock(common));
+
+    common-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(), common-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), 0));
+    
+    rare-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        rare-&gt;appendNew&lt;Value&gt;(
+            proc, Add, Origin(),
+            rare-&gt;appendNew&lt;CCallValue&gt;(
+                proc, Int32, Origin(),
+                rare-&gt;appendNew&lt;ConstPtrValue&gt;(proc, Origin(), bitwise_cast&lt;void*&gt;(simpleFunction)),
+                rare-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR1),
+                rare-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR2)),
+            rare-&gt;appendNew&lt;Value&gt;(
+                proc, Trunc, Origin(),
+                rare-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR3))));
+
+    CHECK(compileAndRun&lt;int&gt;(proc, true, a, b, c) == a + b + c);
+}
+
</ins><span class="cx"> void testCallSimplePure(int a, int b)
</span><span class="cx"> {
</span><span class="cx">     Procedure proc;
</span><span class="lines">@@ -10069,6 +10128,8 @@
</span><span class="cx">     RUN(testInt32ToDoublePartialRegisterWithoutStall());
</span><span class="cx"> 
</span><span class="cx">     RUN(testCallSimple(1, 2));
</span><ins>+    RUN(testCallRare(1, 2));
+    RUN(testCallRareLive(1, 2, 3));
</ins><span class="cx">     RUN(testCallSimplePure(1, 2));
</span><span class="cx">     RUN(testCallFunctionWithHellaArguments());
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreftlFTLLowerDFGToLLVMcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ftl/FTLLowerDFGToLLVM.cpp (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ftl/FTLLowerDFGToLLVM.cpp        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/ftl/FTLLowerDFGToLLVM.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -3540,7 +3540,7 @@
</span><span class="cx">             m_out.branch(
</span><span class="cx">                 m_out.aboveOrEqual(
</span><span class="cx">                     prevLength, m_out.load32(storage, m_heaps.Butterfly_vectorLength)),
</span><del>-                rarely(slowPath), usually(fastPath));
</del><ins>+                unsure(slowPath), unsure(fastPath));
</ins><span class="cx">             
</span><span class="cx">             LBasicBlock lastNext = m_out.appendTo(fastPath, slowPath);
</span><span class="cx">             m_out.store(
</span><span class="lines">@@ -8225,7 +8225,7 @@
</span><span class="cx">                 LBasicBlock holeCase =
</span><span class="cx">                     FTL_NEW_BLOCK(m_out, (&quot;PutByVal hole case&quot;));
</span><span class="cx">                     
</span><del>-                m_out.branch(isOutOfBounds, unsure(outOfBoundsCase), unsure(holeCase));
</del><ins>+                m_out.branch(isOutOfBounds, rarely(outOfBoundsCase), usually(holeCase));
</ins><span class="cx">                     
</span><span class="cx">                 LBasicBlock innerLastNext = m_out.appendTo(outOfBoundsCase, holeCase);
</span><span class="cx">                     
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreftlFTLOSRExitcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ftl/FTLOSRExit.cpp (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ftl/FTLOSRExit.cpp        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/ftl/FTLOSRExit.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -73,7 +73,7 @@
</span><span class="cx"> {
</span><span class="cx">     RefPtr&lt;OSRExitHandle&gt; handle =
</span><span class="cx">         prepareOSRExitHandle(state, exitKind, nodeOrigin, params, offset, isExceptionHandler);
</span><del>-    handle-&gt;emitExitThunk(jit);
</del><ins>+    handle-&gt;emitExitThunk(state, jit);
</ins><span class="cx">     return handle;
</span><span class="cx"> }
</span><span class="cx"> 
</span><span class="lines">@@ -84,8 +84,8 @@
</span><span class="cx">     RefPtr&lt;OSRExitHandle&gt; handle =
</span><span class="cx">         prepareOSRExitHandle(state, exitKind, nodeOrigin, params, offset, isExceptionHandler);
</span><span class="cx">     params.addLatePath(
</span><del>-        [handle] (CCallHelpers&amp; jit) {
-            handle-&gt;emitExitThunk(jit);
</del><ins>+        [handle, &amp;state] (CCallHelpers&amp; jit) {
+            handle-&gt;emitExitThunk(state, jit);
</ins><span class="cx">         });
</span><span class="cx">     return handle;
</span><span class="cx"> }
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreftlFTLOSRExitHandlecpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ftl/FTLOSRExitHandle.cpp (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ftl/FTLOSRExitHandle.cpp        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/ftl/FTLOSRExitHandle.cpp        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2015-2016 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -29,24 +29,30 @@
</span><span class="cx"> #if ENABLE(FTL_JIT) &amp;&amp; FTL_USES_B3
</span><span class="cx"> 
</span><span class="cx"> #include &quot;FTLOSRExit.h&quot;
</span><ins>+#include &quot;FTLState.h&quot;
</ins><span class="cx"> #include &quot;FTLThunks.h&quot;
</span><span class="cx"> #include &quot;LinkBuffer.h&quot;
</span><ins>+#include &quot;ProfilerCompilation.h&quot;
</ins><span class="cx"> 
</span><span class="cx"> namespace JSC { namespace FTL {
</span><span class="cx"> 
</span><del>-void OSRExitHandle::emitExitThunk(CCallHelpers&amp; jit)
</del><ins>+void OSRExitHandle::emitExitThunk(State&amp; state, CCallHelpers&amp; jit)
</ins><span class="cx"> {
</span><del>-    label = jit.label();
</del><ins>+    Profiler::Compilation* compilation = state.graph.compilation();
+    CCallHelpers::Label myLabel = jit.label();
+    label = myLabel;
</ins><span class="cx">     jit.pushToSaveImmediateWithoutTouchingRegisters(CCallHelpers::TrustedImm32(index));
</span><span class="cx">     CCallHelpers::PatchableJump jump = jit.patchableJump();
</span><span class="cx">     RefPtr&lt;OSRExitHandle&gt; self = this;
</span><span class="cx">     jit.addLinkTask(
</span><del>-        [self, jump] (LinkBuffer&amp; linkBuffer) {
</del><ins>+        [self, jump, myLabel, compilation] (LinkBuffer&amp; linkBuffer) {
</ins><span class="cx">             self-&gt;exit.m_patchableJump = CodeLocationJump(linkBuffer.locationOf(jump));
</span><span class="cx"> 
</span><span class="cx">             linkBuffer.link(
</span><span class="cx">                 jump.m_jump,
</span><span class="cx">                 CodeLocationLabel(linkBuffer.vm().getCTIStub(osrExitGenerationThunkGenerator).code()));
</span><ins>+            if (compilation)
+                compilation-&gt;addOSRExitSite({ linkBuffer.locationOf(myLabel).executableAddress() });
</ins><span class="cx">         });
</span><span class="cx"> }
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreftlFTLOSRExitHandleh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ftl/FTLOSRExitHandle.h (195083 => 195084)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ftl/FTLOSRExitHandle.h        2016-01-15 00:50:15 UTC (rev 195083)
+++ trunk/Source/JavaScriptCore/ftl/FTLOSRExitHandle.h        2016-01-15 00:58:22 UTC (rev 195084)
</span><span class="lines">@@ -35,6 +35,7 @@
</span><span class="cx"> 
</span><span class="cx"> namespace JSC { namespace FTL {
</span><span class="cx"> 
</span><ins>+class State;
</ins><span class="cx"> struct OSRExit;
</span><span class="cx"> 
</span><span class="cx"> // This is an object that stores some interesting data about an OSR exit. It's expected that you will
</span><span class="lines">@@ -55,7 +56,7 @@
</span><span class="cx">     CCallHelpers::Label label;
</span><span class="cx"> 
</span><span class="cx">     // This emits the exit thunk and populates 'label'.
</span><del>-    void emitExitThunk(CCallHelpers&amp;);
</del><ins>+    void emitExitThunk(State&amp;, CCallHelpers&amp;);
</ins><span class="cx"> };
</span><span class="cx"> 
</span><span class="cx"> } } // namespace JSC::FTL
</span></span></pre>
</div>
</div>

</body>
</html>