<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[194039] trunk/Source/JavaScriptCore</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/194039">194039</a></dd>
<dt>Author</dt> <dd>fpizlo@apple.com</dd>
<dt>Date</dt> <dd>2015-12-14 11:13:31 -0800 (Mon, 14 Dec 2015)</dd>
</dl>

<h3>Log Message</h3>
<pre>B3-&gt;Air compare-branch fusion should fuse even if the result of the comparison is used more than once
https://bugs.webkit.org/show_bug.cgi?id=152198

Reviewed by Benjamin Poulain.

If we have a comparison operation that is branched on from multiple places, then we were
previously executing the comparison to get a boolean result in a register and then we were
testing/branching on that register in multiple places. This is actually less efficient than
just fusing the compare/branch multiple times, even though this means that the comparison
executes multiple times. This would only be bad if the comparison fused loads multiple times,
since duplicating loads is both wrong and inefficient. So, this adds the notion of sharing to
compare/branch fusion. If a compare is shared by multiple branches, then we refuse to fuse
the load.

To write the test, I needed to zero-extend 8 to 32. In the process of thinking about how to
do this, I realized that we needed lowerings for SExt8/SExt16. And I realized that the
lowerings for the other extension operations were not fully fleshed out; for example they
were incapable of load fusion. This patch fixes this and also adds some smart strength
reductions for BitAnd(@x, 0xff/0xffff/0xffffffff) - all of which should be lowered to a zero
extension.

This is a big win on asm.js code. It's not enough to bridge the gap to LLVM, but it's a huge
step in that direction.

* assembler/MacroAssemblerX86Common.h:
(JSC::MacroAssemblerX86Common::load8SignedExtendTo32):
(JSC::MacroAssemblerX86Common::zeroExtend8To32):
(JSC::MacroAssemblerX86Common::signExtend8To32):
(JSC::MacroAssemblerX86Common::load16):
(JSC::MacroAssemblerX86Common::load16SignedExtendTo32):
(JSC::MacroAssemblerX86Common::zeroExtend16To32):
(JSC::MacroAssemblerX86Common::signExtend16To32):
(JSC::MacroAssemblerX86Common::store32WithAddressOffsetPatch):
* assembler/X86Assembler.h:
(JSC::X86Assembler::movzbl_rr):
(JSC::X86Assembler::movsbl_rr):
(JSC::X86Assembler::movzwl_rr):
(JSC::X86Assembler::movswl_rr):
(JSC::X86Assembler::cmovl_rr):
* b3/B3LowerToAir.cpp:
(JSC::B3::Air::LowerToAir::createGenericCompare):
(JSC::B3::Air::LowerToAir::lower):
* b3/B3ReduceStrength.cpp:
* b3/air/AirOpcode.opcodes:
* b3/testb3.cpp:
(JSC::B3::testCheckMegaCombo):
(JSC::B3::testCheckTwoMegaCombos):
(JSC::B3::testCheckTwoNonRedundantMegaCombos):
(JSC::B3::testCheckAddImm):
(JSC::B3::testTruncSExt32):
(JSC::B3::testSExt8):
(JSC::B3::testSExt8Fold):
(JSC::B3::testSExt8SExt8):
(JSC::B3::testSExt8SExt16):
(JSC::B3::testSExt8BitAnd):
(JSC::B3::testBitAndSExt8):
(JSC::B3::testSExt16):
(JSC::B3::testSExt16Fold):
(JSC::B3::testSExt16SExt16):
(JSC::B3::testSExt16SExt8):
(JSC::B3::testSExt16BitAnd):
(JSC::B3::testBitAndSExt16):
(JSC::B3::testSExt32BitAnd):
(JSC::B3::testBitAndSExt32):
(JSC::B3::testBasicSelect):
(JSC::B3::run):</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceJavaScriptCoreChangeLog">trunk/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerMacroAssemblerX86Commonh">trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreassemblerX86Assemblerh">trunk/Source/JavaScriptCore/assembler/X86Assembler.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3B3LowerToAircpp">trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3B3ReduceStrengthcpp">trunk/Source/JavaScriptCore/b3/B3ReduceStrength.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3airAirOpcodeopcodes">trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes</a></li>
<li><a href="#trunkSourceJavaScriptCoreb3testb3cpp">trunk/Source/JavaScriptCore/b3/testb3.cpp</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ChangeLog (194038 => 194039)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ChangeLog        2015-12-14 18:07:19 UTC (rev 194038)
+++ trunk/Source/JavaScriptCore/ChangeLog        2015-12-14 19:13:31 UTC (rev 194039)
</span><span class="lines">@@ -1,3 +1,72 @@
</span><ins>+2015-12-14  Filip Pizlo  &lt;fpizlo@apple.com&gt;
+
+        B3-&gt;Air compare-branch fusion should fuse even if the result of the comparison is used more than once
+        https://bugs.webkit.org/show_bug.cgi?id=152198
+
+        Reviewed by Benjamin Poulain.
+
+        If we have a comparison operation that is branched on from multiple places, then we were
+        previously executing the comparison to get a boolean result in a register and then we were
+        testing/branching on that register in multiple places. This is actually less efficient than
+        just fusing the compare/branch multiple times, even though this means that the comparison
+        executes multiple times. This would only be bad if the comparison fused loads multiple times,
+        since duplicating loads is both wrong and inefficient. So, this adds the notion of sharing to
+        compare/branch fusion. If a compare is shared by multiple branches, then we refuse to fuse
+        the load.
+
+        To write the test, I needed to zero-extend 8 to 32. In the process of thinking about how to
+        do this, I realized that we needed lowerings for SExt8/SExt16. And I realized that the
+        lowerings for the other extension operations were not fully fleshed out; for example they
+        were incapable of load fusion. This patch fixes this and also adds some smart strength
+        reductions for BitAnd(@x, 0xff/0xffff/0xffffffff) - all of which should be lowered to a zero
+        extension.
+
+        This is a big win on asm.js code. It's not enough to bridge the gap to LLVM, but it's a huge
+        step in that direction.
+
+        * assembler/MacroAssemblerX86Common.h:
+        (JSC::MacroAssemblerX86Common::load8SignedExtendTo32):
+        (JSC::MacroAssemblerX86Common::zeroExtend8To32):
+        (JSC::MacroAssemblerX86Common::signExtend8To32):
+        (JSC::MacroAssemblerX86Common::load16):
+        (JSC::MacroAssemblerX86Common::load16SignedExtendTo32):
+        (JSC::MacroAssemblerX86Common::zeroExtend16To32):
+        (JSC::MacroAssemblerX86Common::signExtend16To32):
+        (JSC::MacroAssemblerX86Common::store32WithAddressOffsetPatch):
+        * assembler/X86Assembler.h:
+        (JSC::X86Assembler::movzbl_rr):
+        (JSC::X86Assembler::movsbl_rr):
+        (JSC::X86Assembler::movzwl_rr):
+        (JSC::X86Assembler::movswl_rr):
+        (JSC::X86Assembler::cmovl_rr):
+        * b3/B3LowerToAir.cpp:
+        (JSC::B3::Air::LowerToAir::createGenericCompare):
+        (JSC::B3::Air::LowerToAir::lower):
+        * b3/B3ReduceStrength.cpp:
+        * b3/air/AirOpcode.opcodes:
+        * b3/testb3.cpp:
+        (JSC::B3::testCheckMegaCombo):
+        (JSC::B3::testCheckTwoMegaCombos):
+        (JSC::B3::testCheckTwoNonRedundantMegaCombos):
+        (JSC::B3::testCheckAddImm):
+        (JSC::B3::testTruncSExt32):
+        (JSC::B3::testSExt8):
+        (JSC::B3::testSExt8Fold):
+        (JSC::B3::testSExt8SExt8):
+        (JSC::B3::testSExt8SExt16):
+        (JSC::B3::testSExt8BitAnd):
+        (JSC::B3::testBitAndSExt8):
+        (JSC::B3::testSExt16):
+        (JSC::B3::testSExt16Fold):
+        (JSC::B3::testSExt16SExt16):
+        (JSC::B3::testSExt16SExt8):
+        (JSC::B3::testSExt16BitAnd):
+        (JSC::B3::testBitAndSExt16):
+        (JSC::B3::testSExt32BitAnd):
+        (JSC::B3::testBitAndSExt32):
+        (JSC::B3::testBasicSelect):
+        (JSC::B3::run):
+
</ins><span class="cx"> 2015-12-14  Chris Dumez  &lt;cdumez@apple.com&gt;
</span><span class="cx"> 
</span><span class="cx">         Roll out r193974 and follow-up fixes as it caused JSC crashes
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerMacroAssemblerX86Commonh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h (194038 => 194039)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h        2015-12-14 18:07:19 UTC (rev 194038)
+++ trunk/Source/JavaScriptCore/assembler/MacroAssemblerX86Common.h        2015-12-14 19:13:31 UTC (rev 194039)
</span><span class="lines">@@ -644,7 +644,17 @@
</span><span class="cx">     {
</span><span class="cx">         m_assembler.movsbl_mr(address.offset, address.base, dest);
</span><span class="cx">     }
</span><ins>+
+    void zeroExtend8To32(RegisterID src, RegisterID dest)
+    {
+        m_assembler.movzbl_rr(src, dest);
+    }
</ins><span class="cx">     
</span><ins>+    void signExtend8To32(RegisterID src, RegisterID dest)
+    {
+        m_assembler.movsbl_rr(src, dest);
+    }
+    
</ins><span class="cx">     void load16(BaseIndex address, RegisterID dest)
</span><span class="cx">     {
</span><span class="cx">         m_assembler.movzwl_mr(address.offset, address.base, address.index, address.scale, dest);
</span><span class="lines">@@ -665,6 +675,16 @@
</span><span class="cx">         m_assembler.movswl_mr(address.offset, address.base, dest);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void zeroExtend16To32(RegisterID src, RegisterID dest)
+    {
+        m_assembler.movzwl_rr(src, dest);
+    }
+    
+    void signExtend16To32(RegisterID src, RegisterID dest)
+    {
+        m_assembler.movswl_rr(src, dest);
+    }
+    
</ins><span class="cx">     DataLabel32 store32WithAddressOffsetPatch(RegisterID src, Address address)
</span><span class="cx">     {
</span><span class="cx">         padBeforePatch();
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreassemblerX86Assemblerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/assembler/X86Assembler.h (194038 => 194039)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/assembler/X86Assembler.h        2015-12-14 18:07:19 UTC (rev 194038)
+++ trunk/Source/JavaScriptCore/assembler/X86Assembler.h        2015-12-14 19:13:31 UTC (rev 194039)
</span><span class="lines">@@ -1719,6 +1719,21 @@
</span><span class="cx">         m_formatter.twoByteOp8(OP2_MOVZX_GvEb, dst, src);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void movsbl_rr(RegisterID src, RegisterID dst)
+    {
+        m_formatter.twoByteOp8(OP2_MOVSX_GvEb, dst, src);
+    }
+
+    void movzwl_rr(RegisterID src, RegisterID dst)
+    {
+        m_formatter.twoByteOp8(OP2_MOVZX_GvEw, dst, src);
+    }
+
+    void movswl_rr(RegisterID src, RegisterID dst)
+    {
+        m_formatter.twoByteOp8(OP2_MOVSX_GvEw, dst, src);
+    }
+
</ins><span class="cx">     void cmovl_rr(Condition cond, RegisterID src, RegisterID dst)
</span><span class="cx">     {
</span><span class="cx">         m_formatter.twoByteOp(cmovcc(cond), dst, src);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3B3LowerToAircpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp (194038 => 194039)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp        2015-12-14 18:07:19 UTC (rev 194038)
+++ trunk/Source/JavaScriptCore/b3/B3LowerToAir.cpp        2015-12-14 19:13:31 UTC (rev 194039)
</span><span class="lines">@@ -873,20 +873,144 @@
</span><span class="cx">         const CompareFloatFunctor&amp; compareFloat, // Signature: (Arg doubleCond, Arg, Arg) -&gt; Inst
</span><span class="cx">         bool inverted = false)
</span><span class="cx">     {
</span><del>-        // Chew through any negations. It's not strictly necessary for this to be a loop, but we like
-        // to follow the rule that the instruction selector reduces strength whenever it doesn't
-        // require making things more complicated.
</del><ins>+        // NOTE: This is totally happy to match comparisons that have already been computed elsewhere
+        // since on most architectures, the cost of branching on a previously computed comparison
+        // result is almost always higher than just doing another fused compare/branch. The only time
+        // it could be worse is if we have a binary comparison and both operands are variables (not
+        // constants), and we encounter register pressure. Even in this case, duplicating the compare
+        // so that we can fuse it to the branch will be more efficient most of the time, since
+        // register pressure is not *that* common. For this reason, this algorithm will always
+        // duplicate the comparison.
+        //
+        // However, we cannot duplicate loads. The canBeInternal() on a load will assume that we
+        // already validated canBeInternal() on all of the values that got us to the load. So, even
+        // if we are sharing a value, we still need to call canBeInternal() for the purpose of
+        // tracking whether we are still in good shape to fuse loads.
+        //
+        // We could even have a chain of compare values that we fuse, and any member of the chain
+        // could be shared. Once any of them are shared, then the shared one's transitive children
+        // cannot be locked (i.e. commitInternal()). But if none of them are shared, then we want to
+        // lock all of them because that's a prerequisite to fusing the loads so that the loads don't
+        // get duplicated. For example, we might have: 
+        //
+        //     @tmp1 = LessThan(@a, @b)
+        //     @tmp2 = Equal(@tmp1, 0)
+        //     Branch(@tmp2)
+        //
+        // If either @a or @b are loads, then we want to have locked @tmp1 and @tmp2 so that they
+        // don't emit the loads a second time. But if we had another use of @tmp2, then we cannot
+        // lock @tmp1 (or @a or @b) because then we'll get into trouble when the other values that
+        // try to share @tmp1 with us try to do their lowering.
+        //
+        // There's one more wrinkle. If we don't lock an internal value, then this internal value may
+        // have already separately locked its children. So, if we're not locking a value then we need
+        // to make sure that its children aren't locked. We encapsulate this in two ways:
+        //
+        // canCommitInternal: This variable tells us if the values that we've fused so far are
+        // locked. This means that we're not sharing any of them with anyone. This permits us to fuse
+        // loads. If it's false, then we cannot fuse loads and we also need to ensure that the
+        // children of any values we try to fuse-by-sharing are not already locked. You don't have to
+        // worry about the children locking thing if you use prepareToFuse() before trying to fuse a
+        // sharable value. But, you do need to guard any load fusion by checking if canCommitInternal
+        // is true.
+        //
+        // FusionResult prepareToFuse(value): Call this when you think that you would like to fuse
+        // some value and that value is not a load. It will automatically handle the shared-or-locked
+        // issues and it will clear canCommitInternal if necessary. This will return CannotFuse
+        // (which acts like false) if the value cannot be locked and its children are locked. That's
+        // rare, but you just need to make sure that you do smart things when this happens (i.e. just
+        // use the value rather than trying to fuse it). After you call prepareToFuse(), you can
+        // still change your mind about whether you will actually fuse the value. If you do fuse it,
+        // you need to call commitFusion(value, fusionResult).
+        //
+        // commitFusion(value, fusionResult): Handles calling commitInternal(value) if fusionResult
+        // is FuseAndCommit.
+        
+        bool canCommitInternal = true;
+
+        enum FusionResult {
+            CannotFuse,
+            FuseAndCommit,
+            Fuse
+        };
+        auto prepareToFuse = [&amp;] (Value* value) -&gt; FusionResult {
+            if (value == m_value) {
+                // It's not actually internal. It's the root value. We're good to go.
+                return Fuse;
+            }
+
+            if (canCommitInternal &amp;&amp; canBeInternal(value)) {
+                // We are the only users of this value. This also means that the value's children
+                // could not have been locked, since we have now proved that m_value dominates value
+                // in the data flow graph. To only other way to value is from a user of m_value. If
+                // value's children are shared with others, then they could not have been locked
+                // because their use count is greater than 1. If they are only used from value, then
+                // in order for value's children to be locked, value would also have to be locked,
+                // and we just proved that it wasn't.
+                return FuseAndCommit;
+            }
+
+            // We're going to try to share value with others. It's possible that some other basic
+            // block had already emitted code for value and then matched over its children and then
+            // locked them, in which case we just want to use value instead of duplicating it. So, we
+            // validate the children. Note that this only arises in linear chains like:
+            //
+            //     BB#1:
+            //         @1 = Foo(...)
+            //         @2 = Bar(@1)
+            //         Jump(#2)
+            //     BB#2:
+            //         @3 = Baz(@2)
+            //
+            // Notice how we could start by generating code for BB#1 and then decide to lock @1 when
+            // generating code for @2, if we have some way of fusing Bar and Foo into a single
+            // instruction. This is legal, since indeed @1 only has one user. The fact that @2 now
+            // has a tmp (i.e. @2 is pinned), canBeInternal(@2) will return false, which brings us
+            // here. In that case, we cannot match over @2 because then we'd hit a hazard if we end
+            // up deciding not to fuse Foo into the fused Baz/Bar.
+            //
+            // Happily, there are only two places where this kind of child validation happens is in
+            // rules that admit sharing, like this and effectiveAddress().
+            //
+            // N.B. We could probably avoid the need to do value locking if we committed to a well
+            // chosen code generation order. For example, if we guaranteed that all of the users of
+            // a value get generated before that value, then there's no way for the lowering of @3 to
+            // see @1 locked. But we don't want to do that, since this is a greedy instruction
+            // selector and so we want to be able to play with order.
+            for (Value* child : value-&gt;children()) {
+                if (m_locked.contains(child))
+                    return CannotFuse;
+            }
+
+            // It's safe to share value, but since we're sharing, it means that we aren't locking it.
+            // If we don't lock it, then fusing loads is off limits and all of value's children will
+            // have to go through the sharing path as well.
+            canCommitInternal = false;
+            
+            return Fuse;
+        };
+
+        auto commitFusion = [&amp;] (Value* value, FusionResult result) {
+            if (result == FuseAndCommit)
+                commitInternal(value);
+        };
+        
+        // Chew through any inversions. This loop isn't necessary for comparisons and branches, but
+        // we do need at least one iteration of it for Check.
</ins><span class="cx">         for (;;) {
</span><del>-            if (!canBeInternal(value) &amp;&amp; value != m_value)
-                break;
</del><span class="cx">             bool shouldInvert =
</span><span class="cx">                 (value-&gt;opcode() == BitXor &amp;&amp; value-&gt;child(1)-&gt;hasInt() &amp;&amp; (value-&gt;child(1)-&gt;asInt() &amp; 1) &amp;&amp; value-&gt;child(0)-&gt;returnsBool())
</span><span class="cx">                 || (value-&gt;opcode() == Equal &amp;&amp; value-&gt;child(1)-&gt;isInt(0));
</span><span class="cx">             if (!shouldInvert)
</span><span class="cx">                 break;
</span><ins>+
+            FusionResult fusionResult = prepareToFuse(value);
+            if (fusionResult == CannotFuse)
+                break;
+            commitFusion(value, fusionResult);
+            
</ins><span class="cx">             value = value-&gt;child(0);
</span><span class="cx">             inverted = !inverted;
</span><del>-            commitInternal(value);
</del><span class="cx">         }
</span><span class="cx"> 
</span><span class="cx">         auto createRelCond = [&amp;] (
</span><span class="lines">@@ -931,51 +1055,55 @@
</span><span class="cx">                     return Inst();
</span><span class="cx">                 };
</span><span class="cx"> 
</span><del>-                // First handle compares that involve fewer bits than B3's type system supports.
-                // This is pretty important. For example, we want this to be a single instruction:
-                //
-                //     @1 = Load8S(...)
-                //     @2 = Const32(...)
-                //     @3 = LessThan(@1, @2)
-                //     Branch(@3)
</del><ins>+                Arg::Width width = Arg::widthForB3Type(value-&gt;child(0)-&gt;type());
</ins><span class="cx">                 
</span><del>-                if (relCond.isSignedCond()) {
-                    if (Inst result = tryCompareLoadImm(Arg::Width8, Load8S, Arg::Signed))
-                        return result;
-                }
</del><ins>+                if (canCommitInternal) {
+                    // First handle compares that involve fewer bits than B3's type system supports.
+                    // This is pretty important. For example, we want this to be a single
+                    // instruction:
+                    //
+                    //     @1 = Load8S(...)
+                    //     @2 = Const32(...)
+                    //     @3 = LessThan(@1, @2)
+                    //     Branch(@3)
</ins><span class="cx">                 
</span><del>-                if (relCond.isUnsignedCond()) {
-                    if (Inst result = tryCompareLoadImm(Arg::Width8, Load8Z, Arg::Unsigned))
-                        return result;
-                }
</del><ins>+                    if (relCond.isSignedCond()) {
+                        if (Inst result = tryCompareLoadImm(Arg::Width8, Load8S, Arg::Signed))
+                            return result;
+                    }
+                
+                    if (relCond.isUnsignedCond()) {
+                        if (Inst result = tryCompareLoadImm(Arg::Width8, Load8Z, Arg::Unsigned))
+                            return result;
+                    }
</ins><span class="cx"> 
</span><del>-                if (relCond.isSignedCond()) {
-                    if (Inst result = tryCompareLoadImm(Arg::Width16, Load16S, Arg::Signed))
-                        return result;
-                }
</del><ins>+                    if (relCond.isSignedCond()) {
+                        if (Inst result = tryCompareLoadImm(Arg::Width16, Load16S, Arg::Signed))
+                            return result;
+                    }
</ins><span class="cx">                 
</span><del>-                if (relCond.isUnsignedCond()) {
-                    if (Inst result = tryCompareLoadImm(Arg::Width16, Load16Z, Arg::Unsigned))
-                        return result;
-                }
</del><ins>+                    if (relCond.isUnsignedCond()) {
+                        if (Inst result = tryCompareLoadImm(Arg::Width16, Load16Z, Arg::Unsigned))
+                            return result;
+                    }
</ins><span class="cx"> 
</span><del>-                // Now handle compares that involve a load and an immediate.
</del><ins>+                    // Now handle compares that involve a load and an immediate.
</ins><span class="cx"> 
</span><del>-                Arg::Width width = Arg::widthForB3Type(value-&gt;child(0)-&gt;type());
-                if (Inst result = tryCompareLoadImm(width, Load, Arg::Signed))
-                    return result;
</del><ins>+                    if (Inst result = tryCompareLoadImm(width, Load, Arg::Signed))
+                        return result;
</ins><span class="cx"> 
</span><del>-                // Now handle compares that involve a load. It's not obvious that it's better to
-                // handle this before the immediate cases or not. Probably doesn't matter.
</del><ins>+                    // Now handle compares that involve a load. It's not obvious that it's better to
+                    // handle this before the immediate cases or not. Probably doesn't matter.
</ins><span class="cx"> 
</span><del>-                if (Inst result = tryCompare(width, loadPromise(left), tmpPromise(right))) {
-                    commitInternal(left);
-                    return result;
-                }
</del><ins>+                    if (Inst result = tryCompare(width, loadPromise(left), tmpPromise(right))) {
+                        commitInternal(left);
+                        return result;
+                    }
</ins><span class="cx">                 
</span><del>-                if (Inst result = tryCompare(width, tmpPromise(left), loadPromise(right))) {
-                    commitInternal(right);
-                    return result;
</del><ins>+                    if (Inst result = tryCompare(width, tmpPromise(left), loadPromise(right))) {
+                        commitInternal(right);
+                        return result;
+                    }
</ins><span class="cx">                 }
</span><span class="cx"> 
</span><span class="cx">                 // Now handle compares that involve an immediate and a tmp.
</span><span class="lines">@@ -1062,41 +1190,43 @@
</span><span class="cx">                     return Inst();
</span><span class="cx">                 };
</span><span class="cx"> 
</span><del>-                // First handle test's that involve fewer bits than B3's type system supports.
</del><ins>+                if (canCommitInternal) {
+                    // First handle test's that involve fewer bits than B3's type system supports.
</ins><span class="cx"> 
</span><del>-                if (Inst result = tryTestLoadImm(Arg::Width8, Load8Z))
-                    return result;
</del><ins>+                    if (Inst result = tryTestLoadImm(Arg::Width8, Load8Z))
+                        return result;
+                    
+                    if (Inst result = tryTestLoadImm(Arg::Width8, Load8S))
+                        return result;
+                    
+                    if (Inst result = tryTestLoadImm(Arg::Width16, Load16Z))
+                        return result;
+                    
+                    if (Inst result = tryTestLoadImm(Arg::Width16, Load16S))
+                        return result;
</ins><span class="cx"> 
</span><del>-                if (Inst result = tryTestLoadImm(Arg::Width8, Load8S))
-                    return result;
-
-                if (Inst result = tryTestLoadImm(Arg::Width16, Load16Z))
-                    return result;
-
-                if (Inst result = tryTestLoadImm(Arg::Width16, Load16S))
-                    return result;
-
-                // Now handle test's that involve a load and an immediate. Note that immediates are
-                // 32-bit, and we want zero-extension. Hence, the immediate form is compiled as a
-                // 32-bit test. Note that this spits on the grave of inferior endians, such as the
-                // big one.
-                
-                if (Inst result = tryTestLoadImm(Arg::Width32, Load))
-                    return result;
-
-                // Now handle test's that involve a load.
-
-                Arg::Width width = Arg::widthForB3Type(value-&gt;child(0)-&gt;type());
-                if (Inst result = tryTest(width, loadPromise(left), tmpPromise(right))) {
-                    commitInternal(left);
-                    return result;
</del><ins>+                    // Now handle test's that involve a load and an immediate. Note that immediates
+                    // are 32-bit, and we want zero-extension. Hence, the immediate form is compiled
+                    // as a 32-bit test. Note that this spits on the grave of inferior endians, such
+                    // as the big one.
+                    
+                    if (Inst result = tryTestLoadImm(Arg::Width32, Load))
+                        return result;
+                    
+                    // Now handle test's that involve a load.
+                    
+                    Arg::Width width = Arg::widthForB3Type(value-&gt;child(0)-&gt;type());
+                    if (Inst result = tryTest(width, loadPromise(left), tmpPromise(right))) {
+                        commitInternal(left);
+                        return result;
+                    }
+                    
+                    if (Inst result = tryTest(width, tmpPromise(left), loadPromise(right))) {
+                        commitInternal(right);
+                        return result;
+                    }
</ins><span class="cx">                 }
</span><span class="cx"> 
</span><del>-                if (Inst result = tryTest(width, tmpPromise(left), loadPromise(right))) {
-                    commitInternal(right);
-                    return result;
-                }
-
</del><span class="cx">                 // Now handle test's that involve an immediate and a tmp.
</span><span class="cx"> 
</span><span class="cx">                 if (leftImm &amp;&amp; leftImm.isRepresentableAs&lt;uint32_t&gt;()) {
</span><span class="lines">@@ -1117,9 +1247,9 @@
</span><span class="cx">             }
</span><span class="cx">         };
</span><span class="cx"> 
</span><del>-        if (canBeInternal(value) || value == m_value) {
</del><ins>+        if (FusionResult fusionResult = prepareToFuse(value)) {
</ins><span class="cx">             if (Inst result = attemptFused()) {
</span><del>-                commitInternal(value);
</del><ins>+                commitFusion(value, fusionResult);
</ins><span class="cx">                 return result;
</span><span class="cx">             }
</span><span class="cx">         }
</span><span class="lines">@@ -1463,6 +1593,21 @@
</span><span class="cx">         }
</span><span class="cx"> 
</span><span class="cx">         case BitAnd: {
</span><ins>+            if (m_value-&gt;child(1)-&gt;isInt(0xff)) {
+                appendUnOp&lt;ZeroExtend8To32, ZeroExtend8To32&gt;(m_value-&gt;child(0));
+                return;
+            }
+            
+            if (m_value-&gt;child(1)-&gt;isInt(0xffff)) {
+                appendUnOp&lt;ZeroExtend16To32, ZeroExtend16To32&gt;(m_value-&gt;child(0));
+                return;
+            }
+
+            if (m_value-&gt;child(1)-&gt;isInt(0xffffffff)) {
+                appendUnOp&lt;Move32, Move32&gt;(m_value-&gt;child(0));
+                return;
+            }
+            
</ins><span class="cx">             appendBinOp&lt;And32, And64, AndDouble, AndFloat, Commutative&gt;(
</span><span class="cx">                 m_value-&gt;child(0), m_value-&gt;child(1));
</span><span class="cx">             return;
</span><span class="lines">@@ -1572,18 +1717,31 @@
</span><span class="cx">             return;
</span><span class="cx">         }
</span><span class="cx"> 
</span><ins>+        case SExt8: {
+            appendUnOp&lt;SignExtend8To32, Air::Oops&gt;(m_value-&gt;child(0));
+            return;
+        }
+
+        case SExt16: {
+            appendUnOp&lt;SignExtend16To32, Air::Oops&gt;(m_value-&gt;child(0));
+            return;
+        }
+
</ins><span class="cx">         case ZExt32: {
</span><span class="cx">             if (highBitsAreZero(m_value-&gt;child(0))) {
</span><span class="cx">                 ASSERT(tmp(m_value-&gt;child(0)) == tmp(m_value));
</span><span class="cx">                 return;
</span><span class="cx">             }
</span><span class="cx"> 
</span><del>-            append(Move32, tmp(m_value-&gt;child(0)), tmp(m_value));
</del><ins>+            appendUnOp&lt;Move32, Air::Oops&gt;(m_value-&gt;child(0));
</ins><span class="cx">             return;
</span><span class="cx">         }
</span><span class="cx"> 
</span><span class="cx">         case SExt32: {
</span><del>-            append(SignExtend32ToPtr, tmp(m_value-&gt;child(0)), tmp(m_value));
</del><ins>+            // FIXME: We should have support for movsbq/movswq
+            // https://bugs.webkit.org/show_bug.cgi?id=152232
+            
+            appendUnOp&lt;SignExtend32ToPtr, Air::Oops&gt;(m_value-&gt;child(0));
</ins><span class="cx">             return;
</span><span class="cx">         }
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3B3ReduceStrengthcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/B3ReduceStrength.cpp (194038 => 194039)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/B3ReduceStrength.cpp        2015-12-14 18:07:19 UTC (rev 194038)
+++ trunk/Source/JavaScriptCore/b3/B3ReduceStrength.cpp        2015-12-14 19:13:31 UTC (rev 194039)
</span><span class="lines">@@ -367,6 +367,32 @@
</span><span class="cx">                 m_changed = true;
</span><span class="cx">                 break;
</span><span class="cx">             }
</span><ins>+
+            // Turn this: BitAnd(SExt8(value), mask) where (mask &amp; 0xffffff00) == 0
+            // Into this: BitAnd(value, mask)
+            if (m_value-&gt;child(0)-&gt;opcode() == SExt8 &amp;&amp; m_value-&gt;child(1)-&gt;hasInt32()
+                &amp;&amp; !(m_value-&gt;child(1)-&gt;asInt32() &amp; 0xffffff00)) {
+                m_value-&gt;child(0) = m_value-&gt;child(0)-&gt;child(0);
+                m_changed = true;
+            }
+
+            // Turn this: BitAnd(SExt16(value), mask) where (mask &amp; 0xffff0000) == 0
+            // Into this: BitAnd(value, mask)
+            if (m_value-&gt;child(0)-&gt;opcode() == SExt16 &amp;&amp; m_value-&gt;child(1)-&gt;hasInt32()
+                &amp;&amp; !(m_value-&gt;child(1)-&gt;asInt32() &amp; 0xffff0000)) {
+                m_value-&gt;child(0) = m_value-&gt;child(0)-&gt;child(0);
+                m_changed = true;
+            }
+
+            // Turn this: BitAnd(SExt32(value), mask) where (mask &amp; 0xffffffff00000000) == 0
+            // Into this: BitAnd(ZExt32(value), mask)
+            if (m_value-&gt;child(0)-&gt;opcode() == SExt32 &amp;&amp; m_value-&gt;child(1)-&gt;hasInt32()
+                &amp;&amp; !(m_value-&gt;child(1)-&gt;asInt32() &amp; 0xffffffff00000000llu)) {
+                m_value-&gt;child(0) = m_insertionSet.insert&lt;Value&gt;(
+                    m_index, ZExt32, m_value-&gt;origin(),
+                    m_value-&gt;child(0)-&gt;child(0), m_value-&gt;child(0)-&gt;child(1));
+                m_changed = true;
+            }
</ins><span class="cx">             break;
</span><span class="cx"> 
</span><span class="cx">         case BitOr:
</span><span class="lines">@@ -564,6 +590,97 @@
</span><span class="cx">             }
</span><span class="cx">             break;
</span><span class="cx"> 
</span><ins>+        case SExt8:
+            // Turn this: SExt8(constant)
+            // Into this: static_cast&lt;int8_t&gt;(constant)
+            if (m_value-&gt;child(0)-&gt;hasInt32()) {
+                int32_t result = static_cast&lt;int8_t&gt;(m_value-&gt;child(0)-&gt;asInt32());
+                replaceWithNewValue(m_proc.addIntConstant(m_value, result));
+                break;
+            }
+
+            // Turn this: SExt8(SExt8(value))
+            //   or this: SExt8(SExt16(value))
+            // Into this: SExt8(value)
+            if (m_value-&gt;child(0)-&gt;opcode() == SExt8 || m_value-&gt;child(0)-&gt;opcode() == SExt16) {
+                m_value-&gt;child(0) = m_value-&gt;child(0)-&gt;child(0);
+                m_changed = true;
+            }
+
+            if (m_value-&gt;child(0)-&gt;opcode() == BitAnd &amp;&amp; m_value-&gt;child(0)-&gt;child(1)-&gt;hasInt32()) {
+                Value* input = m_value-&gt;child(0)-&gt;child(0);
+                int32_t mask = m_value-&gt;child(0)-&gt;child(1)-&gt;asInt32();
+                
+                // Turn this: SExt8(BitAnd(input, mask)) where (mask &amp; 0xff) == 0xff
+                // Into this: SExt8(input)
+                if ((mask &amp; 0xff) == 0xff) {
+                    m_value-&gt;child(0) = input;
+                    m_changed = true;
+                    break;
+                }
+                
+                // Turn this: SExt8(BitAnd(input, mask)) where (mask &amp; 0x80) == 0
+                // Into this: BitAnd(input, const &amp; 0x7f)
+                if (!(mask &amp; 0x80)) {
+                    replaceWithNewValue(
+                        m_proc.add&lt;Value&gt;(
+                            BitAnd, m_value-&gt;origin(), input,
+                            m_insertionSet.insert&lt;Const32Value&gt;(
+                                m_index, m_value-&gt;origin(), mask &amp; 0x7f)));
+                    break;
+                }
+            }
+            break;
+
+        case SExt16:
+            // Turn this: SExt16(constant)
+            // Into this: static_cast&lt;int16_t&gt;(constant)
+            if (m_value-&gt;child(0)-&gt;hasInt32()) {
+                int32_t result = static_cast&lt;int16_t&gt;(m_value-&gt;child(0)-&gt;asInt32());
+                replaceWithNewValue(m_proc.addIntConstant(m_value, result));
+                break;
+            }
+
+            // Turn this: SExt16(SExt16(value))
+            // Into this: SExt16(value)
+            if (m_value-&gt;child(0)-&gt;opcode() == SExt16) {
+                m_value-&gt;child(0) = m_value-&gt;child(0)-&gt;child(0);
+                m_changed = true;
+            }
+
+            // Turn this: SExt16(SExt8(value))
+            // Into this: SExt8(value)
+            if (m_value-&gt;child(0)-&gt;opcode() == SExt8) {
+                m_value-&gt;replaceWithIdentity(m_value-&gt;child(0));
+                m_changed = true;
+                break;
+            }
+
+            if (m_value-&gt;child(0)-&gt;opcode() == BitAnd &amp;&amp; m_value-&gt;child(0)-&gt;child(1)-&gt;hasInt32()) {
+                Value* input = m_value-&gt;child(0)-&gt;child(0);
+                int32_t mask = m_value-&gt;child(0)-&gt;child(1)-&gt;asInt32();
+                
+                // Turn this: SExt16(BitAnd(input, mask)) where (mask &amp; 0xffff) == 0xffff
+                // Into this: SExt16(input)
+                if ((mask &amp; 0xffff) == 0xffff) {
+                    m_value-&gt;child(0) = input;
+                    m_changed = true;
+                    break;
+                }
+                
+                // Turn this: SExt16(BitAnd(input, mask)) where (mask &amp; 0x8000) == 0
+                // Into this: BitAnd(input, const &amp; 0x7fff)
+                if (!(mask &amp; 0x8000)) {
+                    replaceWithNewValue(
+                        m_proc.add&lt;Value&gt;(
+                            BitAnd, m_value-&gt;origin(), input,
+                            m_insertionSet.insert&lt;Const32Value&gt;(
+                                m_index, m_value-&gt;origin(), mask &amp; 0x7fff)));
+                    break;
+                }
+            }
+            break;
+
</ins><span class="cx">         case SExt32:
</span><span class="cx">             // Turn this: SExt32(constant)
</span><span class="cx">             // Into this: static_cast&lt;int64_t&gt;(constant)
</span><span class="lines">@@ -571,6 +688,16 @@
</span><span class="cx">                 replaceWithNewValue(m_proc.addIntConstant(m_value, m_value-&gt;child(0)-&gt;asInt32()));
</span><span class="cx">                 break;
</span><span class="cx">             }
</span><ins>+
+            // Turn this: SExt32(BitAnd(input, mask)) where (mask &amp; 0x80000000) == 0
+            // Into this: ZExt32(BitAnd(input, mask))
+            if (m_value-&gt;child(0)-&gt;opcode() == BitAnd &amp;&amp; m_value-&gt;child(0)-&gt;child(1)-&gt;hasInt32()
+                &amp;&amp; !(m_value-&gt;child(0)-&gt;child(1)-&gt;asInt32() &amp; 0x80000000)) {
+                replaceWithNewValue(
+                    m_proc.add&lt;Value&gt;(
+                        ZExt32, m_value-&gt;origin(), m_value-&gt;child(0)));
+                break;
+            }
</ins><span class="cx">             break;
</span><span class="cx"> 
</span><span class="cx">         case ZExt32:
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3airAirOpcodeopcodes"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes (194038 => 194039)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes        2015-12-14 18:07:19 UTC (rev 194038)
+++ trunk/Source/JavaScriptCore/b3/air/AirOpcode.opcodes        2015-12-14 19:13:31 UTC (rev 194039)
</span><span class="lines">@@ -288,6 +288,26 @@
</span><span class="cx"> SignExtend32ToPtr U:G, D:G
</span><span class="cx">     Tmp, Tmp
</span><span class="cx"> 
</span><ins>+ZeroExtend8To32 U:G, D:G
+    Tmp, Tmp
+    Addr, Tmp as load8
+    Index, Tmp as load8
+
+SignExtend8To32 U:G, D:G
+    Tmp, Tmp
+    Addr, Tmp as load8SignedExtendTo32
+    Index, Tmp as load8SignedExtendTo32
+
+ZeroExtend16To32 U:G, D:G
+    Tmp, Tmp
+    Addr, Tmp as load16
+    Index, Tmp as load16
+
+SignExtend16To32 U:G, D:G
+    Tmp, Tmp
+    Addr, Tmp as load16SignedExtendTo32
+    Index, Tmp as load16SignedExtendTo32
+
</ins><span class="cx"> MoveFloat U:F, D:F
</span><span class="cx">     Tmp, Tmp as moveDouble
</span><span class="cx">     Addr, Tmp as loadFloat
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreb3testb3cpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/b3/testb3.cpp (194038 => 194039)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/b3/testb3.cpp        2015-12-14 18:07:19 UTC (rev 194038)
+++ trunk/Source/JavaScriptCore/b3/testb3.cpp        2015-12-14 19:13:31 UTC (rev 194039)
</span><span class="lines">@@ -71,8 +71,16 @@
</span><span class="cx"> 
</span><span class="cx"> namespace {
</span><span class="cx"> 
</span><ins>+StaticLock crashLock;
+
</ins><span class="cx"> // Nothing fancy for now; we just use the existing WTF assertion machinery.
</span><del>-#define CHECK(x) RELEASE_ASSERT(x)
</del><ins>+#define CHECK(x) do {                                                   \
+        if (!!(x))                                                      \
+            break;                                                      \
+        crashLock.lock();                                               \
+        WTFReportAssertionFailure(__FILE__, __LINE__, WTF_PRETTY_FUNCTION, #x); \
+        CRASH();                                                        \
+    } while (false)
</ins><span class="cx"> 
</span><span class="cx"> VM* vm;
</span><span class="cx"> 
</span><span class="lines">@@ -5693,6 +5701,163 @@
</span><span class="cx">     CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1) == 42);
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+void testCheckTwoMegaCombos()
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    Value* base = root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0);
+    Value* index = root-&gt;appendNew&lt;Value&gt;(
+        proc, ZExt32, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, Trunc, Origin(),
+            root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR1)));
+
+    Value* ptr = root-&gt;appendNew&lt;Value&gt;(
+        proc, Add, Origin(), base,
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, Shl, Origin(), index,
+            root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), 1)));
+
+    Value* predicate = root-&gt;appendNew&lt;Value&gt;(
+        proc, LessThan, Origin(),
+        root-&gt;appendNew&lt;MemoryValue&gt;(proc, Load8S, Origin(), ptr),
+        root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), 42));
+    
+    CheckValue* check = root-&gt;appendNew&lt;CheckValue&gt;(proc, Check, Origin(), predicate);
+    check-&gt;setGenerator(
+        [&amp;] (CCallHelpers&amp; jit, const StackmapGenerationParams&amp; params) {
+            AllowMacroScratchRegisterUsage allowScratch(jit);
+            CHECK(params.size() == 1);
+
+            // This should always work because a function this simple should never have callee
+            // saves.
+            jit.move(CCallHelpers::TrustedImm32(42), GPRInfo::returnValueGPR);
+            jit.emitFunctionEpilogue();
+            jit.ret();
+        });
+    CheckValue* check2 = root-&gt;appendNew&lt;CheckValue&gt;(proc, Check, Origin(), predicate);
+    check2-&gt;setGenerator(
+        [&amp;] (CCallHelpers&amp; jit, const StackmapGenerationParams&amp; params) {
+            AllowMacroScratchRegisterUsage allowScratch(jit);
+            CHECK(params.size() == 1);
+
+            // This should always work because a function this simple should never have callee
+            // saves.
+            jit.move(CCallHelpers::TrustedImm32(43), GPRInfo::returnValueGPR);
+            jit.emitFunctionEpilogue();
+            jit.ret();
+        });
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(), root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), 0));
+
+    auto code = compile(proc);
+
+    int8_t value;
+    value = 42;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1) == 0);
+    value = 127;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1) == 0);
+    value = 41;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1) == 42);
+    value = 0;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1) == 42);
+    value = -1;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1) == 42);
+}
+
+void testCheckTwoNonRedundantMegaCombos()
+{
+    Procedure proc;
+    
+    BasicBlock* root = proc.addBlock();
+    BasicBlock* thenCase = proc.addBlock();
+    BasicBlock* elseCase = proc.addBlock();
+    
+    Value* base = root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0);
+    Value* index = root-&gt;appendNew&lt;Value&gt;(
+        proc, ZExt32, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, Trunc, Origin(),
+            root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR1)));
+    Value* branchPredicate = root-&gt;appendNew&lt;Value&gt;(
+        proc, BitAnd, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, Trunc, Origin(),
+            root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR2)),
+        root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), 0xff));
+
+    Value* ptr = root-&gt;appendNew&lt;Value&gt;(
+        proc, Add, Origin(), base,
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, Shl, Origin(), index,
+            root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), 1)));
+
+    Value* checkPredicate = root-&gt;appendNew&lt;Value&gt;(
+        proc, LessThan, Origin(),
+        root-&gt;appendNew&lt;MemoryValue&gt;(proc, Load8S, Origin(), ptr),
+        root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), 42));
+
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Branch, Origin(), branchPredicate,
+        FrequentedBlock(thenCase), FrequentedBlock(elseCase));
+    
+    CheckValue* check = thenCase-&gt;appendNew&lt;CheckValue&gt;(proc, Check, Origin(), checkPredicate);
+    check-&gt;setGenerator(
+        [&amp;] (CCallHelpers&amp; jit, const StackmapGenerationParams&amp; params) {
+            AllowMacroScratchRegisterUsage allowScratch(jit);
+            CHECK(params.size() == 1);
+
+            // This should always work because a function this simple should never have callee
+            // saves.
+            jit.move(CCallHelpers::TrustedImm32(42), GPRInfo::returnValueGPR);
+            jit.emitFunctionEpilogue();
+            jit.ret();
+        });
+    thenCase-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(), thenCase-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), 43));
+
+    CheckValue* check2 = elseCase-&gt;appendNew&lt;CheckValue&gt;(proc, Check, Origin(), checkPredicate);
+    check2-&gt;setGenerator(
+        [&amp;] (CCallHelpers&amp; jit, const StackmapGenerationParams&amp; params) {
+            AllowMacroScratchRegisterUsage allowScratch(jit);
+            CHECK(params.size() == 1);
+
+            // This should always work because a function this simple should never have callee
+            // saves.
+            jit.move(CCallHelpers::TrustedImm32(44), GPRInfo::returnValueGPR);
+            jit.emitFunctionEpilogue();
+            jit.ret();
+        });
+    elseCase-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(), elseCase-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), 45));
+
+    auto code = compile(proc);
+
+    int8_t value;
+
+    value = 42;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1, true) == 43);
+    value = 127;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1, true) == 43);
+    value = 41;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1, true) == 42);
+    value = 0;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1, true) == 42);
+    value = -1;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1, true) == 42);
+
+    value = 42;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1, false) == 45);
+    value = 127;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1, false) == 45);
+    value = 41;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1, false) == 44);
+    value = 0;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1, false) == 44);
+    value = -1;
+    CHECK(invoke&lt;int&gt;(*code, &amp;value - 2, 1, false) == 44);
+}
+
</ins><span class="cx"> void testCheckAddImm()
</span><span class="cx"> {
</span><span class="cx">     Procedure proc;
</span><span class="lines">@@ -7270,6 +7435,238 @@
</span><span class="cx">     CHECK(compileAndRun&lt;int32_t&gt;(proc, value) == value);
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+void testSExt8(int32_t value)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, SExt8, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, Trunc, Origin(),
+                root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0))));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc, value) == static_cast&lt;int32_t&gt;(static_cast&lt;int8_t&gt;(value)));
+}
+
+void testSExt8Fold(int32_t value)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, SExt8, Origin(),
+            root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), value)));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc) == static_cast&lt;int32_t&gt;(static_cast&lt;int8_t&gt;(value)));
+}
+
+void testSExt8SExt8(int32_t value)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, SExt8, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, SExt8, Origin(),
+                root-&gt;appendNew&lt;Value&gt;(
+                    proc, Trunc, Origin(),
+                    root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0)))));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc, value) == static_cast&lt;int32_t&gt;(static_cast&lt;int8_t&gt;(value)));
+}
+
+void testSExt8SExt16(int32_t value)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, SExt8, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, SExt16, Origin(),
+                root-&gt;appendNew&lt;Value&gt;(
+                    proc, Trunc, Origin(),
+                    root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0)))));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc, value) == static_cast&lt;int32_t&gt;(static_cast&lt;int8_t&gt;(value)));
+}
+
+void testSExt8BitAnd(int32_t value, int32_t mask)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, SExt8, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, BitAnd, Origin(),
+                root-&gt;appendNew&lt;Value&gt;(
+                    proc, Trunc, Origin(),
+                    root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0)),
+                root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), mask))));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc, value) == static_cast&lt;int32_t&gt;(static_cast&lt;int8_t&gt;(value &amp; mask)));
+}
+
+void testBitAndSExt8(int32_t value, int32_t mask)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, BitAnd, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, SExt8, Origin(),
+                root-&gt;appendNew&lt;Value&gt;(
+                    proc, Trunc, Origin(),
+                    root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0))),
+            root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), mask)));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc, value) == (static_cast&lt;int32_t&gt;(static_cast&lt;int8_t&gt;(value)) &amp; mask));
+}
+
+void testSExt16(int32_t value)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, SExt16, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, Trunc, Origin(),
+                root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0))));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc, value) == static_cast&lt;int32_t&gt;(static_cast&lt;int16_t&gt;(value)));
+}
+
+void testSExt16Fold(int32_t value)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, SExt16, Origin(),
+            root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), value)));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc) == static_cast&lt;int32_t&gt;(static_cast&lt;int16_t&gt;(value)));
+}
+
+void testSExt16SExt16(int32_t value)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, SExt16, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, SExt16, Origin(),
+                root-&gt;appendNew&lt;Value&gt;(
+                    proc, Trunc, Origin(),
+                    root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0)))));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc, value) == static_cast&lt;int32_t&gt;(static_cast&lt;int16_t&gt;(value)));
+}
+
+void testSExt16SExt8(int32_t value)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, SExt16, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, SExt8, Origin(),
+                root-&gt;appendNew&lt;Value&gt;(
+                    proc, Trunc, Origin(),
+                    root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0)))));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc, value) == static_cast&lt;int32_t&gt;(static_cast&lt;int8_t&gt;(value)));
+}
+
+void testSExt16BitAnd(int32_t value, int32_t mask)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, SExt16, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, BitAnd, Origin(),
+                root-&gt;appendNew&lt;Value&gt;(
+                    proc, Trunc, Origin(),
+                    root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0)),
+                root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), mask))));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc, value) == static_cast&lt;int32_t&gt;(static_cast&lt;int16_t&gt;(value &amp; mask)));
+}
+
+void testBitAndSExt16(int32_t value, int32_t mask)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, BitAnd, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, SExt16, Origin(),
+                root-&gt;appendNew&lt;Value&gt;(
+                    proc, Trunc, Origin(),
+                    root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0))),
+            root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), mask)));
+
+    CHECK(compileAndRun&lt;int32_t&gt;(proc, value) == (static_cast&lt;int32_t&gt;(static_cast&lt;int16_t&gt;(value)) &amp; mask));
+}
+
+void testSExt32BitAnd(int32_t value, int32_t mask)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, SExt32, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, BitAnd, Origin(),
+                root-&gt;appendNew&lt;Value&gt;(
+                    proc, Trunc, Origin(),
+                    root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0)),
+                root-&gt;appendNew&lt;Const32Value&gt;(proc, Origin(), mask))));
+
+    CHECK(compileAndRun&lt;int64_t&gt;(proc, value) == static_cast&lt;int64_t&gt;(value &amp; mask));
+}
+
+void testBitAndSExt32(int32_t value, int64_t mask)
+{
+    Procedure proc;
+    BasicBlock* root = proc.addBlock();
+    root-&gt;appendNew&lt;ControlValue&gt;(
+        proc, Return, Origin(),
+        root-&gt;appendNew&lt;Value&gt;(
+            proc, BitAnd, Origin(),
+            root-&gt;appendNew&lt;Value&gt;(
+                proc, SExt32, Origin(),
+                root-&gt;appendNew&lt;Value&gt;(
+                    proc, Trunc, Origin(),
+                    root-&gt;appendNew&lt;ArgumentRegValue&gt;(proc, Origin(), GPRInfo::argumentGPR0))),
+            root-&gt;appendNew&lt;Const64Value&gt;(proc, Origin(), mask)));
+
+    CHECK(compileAndRun&lt;int64_t&gt;(proc, value) == (static_cast&lt;int64_t&gt;(value) &amp; mask));
+}
+
</ins><span class="cx"> void testBasicSelect()
</span><span class="cx"> {
</span><span class="cx">     Procedure proc;
</span><span class="lines">@@ -7937,6 +8334,12 @@
</span><span class="cx">     RUN(testBitAndArgImm(43, 0));
</span><span class="cx">     RUN(testBitAndArgImm(10, 3));
</span><span class="cx">     RUN(testBitAndArgImm(42, 0xffffffffffffffff));
</span><ins>+    RUN(testBitAndArgImm(42, 0xff));
+    RUN(testBitAndArgImm(300, 0xff));
+    RUN(testBitAndArgImm(-300, 0xff));
+    RUN(testBitAndArgImm(42, 0xffff));
+    RUN(testBitAndArgImm(40000, 0xffff));
+    RUN(testBitAndArgImm(-40000, 0xffff));
</ins><span class="cx">     RUN(testBitAndImmArg(43, 43));
</span><span class="cx">     RUN(testBitAndImmArg(43, 0));
</span><span class="cx">     RUN(testBitAndImmArg(10, 3));
</span><span class="lines">@@ -7967,6 +8370,12 @@
</span><span class="cx">     RUN(testBitAndImmArg32(43, 0));
</span><span class="cx">     RUN(testBitAndImmArg32(10, 3));
</span><span class="cx">     RUN(testBitAndImmArg32(42, 0xffffffff));
</span><ins>+    RUN(testBitAndImmArg32(42, 0xff));
+    RUN(testBitAndImmArg32(300, 0xff));
+    RUN(testBitAndImmArg32(-300, 0xff));
+    RUN(testBitAndImmArg32(42, 0xffff));
+    RUN(testBitAndImmArg32(40000, 0xffff));
+    RUN(testBitAndImmArg32(-40000, 0xffff));
</ins><span class="cx">     RUN(testBitAndBitAndArgImmImm32(2, 7, 3));
</span><span class="cx">     RUN(testBitAndBitAndArgImmImm32(1, 6, 6));
</span><span class="cx">     RUN(testBitAndBitAndArgImmImm32(0xffff, 24, 7));
</span><span class="lines">@@ -8346,6 +8755,8 @@
</span><span class="cx">     RUN(testSimpleCheck());
</span><span class="cx">     RUN(testCheckLessThan());
</span><span class="cx">     RUN(testCheckMegaCombo());
</span><ins>+    RUN(testCheckTwoMegaCombos());
+    RUN(testCheckTwoNonRedundantMegaCombos());
</ins><span class="cx">     RUN(testCheckAddImm());
</span><span class="cx">     RUN(testCheckAddImmCommute());
</span><span class="cx">     RUN(testCheckAddImmSomeRegister());
</span><span class="lines">@@ -8545,6 +8956,201 @@
</span><span class="cx">     RUN(testTruncSExt32(1000000000ll));
</span><span class="cx">     RUN(testTruncSExt32(-1000000000ll));
</span><span class="cx"> 
</span><ins>+    RUN(testSExt8(0));
+    RUN(testSExt8(1));
+    RUN(testSExt8(42));
+    RUN(testSExt8(-1));
+    RUN(testSExt8(0xff));
+    RUN(testSExt8(0x100));
+    RUN(testSExt8Fold(0));
+    RUN(testSExt8Fold(1));
+    RUN(testSExt8Fold(42));
+    RUN(testSExt8Fold(-1));
+    RUN(testSExt8Fold(0xff));
+    RUN(testSExt8Fold(0x100));
+    RUN(testSExt8SExt8(0));
+    RUN(testSExt8SExt8(1));
+    RUN(testSExt8SExt8(42));
+    RUN(testSExt8SExt8(-1));
+    RUN(testSExt8SExt8(0xff));
+    RUN(testSExt8SExt8(0x100));
+    RUN(testSExt8SExt16(0));
+    RUN(testSExt8SExt16(1));
+    RUN(testSExt8SExt16(42));
+    RUN(testSExt8SExt16(-1));
+    RUN(testSExt8SExt16(0xff));
+    RUN(testSExt8SExt16(0x100));
+    RUN(testSExt8SExt16(0xffff));
+    RUN(testSExt8SExt16(0x10000));
+    RUN(testSExt8BitAnd(0, 0));
+    RUN(testSExt8BitAnd(1, 0));
+    RUN(testSExt8BitAnd(42, 0));
+    RUN(testSExt8BitAnd(-1, 0));
+    RUN(testSExt8BitAnd(0xff, 0));
+    RUN(testSExt8BitAnd(0x100, 0));
+    RUN(testSExt8BitAnd(0xffff, 0));
+    RUN(testSExt8BitAnd(0x10000, 0));
+    RUN(testSExt8BitAnd(0, 0xf));
+    RUN(testSExt8BitAnd(1, 0xf));
+    RUN(testSExt8BitAnd(42, 0xf));
+    RUN(testSExt8BitAnd(-1, 0xf));
+    RUN(testSExt8BitAnd(0xff, 0xf));
+    RUN(testSExt8BitAnd(0x100, 0xf));
+    RUN(testSExt8BitAnd(0xffff, 0xf));
+    RUN(testSExt8BitAnd(0x10000, 0xf));
+    RUN(testSExt8BitAnd(0, 0xff));
+    RUN(testSExt8BitAnd(1, 0xff));
+    RUN(testSExt8BitAnd(42, 0xff));
+    RUN(testSExt8BitAnd(-1, 0xff));
+    RUN(testSExt8BitAnd(0xff, 0xff));
+    RUN(testSExt8BitAnd(0x100, 0xff));
+    RUN(testSExt8BitAnd(0xffff, 0xff));
+    RUN(testSExt8BitAnd(0x10000, 0xff));
+    RUN(testSExt8BitAnd(0, 0x80));
+    RUN(testSExt8BitAnd(1, 0x80));
+    RUN(testSExt8BitAnd(42, 0x80));
+    RUN(testSExt8BitAnd(-1, 0x80));
+    RUN(testSExt8BitAnd(0xff, 0x80));
+    RUN(testSExt8BitAnd(0x100, 0x80));
+    RUN(testSExt8BitAnd(0xffff, 0x80));
+    RUN(testSExt8BitAnd(0x10000, 0x80));
+    RUN(testBitAndSExt8(0, 0xf));
+    RUN(testBitAndSExt8(1, 0xf));
+    RUN(testBitAndSExt8(42, 0xf));
+    RUN(testBitAndSExt8(-1, 0xf));
+    RUN(testBitAndSExt8(0xff, 0xf));
+    RUN(testBitAndSExt8(0x100, 0xf));
+    RUN(testBitAndSExt8(0xffff, 0xf));
+    RUN(testBitAndSExt8(0x10000, 0xf));
+    RUN(testBitAndSExt8(0, 0xff));
+    RUN(testBitAndSExt8(1, 0xff));
+    RUN(testBitAndSExt8(42, 0xff));
+    RUN(testBitAndSExt8(-1, 0xff));
+    RUN(testBitAndSExt8(0xff, 0xff));
+    RUN(testBitAndSExt8(0x100, 0xff));
+    RUN(testBitAndSExt8(0xffff, 0xff));
+    RUN(testBitAndSExt8(0x10000, 0xff));
+    RUN(testBitAndSExt8(0, 0xfff));
+    RUN(testBitAndSExt8(1, 0xfff));
+    RUN(testBitAndSExt8(42, 0xfff));
+    RUN(testBitAndSExt8(-1, 0xfff));
+    RUN(testBitAndSExt8(0xff, 0xfff));
+    RUN(testBitAndSExt8(0x100, 0xfff));
+    RUN(testBitAndSExt8(0xffff, 0xfff));
+    RUN(testBitAndSExt8(0x10000, 0xfff));
+
+    RUN(testSExt16(0));
+    RUN(testSExt16(1));
+    RUN(testSExt16(42));
+    RUN(testSExt16(-1));
+    RUN(testSExt16(0xffff));
+    RUN(testSExt16(0x10000));
+    RUN(testSExt16Fold(0));
+    RUN(testSExt16Fold(1));
+    RUN(testSExt16Fold(42));
+    RUN(testSExt16Fold(-1));
+    RUN(testSExt16Fold(0xffff));
+    RUN(testSExt16Fold(0x10000));
+    RUN(testSExt16SExt8(0));
+    RUN(testSExt16SExt8(1));
+    RUN(testSExt16SExt8(42));
+    RUN(testSExt16SExt8(-1));
+    RUN(testSExt16SExt8(0xffff));
+    RUN(testSExt16SExt8(0x10000));
+    RUN(testSExt16SExt16(0));
+    RUN(testSExt16SExt16(1));
+    RUN(testSExt16SExt16(42));
+    RUN(testSExt16SExt16(-1));
+    RUN(testSExt16SExt16(0xffff));
+    RUN(testSExt16SExt16(0x10000));
+    RUN(testSExt16SExt16(0xffffff));
+    RUN(testSExt16SExt16(0x1000000));
+    RUN(testSExt16BitAnd(0, 0));
+    RUN(testSExt16BitAnd(1, 0));
+    RUN(testSExt16BitAnd(42, 0));
+    RUN(testSExt16BitAnd(-1, 0));
+    RUN(testSExt16BitAnd(0xffff, 0));
+    RUN(testSExt16BitAnd(0x10000, 0));
+    RUN(testSExt16BitAnd(0xffffff, 0));
+    RUN(testSExt16BitAnd(0x1000000, 0));
+    RUN(testSExt16BitAnd(0, 0xf));
+    RUN(testSExt16BitAnd(1, 0xf));
+    RUN(testSExt16BitAnd(42, 0xf));
+    RUN(testSExt16BitAnd(-1, 0xf));
+    RUN(testSExt16BitAnd(0xffff, 0xf));
+    RUN(testSExt16BitAnd(0x10000, 0xf));
+    RUN(testSExt16BitAnd(0xffffff, 0xf));
+    RUN(testSExt16BitAnd(0x1000000, 0xf));
+    RUN(testSExt16BitAnd(0, 0xffff));
+    RUN(testSExt16BitAnd(1, 0xffff));
+    RUN(testSExt16BitAnd(42, 0xffff));
+    RUN(testSExt16BitAnd(-1, 0xffff));
+    RUN(testSExt16BitAnd(0xffff, 0xffff));
+    RUN(testSExt16BitAnd(0x10000, 0xffff));
+    RUN(testSExt16BitAnd(0xffffff, 0xffff));
+    RUN(testSExt16BitAnd(0x1000000, 0xffff));
+    RUN(testSExt16BitAnd(0, 0x8000));
+    RUN(testSExt16BitAnd(1, 0x8000));
+    RUN(testSExt16BitAnd(42, 0x8000));
+    RUN(testSExt16BitAnd(-1, 0x8000));
+    RUN(testSExt16BitAnd(0xffff, 0x8000));
+    RUN(testSExt16BitAnd(0x10000, 0x8000));
+    RUN(testSExt16BitAnd(0xffffff, 0x8000));
+    RUN(testSExt16BitAnd(0x1000000, 0x8000));
+    RUN(testBitAndSExt16(0, 0xf));
+    RUN(testBitAndSExt16(1, 0xf));
+    RUN(testBitAndSExt16(42, 0xf));
+    RUN(testBitAndSExt16(-1, 0xf));
+    RUN(testBitAndSExt16(0xffff, 0xf));
+    RUN(testBitAndSExt16(0x10000, 0xf));
+    RUN(testBitAndSExt16(0xffffff, 0xf));
+    RUN(testBitAndSExt16(0x1000000, 0xf));
+    RUN(testBitAndSExt16(0, 0xffff));
+    RUN(testBitAndSExt16(1, 0xffff));
+    RUN(testBitAndSExt16(42, 0xffff));
+    RUN(testBitAndSExt16(-1, 0xffff));
+    RUN(testBitAndSExt16(0xffff, 0xffff));
+    RUN(testBitAndSExt16(0x10000, 0xffff));
+    RUN(testBitAndSExt16(0xffffff, 0xffff));
+    RUN(testBitAndSExt16(0x1000000, 0xffff));
+    RUN(testBitAndSExt16(0, 0xfffff));
+    RUN(testBitAndSExt16(1, 0xfffff));
+    RUN(testBitAndSExt16(42, 0xfffff));
+    RUN(testBitAndSExt16(-1, 0xfffff));
+    RUN(testBitAndSExt16(0xffff, 0xfffff));
+    RUN(testBitAndSExt16(0x10000, 0xfffff));
+    RUN(testBitAndSExt16(0xffffff, 0xfffff));
+    RUN(testBitAndSExt16(0x1000000, 0xfffff));
+
+    RUN(testSExt32BitAnd(0, 0));
+    RUN(testSExt32BitAnd(1, 0));
+    RUN(testSExt32BitAnd(42, 0));
+    RUN(testSExt32BitAnd(-1, 0));
+    RUN(testSExt32BitAnd(0x80000000, 0));
+    RUN(testSExt32BitAnd(0, 0xf));
+    RUN(testSExt32BitAnd(1, 0xf));
+    RUN(testSExt32BitAnd(42, 0xf));
+    RUN(testSExt32BitAnd(-1, 0xf));
+    RUN(testSExt32BitAnd(0x80000000, 0xf));
+    RUN(testSExt32BitAnd(0, 0x80000000));
+    RUN(testSExt32BitAnd(1, 0x80000000));
+    RUN(testSExt32BitAnd(42, 0x80000000));
+    RUN(testSExt32BitAnd(-1, 0x80000000));
+    RUN(testSExt32BitAnd(0x80000000, 0x80000000));
+    RUN(testBitAndSExt32(0, 0xf));
+    RUN(testBitAndSExt32(1, 0xf));
+    RUN(testBitAndSExt32(42, 0xf));
+    RUN(testBitAndSExt32(-1, 0xf));
+    RUN(testBitAndSExt32(0xffff, 0xf));
+    RUN(testBitAndSExt32(0x10000, 0xf));
+    RUN(testBitAndSExt32(0xffffff, 0xf));
+    RUN(testBitAndSExt32(0x1000000, 0xf));
+    RUN(testBitAndSExt32(0, 0xffff00000000llu));
+    RUN(testBitAndSExt32(1, 0xffff00000000llu));
+    RUN(testBitAndSExt32(42, 0xffff00000000llu));
+    RUN(testBitAndSExt32(-1, 0xffff00000000llu));
+    RUN(testBitAndSExt32(0x80000000, 0xffff00000000llu));
+
</ins><span class="cx">     RUN(testBasicSelect());
</span><span class="cx">     RUN(testSelectTest());
</span><span class="cx">     RUN(testSelectCompareDouble());
</span><span class="lines">@@ -8587,6 +9193,7 @@
</span><span class="cx"> 
</span><span class="cx">     for (ThreadIdentifier thread : threads)
</span><span class="cx">         waitForThreadCompletion(thread);
</span><ins>+    crashLock.lock();
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> } // anonymous namespace
</span></span></pre>
</div>
</div>

</body>
</html>