<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[243642] trunk/Source/JavaScriptCore</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/243642">243642</a></dd>
<dt>Author</dt> <dd>msaboff@apple.com</dd>
<dt>Date</dt> <dd>2019-03-28 23:05:55 -0700 (Thu, 28 Mar 2019)</dd>
</dl>

<h3>Log Message</h3>
<pre>[YARR] Precompute BMP / non-BMP status when constructing character classes
https://bugs.webkit.org/show_bug.cgi?id=196296

Reviewed by Keith Miller.

Changed CharacterClass::m_hasNonBMPCharacters into a character width bit field which
indicateis if the class includes characters from either BMP, non-BMP or both ranges.
This allows the recognizing code to eliminate checks for the width of a matched
characters when the class has only one width.  The character width is needed to
determine if we advance 1 or 2 character.  Also, the pre-computed width of character
classes that contains either all BMP or all non-BMP characters allows the parser to
use fixed widths for terms using those character classes.  Changed both the code gen
scripts and Yarr compiler to compute this bit field during the construction of
character classes.

For JIT'ed code of character classes that contain either all BMP or all non-BMP
characters, we can eliminate the generic check we were doing do compute how much
to advance after sucessfully matching a character in the class.

        Generic isBMP check      BMP only            non-BMP only
        --------------           --------------      --------------
        inc %r9d                 inc %r9d            add $0x2, %r9d
        cmp $0x10000, %eax
        jl isBMP
        cmp %edx, %esi
        jz atEndOfString
        inc %r9d
        inc %esi
 isBMP:

For character classes that contained non-BMP characters, we were always generating
the code in the left column.  The middle column is the code we generate for character
classes that contain only BMP characters.  The right column is the code we now
generate if the character class has only non-BMP characters.  In the fix width cases,
we can eliminate both the isBMP check as well as the atEndOfString check.  The
atEndOfstring check is eliminated since we know how many characters this character
class requires and that check can be factored out to the beginning of the current
alternative.  For character classes that contain both BMP and non-BMP characters,
we still generate the generic left column.

This change is a ~8% perf progression on UniPoker and a ~2% improvement on RexBench
as a whole.

* runtime/RegExp.cpp:
(JSC::RegExp::matchCompareWithInterpreter):
* runtime/RegExpInlines.h:
(JSC::RegExp::matchInline):
* yarr/YarrInterpreter.cpp:
(JSC::Yarr::Interpreter::checkCharacterClassDontAdvanceInputForNonBMP):
(JSC::Yarr::Interpreter::matchCharacterClass):
* yarr/YarrJIT.cpp:
(JSC::Yarr::YarrGenerator::optimizeAlternative):
(JSC::Yarr::YarrGenerator::matchCharacterClass):
(JSC::Yarr::YarrGenerator::advanceIndexAfterCharacterClassTermMatch):
(JSC::Yarr::YarrGenerator::tryReadUnicodeCharImpl):
(JSC::Yarr::YarrGenerator::generateCharacterClassOnce):
(JSC::Yarr::YarrGenerator::generateCharacterClassFixed):
(JSC::Yarr::YarrGenerator::generateCharacterClassGreedy):
(JSC::Yarr::YarrGenerator::backtrackCharacterClassGreedy):
(JSC::Yarr::YarrGenerator::generateCharacterClassNonGreedy):
(JSC::Yarr::YarrGenerator::backtrackCharacterClassNonGreedy):
(JSC::Yarr::YarrGenerator::generateEnter):
(JSC::Yarr::YarrGenerator::YarrGenerator):
(JSC::Yarr::YarrGenerator::compile):
* yarr/YarrPattern.cpp:
(JSC::Yarr::CharacterClassConstructor::CharacterClassConstructor):
(JSC::Yarr::CharacterClassConstructor::reset):
(JSC::Yarr::CharacterClassConstructor::charClass):
(JSC::Yarr::CharacterClassConstructor::addSorted):
(JSC::Yarr::CharacterClassConstructor::addSortedRange):
(JSC::Yarr::CharacterClassConstructor::hasNonBMPCharacters):
(JSC::Yarr::CharacterClassConstructor::characterWidths):
(JSC::Yarr::PatternTerm::dump):
(JSC::Yarr::anycharCreate):
* yarr/YarrPattern.h:
(JSC::Yarr::operator|):
(JSC::Yarr::operator&):
(JSC::Yarr::operator|=):
(JSC::Yarr::CharacterClass::CharacterClass):
(JSC::Yarr::CharacterClass::hasNonBMPCharacters):
(JSC::Yarr::CharacterClass::hasOneCharacterSize):
(JSC::Yarr::CharacterClass::hasOnlyNonBMPCharacters):
(JSC::Yarr::PatternTerm::invert const):
(JSC::Yarr::PatternTerm::invert): Deleted.
* yarr/create_regex_tables:
* yarr/generateYarrUnicodePropertyTables.py:</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceJavaScriptCoreChangeLog">trunk/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreruntimeRegExpcpp">trunk/Source/JavaScriptCore/runtime/RegExp.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreruntimeRegExpInlinesh">trunk/Source/JavaScriptCore/runtime/RegExpInlines.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreyarrYarrInterpretercpp">trunk/Source/JavaScriptCore/yarr/YarrInterpreter.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreyarrYarrJITcpp">trunk/Source/JavaScriptCore/yarr/YarrJIT.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreyarrYarrPatterncpp">trunk/Source/JavaScriptCore/yarr/YarrPattern.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreyarrYarrPatternh">trunk/Source/JavaScriptCore/yarr/YarrPattern.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreyarrcreate_regex_tables">trunk/Source/JavaScriptCore/yarr/create_regex_tables</a></li>
<li><a href="#trunkSourceJavaScriptCoreyarrgenerateYarrUnicodePropertyTablespy">trunk/Source/JavaScriptCore/yarr/generateYarrUnicodePropertyTables.py</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ChangeLog (243641 => 243642)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ChangeLog    2019-03-29 05:18:47 UTC (rev 243641)
+++ trunk/Source/JavaScriptCore/ChangeLog       2019-03-29 06:05:55 UTC (rev 243642)
</span><span class="lines">@@ -1,3 +1,92 @@
</span><ins>+2019-03-28  Michael Saboff  <msaboff@apple.com>
+
+        [YARR] Precompute BMP / non-BMP status when constructing character classes
+        https://bugs.webkit.org/show_bug.cgi?id=196296
+
+        Reviewed by Keith Miller.
+
+        Changed CharacterClass::m_hasNonBMPCharacters into a character width bit field which
+        indicateis if the class includes characters from either BMP, non-BMP or both ranges.
+        This allows the recognizing code to eliminate checks for the width of a matched
+        characters when the class has only one width.  The character width is needed to
+        determine if we advance 1 or 2 character.  Also, the pre-computed width of character
+        classes that contains either all BMP or all non-BMP characters allows the parser to
+        use fixed widths for terms using those character classes.  Changed both the code gen
+        scripts and Yarr compiler to compute this bit field during the construction of
+        character classes.
+
+        For JIT'ed code of character classes that contain either all BMP or all non-BMP
+        characters, we can eliminate the generic check we were doing do compute how much
+        to advance after sucessfully matching a character in the class.
+
+                Generic isBMP check      BMP only            non-BMP only
+                --------------           --------------      --------------
+                inc %r9d                 inc %r9d            add $0x2, %r9d
+                cmp $0x10000, %eax
+                jl isBMP
+                cmp %edx, %esi
+                jz atEndOfString
+                inc %r9d
+                inc %esi
+         isBMP:
+
+        For character classes that contained non-BMP characters, we were always generating
+        the code in the left column.  The middle column is the code we generate for character
+        classes that contain only BMP characters.  The right column is the code we now
+        generate if the character class has only non-BMP characters.  In the fix width cases,
+        we can eliminate both the isBMP check as well as the atEndOfString check.  The
+        atEndOfstring check is eliminated since we know how many characters this character
+        class requires and that check can be factored out to the beginning of the current
+        alternative.  For character classes that contain both BMP and non-BMP characters,
+        we still generate the generic left column.
+
+        This change is a ~8% perf progression on UniPoker and a ~2% improvement on RexBench
+        as a whole.
+
+        * runtime/RegExp.cpp:
+        (JSC::RegExp::matchCompareWithInterpreter):
+        * runtime/RegExpInlines.h:
+        (JSC::RegExp::matchInline):
+        * yarr/YarrInterpreter.cpp:
+        (JSC::Yarr::Interpreter::checkCharacterClassDontAdvanceInputForNonBMP):
+        (JSC::Yarr::Interpreter::matchCharacterClass):
+        * yarr/YarrJIT.cpp:
+        (JSC::Yarr::YarrGenerator::optimizeAlternative):
+        (JSC::Yarr::YarrGenerator::matchCharacterClass):
+        (JSC::Yarr::YarrGenerator::advanceIndexAfterCharacterClassTermMatch):
+        (JSC::Yarr::YarrGenerator::tryReadUnicodeCharImpl):
+        (JSC::Yarr::YarrGenerator::generateCharacterClassOnce):
+        (JSC::Yarr::YarrGenerator::generateCharacterClassFixed):
+        (JSC::Yarr::YarrGenerator::generateCharacterClassGreedy):
+        (JSC::Yarr::YarrGenerator::backtrackCharacterClassGreedy):
+        (JSC::Yarr::YarrGenerator::generateCharacterClassNonGreedy):
+        (JSC::Yarr::YarrGenerator::backtrackCharacterClassNonGreedy):
+        (JSC::Yarr::YarrGenerator::generateEnter):
+        (JSC::Yarr::YarrGenerator::YarrGenerator):
+        (JSC::Yarr::YarrGenerator::compile):
+        * yarr/YarrPattern.cpp:
+        (JSC::Yarr::CharacterClassConstructor::CharacterClassConstructor):
+        (JSC::Yarr::CharacterClassConstructor::reset):
+        (JSC::Yarr::CharacterClassConstructor::charClass):
+        (JSC::Yarr::CharacterClassConstructor::addSorted):
+        (JSC::Yarr::CharacterClassConstructor::addSortedRange):
+        (JSC::Yarr::CharacterClassConstructor::hasNonBMPCharacters):
+        (JSC::Yarr::CharacterClassConstructor::characterWidths):
+        (JSC::Yarr::PatternTerm::dump):
+        (JSC::Yarr::anycharCreate):
+        * yarr/YarrPattern.h:
+        (JSC::Yarr::operator|):
+        (JSC::Yarr::operator&):
+        (JSC::Yarr::operator|=):
+        (JSC::Yarr::CharacterClass::CharacterClass):
+        (JSC::Yarr::CharacterClass::hasNonBMPCharacters):
+        (JSC::Yarr::CharacterClass::hasOneCharacterSize):
+        (JSC::Yarr::CharacterClass::hasOnlyNonBMPCharacters):
+        (JSC::Yarr::PatternTerm::invert const):
+        (JSC::Yarr::PatternTerm::invert): Deleted.
+        * yarr/create_regex_tables:
+        * yarr/generateYarrUnicodePropertyTables.py:
+
</ins><span class="cx"> 2019-03-28  Saam Barati  <sbarati@apple.com>
</span><span class="cx"> 
</span><span class="cx">         BackwardsGraph needs to consider back edges as the backward's root successor
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreruntimeRegExpcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/runtime/RegExp.cpp (243641 => 243642)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/runtime/RegExp.cpp   2019-03-29 05:18:47 UTC (rev 243641)
+++ trunk/Source/JavaScriptCore/runtime/RegExp.cpp      2019-03-29 06:05:55 UTC (rev 243642)
</span><span class="lines">@@ -385,7 +385,7 @@
</span><span class="cx">     for (unsigned j = 0, i = 0; i < m_numSubpatterns + 1; j += 2, i++)
</span><span class="cx">         interpreterOffsetVector[j] = -1;
</span><span class="cx"> 
</span><del>-    interpreterResult = Yarr::interpret(m_regExpBytecode.get(), s, startOffset, interpreterOffsetVector);
</del><ins>+    interpreterResult = Yarr::interpret(m_regExpBytecode.get(), s, startOffset, reinterpret_cast<unsigned*>(interpreterOffsetVector));
</ins><span class="cx"> 
</span><span class="cx">     if (jitResult != interpreterResult)
</span><span class="cx">         differences++;
</span><span class="lines">@@ -402,7 +402,7 @@
</span><span class="cx">         dataLogF((segmentLen < 150) ? "\"%s\"\n" : "\"%148s...\"\n", s.utf8().data() + startOffset);
</span><span class="cx"> 
</span><span class="cx">         if (jitResult != interpreterResult) {
</span><del>-            dataLogF("    JIT result = %d, blah interpreted result = %d\n", jitResult, interpreterResult);
</del><ins>+            dataLogF("    JIT result = %d, interpreted result = %d\n", jitResult, interpreterResult);
</ins><span class="cx">             differences--;
</span><span class="cx">         } else {
</span><span class="cx">             dataLogF("    Correct result = %d\n", jitResult);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreruntimeRegExpInlinesh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/runtime/RegExpInlines.h (243641 => 243642)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/runtime/RegExpInlines.h      2019-03-29 05:18:47 UTC (rev 243641)
+++ trunk/Source/JavaScriptCore/runtime/RegExpInlines.h 2019-03-29 06:05:55 UTC (rev 243642)
</span><span class="lines">@@ -181,7 +181,10 @@
</span><span class="cx">         }
</span><span class="cx"> 
</span><span class="cx"> #if ENABLE(YARR_JIT_DEBUG)
</span><del>-        matchCompareWithInterpreter(s, startOffset, offsetVector, result);
</del><ins>+        if (m_state == JITCode) {
+            byteCodeCompileIfNecessary(&vm);
+            matchCompareWithInterpreter(s, startOffset, offsetVector, result);
+        }
</ins><span class="cx"> #endif
</span><span class="cx">     } else
</span><span class="cx"> #endif
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreyarrYarrInterpretercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/yarr/YarrInterpreter.cpp (243641 => 243642)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/yarr/YarrInterpreter.cpp     2019-03-29 05:18:47 UTC (rev 243641)
+++ trunk/Source/JavaScriptCore/yarr/YarrInterpreter.cpp        2019-03-29 06:05:55 UTC (rev 243642)
</span><span class="lines">@@ -428,6 +428,12 @@
</span><span class="cx">         bool match = testCharacterClass(characterClass, input.readChecked(negativeInputOffset));
</span><span class="cx">         return invert ? !match : match;
</span><span class="cx">     }
</span><ins>+    
+    bool checkCharacterClassDontAdvanceInputForNonBMP(CharacterClass* characterClass, unsigned negativeInputOffset)
+    {
+        int readCharacter = characterClass->hasOnlyNonBMPCharacters() ? input.readSurrogatePairChecked(negativeInputOffset) :  input.readChecked(negativeInputOffset);
+        return testCharacterClass(characterClass, readCharacter);
+    }
</ins><span class="cx"> 
</span><span class="cx">     bool tryConsumeBackReference(int matchBegin, int matchEnd, unsigned negativeInputOffset)
</span><span class="cx">     {
</span><span class="lines">@@ -558,12 +564,21 @@
</span><span class="cx">         switch (term.atom.quantityType) {
</span><span class="cx">         case QuantifierFixedCount: {
</span><span class="cx">             if (unicode) {
</span><ins>+                CharacterClass* charClass = term.atom.characterClass;
</ins><span class="cx">                 backTrack->begin = input.getPos();
</span><span class="cx">                 unsigned matchAmount = 0;
</span><span class="cx">                 for (matchAmount = 0; matchAmount < term.atom.quantityMaxCount; ++matchAmount) {
</span><del>-                    if (!checkCharacterClass(term.atom.characterClass, term.invert(), term.inputPosition - matchAmount)) {
-                        input.setPos(backTrack->begin);
-                        return false;
</del><ins>+                    if (term.invert()) {
+                        if (!checkCharacterClass(charClass, term.invert(), term.inputPosition - matchAmount)) {
+                            input.setPos(backTrack->begin);
+                            return false;
+                        }
+                    } else {
+                        unsigned matchOffset = matchAmount * (charClass->hasOnlyNonBMPCharacters() ? 2 : 1);
+                        if (!checkCharacterClassDontAdvanceInputForNonBMP(charClass, term.inputPosition - matchOffset)) {
+                            input.setPos(backTrack->begin);
+                            return false;
+                        }
</ins><span class="cx">                     }
</span><span class="cx">                 }
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreyarrYarrJITcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/yarr/YarrJIT.cpp (243641 => 243642)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/yarr/YarrJIT.cpp     2019-03-29 05:18:47 UTC (rev 243641)
+++ trunk/Source/JavaScriptCore/yarr/YarrJIT.cpp        2019-03-29 06:05:55 UTC (rev 243642)
</span><span class="lines">@@ -72,13 +72,14 @@
</span><span class="cx">     static const RegisterID regUnicodeInputAndTrail = ARM64Registers::x10;
</span><span class="cx">     static const RegisterID initialStart = ARM64Registers::x11;
</span><span class="cx">     static const RegisterID supplementaryPlanesBase = ARM64Registers::x12;
</span><del>-    static const RegisterID surrogateTagMask = ARM64Registers::x13;
-    static const RegisterID leadingSurrogateTag = ARM64Registers::x14;
-    static const RegisterID trailingSurrogateTag = ARM64Registers::x15;
</del><ins>+    static const RegisterID leadingSurrogateTag = ARM64Registers::x13;
+    static const RegisterID trailingSurrogateTag = ARM64Registers::x14;
+    static const RegisterID endOfStringAddress = ARM64Registers::x15;
</ins><span class="cx"> 
</span><span class="cx">     static const RegisterID returnRegister = ARM64Registers::x0;
</span><span class="cx">     static const RegisterID returnRegister2 = ARM64Registers::x1;
</span><span class="cx"> 
</span><ins>+    const TrustedImm32 surrogateTagMask = TrustedImm32(0xfffffc00);
</ins><span class="cx"> #define HAVE_INITIAL_START_REG
</span><span class="cx"> #define JIT_UNICODE_EXPRESSIONS
</span><span class="cx"> #elif CPU(MIPS)
</span><span class="lines">@@ -143,12 +144,13 @@
</span><span class="cx"> #endif
</span><span class="cx">     static const RegisterID regUnicodeInputAndTrail = X86Registers::r13;
</span><span class="cx">     static const RegisterID leadingSurrogateTag = X86Registers::r14;
</span><del>-    static const RegisterID trailingSurrogateTag = X86Registers::r15;
</del><ins>+    static const RegisterID endOfStringAddress = X86Registers::r15;
</ins><span class="cx"> 
</span><span class="cx">     static const RegisterID returnRegister = X86Registers::eax;
</span><span class="cx">     static const RegisterID returnRegister2 = X86Registers::edx;
</span><span class="cx"> 
</span><span class="cx">     const TrustedImm32 supplementaryPlanesBase = TrustedImm32(0x10000);
</span><ins>+    const TrustedImm32 trailingSurrogateTag = TrustedImm32(0xdc00);
</ins><span class="cx">     const TrustedImm32 surrogateTagMask = TrustedImm32(0xfffffc00);
</span><span class="cx"> #define HAVE_INITIAL_START_REG
</span><span class="cx"> #define JIT_UNICODE_EXPRESSIONS
</span><span class="lines">@@ -319,7 +321,7 @@
</span><span class="cx">             // We can move BMP only character classes after fixed character terms.
</span><span class="cx">             if ((term.type == PatternTerm::TypeCharacterClass)
</span><span class="cx">                 && (term.quantityType == QuantifierFixedCount)
</span><del>-                && (!m_decodeSurrogatePairs || (!term.characterClass->m_hasNonBMPCharacters && !term.m_invert))
</del><ins>+                && (!m_decodeSurrogatePairs || (term.characterClass->hasOneCharacterSize() && !term.m_invert))
</ins><span class="cx">                 && (nextTerm.type == PatternTerm::TypePatternCharacter)
</span><span class="cx">                 && (nextTerm.quantityType == QuantifierFixedCount)) {
</span><span class="cx">                 PatternTerm termCopy = term;
</span><span class="lines">@@ -383,6 +385,7 @@
</span><span class="cx">             matchDest.append(branchTest8(charClass->m_tableInverted ? Zero : NonZero, tableEntry));
</span><span class="cx">             return;
</span><span class="cx">         }
</span><ins>+
</ins><span class="cx">         JumpList unicodeFail;
</span><span class="cx">         if (charClass->m_matchesUnicode.size() || charClass->m_rangesUnicode.size()) {
</span><span class="cx">             JumpList isAscii;
</span><span class="lines">@@ -448,6 +451,23 @@
</span><span class="cx">             unicodeFail.link(this);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+#ifdef JIT_UNICODE_EXPRESSIONS
+    void advanceIndexAfterCharacterClassTermMatch(const PatternTerm* term, JumpList& failures, const RegisterID character)
+    {
+        ASSERT(term->type == PatternTerm::TypeCharacterClass);
+
+        if (term->characterClass->hasOneCharacterSize() && !term->invert())
+            add32(TrustedImm32(term->characterClass->hasNonBMPCharacters() ? 2 : 1), index);
+        else {
+            add32(TrustedImm32(1), index);
+            failures.append(atEndOfInput());
+            Jump isBMPChar = branch32(LessThan, character, supplementaryPlanesBase);
+            add32(TrustedImm32(1), index);
+            isBMPChar.link(this);
+        }
+    }
+#endif
+
</ins><span class="cx">     // Jumps if input not available; will have (incorrectly) incremented already!
</span><span class="cx">     Jump jumpIfNoAvailableInput(unsigned countToCheck = 0)
</span><span class="cx">     {
</span><span class="lines">@@ -520,12 +540,12 @@
</span><span class="cx">         ASSERT(m_charSize == Char16);
</span><span class="cx"> 
</span><span class="cx">         JumpList notUnicode;
</span><ins>+
</ins><span class="cx">         load16Unaligned(regUnicodeInputAndTrail, resultReg);
</span><span class="cx">         and32(surrogateTagMask, resultReg, regT2);
</span><span class="cx">         notUnicode.append(branch32(NotEqual, regT2, leadingSurrogateTag));
</span><span class="cx">         addPtr(TrustedImm32(2), regUnicodeInputAndTrail);
</span><del>-        getEffectiveAddress(BaseIndex(input, length, TimesTwo), regT2);
-        notUnicode.append(branch32(AboveOrEqual, regUnicodeInputAndTrail, regT2));
</del><ins>+        notUnicode.append(branchPtr(AboveOrEqual, regUnicodeInputAndTrail, endOfStringAddress));
</ins><span class="cx">         load16Unaligned(Address(regUnicodeInputAndTrail), regUnicodeInputAndTrail);
</span><span class="cx">         and32(surrogateTagMask, regUnicodeInputAndTrail, regT2);
</span><span class="cx">         notUnicode.append(branch32(NotEqual, regT2, trailingSurrogateTag));
</span><span class="lines">@@ -1734,7 +1754,7 @@
</span><span class="cx">             }
</span><span class="cx">         }
</span><span class="cx"> #ifdef JIT_UNICODE_EXPRESSIONS
</span><del>-        if (m_decodeSurrogatePairs) {
</del><ins>+        if (m_decodeSurrogatePairs && (!term->characterClass->hasOneCharacterSize() || term->invert())) {
</ins><span class="cx">             Jump isBMPChar = branch32(LessThan, character, supplementaryPlanesBase);
</span><span class="cx">             add32(TrustedImm32(1), index);
</span><span class="cx">             isBMPChar.link(this);
</span><span class="lines">@@ -1768,11 +1788,18 @@
</span><span class="cx">             op.m_jumps.append(jumpIfNoAvailableInput());
</span><span class="cx"> 
</span><span class="cx">         move(index, countRegister);
</span><del>-        sub32(Imm32(term->quantityMaxCount.unsafeGet()), countRegister);
</del><span class="cx"> 
</span><ins>+        Checked<unsigned> scaledMaxCount = term->quantityMaxCount;
+
+#ifdef JIT_UNICODE_EXPRESSIONS
+        if (m_decodeSurrogatePairs && term->characterClass->hasOnlyNonBMPCharacters() && !term->invert())
+            scaledMaxCount *= 2;
+#endif
+        sub32(Imm32(scaledMaxCount.unsafeGet()), countRegister);
+
</ins><span class="cx">         Label loop(this);
</span><span class="cx">         JumpList matchDest;
</span><del>-        readCharacter(m_checkedOffset - term->inputPosition - term->quantityMaxCount, character, countRegister);
</del><ins>+        readCharacter(m_checkedOffset - term->inputPosition - scaledMaxCount, character, countRegister);
</ins><span class="cx">         // If we are matching the "any character" builtin class we only need to read the
</span><span class="cx">         // character and don't need to match as it will always succeed.
</span><span class="cx">         if (term->invert() || !term->characterClass->m_anyCharacter) {
</span><span class="lines">@@ -1786,16 +1813,21 @@
</span><span class="cx">             }
</span><span class="cx">         }
</span><span class="cx"> 
</span><del>-        add32(TrustedImm32(1), countRegister);
</del><span class="cx"> #ifdef JIT_UNICODE_EXPRESSIONS
</span><span class="cx">         if (m_decodeSurrogatePairs) {
</span><del>-            Jump isBMPChar = branch32(LessThan, character, supplementaryPlanesBase);
-            op.m_jumps.append(atEndOfInput());
</del><ins>+            if (term->characterClass->hasOneCharacterSize() && !term->invert())
+                add32(TrustedImm32(term->characterClass->hasNonBMPCharacters() ? 2 : 1), countRegister);
+            else {
+                add32(TrustedImm32(1), countRegister);
+                Jump isBMPChar = branch32(LessThan, character, supplementaryPlanesBase);
+                op.m_jumps.append(atEndOfInput());
+                add32(TrustedImm32(1), countRegister);
+                add32(TrustedImm32(1), index);
+                isBMPChar.link(this);
+            }
+        } else
+#endif
</ins><span class="cx">             add32(TrustedImm32(1), countRegister);
</span><del>-            add32(TrustedImm32(1), index);
-            isBMPChar.link(this);
-        }
-#endif
</del><span class="cx">         branch32(NotEqual, countRegister, index).linkTo(loop, this);
</span><span class="cx">     }
</span><span class="cx">     void backtrackCharacterClassFixed(size_t opIndex)
</span><span class="lines">@@ -1811,7 +1843,7 @@
</span><span class="cx">         const RegisterID character = regT0;
</span><span class="cx">         const RegisterID countRegister = regT1;
</span><span class="cx"> 
</span><del>-        if (m_decodeSurrogatePairs)
</del><ins>+        if (m_decodeSurrogatePairs && (!term->characterClass->hasOneCharacterSize() || term->invert()))
</ins><span class="cx">             storeToFrame(index, term->frameLocation + BackTrackInfoCharacterClass::beginIndex());
</span><span class="cx">         move(TrustedImm32(0), countRegister);
</span><span class="cx"> 
</span><span class="lines">@@ -1825,8 +1857,8 @@
</span><span class="cx">         } else {
</span><span class="cx">             JumpList matchDest;
</span><span class="cx">             readCharacter(m_checkedOffset - term->inputPosition, character);
</span><del>-            // If we are matching the "any character" builtin class we only need to read the
-            // character and don't need to match as it will always succeed.
</del><ins>+            // If we are matching the "any character" builtin class for non-unicode patterns,
+            // we only need to read the character and don't need to match as it will always succeed.
</ins><span class="cx">             if (!term->characterClass->m_anyCharacter) {
</span><span class="cx">                 matchCharacterClass(character, matchDest, term->characterClass);
</span><span class="cx">                 failures.append(jump());
</span><span class="lines">@@ -1834,15 +1866,12 @@
</span><span class="cx">             matchDest.link(this);
</span><span class="cx">         }
</span><span class="cx"> 
</span><del>-        add32(TrustedImm32(1), index);
</del><span class="cx"> #ifdef JIT_UNICODE_EXPRESSIONS
</span><del>-        if (m_decodeSurrogatePairs) {
-            failures.append(atEndOfInput());
-            Jump isBMPChar = branch32(LessThan, character, supplementaryPlanesBase);
</del><ins>+        if (m_decodeSurrogatePairs)
+            advanceIndexAfterCharacterClassTermMatch(term, failures, character);
+        else
+#endif
</ins><span class="cx">             add32(TrustedImm32(1), index);
</span><del>-            isBMPChar.link(this);
-        }
-#endif
</del><span class="cx">         add32(TrustedImm32(1), countRegister);
</span><span class="cx"> 
</span><span class="cx">         if (term->quantityMaxCount != quantifyInfinite) {
</span><span class="lines">@@ -1868,14 +1897,17 @@
</span><span class="cx">         loadFromFrame(term->frameLocation + BackTrackInfoCharacterClass::matchAmountIndex(), countRegister);
</span><span class="cx">         m_backtrackingState.append(branchTest32(Zero, countRegister));
</span><span class="cx">         sub32(TrustedImm32(1), countRegister);
</span><ins>+        storeToFrame(countRegister, term->frameLocation + BackTrackInfoCharacterClass::matchAmountIndex());
+
</ins><span class="cx">         if (!m_decodeSurrogatePairs)
</span><span class="cx">             sub32(TrustedImm32(1), index);
</span><ins>+        else if (term->characterClass->hasOneCharacterSize() && !term->invert())
+            sub32(TrustedImm32(term->characterClass->hasNonBMPCharacters() ? 2 : 1), index);
</ins><span class="cx">         else {
</span><ins>+            // Rematch one less
</ins><span class="cx">             const RegisterID character = regT0;
</span><span class="cx"> 
</span><span class="cx">             loadFromFrame(term->frameLocation + BackTrackInfoCharacterClass::beginIndex(), index);
</span><del>-            // Rematch one less
-            storeToFrame(countRegister, term->frameLocation + BackTrackInfoCharacterClass::matchAmountIndex());
</del><span class="cx"> 
</span><span class="cx">             Label rematchLoop(this);
</span><span class="cx">             readCharacter(m_checkedOffset - term->inputPosition, character);
</span><span class="lines">@@ -1905,9 +1937,11 @@
</span><span class="cx"> 
</span><span class="cx">         move(TrustedImm32(0), countRegister);
</span><span class="cx">         op.m_reentry = label();
</span><del>-        if (m_decodeSurrogatePairs)
-            storeToFrame(index, term->frameLocation + BackTrackInfoCharacterClass::beginIndex());
-        storeToFrame(countRegister, term->frameLocation + BackTrackInfoCharacterClass::matchAmountIndex());
</del><ins>+        if (m_decodeSurrogatePairs) {
+            if (!term->characterClass->hasOneCharacterSize() || term->invert())
+                storeToFrame(index, term->frameLocation + BackTrackInfoCharacterClass::beginIndex());
+            storeToFrame(countRegister, term->frameLocation + BackTrackInfoCharacterClass::matchAmountIndex());
+        }
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void backtrackCharacterClassNonGreedy(size_t opIndex)
</span><span class="lines">@@ -1922,9 +1956,11 @@
</span><span class="cx"> 
</span><span class="cx">         m_backtrackingState.link(this);
</span><span class="cx"> 
</span><del>-        if (m_decodeSurrogatePairs)
-            loadFromFrame(term->frameLocation + BackTrackInfoCharacterClass::beginIndex(), index);
-        loadFromFrame(term->frameLocation + BackTrackInfoCharacterClass::matchAmountIndex(), countRegister);
</del><ins>+        if (m_decodeSurrogatePairs) {
+            if (!term->characterClass->hasOneCharacterSize() || term->invert())
+                loadFromFrame(term->frameLocation + BackTrackInfoCharacterClass::beginIndex(), index);
+            loadFromFrame(term->frameLocation + BackTrackInfoCharacterClass::matchAmountIndex(), countRegister);
+        }
</ins><span class="cx"> 
</span><span class="cx">         nonGreedyFailures.append(atEndOfInput());
</span><span class="cx">         nonGreedyFailures.append(branch32(Equal, countRegister, Imm32(term->quantityMaxCount.unsafeGet())));
</span><span class="lines">@@ -1931,8 +1967,8 @@
</span><span class="cx"> 
</span><span class="cx">         JumpList matchDest;
</span><span class="cx">         readCharacter(m_checkedOffset - term->inputPosition, character);
</span><del>-        // If we are matching the "any character" builtin class we only need to read the
-        // character and don't need to match as it will always succeed.
</del><ins>+        // If we are matching the "any character" builtin class for non-unicode patterns,
+        // we only need to read the character and don't need to match as it will always succeed.
</ins><span class="cx">         if (term->invert() || !term->characterClass->m_anyCharacter) {
</span><span class="cx">             matchCharacterClass(character, matchDest, term->characterClass);
</span><span class="cx"> 
</span><span class="lines">@@ -1944,15 +1980,12 @@
</span><span class="cx">             }
</span><span class="cx">         }
</span><span class="cx"> 
</span><del>-        add32(TrustedImm32(1), index);
</del><span class="cx"> #ifdef JIT_UNICODE_EXPRESSIONS
</span><del>-        if (m_decodeSurrogatePairs) {
-            nonGreedyFailures.append(atEndOfInput());
-            Jump isBMPChar = branch32(LessThan, character, supplementaryPlanesBase);
</del><ins>+        if (m_decodeSurrogatePairs)
+            advanceIndexAfterCharacterClassTermMatch(term, nonGreedyFailures, character);
+        else
+#endif
</ins><span class="cx">             add32(TrustedImm32(1), index);
</span><del>-            isBMPChar.link(this);
-        }
-#endif
</del><span class="cx">         add32(TrustedImm32(1), countRegister);
</span><span class="cx"> 
</span><span class="cx">         jump(op.m_reentry);
</span><span class="lines">@@ -3700,7 +3733,6 @@
</span><span class="cx">             push(X86Registers::r15);
</span><span class="cx"> 
</span><span class="cx">             move(TrustedImm32(0xd800), leadingSurrogateTag);
</span><del>-            move(TrustedImm32(0xdc00), trailingSurrogateTag);
</del><span class="cx">         }
</span><span class="cx">         // The ABI doesn't guarantee the upper bits are zero on unsigned arguments, so clear them ourselves.
</span><span class="cx">         zeroExtend32ToPtr(index, index);
</span><span class="lines">@@ -3734,7 +3766,6 @@
</span><span class="cx">         if (m_decodeSurrogatePairs) {
</span><span class="cx">             pushPair(framePointerRegister, linkRegister);
</span><span class="cx">             move(TrustedImm32(0x10000), supplementaryPlanesBase);
</span><del>-            move(TrustedImm32(0xfffffc00), surrogateTagMask);
</del><span class="cx">             move(TrustedImm32(0xd800), leadingSurrogateTag);
</span><span class="cx">             move(TrustedImm32(0xdc00), trailingSurrogateTag);
</span><span class="cx">         }
</span><span class="lines">@@ -3815,6 +3846,7 @@
</span><span class="cx">         , m_charSize(charSize)
</span><span class="cx">         , m_decodeSurrogatePairs(m_charSize == Char16 && m_pattern.unicode())
</span><span class="cx">         , m_unicodeIgnoreCase(m_pattern.unicode() && m_pattern.ignoreCase())
</span><ins>+        , m_fixedSizedAlternative(false)
</ins><span class="cx">         , m_canonicalMode(m_pattern.unicode() ? CanonicalMode::Unicode : CanonicalMode::UCS2)
</span><span class="cx"> #if ENABLE(YARR_JIT_ALL_PARENS_EXPRESSIONS)
</span><span class="cx">         , m_containsNestedSubpatterns(false)
</span><span class="lines">@@ -3869,6 +3901,11 @@
</span><span class="cx">         generateFailReturn();
</span><span class="cx">         hasInput.link(this);
</span><span class="cx"> 
</span><ins>+#ifdef JIT_UNICODE_EXPRESSIONS
+        if (m_decodeSurrogatePairs)
+            getEffectiveAddress(BaseIndex(input, length, TimesTwo), endOfStringAddress);
+#endif
+
</ins><span class="cx"> #if ENABLE(YARR_JIT_ALL_PARENS_EXPRESSIONS)
</span><span class="cx">         if (m_containsNestedSubpatterns)
</span><span class="cx">             move(TrustedImm32(matchLimit), remainingMatchCount);
</span><span class="lines">@@ -4163,6 +4200,7 @@
</span><span class="cx"> 
</span><span class="cx">     bool m_decodeSurrogatePairs;
</span><span class="cx">     bool m_unicodeIgnoreCase;
</span><ins>+    bool m_fixedSizedAlternative;
</ins><span class="cx">     CanonicalMode m_canonicalMode;
</span><span class="cx"> #if ENABLE(YARR_JIT_ALL_PARENS_EXPRESSIONS)
</span><span class="cx">     bool m_containsNestedSubpatterns;
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreyarrYarrPatterncpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/yarr/YarrPattern.cpp (243641 => 243642)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/yarr/YarrPattern.cpp 2019-03-29 05:18:47 UTC (rev 243641)
+++ trunk/Source/JavaScriptCore/yarr/YarrPattern.cpp    2019-03-29 06:05:55 UTC (rev 243642)
</span><span class="lines">@@ -45,8 +45,8 @@
</span><span class="cx"> public:
</span><span class="cx">     CharacterClassConstructor(bool isCaseInsensitive, CanonicalMode canonicalMode)
</span><span class="cx">         : m_isCaseInsensitive(isCaseInsensitive)
</span><del>-        , m_hasNonBMPCharacters(false)
</del><span class="cx">         , m_anyCharacter(false)
</span><ins>+        , m_characterWidths(CharacterClassWidths::Unknown)
</ins><span class="cx">         , m_canonicalMode(canonicalMode)
</span><span class="cx">     {
</span><span class="cx">     }
</span><span class="lines">@@ -57,8 +57,8 @@
</span><span class="cx">         m_ranges.clear();
</span><span class="cx">         m_matchesUnicode.clear();
</span><span class="cx">         m_rangesUnicode.clear();
</span><del>-        m_hasNonBMPCharacters = false;
</del><span class="cx">         m_anyCharacter = false;
</span><ins>+        m_characterWidths = CharacterClassWidths::Unknown;
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void append(const CharacterClass* other)
</span><span class="lines">@@ -246,11 +246,11 @@
</span><span class="cx">         characterClass->m_ranges.swap(m_ranges);
</span><span class="cx">         characterClass->m_matchesUnicode.swap(m_matchesUnicode);
</span><span class="cx">         characterClass->m_rangesUnicode.swap(m_rangesUnicode);
</span><del>-        characterClass->m_hasNonBMPCharacters = hasNonBMPCharacters();
</del><span class="cx">         characterClass->m_anyCharacter = anyCharacter();
</span><ins>+        characterClass->m_characterWidths = characterWidths();
</ins><span class="cx"> 
</span><del>-        m_hasNonBMPCharacters = false;
</del><span class="cx">         m_anyCharacter = false;
</span><ins>+        m_characterWidths = CharacterClassWidths::Unknown;
</ins><span class="cx"> 
</span><span class="cx">         return characterClass;
</span><span class="cx">     }
</span><span class="lines">@@ -266,8 +266,7 @@
</span><span class="cx">         unsigned pos = 0;
</span><span class="cx">         unsigned range = matches.size();
</span><span class="cx"> 
</span><del>-        if (!U_IS_BMP(ch))
-            m_hasNonBMPCharacters = true;
</del><ins>+        m_characterWidths |= (U_IS_BMP(ch) ? CharacterClassWidths::HasBMPChars : CharacterClassWidths::HasNonBMPChars);
</ins><span class="cx"> 
</span><span class="cx">         // binary chop, find position to insert char.
</span><span class="cx">         while (range) {
</span><span class="lines">@@ -316,8 +315,10 @@
</span><span class="cx">     {
</span><span class="cx">         size_t end = ranges.size();
</span><span class="cx"> 
</span><ins>+        if (U_IS_BMP(lo))
+            m_characterWidths |= CharacterClassWidths::HasBMPChars;
</ins><span class="cx">         if (!U_IS_BMP(hi))
</span><del>-            m_hasNonBMPCharacters = true;
</del><ins>+            m_characterWidths |= CharacterClassWidths::HasNonBMPChars;
</ins><span class="cx"> 
</span><span class="cx">         // Simple linear scan - I doubt there are that many ranges anyway...
</span><span class="cx">         // feel free to fix this with something faster (eg binary chop).
</span><span class="lines">@@ -408,9 +409,14 @@
</span><span class="cx"> 
</span><span class="cx">     bool hasNonBMPCharacters()
</span><span class="cx">     {
</span><del>-        return m_hasNonBMPCharacters;
</del><ins>+        return m_characterWidths & CharacterClassWidths::HasNonBMPChars;
</ins><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    CharacterClassWidths characterWidths()
+    {
+        return m_characterWidths;
+    }
+
</ins><span class="cx">     bool anyCharacter()
</span><span class="cx">     {
</span><span class="cx">         return m_anyCharacter;
</span><span class="lines">@@ -417,8 +423,9 @@
</span><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     bool m_isCaseInsensitive : 1;
</span><del>-    bool m_hasNonBMPCharacters : 1;
</del><span class="cx">     bool m_anyCharacter : 1;
</span><ins>+    CharacterClassWidths m_characterWidths;
+    
</ins><span class="cx">     CanonicalMode m_canonicalMode;
</span><span class="cx"> 
</span><span class="cx">     Vector<UChar32> m_matches;
</span><span class="lines">@@ -836,8 +843,16 @@
</span><span class="cx">                 } else if (m_pattern.unicode()) {
</span><span class="cx">                     term.frameLocation = currentCallFrameSize;
</span><span class="cx">                     currentCallFrameSize += YarrStackSpaceForBackTrackInfoCharacterClass;
</span><del>-                    currentInputPosition += term.quantityMaxCount;
-                    alternative->m_hasFixedSize = false;
</del><ins>+                    if (term.characterClass->hasOneCharacterSize() && !term.invert()) {
+                        Checked<unsigned, RecordOverflow> tempCount = term.quantityMaxCount;
+                        tempCount *= term.characterClass->hasNonBMPCharacters() ? 2 : 1;
+                        if (tempCount.hasOverflowed())
+                            return ErrorCode::OffsetTooLarge;
+                        currentInputPosition += tempCount;
+                    } else {
+                        currentInputPosition += term.quantityMaxCount;
+                        alternative->m_hasFixedSize = false;
+                    }
</ins><span class="cx">                 } else
</span><span class="cx">                     currentInputPosition += term.quantityMaxCount;
</span><span class="cx">                 break;
</span><span class="lines">@@ -1318,6 +1333,7 @@
</span><span class="cx">         break;
</span><span class="cx">     case TypeCharacterClass:
</span><span class="cx">         out.print("character class ");
</span><ins>+        out.printf("inputPosition %u ", inputPosition);
</ins><span class="cx">         dumpCharacterClass(out, thisPattern, characterClass);
</span><span class="cx">         dumpQuantifier(out);
</span><span class="cx">         if (quantityType != QuantifierFixedCount || thisPattern->unicode())
</span><span class="lines">@@ -1461,7 +1477,7 @@
</span><span class="cx">     auto characterClass = std::make_unique<CharacterClass>();
</span><span class="cx">     characterClass->m_ranges.append(CharacterRange(0x00, 0x7f));
</span><span class="cx">     characterClass->m_rangesUnicode.append(CharacterRange(0x0080, 0x10ffff));
</span><del>-    characterClass->m_hasNonBMPCharacters = true;
</del><ins>+    characterClass->m_characterWidths = CharacterClassWidths::HasBothBMPAndNonBMP;
</ins><span class="cx">     characterClass->m_anyCharacter = true;
</span><span class="cx">     return characterClass;
</span><span class="cx"> }
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreyarrYarrPatternh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/yarr/YarrPattern.h (243641 => 243642)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/yarr/YarrPattern.h   2019-03-29 05:18:47 UTC (rev 243641)
+++ trunk/Source/JavaScriptCore/yarr/YarrPattern.h      2019-03-29 06:05:55 UTC (rev 243642)
</span><span class="lines">@@ -52,6 +52,29 @@
</span><span class="cx">     }
</span><span class="cx"> };
</span><span class="cx"> 
</span><ins>+enum struct CharacterClassWidths : unsigned char {
+    Unknown = 0x0,
+    HasBMPChars = 0x1,
+    HasNonBMPChars = 0x2,
+    HasBothBMPAndNonBMP = HasBMPChars | HasNonBMPChars
+};
+
+inline CharacterClassWidths operator|(CharacterClassWidths lhs, CharacterClassWidths rhs)
+{
+    return static_cast<CharacterClassWidths>(static_cast<unsigned>(lhs) | static_cast<unsigned>(rhs));
+}
+
+inline bool operator&(CharacterClassWidths lhs, CharacterClassWidths rhs)
+{
+    return static_cast<unsigned>(lhs) & static_cast<unsigned>(rhs);
+}
+
+inline CharacterClassWidths& operator|=(CharacterClassWidths& lhs, CharacterClassWidths rhs)
+{
+    lhs = lhs | rhs;
+    return lhs;
+}
+
</ins><span class="cx"> struct CharacterClass {
</span><span class="cx">     WTF_MAKE_FAST_ALLOCATED;
</span><span class="cx"> public:
</span><span class="lines">@@ -60,29 +83,34 @@
</span><span class="cx">     // specified matches and ranges)
</span><span class="cx">     CharacterClass()
</span><span class="cx">         : m_table(0)
</span><del>-        , m_hasNonBMPCharacters(false)
</del><ins>+        , m_characterWidths(CharacterClassWidths::Unknown)
</ins><span class="cx">         , m_anyCharacter(false)
</span><span class="cx">     {
</span><span class="cx">     }
</span><span class="cx">     CharacterClass(const char* table, bool inverted)
</span><span class="cx">         : m_table(table)
</span><ins>+        , m_characterWidths(CharacterClassWidths::Unknown)
</ins><span class="cx">         , m_tableInverted(inverted)
</span><del>-        , m_hasNonBMPCharacters(false)
</del><span class="cx">         , m_anyCharacter(false)
</span><span class="cx">     {
</span><span class="cx">     }
</span><del>-    CharacterClass(std::initializer_list<UChar32> matches, std::initializer_list<CharacterRange> ranges, std::initializer_list<UChar32> matchesUnicode, std::initializer_list<CharacterRange> rangesUnicode)
</del><ins>+    CharacterClass(std::initializer_list<UChar32> matches, std::initializer_list<CharacterRange> ranges, std::initializer_list<UChar32> matchesUnicode, std::initializer_list<CharacterRange> rangesUnicode, CharacterClassWidths widths)
</ins><span class="cx">         : m_matches(matches)
</span><span class="cx">         , m_ranges(ranges)
</span><span class="cx">         , m_matchesUnicode(matchesUnicode)
</span><span class="cx">         , m_rangesUnicode(rangesUnicode)
</span><span class="cx">         , m_table(0)
</span><ins>+        , m_characterWidths(widths)
</ins><span class="cx">         , m_tableInverted(false)
</span><del>-        , m_hasNonBMPCharacters(false)
</del><span class="cx">         , m_anyCharacter(false)
</span><span class="cx">     {
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    bool hasNonBMPCharacters() { return m_characterWidths & CharacterClassWidths::HasNonBMPChars; }
+
+    bool hasOneCharacterSize() { return m_characterWidths == CharacterClassWidths::HasBMPChars || m_characterWidths == CharacterClassWidths::HasNonBMPChars; }
+    bool hasOnlyNonBMPCharacters() { return m_characterWidths == CharacterClassWidths::HasNonBMPChars; }
+    
</ins><span class="cx">     Vector<UChar32> m_matches;
</span><span class="cx">     Vector<CharacterRange> m_ranges;
</span><span class="cx">     Vector<UChar32> m_matchesUnicode;
</span><span class="lines">@@ -89,8 +117,8 @@
</span><span class="cx">     Vector<CharacterRange> m_rangesUnicode;
</span><span class="cx"> 
</span><span class="cx">     const char* m_table;
</span><ins>+    CharacterClassWidths m_characterWidths;
</ins><span class="cx">     bool m_tableInverted : 1;
</span><del>-    bool m_hasNonBMPCharacters : 1;
</del><span class="cx">     bool m_anyCharacter : 1;
</span><span class="cx"> };
</span><span class="cx"> 
</span><span class="lines">@@ -220,7 +248,7 @@
</span><span class="cx">         return PatternTerm(TypeAssertionWordBoundary, invert);
</span><span class="cx">     }
</span><span class="cx">     
</span><del>-    bool invert()
</del><ins>+    bool invert() const
</ins><span class="cx">     {
</span><span class="cx">         return m_invert;
</span><span class="cx">     }
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreyarrcreate_regex_tables"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/yarr/create_regex_tables (243641 => 243642)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/yarr/create_regex_tables     2019-03-29 05:18:47 UTC (rev 243641)
+++ trunk/Source/JavaScriptCore/yarr/create_regex_tables        2019-03-29 06:05:55 UTC (rev 243642)
</span><span class="lines">@@ -100,8 +100,13 @@
</span><span class="cx">             function += ("    auto characterClass = std::make_unique<CharacterClass>(_%sData, false);\n" % (name))
</span><span class="cx">     else:
</span><span class="cx">         function += ("    auto characterClass = std::make_unique<CharacterClass>();\n")
</span><ins>+    hasBMPCharacters = False
</ins><span class="cx">     hasNonBMPCharacters = False
</span><span class="cx">     for (min, max) in ranges:
</span><ins>+        if min < 0x10000:
+            hasBMPCharacters = True
+        if max >= 0x10000:
+            hasNonBMPCharacters = True
</ins><span class="cx">         if (min == max):
</span><span class="cx">             if (min > 127):
</span><span class="cx">                 function += ("    characterClass->m_matchesUnicode.append(0x%04x);\n" % min)
</span><span class="lines">@@ -112,9 +117,7 @@
</span><span class="cx">             function += ("    characterClass->m_rangesUnicode.append(CharacterRange(0x%04x, 0x%04x));\n" % (min, max))
</span><span class="cx">         else:
</span><span class="cx">             function += ("    characterClass->m_ranges.append(CharacterRange(0x%02x, 0x%02x));\n" % (min, max))
</span><del>-        if max >= 0x10000:
-            hasNonBMPCharacters = True
-    function += ("    characterClass->m_hasNonBMPCharacters = %s;\n" % ("true" if hasNonBMPCharacters else "false"))
</del><ins>+    function += ("    characterClass->m_characterWidths = CharacterClassWidths::%s;\n" % (("Unknown", "HasBMPChars", "HasNonBMPChars", "HasBothBMPAndNonBMP")[int(hasNonBMPCharacters) * 2 + int(hasBMPCharacters)]))
</ins><span class="cx">     function += ("    return characterClass;\n")
</span><span class="cx">     function += ("}\n\n")
</span><span class="cx">     functions += function
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreyarrgenerateYarrUnicodePropertyTablespy"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/yarr/generateYarrUnicodePropertyTables.py (243641 => 243642)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/yarr/generateYarrUnicodePropertyTables.py    2019-03-29 05:18:47 UTC (rev 243641)
+++ trunk/Source/JavaScriptCore/yarr/generateYarrUnicodePropertyTables.py       2019-03-29 06:05:55 UTC (rev 243642)
</span><span class="lines">@@ -35,7 +35,7 @@
</span><span class="cx"> from hasher import stringHash
</span><span class="cx"> 
</span><span class="cx"> header = """/*
</span><del>-* Copyright (C) 2017-2018 Apple Inc. All rights reserved.
</del><ins>+* Copyright (C) 2017-2019 Apple Inc. All rights reserved.
</ins><span class="cx"> *
</span><span class="cx"> * Redistribution and use in source and binary forms, with or without
</span><span class="cx"> * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -225,6 +225,7 @@
</span><span class="cx">         self.name = name
</span><span class="cx">         self.aliases = []
</span><span class="cx">         self.index = len(PropertyData.allPropertyData)
</span><ins>+        self.hasBMPCharacters = False
</ins><span class="cx">         self.hasNonBMPCharacters = False
</span><span class="cx">         self.matches = []
</span><span class="cx">         self.ranges = []
</span><span class="lines">@@ -249,7 +250,9 @@
</span><span class="cx">         return "createCharacterClass{}".format(self.index)
</span><span class="cx"> 
</span><span class="cx">     def addMatch(self, codePoint):
</span><del>-        if codePoint > MaxBMP:
</del><ins>+        if codePoint <= MaxBMP:
+            self.hasBMPCharacters = True
+        else:
</ins><span class="cx">             self.hasNonBMPCharacters = True
</span><span class="cx">         if codePoint <= lastASCIICodePoint:
</span><span class="cx">             if (len(self.matches) and self.matches[-1] > codePoint) or (len(self.ranges) and self.ranges[-1][1] > codePoint):
</span><span class="lines">@@ -281,6 +284,8 @@
</span><span class="cx">                 self.unicodeMatches.append(codePoint)
</span><span class="cx"> 
</span><span class="cx">     def addRange(self, lowCodePoint, highCodePoint):
</span><ins>+        if lowCodePoint <= MaxBMP:
+            self.hasBMPCharacters = True
</ins><span class="cx">         if highCodePoint > MaxBMP:
</span><span class="cx">             self.hasNonBMPCharacters = True
</span><span class="cx">         if highCodePoint <= lastASCIICodePoint:
</span><span class="lines">@@ -536,9 +541,9 @@
</span><span class="cx">         file.write("),\n")
</span><span class="cx">         file.write("        std::initializer_list<CharacterRange>(")
</span><span class="cx">         self.dumpMatchData(file, 4, self.unicodeRanges, lambda file, range: (file.write("{{{0:0=#6x}, {1:0=#6x}}}".format(range[0], range[1]))))
</span><del>-        file.write("));\n")
</del><ins>+        file.write("),\n")
</ins><span class="cx"> 
</span><del>-        file.write("    characterClass->m_hasNonBMPCharacters = {};\n".format(("false", "true")[self.hasNonBMPCharacters]))
</del><ins>+        file.write("        CharacterClassWidths::{});\n".format(("Unknown", "HasBMPChars", "HasNonBMPChars", "HasBothBMPAndNonBMP")[int(self.hasNonBMPCharacters) * 2 + int(self.hasBMPCharacters)]))
</ins><span class="cx">         file.write("    return characterClass;\n}\n\n")
</span><span class="cx"> 
</span><span class="cx">     @classmethod
</span></span></pre>
</div>
</div>

</body>
</html>