<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[280570] trunk</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/280570">280570</a></dd>
<dt>Author</dt> <dd>ysuzuki@apple.com</dd>
<dt>Date</dt> <dd>2021-08-02 16:43:16 -0700 (Mon, 02 Aug 2021)</dd>
</dl>

<h3>Log Message</h3>
<pre>[JSC] Yarr BoyerMoore search should support character-class
https://bugs.webkit.org/show_bug.cgi?id=228613

Reviewed by Saam Barati.

JSTests:

* stress/regexp-bm-search-character-non-fixed-size.js: Added.
(shouldBe):
* stress/regexp-bm-search-many-candidate-zero-length.js: Added.
(shouldBe):
(regexp.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.0.1.2.3.4.5.6.7.8.9.t.v.n.r):
* stress/regexp-bm-search-non-fixed-size.js: Added.
(shouldBe):

Source/JavaScriptCore:

This patch adds character-class support for BoyerMoore lookahead search in Yarr.
Currently, we only support fixed-sized character-class. We can extend it for repeat cases in the future.

To apply this character-class thing to jQuery's RegExp, we also allow non-fixed-sized disjunction.
For example, /aaaa.*|bbbb/'s disjunction is not fixed-sized. But still we can use (aaaa|bbbb) prefix since
this part is fixed-sized and we know minimum-size of this disjunction is 4.

Plus, instead of giving up BoyerMoore search when we found non-supported terms, we shorten BoyerMoore search
length not to include this term so that we can still have a chance to leverage BoyerMoore search. In the case
of /aaaa|bbbb|ccc(d|e|f)/, we previously gave up since it finds `(d|e|f)`. But now, instead we shorten the length
from 4 to 3, and construct search pattern with `aaa|bbb|ccc`.

This patch improves jquery-todomvc-regexp by 20%.

                                      ToT                     Patched

    jquery-todomvc-regexp      545.3561+-0.6968     ^    451.6117+-0.4613        ^ definitely 1.2076x faster

This improves Speedometer2/jQuery-TodoMVC by 2%.

    ----------------------------------------------------------------------------------------------------------------------------------
    |               subtest                |     ms      |     ms      |  b / a   | pValue (significance using False Discovery Rate) |
    ----------------------------------------------------------------------------------------------------------------------------------
    | Elm-TodoMVC                          |123.470833   |123.550000   |1.000641  | 0.841600                                         |
    | VueJS-TodoMVC                        |26.883333    |26.950000    |1.002480  | 0.846732                                         |
    | EmberJS-TodoMVC                      |127.708333   |127.754167   |1.000359  | 0.934206                                         |
    | BackboneJS-TodoMVC                   |50.545833    |50.445833    |0.998022  | 0.679610                                         |
    | Preact-TodoMVC                       |20.879167    |20.791667    |0.995809  | 0.796541                                         |
    | AngularJS-TodoMVC                    |137.479167   |137.275000   |0.998515  | 0.729817                                         |
    | Vanilla-ES2015-TodoMVC               |69.079167    |68.912500    |0.997587  | 0.524325                                         |
    | Inferno-TodoMVC                      |65.604167    |66.120833    |1.007876  | 0.145549                                         |
    | Flight-TodoMVC                       |77.029167    |76.708333    |0.995835  | 0.518562                                         |
    | Angular2-TypeScript-TodoMVC          |40.516667    |40.812500    |1.007302  | 0.513386                                         |
    | VanillaJS-TodoMVC                    |54.762500    |54.895833    |1.002435  | 0.647381                                         |
    | jQuery-TodoMVC                       |255.950000   |250.425000   |0.978414  | 0.000000 (significant)                           |
    | EmberJS-Debug-TodoMVC                |341.745833   |342.804167   |1.003097  | 0.219937                                         |
    | React-TodoMVC                        |88.854167    |88.700000    |0.998265  | 0.568405                                         |
    | React-Redux-TodoMVC                  |151.266667   |150.804167   |0.996942  | 0.256403                                         |
    | Vanilla-ES2015-Babel-Webpack-TodoMVC |65.783333    |65.645833    |0.997910  | 0.437464                                         |
    ----------------------------------------------------------------------------------------------------------------------------------
    a mean = 246.52898
    b mean = 246.85128
    pValue = 0.3927330278
    (Bigger means are better.)
    1.001 times better
    Results ARE NOT significant

* yarr/YarrJIT.cpp:
(JSC::Yarr::BoyerMooreInfo::shortenLength):
(JSC::Yarr::BoyerMooreInfo::setAll):
(JSC::Yarr::BoyerMooreInfo::addCharacters):
(JSC::Yarr::BoyerMooreInfo::addRanges):
* yarr/YarrJIT.h:
(JSC::Yarr::BoyerMooreBitmap::add):
(JSC::Yarr::BoyerMooreBitmap::addCharacters):
(JSC::Yarr::BoyerMooreBitmap::addRanges):
(JSC::Yarr::BoyerMooreBitmap::setAll):
(JSC::Yarr::BoyerMooreBitmap::isAllSet const):</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkJSTestsChangeLog">trunk/JSTests/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreChangeLog">trunk/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreyarrYarrJITcpp">trunk/Source/JavaScriptCore/yarr/YarrJIT.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreyarrYarrJITh">trunk/Source/JavaScriptCore/yarr/YarrJIT.h</a></li>
</ul>

<h3>Added Paths</h3>
<ul>
<li><a href="#trunkJSTestsstressregexpbmsearchcharacternonfixedsizejs">trunk/JSTests/stress/regexp-bm-search-character-non-fixed-size.js</a></li>
<li><a href="#trunkJSTestsstressregexpbmsearchmanycandidatezerolengthjs">trunk/JSTests/stress/regexp-bm-search-many-candidate-zero-length.js</a></li>
<li><a href="#trunkJSTestsstressregexpbmsearchnonfixedsizejs">trunk/JSTests/stress/regexp-bm-search-non-fixed-size.js</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkJSTestsChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/JSTests/ChangeLog (280569 => 280570)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/JSTests/ChangeLog  2021-08-02 23:39:10 UTC (rev 280569)
+++ trunk/JSTests/ChangeLog     2021-08-02 23:43:16 UTC (rev 280570)
</span><span class="lines">@@ -1,5 +1,20 @@
</span><span class="cx"> 2021-08-02  Yusuke Suzuki  <ysuzuki@apple.com>
</span><span class="cx"> 
</span><ins>+        [JSC] Yarr BoyerMoore search should support character-class
+        https://bugs.webkit.org/show_bug.cgi?id=228613
+
+        Reviewed by Saam Barati.
+
+        * stress/regexp-bm-search-character-non-fixed-size.js: Added.
+        (shouldBe):
+        * stress/regexp-bm-search-many-candidate-zero-length.js: Added.
+        (shouldBe):
+        (regexp.a.b.c.d.e.f.g.h.i.j.k.l.m.n.o.p.q.r.s.t.u.v.w.x.y.z.0.1.2.3.4.5.6.7.8.9.t.v.n.r):
+        * stress/regexp-bm-search-non-fixed-size.js: Added.
+        (shouldBe):
+
+2021-08-02  Yusuke Suzuki  <ysuzuki@apple.com>
+
</ins><span class="cx">         [JSC] Update test262
</span><span class="cx">         https://bugs.webkit.org/show_bug.cgi?id=228709
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkJSTestsstressregexpbmsearchcharacternonfixedsizejs"></a>
<div class="addfile"><h4>Added: trunk/JSTests/stress/regexp-bm-search-character-non-fixed-size.js (0 => 280570)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/JSTests/stress/regexp-bm-search-character-non-fixed-size.js                                (rev 0)
+++ trunk/JSTests/stress/regexp-bm-search-character-non-fixed-size.js   2021-08-02 23:43:16 UTC (rev 280570)
</span><span class="lines">@@ -0,0 +1,14 @@
</span><ins>+function shouldBe(actual, expected) {
+    if (actual !== expected)
+        throw new Error('bad value: ' + actual);
+}
+
+let regexp = /ssssss.*ss/;
+let regexpFail = /oooooo.*oo/;
+
+for (var i = 0; i < 1e2; ++i) {
+    let matched = `aaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbcccccccpppppppppppptttttttttttt<<<<<<<<<<<<<<ddddddddddddddddddddddddddddddddddddddddjjjjjjjjjjssssss src="hey.js" ssHey`.match(regexp);
+    shouldBe(matched[0], `ssssss src="hey.js" ss`);
+    let notMatched = `aaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbcccccccpppppppppppptttttttttttt<<<<<<<<<<<<<<ddddddddddddddddddddddddddddddddddddddddjjjjjjjjjjssssss src="hey.js" ssHey`.match(regexpFail);
+    shouldBe(notMatched, null);
+}
</ins></span></pre></div>
<a id="trunkJSTestsstressregexpbmsearchmanycandidatezerolengthjs"></a>
<div class="addfile"><h4>Added: trunk/JSTests/stress/regexp-bm-search-many-candidate-zero-length.js (0 => 280570)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/JSTests/stress/regexp-bm-search-many-candidate-zero-length.js                              (rev 0)
+++ trunk/JSTests/stress/regexp-bm-search-many-candidate-zero-length.js 2021-08-02 23:43:16 UTC (rev 280570)
</span><span class="lines">@@ -0,0 +1,14 @@
</span><ins>+function shouldBe(actual, expected) {
+    if (actual !== expected)
+        throw new Error('bad value: ' + actual);
+}
+
+var regexp = /a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z|0|1|2|3|4|5|6|7|8|9| |\t|\v|\n|\r|\$|\^|\&|\*|\(|\)/
+
+for (var i = 0; i < 1e2; ++i) {
+    shouldBe(regexp.test(`%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%`), false);
+    shouldBe(regexp.test(`ใƒ†ใ‚นใƒˆ`), false);
+    shouldBe(regexp.test(`testing`), true);
+    shouldBe(RegExp.leftContext, ``);
+    shouldBe(RegExp.rightContext, `esting`);
+}
</ins></span></pre></div>
<a id="trunkJSTestsstressregexpbmsearchnonfixedsizejs"></a>
<div class="addfile"><h4>Added: trunk/JSTests/stress/regexp-bm-search-non-fixed-size.js (0 => 280570)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/JSTests/stress/regexp-bm-search-non-fixed-size.js                          (rev 0)
+++ trunk/JSTests/stress/regexp-bm-search-non-fixed-size.js     2021-08-02 23:43:16 UTC (rev 280570)
</span><span class="lines">@@ -0,0 +1,14 @@
</span><ins>+function shouldBe(actual, expected) {
+    if (actual !== expected)
+        throw new Error('bad value: ' + actual);
+}
+
+let regexp = /<script.*\/>/i;
+let regexpFail = /<scripp.*\/>/i;
+
+for (var i = 0; i < 1e2; ++i) {
+    let matched = `aaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbcccccccpppppppppppptttttttttttt<<<<<<<<<<<<<<ddddddddddddddddddddddddddddddddddddddddjjjjjjjjjj<script src="hey.js" />Hey`.match(regexp);
+    shouldBe(matched[0], `<script src="hey.js" />`);
+    let notMatched = `aaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbcccccccpppppppppppptttttttttttt<<<<<<<<<<<<<<ddddddddddddddddddddddddddddddddddddddddjjjjjjjjjj<script src="hey.js" />Hey`.match(regexpFail);
+    shouldBe(notMatched, null);
+}
</ins></span></pre></div>
<a id="trunkSourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ChangeLog (280569 => 280570)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ChangeLog    2021-08-02 23:39:10 UTC (rev 280569)
+++ trunk/Source/JavaScriptCore/ChangeLog       2021-08-02 23:43:16 UTC (rev 280570)
</span><span class="lines">@@ -1,3 +1,69 @@
</span><ins>+2021-08-02  Yusuke Suzuki  <ysuzuki@apple.com>
+
+        [JSC] Yarr BoyerMoore search should support character-class
+        https://bugs.webkit.org/show_bug.cgi?id=228613
+
+        Reviewed by Saam Barati.
+
+        This patch adds character-class support for BoyerMoore lookahead search in Yarr.
+        Currently, we only support fixed-sized character-class. We can extend it for repeat cases in the future.
+
+        To apply this character-class thing to jQuery's RegExp, we also allow non-fixed-sized disjunction.
+        For example, /aaaa.*|bbbb/'s disjunction is not fixed-sized. But still we can use (aaaa|bbbb) prefix since
+        this part is fixed-sized and we know minimum-size of this disjunction is 4.
+
+        Plus, instead of giving up BoyerMoore search when we found non-supported terms, we shorten BoyerMoore search
+        length not to include this term so that we can still have a chance to leverage BoyerMoore search. In the case
+        of /aaaa|bbbb|ccc(d|e|f)/, we previously gave up since it finds `(d|e|f)`. But now, instead we shorten the length
+        from 4 to 3, and construct search pattern with `aaa|bbb|ccc`.
+
+        This patch improves jquery-todomvc-regexp by 20%.
+
+                                              ToT                     Patched
+
+            jquery-todomvc-regexp      545.3561+-0.6968     ^    451.6117+-0.4613        ^ definitely 1.2076x faster
+
+        This improves Speedometer2/jQuery-TodoMVC by 2%.
+
+            ----------------------------------------------------------------------------------------------------------------------------------
+            |               subtest                |     ms      |     ms      |  b / a   | pValue (significance using False Discovery Rate) |
+            ----------------------------------------------------------------------------------------------------------------------------------
+            | Elm-TodoMVC                          |123.470833   |123.550000   |1.000641  | 0.841600                                         |
+            | VueJS-TodoMVC                        |26.883333    |26.950000    |1.002480  | 0.846732                                         |
+            | EmberJS-TodoMVC                      |127.708333   |127.754167   |1.000359  | 0.934206                                         |
+            | BackboneJS-TodoMVC                   |50.545833    |50.445833    |0.998022  | 0.679610                                         |
+            | Preact-TodoMVC                       |20.879167    |20.791667    |0.995809  | 0.796541                                         |
+            | AngularJS-TodoMVC                    |137.479167   |137.275000   |0.998515  | 0.729817                                         |
+            | Vanilla-ES2015-TodoMVC               |69.079167    |68.912500    |0.997587  | 0.524325                                         |
+            | Inferno-TodoMVC                      |65.604167    |66.120833    |1.007876  | 0.145549                                         |
+            | Flight-TodoMVC                       |77.029167    |76.708333    |0.995835  | 0.518562                                         |
+            | Angular2-TypeScript-TodoMVC          |40.516667    |40.812500    |1.007302  | 0.513386                                         |
+            | VanillaJS-TodoMVC                    |54.762500    |54.895833    |1.002435  | 0.647381                                         |
+            | jQuery-TodoMVC                       |255.950000   |250.425000   |0.978414  | 0.000000 (significant)                           |
+            | EmberJS-Debug-TodoMVC                |341.745833   |342.804167   |1.003097  | 0.219937                                         |
+            | React-TodoMVC                        |88.854167    |88.700000    |0.998265  | 0.568405                                         |
+            | React-Redux-TodoMVC                  |151.266667   |150.804167   |0.996942  | 0.256403                                         |
+            | Vanilla-ES2015-Babel-Webpack-TodoMVC |65.783333    |65.645833    |0.997910  | 0.437464                                         |
+            ----------------------------------------------------------------------------------------------------------------------------------
+            a mean = 246.52898
+            b mean = 246.85128
+            pValue = 0.3927330278
+            (Bigger means are better.)
+            1.001 times better
+            Results ARE NOT significant
+
+        * yarr/YarrJIT.cpp:
+        (JSC::Yarr::BoyerMooreInfo::shortenLength):
+        (JSC::Yarr::BoyerMooreInfo::setAll):
+        (JSC::Yarr::BoyerMooreInfo::addCharacters):
+        (JSC::Yarr::BoyerMooreInfo::addRanges):
+        * yarr/YarrJIT.h:
+        (JSC::Yarr::BoyerMooreBitmap::add):
+        (JSC::Yarr::BoyerMooreBitmap::addCharacters):
+        (JSC::Yarr::BoyerMooreBitmap::addRanges):
+        (JSC::Yarr::BoyerMooreBitmap::setAll):
+        (JSC::Yarr::BoyerMooreBitmap::isAllSet const):
+
</ins><span class="cx"> 2021-08-02  Stephan Szabo  <stephan.szabo@sony.com>
</span><span class="cx"> 
</span><span class="cx">         [PlayStation] Make C files in testapi compile with a C standard rather than C++ one
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreyarrYarrJITcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/yarr/YarrJIT.cpp (280569 => 280570)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/yarr/YarrJIT.cpp     2021-08-02 23:39:10 UTC (rev 280569)
+++ trunk/Source/JavaScriptCore/yarr/YarrJIT.cpp        2021-08-02 23:43:16 UTC (rev 280570)
</span><span class="lines">@@ -63,6 +63,11 @@
</span><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     unsigned length() const { return m_characters.size(); }
</span><ins>+    void shortenLength(unsigned length)
+    {
+        ASSERT(length <= this->length());
+        m_characters.shrink(length);
+    }
</ins><span class="cx"> 
</span><span class="cx">     void set(unsigned index, UChar32 character)
</span><span class="cx">     {
</span><span class="lines">@@ -69,6 +74,21 @@
</span><span class="cx">         m_characters[index].add(character);
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void setAll(unsigned index)
+    {
+        m_characters[index].setAll();
+    }
+
+    void addCharacters(unsigned index, const Vector<UChar32>& characters)
+    {
+        m_characters[index].addCharacters(characters);
+    }
+
+    void addRanges(unsigned index, const Vector<CharacterRange>& range)
+    {
+        m_characters[index].addRanges(range);
+    }
+
</ins><span class="cx">     static UniqueRef<BoyerMooreInfo> create(unsigned length)
</span><span class="cx">     {
</span><span class="cx">         return makeUniqueRef<BoyerMooreInfo>(length);
</span><span class="lines">@@ -124,6 +144,7 @@
</span><span class="cx">     unsigned begin = 0;
</span><span class="cx">     unsigned end = 0;
</span><span class="cx">     constexpr unsigned maxCandidatesPerCharacter = 32;
</span><ins>+    static_assert(maxCandidatesPerCharacter < BoyerMooreBitmap::mapSize);
</ins><span class="cx">     for (unsigned limit = 4; limit < maxCandidatesPerCharacter; limit *= 2) {
</span><span class="cx">         auto [newPoint, newBegin, newEnd] = findBestCharacterSequence(limit);
</span><span class="cx">         if (newPoint > biggestPoint) {
</span><span class="lines">@@ -2381,6 +2402,12 @@
</span><span class="cx">                         auto [map, isMaskEffective] = op.m_bmInfo->createCandidateBitmap(beginIndex, endIndex);
</span><span class="cx">                         unsigned mapCount = map.count();
</span><span class="cx">                         // If candiate characters are <= 2, checking each is better than using vector.
</span><ins>+                        JumpList outOfLengthFailure;
+                        JumpList matched;
+                        dataLogLnIf(YarrJITInternal::verbose, "BM Bitmap is ", map);
+                        // Patterns like /[]/ have zero candidates. Since it is rare, we do not do nothing for now.
+                        if (!mapCount)
+                            break;
</ins><span class="cx">                         if (mapCount <= 2) {
</span><span class="cx">                             UChar32 character1 = map.findBit(0, true);
</span><span class="cx">                             ASSERT(character1 != BoyerMooreBitmap::Map::size());
</span><span class="lines">@@ -2391,7 +2418,6 @@
</span><span class="cx">                             }
</span><span class="cx">                             dataLogLnIf(Options::verboseRegExpCompilation(), "Found 1-or-2 characters lookahead character:(0x", hex(character1), "),character2:(", hex(character2), "),isMaskEffective:(", isMaskEffective,"),range:[", beginIndex, ", ", endIndex, ")");
</span><span class="cx"> 
</span><del>-                            JumpList matched;
</del><span class="cx">                             auto loopHead = label();
</span><span class="cx">                             readCharacter(m_checkedOffset - endIndex + 1, regT0);
</span><span class="cx">                             if (isMaskEffective)
</span><span class="lines">@@ -2399,9 +2425,8 @@
</span><span class="cx">                             matched.append(branch32(Equal, regT0, TrustedImm32(character1)));
</span><span class="cx">                             if (mapCount == 2)
</span><span class="cx">                                 matched.append(branch32(Equal, regT0, TrustedImm32(character2)));
</span><del>-                            op.m_jumps.append(jumpIfNoAvailableInput(endIndex - beginIndex));
</del><ins>+                            outOfLengthFailure.append(jumpIfNoAvailableInput(endIndex - beginIndex));
</ins><span class="cx">                             jump().linkTo(loopHead, this);
</span><del>-                            matched.link(this);
</del><span class="cx">                         } else {
</span><span class="cx">                             const auto* pointer = getBoyerMooreBitmap(map);
</span><span class="cx">                             dataLogLnIf(Options::verboseRegExpCompilation(), "Found bitmap lookahead count:(", mapCount, "),range:[", beginIndex, ", ", endIndex, ")");
</span><span class="lines">@@ -2416,7 +2441,7 @@
</span><span class="cx">                             extractUnsignedBitfield32(regT0, TrustedImm32(6), TrustedImm32(1), regT2); // Extract 1 bit for index.
</span><span class="cx">                             load64(BaseIndex(regT1, regT2, TimesEight), regT2);
</span><span class="cx">                             urshift64(regT0, regT2); // We can ignore upper bits and only lower 6bits are effective.
</span><del>-                            auto matched = branchTest64(NonZero, regT2, TrustedImm32(1));
</del><ins>+                            matched.append(branchTest64(NonZero, regT2, TrustedImm32(1)));
</ins><span class="cx"> #elif CPU(X86_64)
</span><span class="cx">                             static_assert(sizeof(BoyerMooreBitmap::Map::WordType) == sizeof(uint64_t));
</span><span class="cx">                             static_assert(1 << 6 == 64);
</span><span class="lines">@@ -2425,7 +2450,7 @@
</span><span class="cx">                             urshift32(TrustedImm32(6), regT2);
</span><span class="cx">                             and32(TrustedImm32(1), regT2);
</span><span class="cx">                             load64(BaseIndex(regT1, regT2, TimesEight), regT2);
</span><del>-                            auto matched = branchTestBit64(NonZero, regT2, regT0); // We can ignore upper bits since modulo-64 is performed.
</del><ins>+                            matched.append(branchTestBit64(NonZero, regT2, regT0)); // We can ignore upper bits since modulo-64 is performed.
</ins><span class="cx"> #else
</span><span class="cx">                             static_assert(sizeof(BoyerMooreBitmap::Map::WordType) == sizeof(uint32_t));
</span><span class="cx">                             static_assert(1 << 5 == 32);
</span><span class="lines">@@ -2435,15 +2460,16 @@
</span><span class="cx">                             and32(TrustedImm32(0b11), regT2);
</span><span class="cx">                             load32(BaseIndex(regT1, regT2, TimesFour), regT2);
</span><span class="cx">                             urshift32(regT0, regT2); // We can ignore upper bits and only lower 5bits are effective.
</span><del>-                            auto matched = branchTest32(NonZero, regT2, TrustedImm32(1));
</del><ins>+                            matched.append(branchTest32(NonZero, regT2, TrustedImm32(1)));
</ins><span class="cx"> #endif
</span><del>-                            op.m_jumps.append(jumpIfNoAvailableInput(endIndex - beginIndex));
</del><ins>+                            outOfLengthFailure.append(jumpIfNoAvailableInput(endIndex - beginIndex));
</ins><span class="cx">                             jump().linkTo(loopHead, this);
</span><del>-                            matched.link(this);
</del><span class="cx">                         }
</span><span class="cx"> 
</span><span class="cx">                         // If the pattern size is not fixed, then store the start index for use if we match.
</span><ins>+                        // This is used for adjusting match-start when we failed to find the start with BoyerMoore search.
</ins><span class="cx">                         if (!m_pattern.m_body->m_hasFixedSize) {
</span><ins>+                            outOfLengthFailure.link(this);
</ins><span class="cx">                             if (alternative->m_minimumSize) {
</span><span class="cx">                                 move(index, regT0);
</span><span class="cx">                                 sub32(Imm32(alternative->m_minimumSize), regT0);
</span><span class="lines">@@ -2450,6 +2476,21 @@
</span><span class="cx">                                 setMatchStart(regT0);
</span><span class="cx">                             } else
</span><span class="cx">                                 setMatchStart(index);
</span><ins>+                            op.m_jumps.append(jump());
+                        } else
+                            op.m_jumps.append(outOfLengthFailure);
+
+                        matched.link(this);
+                        // If the pattern size is not fixed, then store the start index for use if we match.
+                        // This is used for adjusting match-start when we start pattern matching with the updated index
+                        // by BoyerMoore search.
+                        if (!m_pattern.m_body->m_hasFixedSize) {
+                            if (alternative->m_minimumSize) {
+                                move(index, regT0);
+                                sub32(Imm32(alternative->m_minimumSize), regT0);
+                                setMatchStart(regT0);
+                            } else
+                                setMatchStart(index);
</ins><span class="cx">                         }
</span><span class="cx">                     }
</span><span class="cx">                 }
</span><span class="lines">@@ -3842,7 +3883,7 @@
</span><span class="cx">         // it fails when the body alternatives fail to match with the current offset.
</span><span class="cx">         // FIXME: Support unicode flag.
</span><span class="cx">         // https://bugs.webkit.org/show_bug.cgi?id=228611
</span><del>-        if (disjunction->m_minimumSize && disjunction->m_hasFixedSize && !m_pattern.sticky() && !m_pattern.unicode()) {
</del><ins>+        if (disjunction->m_minimumSize && !m_pattern.sticky() && !m_pattern.unicode()) {
</ins><span class="cx">             auto bmInfo = BoyerMooreInfo::create(std::min<unsigned>(disjunction->m_minimumSize, BoyerMooreInfo::maxLength));
</span><span class="cx">             if (collectBoyerMooreInfo(disjunction, currentAlternativeIndex, bmInfo.get())) {
</span><span class="cx">                 m_ops.last().m_bmInfo = bmInfo.ptr();
</span><span class="lines">@@ -3890,13 +3931,10 @@
</span><span class="cx">         // We first collect possible characters for each character position. Then, apply heuristics to extract good character sequence from
</span><span class="cx">         // that and construct fast searching with long stride.
</span><span class="cx"> 
</span><del>-        ASSERT(disjunction->m_hasFixedSize); // We only support fixed-sized lookahead for BoyerMoore search.
</del><span class="cx">         ASSERT(disjunction->m_minimumSize);
</span><span class="cx"> 
</span><span class="cx">         // FIXME: Support nested disjunctions (e.g. /(?:abc|def|g(?:hi|jk))/).
</span><span class="cx">         // https://bugs.webkit.org/show_bug.cgi?id=228614
</span><del>-        // FIXME: Support character-class (e.g. /[\d]test/).
-        // https://bugs.webkit.org/show_bug.cgi?id=228613
</del><span class="cx">         // FIXME: Support non-fixed-sized lookahead (e.g. /.*abc/ and extract "abc" sequence).
</span><span class="cx">         // https://bugs.webkit.org/show_bug.cgi?id=228612
</span><span class="cx">         auto& alternatives = disjunction->m_alternatives;
</span><span class="lines">@@ -3909,20 +3947,46 @@
</span><span class="cx">                 case PatternTerm::Type::AssertionBOL:
</span><span class="cx">                 case PatternTerm::Type::AssertionEOL:
</span><span class="cx">                 case PatternTerm::Type::AssertionWordBoundary:
</span><del>-                case PatternTerm::Type::CharacterClass:
</del><span class="cx">                 case PatternTerm::Type::BackReference:
</span><span class="cx">                 case PatternTerm::Type::ForwardReference:
</span><span class="cx">                 case PatternTerm::Type::ParenthesesSubpattern:
</span><span class="cx">                 case PatternTerm::Type::ParentheticalAssertion:
</span><span class="cx">                 case PatternTerm::Type::DotStarEnclosure:
</span><del>-                    return false;
</del><ins>+                    break;
+                case PatternTerm::Type::CharacterClass: {
+                    if (term.quantityType != QuantifierType::FixedCount || term.quantityMaxCount != 1)
+                        break;
+                    if (term.inputPosition != index)
+                        break;
+                    auto& characterClass = *term.characterClass;
+                    if (term.invert() || characterClass.m_anyCharacter) {
+                        bmInfo.setAll(cursor);
+                        ++cursor;
+                        continue;
+                    }
+                    if (characterClass.m_table) {
+                        bmInfo.setAll(cursor);
+                        ++cursor;
+                        continue;
+                    }
+                    if (!characterClass.m_rangesUnicode.isEmpty())
+                        bmInfo.addRanges(cursor, characterClass.m_rangesUnicode);
+                    if (!characterClass.m_matchesUnicode.isEmpty())
+                        bmInfo.addCharacters(cursor, characterClass.m_matchesUnicode);
+                    if (!characterClass.m_ranges.isEmpty())
+                        bmInfo.addRanges(cursor, characterClass.m_ranges);
+                    if (!characterClass.m_matches.isEmpty())
+                        bmInfo.addCharacters(cursor, characterClass.m_matches);
+                    ++cursor;
+                    continue;
+                }
</ins><span class="cx">                 case PatternTerm::Type::PatternCharacter: {
</span><span class="cx">                     if (term.quantityType != QuantifierType::FixedCount || term.quantityMaxCount != 1)
</span><del>-                        return false;
</del><ins>+                        break;
</ins><span class="cx">                     if (term.inputPosition != index)
</span><del>-                        return false;
</del><ins>+                        break;
</ins><span class="cx">                     if (U16_LENGTH(term.patternCharacter) != 1 && m_decodeSurrogatePairs)
</span><del>-                        return false;
</del><ins>+                        break;
</ins><span class="cx">                     // For case-insesitive compares, non-ascii characters that have different
</span><span class="cx">                     // upper & lower case representations are already converted to a character class.
</span><span class="cx">                     ASSERT(!m_pattern.ignoreCase() || isASCIIAlpha(term.patternCharacter) || isCanonicallyUnique(term.patternCharacter, m_canonicalMode));
</span><span class="lines">@@ -3932,13 +3996,15 @@
</span><span class="cx">                     } else
</span><span class="cx">                         bmInfo.set(cursor, term.patternCharacter);
</span><span class="cx">                     ++cursor;
</span><del>-                    break;
</del><ins>+                    continue;
</ins><span class="cx">                 }
</span><span class="cx">                 }
</span><ins>+                dataLogLnIf(YarrJITInternal::verbose, "Shortening to ", cursor);
+                bmInfo.shortenLength(cursor);
+                break;
</ins><span class="cx">             }
</span><span class="cx">         }
</span><del>-        dataLogLnIf(YarrJITInternal::verbose, "Characters collected");
-        return true;
</del><ins>+        return bmInfo.length();
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     const BoyerMooreBitmap::Map::WordType* getBoyerMooreBitmap(const BoyerMooreBitmap::Map& map)
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreyarrYarrJITh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/yarr/YarrJIT.h (280569 => 280570)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/yarr/YarrJIT.h       2021-08-02 23:39:10 UTC (rev 280569)
+++ trunk/Source/JavaScriptCore/yarr/YarrJIT.h  2021-08-02 23:43:16 UTC (rev 280570)
</span><span class="lines">@@ -61,6 +61,7 @@
</span><span class="cx"> };
</span><span class="cx"> 
</span><span class="cx"> class BoyerMooreBitmap {
</span><ins>+    WTF_MAKE_NONCOPYABLE(BoyerMooreBitmap);
</ins><span class="cx">     WTF_MAKE_FAST_ALLOCATED(BoyerMooreBitmap);
</span><span class="cx"> public:
</span><span class="cx">     static constexpr unsigned mapSize = 128;
</span><span class="lines">@@ -75,6 +76,8 @@
</span><span class="cx"> 
</span><span class="cx">     void add(UChar32 character)
</span><span class="cx">     {
</span><ins>+        if (isAllSet())
+            return;
</ins><span class="cx">         unsigned position = character & mapMask;
</span><span class="cx">         if (position != static_cast<unsigned>(character))
</span><span class="cx">             m_isMaskEffective = true;
</span><span class="lines">@@ -84,6 +87,43 @@
</span><span class="cx">         }
</span><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void addCharacters(const Vector<UChar32>& characters)
+    {
+        if (isAllSet())
+            return;
+        if (characters.size() >= mapSize) {
+            setAll();
+            return;
+        }
+        for (UChar character : characters)
+            add(character);
+    }
+
+    void addRanges(const Vector<CharacterRange>& ranges)
+    {
+        if (ranges.size() >= mapSize) {
+            setAll();
+            return;
+        }
+        for (CharacterRange range : ranges) {
+            if (isAllSet())
+                return;
+            if (static_cast<unsigned>(range.end - range.begin + 1) >= mapSize) {
+                setAll();
+                return;
+            }
+            for (UChar32 character = range.begin; character <= range.end; ++character)
+                add(character);
+        }
+    }
+
+    void setAll()
+    {
+        m_count = mapSize;
+    }
+
+    bool isAllSet() const { return m_count == mapSize; }
+
</ins><span class="cx"> private:
</span><span class="cx">     Map m_map { };
</span><span class="cx">     unsigned m_count { 0 };
</span></span></pre>
</div>
</div>

</body>
</html>