<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[198624] trunk</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/198624">198624</a></dd>
<dt>Author</dt> <dd>msaboff@apple.com</dd>
<dt>Date</dt> <dd>2016-03-24 07:19:37 -0700 (Thu, 24 Mar 2016)</dd>
</dl>

<h3>Log Message</h3>
<pre>[ES6] Greedy unicode RegExp's don't properly backtrack past non BMP characters
https://bugs.webkit.org/show_bug.cgi?id=155829

Reviewed by Saam Barati.

Source/JavaScriptCore:

When we backup when matching part of a unicode pattern, we can't just backup one character.
Instead we need to save our start position before trying to match a character and
restore the position if the match fails.  This was done in other places, but wasn't
done for all greedy types.

Fixed matchGlobal() to properly handle advancing past non BMP characters.

* runtime/RegExpObject.cpp:
(JSC::RegExpObject::matchGlobal):
* runtime/RegExpObjectInlines.h:
(JSC::RegExpObject::advanceStringUnicode):
* yarr/YarrInterpreter.cpp:
(JSC::Yarr::Interpreter::matchCharacterClass):
(JSC::Yarr::Interpreter::matchDisjunction):

LayoutTests:

Added new test cases.

* js/regexp-unicode-expected.txt:
* js/script-tests/regexp-unicode.js:</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkLayoutTestsChangeLog">trunk/LayoutTests/ChangeLog</a></li>
<li><a href="#trunkLayoutTestsjsregexpunicodeexpectedtxt">trunk/LayoutTests/js/regexp-unicode-expected.txt</a></li>
<li><a href="#trunkLayoutTestsjsscripttestsregexpunicodejs">trunk/LayoutTests/js/script-tests/regexp-unicode.js</a></li>
<li><a href="#trunkSourceJavaScriptCoreChangeLog">trunk/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreruntimeRegExpObjectcpp">trunk/Source/JavaScriptCore/runtime/RegExpObject.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreruntimeRegExpObjectInlinesh">trunk/Source/JavaScriptCore/runtime/RegExpObjectInlines.h</a></li>
<li><a href="#trunkSourceJavaScriptCoreyarrYarrInterpretercpp">trunk/Source/JavaScriptCore/yarr/YarrInterpreter.cpp</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkLayoutTestsChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/ChangeLog (198623 => 198624)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/ChangeLog        2016-03-24 13:27:30 UTC (rev 198623)
+++ trunk/LayoutTests/ChangeLog        2016-03-24 14:19:37 UTC (rev 198624)
</span><span class="lines">@@ -1,3 +1,15 @@
</span><ins>+2016-03-24  Michael Saboff  &lt;msaboff@apple.com&gt;
+
+        [ES6] Greedy unicode RegExp's don't properly backtrack past non BMP characters
+        https://bugs.webkit.org/show_bug.cgi?id=155829
+
+        Reviewed by Saam Barati.
+
+        Added new test cases.
+
+        * js/regexp-unicode-expected.txt:
+        * js/script-tests/regexp-unicode.js:
+
</ins><span class="cx"> 2016-03-24  Gyuyoung Kim  &lt;gyuyoung.kim@webkit.org&gt;
</span><span class="cx"> 
</span><span class="cx">         Unreviewed EFL gardening.
</span></span></pre></div>
<a id="trunkLayoutTestsjsregexpunicodeexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/js/regexp-unicode-expected.txt (198623 => 198624)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/js/regexp-unicode-expected.txt        2016-03-24 13:27:30 UTC (rev 198623)
+++ trunk/LayoutTests/js/regexp-unicode-expected.txt        2016-03-24 14:19:37 UTC (rev 198624)
</span><span class="lines">@@ -77,6 +77,11 @@
</span><span class="cx"> PASS &quot;ab𐐨𐐨𐐨c𐨁&quot;.match(/abc|ab𐐀*cd|ab𐐀+c𐨁d|ab𐐀+c𐨁/iu)[0] is &quot;ab𐐨𐐨𐐨c𐨁&quot;
</span><span class="cx"> PASS &quot;ab𐐨𐐨𐐨&quot;.match(/abc|ab𐐨*./u)[0] is &quot;ab𐐨𐐨𐐨&quot;
</span><span class="cx"> PASS &quot;ab𐐨𐐨𐐨&quot;.match(/abc|ab𐐀*./iu)[0] is &quot;ab𐐨𐐨𐐨&quot;
</span><ins>+PASS &quot;𐐀&quot;.match(/a*/u)[0].length is 0
+PASS &quot;𐐀&quot;.match(/a*/ui)[0].length is 0
+PASS &quot;𐐀&quot;.match(/\d*/u)[0].length is 0
+PASS &quot;123𐐀&quot;.match(/\d*/u)[0] is &quot;123&quot;
+PASS &quot;12X3𐐀4&quot;.match(/\d{0,1}/ug) is [&quot;1&quot;, &quot;2&quot;, &quot;&quot;, &quot;3&quot;, &quot;&quot;, &quot;4&quot;, &quot;&quot;]
</ins><span class="cx"> PASS match3[0] is &quot;a𐐐𐐐b&quot;
</span><span class="cx"> PASS match3[1] is undefined.
</span><span class="cx"> PASS match3[2] is &quot;a𐐐𐐐b&quot;
</span></span></pre></div>
<a id="trunkLayoutTestsjsscripttestsregexpunicodejs"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/js/script-tests/regexp-unicode.js (198623 => 198624)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/js/script-tests/regexp-unicode.js        2016-03-24 13:27:30 UTC (rev 198623)
+++ trunk/LayoutTests/js/script-tests/regexp-unicode.js        2016-03-24 14:19:37 UTC (rev 198624)
</span><span class="lines">@@ -113,6 +113,11 @@
</span><span class="cx"> shouldBe('&quot;ab\u{10428}\u{10428}\u{10428}c\u{10a01}&quot;.match(/abc|ab\u{10400}*cd|ab\u{10400}+c\u{10a01}d|ab\u{10400}+c\u{10a01}/iu)[0]', '&quot;ab\u{10428}\u{10428}\u{10428}c\u{10a01}&quot;');
</span><span class="cx"> shouldBe('&quot;ab\u{10428}\u{10428}\u{10428}&quot;.match(/abc|ab\u{10428}*./u)[0]', '&quot;ab\u{10428}\u{10428}\u{10428}&quot;');
</span><span class="cx"> shouldBe('&quot;ab\u{10428}\u{10428}\u{10428}&quot;.match(/abc|ab\u{10400}*./iu)[0]', '&quot;ab\u{10428}\u{10428}\u{10428}&quot;');
</span><ins>+shouldBe('&quot;\u{10400}&quot;.match(/a*/u)[0].length', '0');
+shouldBe('&quot;\u{10400}&quot;.match(/a*/ui)[0].length', '0');
+shouldBe('&quot;\u{10400}&quot;.match(/\\d*/u)[0].length', '0');
+shouldBe('&quot;123\u{10400}&quot;.match(/\\d*/u)[0]', '&quot;123&quot;');
+shouldBe('&quot;12X3\u{10400}4&quot;.match(/\\d{0,1}/ug)', '[&quot;1&quot;, &quot;2&quot;, &quot;&quot;, &quot;3&quot;, &quot;&quot;, &quot;4&quot;, &quot;&quot;]');
</ins><span class="cx"> 
</span><span class="cx"> var re3 = new RegExp(&quot;(a\u{10410}*bc)|(a\u{10410}*b)&quot;, &quot;u&quot;);
</span><span class="cx"> var match3 = &quot;a\u{10410}\u{10410}b&quot;.match(re3);
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ChangeLog (198623 => 198624)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ChangeLog        2016-03-24 13:27:30 UTC (rev 198623)
+++ trunk/Source/JavaScriptCore/ChangeLog        2016-03-24 14:19:37 UTC (rev 198624)
</span><span class="lines">@@ -1,3 +1,25 @@
</span><ins>+2016-03-24  Michael Saboff  &lt;msaboff@apple.com&gt;
+
+        [ES6] Greedy unicode RegExp's don't properly backtrack past non BMP characters
+        https://bugs.webkit.org/show_bug.cgi?id=155829
+
+        Reviewed by Saam Barati.
+
+        When we backup when matching part of a unicode pattern, we can't just backup one character.
+        Instead we need to save our start position before trying to match a character and
+        restore the position if the match fails.  This was done in other places, but wasn't
+        done for all greedy types.
+
+        Fixed matchGlobal() to properly handle advancing past non BMP characters.
+
+        * runtime/RegExpObject.cpp:
+        (JSC::RegExpObject::matchGlobal):
+        * runtime/RegExpObjectInlines.h:
+        (JSC::RegExpObject::advanceStringUnicode):
+        * yarr/YarrInterpreter.cpp:
+        (JSC::Yarr::Interpreter::matchCharacterClass):
+        (JSC::Yarr::Interpreter::matchDisjunction):
+
</ins><span class="cx"> 2016-03-24  Benjamin Poulain  &lt;bpoulain@apple.com&gt;
</span><span class="cx"> 
</span><span class="cx">         [JSC] In some cases, the integer range optimization phase never converges
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreruntimeRegExpObjectcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/runtime/RegExpObject.cpp (198623 => 198624)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/runtime/RegExpObject.cpp        2016-03-24 13:27:30 UTC (rev 198623)
+++ trunk/Source/JavaScriptCore/runtime/RegExpObject.cpp        2016-03-24 14:19:37 UTC (rev 198624)
</span><span class="lines">@@ -191,6 +191,7 @@
</span><span class="cx">     const size_t maximumReasonableMatchSize = 1000000000;
</span><span class="cx"> 
</span><span class="cx">     if (regExp-&gt;unicode()) {
</span><ins>+        unsigned stringLength = s.length();
</ins><span class="cx">         while (result) {
</span><span class="cx">             if (list.size() &gt; maximumReasonableMatchSize) {
</span><span class="cx">                 throwOutOfMemoryError(exec);
</span><span class="lines">@@ -201,7 +202,7 @@
</span><span class="cx">             size_t length = end - result.start;
</span><span class="cx">             list.append(jsSubstring(exec, s, result.start, length));
</span><span class="cx">             if (!length)
</span><del>-                end = advanceStringUnicode(s, length, end);
</del><ins>+                end = advanceStringUnicode(s, stringLength, end);
</ins><span class="cx">             result = regExpConstructor-&gt;performMatch(*vm, regExp, string, s, end);
</span><span class="cx">         }
</span><span class="cx">     } else {
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreruntimeRegExpObjectInlinesh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/runtime/RegExpObjectInlines.h (198623 => 198624)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/runtime/RegExpObjectInlines.h        2016-03-24 13:27:30 UTC (rev 198623)
+++ trunk/Source/JavaScriptCore/runtime/RegExpObjectInlines.h        2016-03-24 14:19:37 UTC (rev 198624)
</span><span class="lines">@@ -117,7 +117,7 @@
</span><span class="cx">     if (first &lt; 0xD800 || first &gt; 0xDBFF)
</span><span class="cx">         return currentIndex + 1;
</span><span class="cx"> 
</span><del>-    UChar second = s[currentIndex];
</del><ins>+    UChar second = s[currentIndex + 1];
</ins><span class="cx">     if (second &lt; 0xDC00 || second &gt; 0xDFFF)
</span><span class="cx">         return currentIndex + 1;
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreyarrYarrInterpretercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/yarr/YarrInterpreter.cpp (198623 => 198624)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/yarr/YarrInterpreter.cpp        2016-03-24 13:27:30 UTC (rev 198623)
+++ trunk/Source/JavaScriptCore/yarr/YarrInterpreter.cpp        2016-03-24 14:19:37 UTC (rev 198624)
</span><span class="lines">@@ -507,14 +507,16 @@
</span><span class="cx">         }
</span><span class="cx"> 
</span><span class="cx">         case QuantifierGreedy: {
</span><del>-            backTrack-&gt;begin = input.getPos();
</del><ins>+            unsigned position = input.getPos();
+            backTrack-&gt;begin = position;
</ins><span class="cx">             unsigned matchAmount = 0;
</span><span class="cx">             while ((matchAmount &lt; term.atom.quantityCount) &amp;&amp; input.checkInput(1)) {
</span><span class="cx">                 if (!checkCharacterClass(term.atom.characterClass, term.invert(), term.inputPosition + 1)) {
</span><del>-                    input.uncheckInput(1);
</del><ins>+                    input.setPos(position);
</ins><span class="cx">                     break;
</span><span class="cx">                 }
</span><span class="cx">                 ++matchAmount;
</span><ins>+                position = input.getPos();
</ins><span class="cx">             }
</span><span class="cx">             backTrack-&gt;matchAmount = matchAmount;
</span><span class="cx"> 
</span><span class="lines">@@ -1242,12 +1244,14 @@
</span><span class="cx">         case ByteTerm::TypePatternCharacterGreedy: {
</span><span class="cx">             BackTrackInfoPatternCharacter* backTrack = reinterpret_cast&lt;BackTrackInfoPatternCharacter*&gt;(context-&gt;frame + currentTerm().frameLocation);
</span><span class="cx">             unsigned matchAmount = 0;
</span><ins>+            unsigned position = input.getPos(); // May need to back out reading a surrogate pair.
</ins><span class="cx">             while ((matchAmount &lt; currentTerm().atom.quantityCount) &amp;&amp; input.checkInput(1)) {
</span><span class="cx">                 if (!checkCharacter(currentTerm().atom.patternCharacter, currentTerm().inputPosition + 1)) {
</span><del>-                    input.uncheckInput(1);
</del><ins>+                    input.setPos(position);
</ins><span class="cx">                     break;
</span><span class="cx">                 }
</span><span class="cx">                 ++matchAmount;
</span><ins>+                position = input.getPos();
</ins><span class="cx">             }
</span><span class="cx">             backTrack-&gt;matchAmount = matchAmount;
</span><span class="cx"> 
</span></span></pre>
</div>
</div>

</body>
</html>