<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[181107] trunk</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/181107">181107</a></dd>
<dt>Author</dt> <dd>benjamin@webkit.org</dd>
<dt>Date</dt> <dd>2015-03-05 15:47:21 -0800 (Thu, 05 Mar 2015)</dd>
</dl>

<h3>Log Message</h3>
<pre>Add basic support for character sets to the URL Filter parser
https://bugs.webkit.org/show_bug.cgi?id=142257

Patch by Benjamin Poulain &lt;bpoulain@apple.com&gt; on 2015-03-05
Reviewed by Alex Christensen.

Source/WebCore:

This patch is a first step toward making the URL filter parser a bit
more powerful: it adds support for character set atom.

I did not attempt to integrate that into the prefix tree in this patch,
instead, the GraphBuilder gets a two modes of generating the NFA:
PrefixTree and DirectGeneration.

As long as we only see trivial atoms, we use the PrefixTree generation
to minimize the number of nodes we need. As soon as we start a character
class, we switch to DirectGeneration and we generate the NFA from the input
without merging with previously seen patterns.

To differentiate between Trivial atoms and CharacterSet, we also gain
an AtomType state.

The character set themself are very simple: each character is represented by
a bit in a 16bytes bit vector.

* contentextensions/URLFilterParser.cpp:
(WebCore::ContentExtensions::quantifyTrivialAtom):
(WebCore::ContentExtensions::GraphBuilder::atomPatternCharacter):
(WebCore::ContentExtensions::GraphBuilder::atomBuiltInCharacterClass):
(WebCore::ContentExtensions::GraphBuilder::quantifyAtom):
(WebCore::ContentExtensions::GraphBuilder::atomBackReference):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassBegin):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassAtom):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassRange):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassEnd):
(WebCore::ContentExtensions::GraphBuilder::atomCharacterClassBuiltIn):
(WebCore::ContentExtensions::GraphBuilder::sinkAtom):
(WebCore::ContentExtensions::GraphBuilder::generateTransition):
(WebCore::ContentExtensions::GraphBuilder::sinkTrivialAtom):
(WebCore::ContentExtensions::GraphBuilder::sinkCharacterSet):
(WebCore::ContentExtensions::GraphBuilder::sinkPendingAtomIfNecessary):

LayoutTests:

* http/tests/usercontentfilter/character-set-basic-support-expected.txt: Added.
* http/tests/usercontentfilter/character-set-basic-support.html: Added.
* http/tests/usercontentfilter/character-set-basic-support.html.json: Added.
* http/tests/usercontentfilter/resources/url-blocking-test.js: Added.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkLayoutTestsChangeLog">trunk/LayoutTests/ChangeLog</a></li>
<li><a href="#trunkSourceWebCoreChangeLog">trunk/Source/WebCore/ChangeLog</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsURLFilterParsercpp">trunk/Source/WebCore/contentextensions/URLFilterParser.cpp</a></li>
</ul>

<h3>Added Paths</h3>
<ul>
<li><a href="#trunkLayoutTestshttptestsusercontentfiltercharactersetbasicsupportexpectedtxt">trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support-expected.txt</a></li>
<li><a href="#trunkLayoutTestshttptestsusercontentfiltercharactersetbasicsupporthtml">trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html</a></li>
<li><a href="#trunkLayoutTestshttptestsusercontentfiltercharactersetbasicsupporthtmljson">trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html.json</a></li>
<li>trunk/LayoutTests/http/tests/usercontentfilter/resources/</li>
<li><a href="#trunkLayoutTestshttptestsusercontentfilterresourcesurlblockingtestjs">trunk/LayoutTests/http/tests/usercontentfilter/resources/url-blocking-test.js</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkLayoutTestsChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/ChangeLog (181106 => 181107)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/ChangeLog        2015-03-05 23:37:31 UTC (rev 181106)
+++ trunk/LayoutTests/ChangeLog        2015-03-05 23:47:21 UTC (rev 181107)
</span><span class="lines">@@ -1,3 +1,15 @@
</span><ins>+2015-03-05  Benjamin Poulain  &lt;bpoulain@apple.com&gt;
+
+        Add basic support for character sets to the URL Filter parser
+        https://bugs.webkit.org/show_bug.cgi?id=142257
+
+        Reviewed by Alex Christensen.
+
+        * http/tests/usercontentfilter/character-set-basic-support-expected.txt: Added.
+        * http/tests/usercontentfilter/character-set-basic-support.html: Added.
+        * http/tests/usercontentfilter/character-set-basic-support.html.json: Added.
+        * http/tests/usercontentfilter/resources/url-blocking-test.js: Added.
+
</ins><span class="cx"> 2015-03-05  Chris Dumez  &lt;cdumez@apple.com&gt;
</span><span class="cx"> 
</span><span class="cx">         Regression(r173761): ASSERTION FAILED: !is8Bit() in StringImpl::characters16()
</span></span></pre></div>
<a id="trunkLayoutTestshttptestsusercontentfiltercharactersetbasicsupportexpectedtxt"></a>
<div class="addfile"><h4>Added: trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support-expected.txt (0 => 181107)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support-expected.txt                                (rev 0)
+++ trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support-expected.txt        2015-03-05 23:47:21 UTC (rev 181107)
</span><span class="lines">@@ -0,0 +1,23 @@
</span><ins>+Test basic cases of filter using a character set (e.g. [a-z]).
+
+On success, you will see a series of &quot;PASS&quot; messages, followed by &quot;TEST COMPLETE&quot;.
+
+
+URL: http://127.0.0.1:8000/resources/square100.png is not blocked.
+PASS isBlocked is false
+URL: http://127.0.0.1:8000/resources/square100.png?casesEnsitive is not blocked.
+PASS isBlocked is false
+URL: http://127.0.0.1:8000/resources/square100.png?casesenSitive is not blocked.
+PASS isBlocked is false
+URL: http://127.0.0.1:8000/resources/square100.png?BaSiC-TeSt-cAsE is blocked.
+PASS isBlocked is true
+URL: http://127.0.0.1:8000/resources/square100.png?any-scheme-matcher is blocked.
+PASS isBlocked is true
+URL: http://127.0.0.1:8000/resources/square100.png?casesensitive is blocked.
+PASS isBlocked is true
+URL: http://127.0.0.1:8000/resources/square100.png?caseSeNsitive is blocked.
+PASS isBlocked is true
+PASS successfullyParsed is true
+
+TEST COMPLETE
+
</ins></span></pre></div>
<a id="trunkLayoutTestshttptestsusercontentfiltercharactersetbasicsupporthtml"></a>
<div class="addfile"><h4>Added: trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html (0 => 181107)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html                                (rev 0)
+++ trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html        2015-03-05 23:47:21 UTC (rev 181107)
</span><span class="lines">@@ -0,0 +1,23 @@
</span><ins>+&lt;script src=&quot;/resources/js-test-pre.js&quot;&gt;&lt;/script&gt;
+&lt;script src=&quot;resources/url-blocking-test.js&quot;&gt;&lt;/script&gt;
+&lt;script&gt;
+window.jsTestIsAsync = true;
+
+description(&quot;Test basic cases of filter using a character set (e.g. [a-z]).&quot;);
+
+validTestSet = [
+    &quot;http://127.0.0.1:8000/resources/square100.png&quot;,
+    &quot;http://127.0.0.1:8000/resources/square100.png?casesEnsitive&quot;,
+    &quot;http://127.0.0.1:8000/resources/square100.png?casesenSitive&quot;,
+];
+
+blockedTestSet = [
+    &quot;http://127.0.0.1:8000/resources/square100.png?BaSiC-TeSt-cAsE&quot;,
+    &quot;http://127.0.0.1:8000/resources/square100.png?any-scheme-matcher&quot;,
+    &quot;http://127.0.0.1:8000/resources/square100.png?casesensitive&quot;,
+    &quot;http://127.0.0.1:8000/resources/square100.png?caseSeNsitive&quot;,
+];
+
+runBlockingTest(validTestSet, blockedTestSet);
+&lt;/script&gt;
+&lt;script src=&quot;/resources/js-test-post.js&quot;&gt;&lt;/script&gt;
</ins><span class="cx">\ No newline at end of file
</span></span></pre></div>
<a id="trunkLayoutTestshttptestsusercontentfiltercharactersetbasicsupporthtmljson"></a>
<div class="addfile"><h4>Added: trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html.json (0 => 181107)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html.json                                (rev 0)
+++ trunk/LayoutTests/http/tests/usercontentfilter/character-set-basic-support.html.json        2015-03-05 23:47:21 UTC (rev 181107)
</span><span class="lines">@@ -0,0 +1,27 @@
</span><ins>+[
+    {
+        &quot;action&quot;: {
+            &quot;type&quot;: &quot;block&quot;
+        },
+        &quot;trigger&quot;: {
+            &quot;url-filter&quot;: &quot;.*\\?[b][a][s][i][c]-test-case&quot;
+        }
+    },
+    {
+        &quot;action&quot;: {
+            &quot;type&quot;: &quot;block&quot;
+        },
+        &quot;trigger&quot;: {
+            &quot;url-filter&quot;: &quot;[a-zA-Z][a-zA-Z0-9+-.]*://127\\.0\\.0\\.1:8000/resources/square100\\.png\\?any-scheme-matcher&quot;
+        }
+    },
+    {
+        &quot;action&quot;: {
+            &quot;type&quot;: &quot;block&quot;
+        },
+        &quot;trigger&quot;: {
+            &quot;url-filter&quot;: &quot;.*case[Ss][e][nN][s]itive&quot;,
+            &quot;url-filter-is-case-sensitive&quot;: true
+        }
+    }
+]
</ins></span></pre></div>
<a id="trunkLayoutTestshttptestsusercontentfilterresourcesurlblockingtestjs"></a>
<div class="addfile"><h4>Added: trunk/LayoutTests/http/tests/usercontentfilter/resources/url-blocking-test.js (0 => 181107)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/http/tests/usercontentfilter/resources/url-blocking-test.js                                (rev 0)
+++ trunk/LayoutTests/http/tests/usercontentfilter/resources/url-blocking-test.js        2015-03-05 23:47:21 UTC (rev 181107)
</span><span class="lines">@@ -0,0 +1,62 @@
</span><ins>+function runBlockingTest(validTestSet, blockedTestSet) {
+    function getUrlIterator(validTestSet, blockedTestSet) {
+        var validTestSetIndex = 0;
+        var blockedTestSetIndex = 0;
+        return function() {
+            if (validTestSetIndex &lt; validTestSet.length)
+                return { url:validTestSet[validTestSetIndex++], expectBlock:false};
+            if (blockedTestSetIndex &lt; blockedTestSet.length)
+                return { url:blockedTestSet[blockedTestSetIndex++], expectBlock:true};
+            return;
+        }
+    }
+
+    function tryLoadingURL(testCase, testFunction) {
+        var request = new XMLHttpRequest;
+        request.open(&quot;GET&quot;, testCase.url, true);
+        request.testCase = testCase;
+        request.timeout = 50;
+
+        var timeoutId = setTimeout( function() { testFunction(&quot;timeout&quot;, request); }, 50);
+
+        request.addEventListener(&quot;readystatechange&quot;, function() { testFunction(&quot;readystatechange&quot;, request, timeoutId); });
+        request.addEventListener(&quot;error&quot;, function() { testFunction(&quot;error&quot;, request, timeoutId); });
+        request.addEventListener(&quot;timeout&quot;, function() { testFunction(&quot;timeout&quot;, request, timeoutId); });
+        request.send();
+    }
+
+    function testFunction(eventType, request, timeoutId)
+    {
+        isBlocked = true;
+        if (eventType === &quot;error&quot; || eventType === &quot;timeout&quot;)
+            debug(&quot;URL: &quot; + request.testCase.url + &quot; is blocked.&quot;);
+        else if (eventType == &quot;readystatechange&quot;) {
+            if (request.readyState == XMLHttpRequest.HEADERS_RECEIVED) {
+                isBlocked = false;
+                debug(&quot;URL: &quot; + request.testCase.url + &quot; is not blocked.&quot;);
+            } else
+                return;
+        }
+
+        if (request.testCase.expectBlock)
+            shouldBeTrue(&quot;isBlocked&quot;);
+        else
+            shouldBeFalse(&quot;isBlocked&quot;);
+
+        if (timeoutId !== undefined)
+            clearTimeout(timeoutId);
+
+        runNextTest();
+    }
+
+    var urlIterator = getUrlIterator(validTestSet, blockedTestSet);
+    function runNextTest() {
+        nextCase = urlIterator();
+        if (nextCase)
+            tryLoadingURL(nextCase, testFunction);
+        else
+            finishJSTest();
+    }
+
+    runNextTest();
+}
</ins><span class="cx">\ No newline at end of file
</span></span></pre></div>
<a id="trunkSourceWebCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/ChangeLog (181106 => 181107)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/ChangeLog        2015-03-05 23:37:31 UTC (rev 181106)
+++ trunk/Source/WebCore/ChangeLog        2015-03-05 23:47:21 UTC (rev 181107)
</span><span class="lines">@@ -1,3 +1,45 @@
</span><ins>+2015-03-05  Benjamin Poulain  &lt;bpoulain@apple.com&gt;
+
+        Add basic support for character sets to the URL Filter parser
+        https://bugs.webkit.org/show_bug.cgi?id=142257
+
+        Reviewed by Alex Christensen.
+
+        This patch is a first step toward making the URL filter parser a bit
+        more powerful: it adds support for character set atom.
+
+        I did not attempt to integrate that into the prefix tree in this patch,
+        instead, the GraphBuilder gets a two modes of generating the NFA:
+        PrefixTree and DirectGeneration.
+
+        As long as we only see trivial atoms, we use the PrefixTree generation
+        to minimize the number of nodes we need. As soon as we start a character
+        class, we switch to DirectGeneration and we generate the NFA from the input
+        without merging with previously seen patterns.
+
+        To differentiate between Trivial atoms and CharacterSet, we also gain
+        an AtomType state.
+
+        The character set themself are very simple: each character is represented by
+        a bit in a 16bytes bit vector.
+
+        * contentextensions/URLFilterParser.cpp:
+        (WebCore::ContentExtensions::quantifyTrivialAtom):
+        (WebCore::ContentExtensions::GraphBuilder::atomPatternCharacter):
+        (WebCore::ContentExtensions::GraphBuilder::atomBuiltInCharacterClass):
+        (WebCore::ContentExtensions::GraphBuilder::quantifyAtom):
+        (WebCore::ContentExtensions::GraphBuilder::atomBackReference):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassBegin):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassAtom):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassRange):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassEnd):
+        (WebCore::ContentExtensions::GraphBuilder::atomCharacterClassBuiltIn):
+        (WebCore::ContentExtensions::GraphBuilder::sinkAtom):
+        (WebCore::ContentExtensions::GraphBuilder::generateTransition):
+        (WebCore::ContentExtensions::GraphBuilder::sinkTrivialAtom):
+        (WebCore::ContentExtensions::GraphBuilder::sinkCharacterSet):
+        (WebCore::ContentExtensions::GraphBuilder::sinkPendingAtomIfNecessary):
+
</ins><span class="cx"> 2015-03-05  Roger Fong  &lt;roger_fong@apple.com&gt;
</span><span class="cx"> 
</span><span class="cx">         Minor touchups to inline media control icons.
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsURLFilterParsercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/URLFilterParser.cpp (181106 => 181107)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/URLFilterParser.cpp        2015-03-05 23:37:31 UTC (rev 181106)
+++ trunk/Source/WebCore/contentextensions/URLFilterParser.cpp        2015-03-05 23:47:21 UTC (rev 181107)
</span><span class="lines">@@ -30,6 +30,7 @@
</span><span class="cx"> 
</span><span class="cx"> #include &quot;NFA.h&quot;
</span><span class="cx"> #include &lt;JavaScriptCore/YarrParser.h&gt;
</span><ins>+#include &lt;wtf/BitVector.h&gt;
</ins><span class="cx"> 
</span><span class="cx"> namespace WebCore {
</span><span class="cx"> 
</span><span class="lines">@@ -50,30 +51,47 @@
</span><span class="cx">     return static_cast&lt;uint16_t&gt;(toASCIILower(character)) | caseInsensitiveMask;
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-enum class TrivialAtomQuantifier : uint16_t {
</del><ins>+enum class AtomQuantifier : uint16_t {
+    One = 0,
</ins><span class="cx">     ZeroOrOne = 0x1000,
</span><del>-    ZeroToMany = 0x2000,
-    OneToMany = 0x4000
</del><ins>+    ZeroOrMore = 0x2000,
+    OneOrMore = 0x4000
</ins><span class="cx"> };
</span><span class="cx"> 
</span><del>-static void quantifyTrivialAtom(TrivialAtom&amp; trivialAtom, TrivialAtomQuantifier quantifier)
</del><ins>+static void quantifyTrivialAtom(TrivialAtom&amp; trivialAtom, AtomQuantifier quantifier)
</ins><span class="cx"> {
</span><span class="cx">     ASSERT(trivialAtom &amp; (hasNonCharacterMask | characterMask));
</span><span class="cx">     ASSERT(!(trivialAtom &amp; 0xf000));
</span><span class="cx">     trivialAtom |= static_cast&lt;uint16_t&gt;(quantifier);
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+static AtomQuantifier trivialAtomQuantifier(TrivialAtom trivialAtom)
+{
+    switch (trivialAtom &amp; 0xf000) {
+    case static_cast&lt;unsigned&gt;(AtomQuantifier::One):
+        return AtomQuantifier::One;
+    case static_cast&lt;unsigned&gt;(AtomQuantifier::ZeroOrOne):
+        return AtomQuantifier::ZeroOrOne;
+    case static_cast&lt;unsigned&gt;(AtomQuantifier::ZeroOrMore):
+        return AtomQuantifier::ZeroOrMore;
+    case static_cast&lt;unsigned&gt;(AtomQuantifier::OneOrMore):
+        return AtomQuantifier::OneOrMore;
+    }
+    ASSERT_NOT_REACHED();
+    return AtomQuantifier::One;
+}
+
</ins><span class="cx"> static TrivialAtom trivialAtomForNewlineClassIDBuiltin()
</span><span class="cx"> {
</span><span class="cx">     return hasNonCharacterMask | newlineClassIDBuiltinMask;
</span><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> class GraphBuilder {
</span><del>-private:
</del><span class="cx">     struct BoundedSubGraph {
</span><span class="cx">         unsigned start;
</span><span class="cx">         unsigned end;
</span><span class="cx">     };
</span><ins>+
</ins><span class="cx"> public:
</span><span class="cx">     GraphBuilder(NFA&amp; nfa, PrefixTreeEntry&amp; prefixTreeRoot, bool patternIsCaseSensitive, uint64_t patternId)
</span><span class="cx">         : m_nfa(nfa)
</span><span class="lines">@@ -104,21 +122,21 @@
</span><span class="cx"> 
</span><span class="cx">     void atomPatternCharacter(UChar character)
</span><span class="cx">     {
</span><ins>+        if (hasError())
+            return;
+
</ins><span class="cx">         if (!isASCII(character)) {
</span><span class="cx">             fail(ASCIILiteral(&quot;Only ASCII characters are supported in pattern.&quot;));
</span><span class="cx">             return;
</span><span class="cx">         }
</span><span class="cx"> 
</span><del>-        if (hasError())
-            return;
-
</del><span class="cx">         sinkPendingAtomIfNecessary();
</span><ins>+        ASSERT(m_floatingAtomType == FloatingAtomType::Invalid);
+        ASSERT(!m_pendingTrivialAtom);
</ins><span class="cx"> 
</span><span class="cx">         char asciiChararacter = static_cast&lt;char&gt;(character);
</span><del>-        m_hasValidAtom = true;
-
-        ASSERT(m_lastPrefixTreeEntry);
</del><span class="cx">         m_pendingTrivialAtom = trivialAtomFromASCIICharacter(asciiChararacter, m_patternIsCaseSensitive);
</span><ins>+        m_floatingAtomType = FloatingAtomType::Trivial;
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void atomBuiltInCharacterClass(JSC::Yarr::BuiltInCharacterClassID builtInCharacterClassID, bool inverted)
</span><span class="lines">@@ -127,11 +145,12 @@
</span><span class="cx">             return;
</span><span class="cx"> 
</span><span class="cx">         sinkPendingAtomIfNecessary();
</span><ins>+        ASSERT(m_floatingAtomType == FloatingAtomType::Invalid);
+        ASSERT(!m_pendingTrivialAtom);
</ins><span class="cx"> 
</span><span class="cx">         if (builtInCharacterClassID == JSC::Yarr::NewlineClassID &amp;&amp; inverted) {
</span><del>-            m_hasValidAtom = true;
-            ASSERT(m_lastPrefixTreeEntry);
</del><span class="cx">             m_pendingTrivialAtom = trivialAtomForNewlineClassIDBuiltin();
</span><ins>+            m_floatingAtomType = FloatingAtomType::Trivial;
</ins><span class="cx">         } else
</span><span class="cx">             fail(ASCIILiteral(&quot;Character class is not supported.&quot;));
</span><span class="cx">     }
</span><span class="lines">@@ -141,34 +160,41 @@
</span><span class="cx">         if (hasError())
</span><span class="cx">             return;
</span><span class="cx"> 
</span><del>-        ASSERT(m_hasValidAtom);
-        if (!m_hasValidAtom) {
</del><ins>+        switch (m_floatingAtomType) {
+        case FloatingAtomType::Invalid:
</ins><span class="cx">             fail(ASCIILiteral(&quot;Quantifier without corresponding atom to quantify.&quot;));
</span><del>-            return;
</del><ins>+            break;
+
+        case FloatingAtomType::Trivial:
+            if (!minimum &amp;&amp; maximum == 1)
+                quantifyTrivialAtom(m_pendingTrivialAtom, AtomQuantifier::ZeroOrOne);
+            else if (!minimum &amp;&amp; maximum == JSC::Yarr::quantifyInfinite)
+                quantifyTrivialAtom(m_pendingTrivialAtom, AtomQuantifier::ZeroOrMore);
+            else if (minimum == 1 &amp;&amp; maximum == JSC::Yarr::quantifyInfinite)
+                quantifyTrivialAtom(m_pendingTrivialAtom, AtomQuantifier::OneOrMore);
+            else
+                fail(ASCIILiteral(&quot;Arbitrary atom repetitions are not supported.&quot;));
+            break;
+        case FloatingAtomType::CharacterSet: {
+            ASSERT(m_characterSetQuantifier == AtomQuantifier::One);
+            if (!minimum &amp;&amp; maximum == 1)
+                m_characterSetQuantifier = AtomQuantifier::ZeroOrOne;
+            else if (!minimum &amp;&amp; maximum == JSC::Yarr::quantifyInfinite)
+                m_characterSetQuantifier = AtomQuantifier::ZeroOrMore;
+            else if (minimum == 1 &amp;&amp; maximum == JSC::Yarr::quantifyInfinite)
+                m_characterSetQuantifier = AtomQuantifier::OneOrMore;
+            else
+                fail(ASCIILiteral(&quot;Arbitrary character set repetitions are not supported.&quot;));
+            break;
</ins><span class="cx">         }
</span><del>-
-        ASSERT(m_lastPrefixTreeEntry);
-        if (!minimum &amp;&amp; maximum == 1)
-            quantifyTrivialAtom(m_pendingTrivialAtom, TrivialAtomQuantifier::ZeroOrOne);
-        else if (!minimum &amp;&amp; maximum == JSC::Yarr::quantifyInfinite)
-            quantifyTrivialAtom(m_pendingTrivialAtom, TrivialAtomQuantifier::ZeroToMany);
-        else if (minimum == 1 &amp;&amp; maximum == JSC::Yarr::quantifyInfinite)
-            quantifyTrivialAtom(m_pendingTrivialAtom, TrivialAtomQuantifier::OneToMany);
-        else
-            fail(ASCIILiteral(&quot;Arbitrary atom repetitions are not supported.&quot;));
</del><ins>+        }
</ins><span class="cx">     }
</span><span class="cx"> 
</span><del>-    NO_RETURN_DUE_TO_ASSERT void atomBackReference(unsigned)
</del><ins>+    void atomBackReference(unsigned)
</ins><span class="cx">     {
</span><span class="cx">         fail(ASCIILiteral(&quot;Patterns cannot contain backreferences.&quot;));
</span><del>-        ASSERT_NOT_REACHED();
</del><span class="cx">     }
</span><span class="cx"> 
</span><del>-    void atomCharacterClassAtom(UChar)
-    {
-        fail(ASCIILiteral(&quot;Character class atoms are not supported yet.&quot;));
-    }
-
</del><span class="cx">     void assertionBOL()
</span><span class="cx">     {
</span><span class="cx">         fail(ASCIILiteral(&quot;Line boundary assertions are not supported yet.&quot;));
</span><span class="lines">@@ -184,26 +210,62 @@
</span><span class="cx">         fail(ASCIILiteral(&quot;Word boundaries assertions are not supported yet.&quot;));
</span><span class="cx">     }
</span><span class="cx"> 
</span><del>-    void atomCharacterClassBegin(bool = false)
</del><ins>+    void atomCharacterClassBegin(bool inverted = false)
</ins><span class="cx">     {
</span><del>-        fail(ASCIILiteral(&quot;Character class atoms are not supported yet.&quot;));
</del><ins>+        if (hasError())
+            return;
+
+        sinkPendingAtomIfNecessary();
+
+        ASSERT_WITH_MESSAGE(!m_pendingCharacterSet.bitCount(), &quot;We should not have nested character classes.&quot;);
+        ASSERT(m_floatingAtomType == FloatingAtomType::Invalid);
+
+        m_buildMode = BuildMode::DirectGeneration;
+        m_lastPrefixTreeEntry = nullptr;
+
+        m_isInvertedCharacterSet = inverted;
+        m_floatingAtomType = FloatingAtomType::CharacterSet;
</ins><span class="cx">     }
</span><span class="cx"> 
</span><del>-    void atomCharacterClassRange(UChar, UChar)
</del><ins>+    void atomCharacterClassAtom(UChar character)
</ins><span class="cx">     {
</span><del>-        fail(ASCIILiteral(&quot;Character class ranges are not supported yet.&quot;));
</del><ins>+        if (hasError())
+            return;
+
+        ASSERT(m_floatingAtomType == FloatingAtomType::CharacterSet);
+
+        if (!isASCII(character)) {
+            fail(ASCIILiteral(&quot;Non ASCII Character in a character set.&quot;));
+            return;
+        }
+        m_pendingCharacterSet.set(character);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><del>-    void atomCharacterClassBuiltIn(JSC::Yarr::BuiltInCharacterClassID, bool)
</del><ins>+    void atomCharacterClassRange(UChar a, UChar b)
</ins><span class="cx">     {
</span><del>-        fail(ASCIILiteral(&quot;Buildins character class atoms are not supported yet.&quot;));
</del><ins>+        if (hasError())
+            return;
+
+        ASSERT(m_floatingAtomType == FloatingAtomType::CharacterSet);
+
+        if (!a || !b || !isASCII(a) || !isASCII(b)) {
+            fail(ASCIILiteral(&quot;Non ASCII Character in a character range of a character set.&quot;));
+            return;
+        }
+        for (unsigned i = a; i &lt;= b; ++i)
+            m_pendingCharacterSet.set(i);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     void atomCharacterClassEnd()
</span><span class="cx">     {
</span><del>-        fail(ASCIILiteral(&quot;Character class are not supported yet.&quot;));
</del><ins>+        // Nothing to do here. The character set atom may have a quantifier, we sink the atom lazily.
</ins><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    void atomCharacterClassBuiltIn(JSC::Yarr::BuiltInCharacterClassID, bool)
+    {
+        fail(ASCIILiteral(&quot;Builtins character class atoms are not supported yet.&quot;));
+    }
+
</ins><span class="cx">     void atomParenthesesSubpatternBegin(bool = true)
</span><span class="cx">     {
</span><span class="cx">         fail(ASCIILiteral(&quot;Groups are not supported yet.&quot;));
</span><span class="lines">@@ -241,38 +303,28 @@
</span><span class="cx">         m_errorMessage = errorMessage;
</span><span class="cx">     }
</span><span class="cx"> 
</span><del>-    void generateTransition(TrivialAtom trivialAtom, unsigned source, unsigned target)
</del><ins>+    BoundedSubGraph sinkAtom(std::function&lt;void(unsigned, unsigned)&gt; transitionFunction, AtomQuantifier quantifier, unsigned start)
</ins><span class="cx">     {
</span><del>-        if (trivialAtom &amp; hasNonCharacterMask) {
-            ASSERT(trivialAtom &amp; newlineClassIDBuiltinMask);
-            m_nfa.addTransitionsOnAnyCharacter(source, target);
-        } else {
-            if (trivialAtom &amp; caseInsensitiveMask) {
-                char character = static_cast&lt;char&gt;(trivialAtom &amp; characterMask);
-                m_nfa.addTransition(source, target, character);
-                m_nfa.addTransition(source, target, toASCIIUpper(character));
-            } else
-                m_nfa.addTransition(source, target, static_cast&lt;char&gt;(trivialAtom &amp; characterMask));
</del><ins>+        switch (quantifier) {
+        case AtomQuantifier::One: {
+            unsigned newEnd = m_nfa.createNode();
+            m_nfa.addRuleId(newEnd, m_patternId);
+            transitionFunction(start, newEnd);
+            return { start, newEnd };
</ins><span class="cx">         }
</span><del>-    }
-
-    BoundedSubGraph sinkTrivialAtom(TrivialAtom trivialAtom, unsigned start)
-    {
-        if (trivialAtom &amp; static_cast&lt;uint16_t&gt;(TrivialAtomQuantifier::ZeroOrOne)) {
</del><ins>+        case AtomQuantifier::ZeroOrOne: {
</ins><span class="cx">             unsigned newEnd = m_nfa.createNode();
</span><span class="cx">             m_nfa.addRuleId(newEnd, m_patternId);
</span><del>-            generateTransition(trivialAtom, start, newEnd);
-            m_nfa.addEpsilonTransition(start, newEnd);
</del><ins>+            transitionFunction(start, newEnd);
</ins><span class="cx">             return { start, newEnd };
</span><span class="cx">         }
</span><del>-
-        if (trivialAtom &amp; static_cast&lt;uint16_t&gt;(TrivialAtomQuantifier::ZeroToMany)) {
</del><ins>+        case AtomQuantifier::ZeroOrMore: {
</ins><span class="cx">             unsigned repeatStart = m_nfa.createNode();
</span><span class="cx">             m_nfa.addRuleId(repeatStart, m_patternId);
</span><span class="cx">             unsigned repeatEnd = m_nfa.createNode();
</span><span class="cx">             m_nfa.addRuleId(repeatEnd, m_patternId);
</span><span class="cx"> 
</span><del>-            generateTransition(trivialAtom, repeatStart, repeatEnd);
</del><ins>+            transitionFunction(repeatStart, repeatEnd);
</ins><span class="cx">             m_nfa.addEpsilonTransition(repeatEnd, repeatStart);
</span><span class="cx"> 
</span><span class="cx">             m_nfa.addEpsilonTransition(start, repeatStart);
</span><span class="lines">@@ -283,14 +335,13 @@
</span><span class="cx">             m_nfa.addEpsilonTransition(start, kleenEnd);
</span><span class="cx">             return { start, kleenEnd };
</span><span class="cx">         }
</span><del>-
-        if (trivialAtom &amp; static_cast&lt;uint16_t&gt;(TrivialAtomQuantifier::OneToMany)) {
</del><ins>+        case AtomQuantifier::OneOrMore: {
</ins><span class="cx">             unsigned repeatStart = m_nfa.createNode();
</span><span class="cx">             m_nfa.addRuleId(repeatStart, m_patternId);
</span><span class="cx">             unsigned repeatEnd = m_nfa.createNode();
</span><span class="cx">             m_nfa.addRuleId(repeatEnd, m_patternId);
</span><span class="cx"> 
</span><del>-            generateTransition(trivialAtom, repeatStart, repeatEnd);
</del><ins>+            transitionFunction(repeatStart, repeatEnd);
</ins><span class="cx">             m_nfa.addEpsilonTransition(repeatEnd, repeatStart);
</span><span class="cx"> 
</span><span class="cx">             m_nfa.addEpsilonTransition(start, repeatStart);
</span><span class="lines">@@ -300,45 +351,133 @@
</span><span class="cx">             m_nfa.addEpsilonTransition(repeatEnd, afterRepeat);
</span><span class="cx">             return { start, afterRepeat };
</span><span class="cx">         }
</span><ins>+        }
+    }
</ins><span class="cx"> 
</span><del>-        unsigned newEnd = m_nfa.createNode();
-        m_nfa.addRuleId(newEnd, m_patternId);
-        generateTransition(trivialAtom, start, newEnd);
-        return { start, newEnd };
</del><ins>+    void generateTransition(TrivialAtom trivialAtom, unsigned source, unsigned target)
+    {
+        if (trivialAtom &amp; hasNonCharacterMask) {
+            ASSERT(trivialAtom &amp; newlineClassIDBuiltinMask);
+            m_nfa.addTransitionsOnAnyCharacter(source, target);
+        } else {
+            if (trivialAtom &amp; caseInsensitiveMask) {
+                char character = static_cast&lt;char&gt;(trivialAtom &amp; characterMask);
+                m_nfa.addTransition(source, target, character);
+                m_nfa.addTransition(source, target, toASCIIUpper(character));
+            } else
+                m_nfa.addTransition(source, target, static_cast&lt;char&gt;(trivialAtom &amp; characterMask));
+        }
</ins><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    BoundedSubGraph sinkTrivialAtom(TrivialAtom trivialAtom, unsigned start)
+    {
+        auto transitionFunction = [this, trivialAtom](unsigned source, unsigned target)
+        {
+            generateTransition(trivialAtom, source, target);
+        };
+        return sinkAtom(transitionFunction, trivialAtomQuantifier(trivialAtom), start);
+    }
+
+    void generateTransition(const BitVector&amp; characterSet, bool isInverted, unsigned source, unsigned target)
+    {
+        ASSERT(characterSet.bitCount());
+        if (!isInverted) {
+            for (const auto&amp; characterIterator : characterSet.setBits()) {
+                char character = static_cast&lt;char&gt;(characterIterator);
+                if (!m_patternIsCaseSensitive &amp;&amp; isASCIIAlpha(character)) {
+                    m_nfa.addTransition(source, target, toASCIIUpper(character));
+                    m_nfa.addTransition(source, target, toASCIILower(character));
+                } else
+                    m_nfa.addTransition(source, target, character);
+            }
+        } else {
+            for (unsigned i = 1; i &lt; characterSet.size(); ++i) {
+                if (characterSet.get(i))
+                    continue;
+                char character = static_cast&lt;char&gt;(i);
+
+                if (!m_patternIsCaseSensitive &amp;&amp; (characterSet.get(toASCIIUpper(character)) || characterSet.get(toASCIILower(character))))
+                    continue;
+
+                m_nfa.addTransition(source, target, character);
+            }
+        }
+    }
+
+    BoundedSubGraph sinkCharacterSet(const BitVector&amp; characterSet, bool isInverted, unsigned start)
+    {
+        auto transitionFunction = [this, &amp;characterSet, isInverted](unsigned source, unsigned target)
+        {
+            generateTransition(characterSet, isInverted, source, target);
+        };
+        return sinkAtom(transitionFunction, m_characterSetQuantifier, start);
+    }
+
</ins><span class="cx">     void sinkPendingAtomIfNecessary()
</span><span class="cx">     {
</span><del>-        ASSERT(m_lastPrefixTreeEntry);
-
-        if (!m_hasValidAtom)
</del><ins>+        if (m_floatingAtomType == FloatingAtomType::Invalid)
</ins><span class="cx">             return;
</span><span class="cx"> 
</span><del>-        ASSERT(m_pendingTrivialAtom);
</del><ins>+        switch (m_buildMode) {
+        case BuildMode::PrefixTree: {
+            ASSERT(m_lastPrefixTreeEntry);
+            ASSERT_WITH_MESSAGE(m_floatingAtomType == FloatingAtomType::Trivial, &quot;Only trivial atoms are handled with a prefix tree.&quot;);
</ins><span class="cx"> 
</span><del>-        auto nextEntry = m_lastPrefixTreeEntry-&gt;nextPattern.find(m_pendingTrivialAtom);
-        if (nextEntry != m_lastPrefixTreeEntry-&gt;nextPattern.end()) {
-            m_lastPrefixTreeEntry = nextEntry-&gt;value.get();
-            m_nfa.addRuleId(m_lastPrefixTreeEntry-&gt;nfaNode, m_patternId);
-        } else {
-            std::unique_ptr&lt;PrefixTreeEntry&gt; nextPrefixTreeEntry = std::make_unique&lt;PrefixTreeEntry&gt;();
</del><ins>+            auto nextEntry = m_lastPrefixTreeEntry-&gt;nextPattern.find(m_pendingTrivialAtom);
+            if (nextEntry != m_lastPrefixTreeEntry-&gt;nextPattern.end()) {
+                m_lastPrefixTreeEntry = nextEntry-&gt;value.get();
+                m_nfa.addRuleId(m_lastPrefixTreeEntry-&gt;nfaNode, m_patternId);
+            } else {
+                std::unique_ptr&lt;PrefixTreeEntry&gt; nextPrefixTreeEntry = std::make_unique&lt;PrefixTreeEntry&gt;();
</ins><span class="cx"> 
</span><del>-            BoundedSubGraph newSubGraph = sinkTrivialAtom(m_pendingTrivialAtom, m_lastPrefixTreeEntry-&gt;nfaNode);
-            nextPrefixTreeEntry-&gt;nfaNode = newSubGraph.end;
</del><ins>+                BoundedSubGraph newSubGraph = sinkTrivialAtom(m_pendingTrivialAtom, m_lastPrefixTreeEntry-&gt;nfaNode);
+                nextPrefixTreeEntry-&gt;nfaNode = newSubGraph.end;
</ins><span class="cx"> 
</span><del>-            auto addResult = m_lastPrefixTreeEntry-&gt;nextPattern.set(m_pendingTrivialAtom, WTF::move(nextPrefixTreeEntry));
-            ASSERT(addResult.isNewEntry);
</del><ins>+                auto addResult = m_lastPrefixTreeEntry-&gt;nextPattern.set(m_pendingTrivialAtom, WTF::move(nextPrefixTreeEntry));
+                ASSERT(addResult.isNewEntry);
</ins><span class="cx"> 
</span><del>-            m_newPrefixSubtreeRoot = m_lastPrefixTreeEntry;
-            m_newPrefixStaringPoint = m_pendingTrivialAtom;
</del><ins>+                if (!m_newPrefixSubtreeRoot) {
+                    m_newPrefixSubtreeRoot = m_lastPrefixTreeEntry;
+                    m_newPrefixStaringPoint = m_pendingTrivialAtom;
+                }
</ins><span class="cx"> 
</span><del>-            m_lastPrefixTreeEntry = addResult.iterator-&gt;value.get();
</del><ins>+                m_lastPrefixTreeEntry = addResult.iterator-&gt;value.get();
+            }
+            m_activeGroup.end = m_lastPrefixTreeEntry-&gt;nfaNode;
+            ASSERT(m_lastPrefixTreeEntry);
+            break;
</ins><span class="cx">         }
</span><del>-        ASSERT(m_lastPrefixTreeEntry);
</del><ins>+        case BuildMode::DirectGeneration: {
+            ASSERT(!m_lastPrefixTreeEntry);
</ins><span class="cx"> 
</span><del>-        m_activeGroup.end = m_lastPrefixTreeEntry-&gt;nfaNode;
</del><ins>+            switch (m_floatingAtomType) {
+            case FloatingAtomType::Invalid:
+                ASSERT_NOT_REACHED();
+                break;
+            case FloatingAtomType::Trivial: {
+                BoundedSubGraph newSubGraph = sinkTrivialAtom(m_pendingTrivialAtom, m_activeGroup.end);
+                m_activeGroup.end = newSubGraph.end;
+                break;
+            }
+            case FloatingAtomType::CharacterSet:
+                if (!m_pendingCharacterSet.bitCount()) {
+                    fail(ASCIILiteral(&quot;Empty character set.&quot;));
+                    return;
+                }
+                BoundedSubGraph newSubGraph = sinkCharacterSet(m_pendingCharacterSet, m_isInvertedCharacterSet, m_activeGroup.end);
+                m_activeGroup.end = newSubGraph.end;
+
+                m_isInvertedCharacterSet = false;
+                m_characterSetQuantifier = AtomQuantifier::One;
+                m_pendingCharacterSet.clearAll();
+                break;
+            }
+            break;
+            }
+        }
+
</ins><span class="cx">         m_pendingTrivialAtom = 0;
</span><del>-        m_hasValidAtom = false;
</del><ins>+        m_floatingAtomType = FloatingAtomType::Invalid;
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     NFA&amp; m_nfa;
</span><span class="lines">@@ -348,9 +487,24 @@
</span><span class="cx">     BoundedSubGraph m_activeGroup;
</span><span class="cx"> 
</span><span class="cx">     PrefixTreeEntry* m_lastPrefixTreeEntry;
</span><del>-    bool m_hasValidAtom = false;
</del><ins>+    enum class FloatingAtomType {
+        Invalid,
+        Trivial,
+        CharacterSet
+    };
+    FloatingAtomType m_floatingAtomType { FloatingAtomType::Invalid };
</ins><span class="cx">     TrivialAtom m_pendingTrivialAtom = 0;
</span><span class="cx"> 
</span><ins>+    bool m_isInvertedCharacterSet { false };
+    BitVector m_pendingCharacterSet { 128 };
+    AtomQuantifier m_characterSetQuantifier { AtomQuantifier::One };
+
+    enum class BuildMode {
+        PrefixTree,
+        DirectGeneration
+    };
+    BuildMode m_buildMode { BuildMode::PrefixTree };
+
</ins><span class="cx">     PrefixTreeEntry* m_newPrefixSubtreeRoot = nullptr;
</span><span class="cx">     TrivialAtom m_newPrefixStaringPoint = 0;
</span><span class="cx"> 
</span></span></pre>
</div>
</div>

</body>
</html>