<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[287086] trunk</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/287086">287086</a></dd>
<dt>Author</dt> <dd>commit-queue@webkit.org</dd>
<dt>Date</dt> <dd>2021-12-15 10:49:18 -0800 (Wed, 15 Dec 2021)</dd>
</dl>

<h3>Log Message</h3>
<pre>Avoid unnecessary allocation and UTF-8 conversion when calling DFABytecodeInterpreter::interpret
https://bugs.webkit.org/show_bug.cgi?id=234351

Patch by Alex Christensen <achristensen@webkit.org> on 2021-12-15
Reviewed by Tim Hatcher.

Source/WebCore:

A valid URL, the only input into DFABytecodeInterpreter::interpret, contains only ASCII characters.
In the overwhelming majority of cases, we have an 8-bit string.  There is no need to allocate, copy, and convert it.
In the rare case that we somehow get a UTF-16 encoded ASCII string, just do what we did before and UTF-8 encode it.

Regular expressions allow matching the end of the string, which we currently implement by checking for the
null character, so I had to keep the parts that read the null character at the end of a string by checking
to see if we are at the end of the string when reading a character and returning the null character if we are.

Covered by many API tests.

* contentextensions/ContentExtension.cpp:
(WebCore::ContentExtensions::ContentExtension::populateTopURLActionCacheIfNeeded const):
(WebCore::ContentExtensions::ContentExtension::populateFrameURLActionCacheIfNeeded const):
* contentextensions/ContentExtensionsBackend.cpp:
(WebCore::ContentExtensions::ContentExtensionsBackend::actionsFromContentRuleList const):
(WebCore::ContentExtensions::ContentExtensionsBackend::actionsForResourceLoad const):
* contentextensions/ContentExtensionsBackend.h:
* contentextensions/DFABytecodeInterpreter.cpp:
(WebCore::ContentExtensions::DFABytecodeInterpreter::interpetJumpTable):
(WebCore::ContentExtensions::DFABytecodeInterpreter::interpret):
* contentextensions/DFABytecodeInterpreter.h:

Tools:

* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
(TestWebKitAPI::TEST_F):</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceWebCoreChangeLog">trunk/Source/WebCore/ChangeLog</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsContentExtensioncpp">trunk/Source/WebCore/contentextensions/ContentExtension.cpp</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsContentExtensionsBackendcpp">trunk/Source/WebCore/contentextensions/ContentExtensionsBackend.cpp</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsContentExtensionsBackendh">trunk/Source/WebCore/contentextensions/ContentExtensionsBackend.h</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsDFABytecodeInterpretercpp">trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsDFABytecodeInterpreterh">trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.h</a></li>
<li><a href="#trunkToolsChangeLog">trunk/Tools/ChangeLog</a></li>
<li><a href="#trunkToolsTestWebKitAPITestsWebCoreContentExtensionscpp">trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceWebCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/ChangeLog (287085 => 287086)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/ChangeLog   2021-12-15 18:32:26 UTC (rev 287085)
+++ trunk/Source/WebCore/ChangeLog      2021-12-15 18:49:18 UTC (rev 287086)
</span><span class="lines">@@ -1,3 +1,32 @@
</span><ins>+2021-12-15  Alex Christensen  <achristensen@webkit.org>
+
+        Avoid unnecessary allocation and UTF-8 conversion when calling DFABytecodeInterpreter::interpret
+        https://bugs.webkit.org/show_bug.cgi?id=234351
+
+        Reviewed by Tim Hatcher.
+
+        A valid URL, the only input into DFABytecodeInterpreter::interpret, contains only ASCII characters.
+        In the overwhelming majority of cases, we have an 8-bit string.  There is no need to allocate, copy, and convert it.
+        In the rare case that we somehow get a UTF-16 encoded ASCII string, just do what we did before and UTF-8 encode it.
+
+        Regular expressions allow matching the end of the string, which we currently implement by checking for the
+        null character, so I had to keep the parts that read the null character at the end of a string by checking
+        to see if we are at the end of the string when reading a character and returning the null character if we are.
+
+        Covered by many API tests.
+
+        * contentextensions/ContentExtension.cpp:
+        (WebCore::ContentExtensions::ContentExtension::populateTopURLActionCacheIfNeeded const):
+        (WebCore::ContentExtensions::ContentExtension::populateFrameURLActionCacheIfNeeded const):
+        * contentextensions/ContentExtensionsBackend.cpp:
+        (WebCore::ContentExtensions::ContentExtensionsBackend::actionsFromContentRuleList const):
+        (WebCore::ContentExtensions::ContentExtensionsBackend::actionsForResourceLoad const):
+        * contentextensions/ContentExtensionsBackend.h:
+        * contentextensions/DFABytecodeInterpreter.cpp:
+        (WebCore::ContentExtensions::DFABytecodeInterpreter::interpetJumpTable):
+        (WebCore::ContentExtensions::DFABytecodeInterpreter::interpret):
+        * contentextensions/DFABytecodeInterpreter.h:
+
</ins><span class="cx"> 2021-12-15  Alan Bujtas  <zalan@apple.com>
</span><span class="cx"> 
</span><span class="cx">         [LFC][IFC] Use the physical margin/border/padding values for inline boxes when generating the display content
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsContentExtensioncpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/ContentExtension.cpp (287085 => 287086)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/ContentExtension.cpp      2021-12-15 18:32:26 UTC (rev 287085)
+++ trunk/Source/WebCore/contentextensions/ContentExtension.cpp 2021-12-15 18:49:18 UTC (rev 287086)
</span><span class="lines">@@ -114,7 +114,7 @@
</span><span class="cx">         return;
</span><span class="cx"> 
</span><span class="cx">     DFABytecodeInterpreter interpreter(m_compiledExtension->topURLFiltersBytecode());
</span><del>-    auto topURLActions = interpreter.interpret(topURL.string().utf8(), AllResourceFlags);
</del><ins>+    auto topURLActions = interpreter.interpret(topURL.string(), AllResourceFlags);
</ins><span class="cx"> 
</span><span class="cx">     m_cachedTopURLActions.clear();
</span><span class="cx">     for (uint64_t action : topURLActions)
</span><span class="lines">@@ -131,7 +131,7 @@
</span><span class="cx">         return;
</span><span class="cx"> 
</span><span class="cx">     DFABytecodeInterpreter interpreter(m_compiledExtension->frameURLFiltersBytecode());
</span><del>-    auto frameURLActions = interpreter.interpret(frameURL.string().utf8(), AllResourceFlags);
</del><ins>+    auto frameURLActions = interpreter.interpret(frameURL.string(), AllResourceFlags);
</ins><span class="cx"> 
</span><span class="cx">     m_cachedFrameURLActions.clear();
</span><span class="cx">     for (uint64_t action : frameURLActions)
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsContentExtensionsBackendcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/ContentExtensionsBackend.cpp (287085 => 287086)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/ContentExtensionsBackend.cpp      2021-12-15 18:32:26 UTC (rev 287085)
+++ trunk/Source/WebCore/contentextensions/ContentExtensionsBackend.cpp 2021-12-15 18:49:18 UTC (rev 287086)
</span><span class="lines">@@ -96,7 +96,7 @@
</span><span class="cx">     m_contentExtensions.clear();
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-auto ContentExtensionsBackend::actionsFromContentRuleList(const ContentExtension& contentExtension, const CString& urlString, const ResourceLoadInfo& resourceLoadInfo, ResourceFlags flags) const -> ActionsFromContentRuleList
</del><ins>+auto ContentExtensionsBackend::actionsFromContentRuleList(const ContentExtension& contentExtension, const String& urlString, const ResourceLoadInfo& resourceLoadInfo, ResourceFlags flags) const -> ActionsFromContentRuleList
</ins><span class="cx"> {
</span><span class="cx">     ActionsFromContentRuleList actionsStruct;
</span><span class="cx">     actionsStruct.contentRuleListIdentifier = contentExtension.identifier();
</span><span class="lines">@@ -162,9 +162,6 @@
</span><span class="cx"> 
</span><span class="cx">     const String& urlString = resourceLoadInfo.resourceURL.string();
</span><span class="cx">     ASSERT_WITH_MESSAGE(urlString.isAllASCII(), "A decoded URL should only contain ASCII characters. The matching algorithm assumes the input is ASCII.");
</span><del>-    // FIXME: UTF-8 conversion should only be necessary with UTF-16 String based URLs, which are rare.
-    // DFABytecodeInterpreter::interpret should take a Span<char> instead of a CString and we can avoid this allocation almost all of the time.
-    const auto urlCString = urlString.utf8();
</del><span class="cx"> 
</span><span class="cx">     Vector<ActionsFromContentRuleList> actionsVector;
</span><span class="cx">     actionsVector.reserveInitialCapacity(m_contentExtensions.size());
</span><span class="lines">@@ -171,7 +168,7 @@
</span><span class="cx">     ASSERT(!(resourceLoadInfo.getResourceFlags() & ActionConditionMask));
</span><span class="cx">     const ResourceFlags flags = resourceLoadInfo.getResourceFlags() | ActionConditionMask;
</span><span class="cx">     for (auto& contentExtension : m_contentExtensions.values())
</span><del>-        actionsVector.uncheckedAppend(actionsFromContentRuleList(contentExtension.get(), urlCString, resourceLoadInfo, flags));
</del><ins>+        actionsVector.uncheckedAppend(actionsFromContentRuleList(contentExtension.get(), urlString, resourceLoadInfo, flags));
</ins><span class="cx"> #if CONTENT_EXTENSIONS_PERFORMANCE_REPORTING
</span><span class="cx">     MonotonicTime addedTimeEnd = MonotonicTime::now();
</span><span class="cx">     dataLogF("Time added: %f microseconds %s \n", (addedTimeEnd - addedTimeStart).microseconds(), resourceLoadInfo.resourceURL.string().utf8().data());
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsContentExtensionsBackendh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/ContentExtensionsBackend.h (287085 => 287086)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/ContentExtensionsBackend.h        2021-12-15 18:32:26 UTC (rev 287085)
+++ trunk/Source/WebCore/contentextensions/ContentExtensionsBackend.h   2021-12-15 18:49:18 UTC (rev 287086)
</span><span class="lines">@@ -80,7 +80,7 @@
</span><span class="cx">     WEBCORE_EXPORT static bool shouldBeMadeSecure(const URL&);
</span><span class="cx"> 
</span><span class="cx"> private:
</span><del>-    ActionsFromContentRuleList actionsFromContentRuleList(const ContentExtension&, const CString& urlString, const ResourceLoadInfo&, ResourceFlags) const;
</del><ins>+    ActionsFromContentRuleList actionsFromContentRuleList(const ContentExtension&, const String& urlString, const ResourceLoadInfo&, ResourceFlags) const;
</ins><span class="cx"> 
</span><span class="cx">     HashMap<String, Ref<ContentExtension>> m_contentExtensions;
</span><span class="cx"> };
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsDFABytecodeInterpretercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp (287085 => 287086)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp        2021-12-15 18:32:26 UTC (rev 287085)
+++ trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp   2021-12-15 18:49:18 UTC (rev 287086)
</span><span class="lines">@@ -203,11 +203,12 @@
</span><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> template<bool caseSensitive>
</span><del>-inline void DFABytecodeInterpreter::interpetJumpTable(const char* url, uint32_t& urlIndex, uint32_t& programCounter, bool& urlIndexIsAfterEndOfString)
</del><ins>+inline void DFABytecodeInterpreter::interpetJumpTable(Span<const char> url, uint32_t& urlIndex, uint32_t& programCounter)
</ins><span class="cx"> {
</span><span class="cx">     DFABytecodeJumpSize jumpSize = getJumpSize(m_bytecode, programCounter);
</span><span class="cx"> 
</span><del>-    char character = caseSensitive ? url[urlIndex] : toASCIILower(url[urlIndex]);
</del><ins>+    char c = urlIndex < url.size() ? url[urlIndex] : 0;
+    char character = caseSensitive ? c : toASCIILower(c);
</ins><span class="cx">     uint8_t firstCharacter = getBits<uint8_t>(m_bytecode, programCounter + sizeof(DFABytecodeInstruction));
</span><span class="cx">     uint8_t lastCharacter = getBits<uint8_t>(m_bytecode, programCounter + sizeof(DFABytecodeInstruction) + sizeof(uint8_t));
</span><span class="cx">     if (character >= firstCharacter && character <= lastCharacter) {
</span><span class="lines">@@ -214,8 +215,6 @@
</span><span class="cx">         uint32_t startOffset = programCounter + sizeof(DFABytecodeInstruction) + 2 * sizeof(uint8_t);
</span><span class="cx">         uint32_t jumpLocation = startOffset + (character - firstCharacter) * jumpSizeInBytes(jumpSize);
</span><span class="cx">         programCounter += getJumpDistance(m_bytecode, jumpLocation, jumpSize);
</span><del>-        if (!character)
-            urlIndexIsAfterEndOfString = true;
</del><span class="cx">         urlIndex++; // This represents an edge in the DFA.
</span><span class="cx">     } else
</span><span class="cx">         programCounter += sizeof(DFABytecodeInstruction) + 2 * sizeof(uint8_t) + jumpSizeInBytes(jumpSize) * (lastCharacter - firstCharacter + 1);
</span><span class="lines">@@ -243,10 +242,17 @@
</span><span class="cx">     return actions;
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-auto DFABytecodeInterpreter::interpret(const CString& urlCString, ResourceFlags flags) -> Actions
</del><ins>+auto DFABytecodeInterpreter::interpret(const String& urlString, ResourceFlags flags) -> Actions
</ins><span class="cx"> {
</span><del>-    const char* url = urlCString.data();
-    ASSERT(url);
</del><ins>+    CString urlCString;
+    Span<const char> url;
+    if (LIKELY(urlString.is8Bit()))
+        url = { reinterpret_cast<const char*>(urlString.characters8()), urlString.length() };
+    else {
+        urlCString = urlString.utf8();
+        url = { urlCString.data(), urlCString.length() };
+    }
+    ASSERT(url.data());
</ins><span class="cx">     
</span><span class="cx">     Actions actions;
</span><span class="cx">     
</span><span class="lines">@@ -281,7 +287,6 @@
</span><span class="cx">         // Interpret the bytecode from this DFA.
</span><span class="cx">         // This should always terminate if interpreting correctly compiled bytecode.
</span><span class="cx">         uint32_t urlIndex = 0;
</span><del>-        bool urlIndexIsAfterEndOfString = false;
</del><span class="cx">         while (true) {
</span><span class="cx">             ASSERT(programCounter <= m_bytecode.size());
</span><span class="cx">             switch (getInstruction(m_bytecode, programCounter)) {
</span><span class="lines">@@ -290,17 +295,15 @@
</span><span class="cx">                 goto nextDFA;
</span><span class="cx">                     
</span><span class="cx">             case DFABytecodeInstruction::CheckValueCaseSensitive: {
</span><del>-                if (urlIndexIsAfterEndOfString)
</del><ins>+                if (urlIndex > url.size())
</ins><span class="cx">                     goto nextDFA;
</span><span class="cx"> 
</span><span class="cx">                 // Check to see if the next character in the url is the value stored with the bytecode.
</span><del>-                char character = url[urlIndex];
</del><ins>+                char character = urlIndex < url.size() ? url[urlIndex] : 0;
</ins><span class="cx">                 DFABytecodeJumpSize jumpSize = getJumpSize(m_bytecode, programCounter);
</span><span class="cx">                 if (character == getBits<uint8_t>(m_bytecode, programCounter + sizeof(DFABytecodeInstruction))) {
</span><span class="cx">                     uint32_t jumpLocation = programCounter + sizeof(DFABytecodeInstruction) + sizeof(uint8_t);
</span><span class="cx">                     programCounter += getJumpDistance(m_bytecode, jumpLocation, jumpSize);
</span><del>-                    if (!character)
-                        urlIndexIsAfterEndOfString = true;
</del><span class="cx">                     urlIndex++; // This represents an edge in the DFA.
</span><span class="cx">                 } else
</span><span class="cx">                     programCounter += sizeof(DFABytecodeInstruction) + sizeof(uint8_t) + jumpSizeInBytes(jumpSize);
</span><span class="lines">@@ -308,17 +311,15 @@
</span><span class="cx">             }
</span><span class="cx"> 
</span><span class="cx">             case DFABytecodeInstruction::CheckValueCaseInsensitive: {
</span><del>-                if (urlIndexIsAfterEndOfString)
</del><ins>+                if (urlIndex > url.size())
</ins><span class="cx">                     goto nextDFA;
</span><span class="cx"> 
</span><span class="cx">                 // Check to see if the next character in the url is the value stored with the bytecode.
</span><del>-                char character = toASCIILower(url[urlIndex]);
</del><ins>+                char character = urlIndex < url.size() ? toASCIILower(url[urlIndex]) : 0;
</ins><span class="cx">                 DFABytecodeJumpSize jumpSize = getJumpSize(m_bytecode, programCounter);
</span><span class="cx">                 if (character == getBits<uint8_t>(m_bytecode, programCounter + sizeof(DFABytecodeInstruction))) {
</span><span class="cx">                     uint32_t jumpLocation = programCounter + sizeof(DFABytecodeInstruction) + sizeof(uint8_t);
</span><span class="cx">                     programCounter += getJumpDistance(m_bytecode, jumpLocation, jumpSize);
</span><del>-                    if (!character)
-                        urlIndexIsAfterEndOfString = true;
</del><span class="cx">                     urlIndex++; // This represents an edge in the DFA.
</span><span class="cx">                 } else
</span><span class="cx">                     programCounter += sizeof(DFABytecodeInstruction) + sizeof(uint8_t) + jumpSizeInBytes(jumpSize);
</span><span class="lines">@@ -326,30 +327,28 @@
</span><span class="cx">             }
</span><span class="cx"> 
</span><span class="cx">             case DFABytecodeInstruction::JumpTableCaseInsensitive:
</span><del>-                if (urlIndexIsAfterEndOfString)
</del><ins>+                if (urlIndex > url.size())
</ins><span class="cx">                     goto nextDFA;
</span><span class="cx"> 
</span><del>-                interpetJumpTable<false>(url, urlIndex, programCounter, urlIndexIsAfterEndOfString);
</del><ins>+                interpetJumpTable<false>(url, urlIndex, programCounter);
</ins><span class="cx">                 break;
</span><span class="cx">             case DFABytecodeInstruction::JumpTableCaseSensitive:
</span><del>-                if (urlIndexIsAfterEndOfString)
</del><ins>+                if (urlIndex > url.size())
</ins><span class="cx">                     goto nextDFA;
</span><span class="cx"> 
</span><del>-                interpetJumpTable<true>(url, urlIndex, programCounter, urlIndexIsAfterEndOfString);
</del><ins>+                interpetJumpTable<true>(url, urlIndex, programCounter);
</ins><span class="cx">                 break;
</span><span class="cx">                     
</span><span class="cx">             case DFABytecodeInstruction::CheckValueRangeCaseSensitive: {
</span><del>-                if (urlIndexIsAfterEndOfString)
</del><ins>+                if (urlIndex > url.size())
</ins><span class="cx">                     goto nextDFA;
</span><span class="cx">                 
</span><del>-                char character = url[urlIndex];
</del><ins>+                char character = urlIndex < url.size() ? url[urlIndex] : 0;
</ins><span class="cx">                 DFABytecodeJumpSize jumpSize = getJumpSize(m_bytecode, programCounter);
</span><span class="cx">                 if (character >= getBits<uint8_t>(m_bytecode, programCounter + sizeof(DFABytecodeInstruction))
</span><span class="cx">                     && character <= getBits<uint8_t>(m_bytecode, programCounter + sizeof(DFABytecodeInstruction) + sizeof(uint8_t))) {
</span><span class="cx">                     uint32_t jumpLocation = programCounter + sizeof(DFABytecodeInstruction) + 2 * sizeof(uint8_t);
</span><span class="cx">                     programCounter += getJumpDistance(m_bytecode, jumpLocation, jumpSize);
</span><del>-                    if (!character)
-                        urlIndexIsAfterEndOfString = true;
</del><span class="cx">                     urlIndex++; // This represents an edge in the DFA.
</span><span class="cx">                 } else
</span><span class="cx">                     programCounter += sizeof(DFABytecodeInstruction) + 2 * sizeof(uint8_t) + jumpSizeInBytes(jumpSize);
</span><span class="lines">@@ -357,17 +356,15 @@
</span><span class="cx">             }
</span><span class="cx"> 
</span><span class="cx">             case DFABytecodeInstruction::CheckValueRangeCaseInsensitive: {
</span><del>-                if (urlIndexIsAfterEndOfString)
</del><ins>+                if (urlIndex > url.size())
</ins><span class="cx">                     goto nextDFA;
</span><span class="cx">                 
</span><del>-                char character = toASCIILower(url[urlIndex]);
</del><ins>+                char character = urlIndex < url.size() ? toASCIILower(url[urlIndex]) : 0;
</ins><span class="cx">                 DFABytecodeJumpSize jumpSize = getJumpSize(m_bytecode, programCounter);
</span><span class="cx">                 if (character >= getBits<uint8_t>(m_bytecode, programCounter + sizeof(DFABytecodeInstruction))
</span><span class="cx">                     && character <= getBits<uint8_t>(m_bytecode, programCounter + sizeof(DFABytecodeInstruction) + sizeof(uint8_t))) {
</span><span class="cx">                     uint32_t jumpLocation = programCounter + sizeof(DFABytecodeInstruction) + 2 * sizeof(uint8_t);
</span><span class="cx">                     programCounter += getJumpDistance(m_bytecode, jumpLocation, jumpSize);
</span><del>-                    if (!character)
-                        urlIndexIsAfterEndOfString = true;
</del><span class="cx">                     urlIndex++; // This represents an edge in the DFA.
</span><span class="cx">                 } else
</span><span class="cx">                     programCounter += sizeof(DFABytecodeInstruction) + 2 * sizeof(uint8_t) + jumpSizeInBytes(jumpSize);
</span><span class="lines">@@ -375,7 +372,7 @@
</span><span class="cx">             }
</span><span class="cx"> 
</span><span class="cx">             case DFABytecodeInstruction::Jump: {
</span><del>-                if (!url[urlIndex] || urlIndexIsAfterEndOfString)
</del><ins>+                if (urlIndex >= url.size())
</ins><span class="cx">                     goto nextDFA;
</span><span class="cx">                 
</span><span class="cx">                 uint32_t jumpLocation = programCounter + sizeof(DFABytecodeInstruction);
</span><span class="lines">@@ -396,8 +393,8 @@
</span><span class="cx">             default:
</span><span class="cx">                 RELEASE_ASSERT_NOT_REACHED(); // Invalid bytecode.
</span><span class="cx">             }
</span><del>-            // We should always terminate before or at a null character at the end of a String.
-            ASSERT(urlIndex <= urlCString.length() || (urlIndexIsAfterEndOfString && urlIndex <= urlCString.length() + 1));
</del><ins>+            // We should always terminate before or at an imaginary null character at the end of a String.
+            ASSERT(urlIndex <= url.size() + 1);
</ins><span class="cx">         }
</span><span class="cx">         RELEASE_ASSERT_NOT_REACHED(); // The while loop can only be exited using goto nextDFA.
</span><span class="cx">         nextDFA:
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsDFABytecodeInterpreterh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.h (287085 => 287086)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.h  2021-12-15 18:32:26 UTC (rev 287085)
+++ trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.h     2021-12-15 18:49:18 UTC (rev 287086)
</span><span class="lines">@@ -42,7 +42,7 @@
</span><span class="cx"> 
</span><span class="cx">     using Actions = HashSet<uint64_t, DefaultHash<uint64_t>, WTF::UnsignedWithZeroKeyHashTraits<uint64_t>>;
</span><span class="cx"> 
</span><del>-    WEBCORE_EXPORT Actions interpret(const CString&, ResourceFlags);
</del><ins>+    WEBCORE_EXPORT Actions interpret(const String&, ResourceFlags);
</ins><span class="cx">     Actions actionsMatchingEverything();
</span><span class="cx"> 
</span><span class="cx"> private:
</span><span class="lines">@@ -50,7 +50,7 @@
</span><span class="cx">     void interpretTestFlagsAndAppendAction(unsigned& programCounter, ResourceFlags, Actions&);
</span><span class="cx"> 
</span><span class="cx">     template<bool caseSensitive>
</span><del>-    void interpetJumpTable(const char* url, uint32_t& urlIndex, uint32_t& programCounter, bool& urlIndexIsAfterEndOfString);
</del><ins>+    void interpetJumpTable(Span<const char> url, uint32_t& urlIndex, uint32_t& programCounter);
</ins><span class="cx"> 
</span><span class="cx">     const Span<const uint8_t> m_bytecode;
</span><span class="cx"> };
</span></span></pre></div>
<a id="trunkToolsChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Tools/ChangeLog (287085 => 287086)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Tools/ChangeLog    2021-12-15 18:32:26 UTC (rev 287085)
+++ trunk/Tools/ChangeLog       2021-12-15 18:49:18 UTC (rev 287086)
</span><span class="lines">@@ -1,3 +1,13 @@
</span><ins>+2021-12-15  Alex Christensen  <achristensen@webkit.org>
+
+        Avoid unnecessary allocation and UTF-8 conversion when calling DFABytecodeInterpreter::interpret
+        https://bugs.webkit.org/show_bug.cgi?id=234351
+
+        Reviewed by Tim Hatcher.
+
+        * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
+        (TestWebKitAPI::TEST_F):
+
</ins><span class="cx"> 2021-12-15  Carlos Garcia Campos  <cgarcia@igalia.com>
</span><span class="cx"> 
</span><span class="cx">         [GTK][a11y] Add support for loading events when building with ATSPI
</span></span></pre></div>
<a id="trunkToolsTestWebKitAPITestsWebCoreContentExtensionscpp"></a>
<div class="modfile"><h4>Modified: trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp (287085 => 287086)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp    2021-12-15 18:32:26 UTC (rev 287085)
+++ trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp       2021-12-15 18:49:18 UTC (rev 287086)
</span><span class="lines">@@ -1316,7 +1316,7 @@
</span><span class="cx">                         pattern.append('x');
</span><span class="cx">                         break;
</span><span class="cx">                     }
</span><del>-                    auto matches = interpreter.interpret(pattern.toString().utf8(), 0);
</del><ins>+                    auto matches = interpreter.interpret(pattern.toString(), 0);
</ins><span class="cx">                     switch ((c1 + c2 + c3 + c4) % 4) {
</span><span class="cx">                     case 0:
</span><span class="cx">                         compareContents(matches, { });
</span></span></pre>
</div>
</div>

</body>
</html>