<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[181663] trunk</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/181663">181663</a></dd>
<dt>Author</dt> <dd>benjamin@webkit.org</dd>
<dt>Date</dt> <dd>2015-03-17 13:47:42 -0700 (Tue, 17 Mar 2015)</dd>
</dl>
<h3>Log Message</h3>
<pre>Compile character ranges targeting the same state as range check in the bytecode
https://bugs.webkit.org/show_bug.cgi?id=142759
Patch by Benjamin Poulain <bpoulain@apple.com> on 2015-03-17
Reviewed by Alex Christensen.
Source/WebCore:
Previously, character ranges would be compiled as many individual character checks.
For example, a transition on "[a-z]" would do 26 character checks + jump, which leads
to enormous matchines.
With this patch, we find the ranges at lowering time and generate a single instruction
for them: "CheckValueRange". This helps making the machine denser when the input
use character sets.
The second part of this patch goes further in the case where the transitions out of
a state cover the entire alphabet. In that case, we create a fallback transition
on the fly and remove all the ranges made useless.
That case is common when ranges are used with inverse character set (e.g. [^a]+a).
* contentextensions/DFABytecode.h:
(WebCore::ContentExtensions::instructionSizeWithArguments):
* contentextensions/DFABytecodeCompiler.cpp:
(WebCore::ContentExtensions::DFABytecodeCompiler::emitCheckValueRange):
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
(WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions):
(WebCore::ContentExtensions::DFABytecodeCompiler::compileCheckForRange):
* contentextensions/DFABytecodeCompiler.h:
Extend the compiler to detect ranges and lower them as CheckValueRange.
* contentextensions/DFABytecodeInterpreter.cpp:
(WebCore::ContentExtensions::DFABytecodeInterpreter::interpret):
Range checks in the interpreter.
* contentextensions/NFA.cpp:
(WebCore::ContentExtensions::NFA::setFinal):
This assertion does not make sense with the current codebase. Actions are "compressed",
it is possible to have two patterns with the same action.
* contentextensions/NFAToDFA.cpp:
(WebCore::ContentExtensions::simplifyTransitions):
A very simple DFA optimization function: it only reduce the strength of ranges.
(WebCore::ContentExtensions::NFAToDFA::convert):
Tools:
* TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
(TestWebKitAPI::TEST_F):</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceWebCoreChangeLog">trunk/Source/WebCore/ChangeLog</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsDFABytecodeh">trunk/Source/WebCore/contentextensions/DFABytecode.h</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsDFABytecodeCompilercpp">trunk/Source/WebCore/contentextensions/DFABytecodeCompiler.cpp</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsDFABytecodeCompilerh">trunk/Source/WebCore/contentextensions/DFABytecodeCompiler.h</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsDFABytecodeInterpretercpp">trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsNFAcpp">trunk/Source/WebCore/contentextensions/NFA.cpp</a></li>
<li><a href="#trunkSourceWebCorecontentextensionsNFAToDFAcpp">trunk/Source/WebCore/contentextensions/NFAToDFA.cpp</a></li>
<li><a href="#trunkToolsChangeLog">trunk/Tools/ChangeLog</a></li>
<li><a href="#trunkToolsTestWebKitAPITestsWebCoreContentExtensionscpp">trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceWebCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/ChangeLog (181662 => 181663)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/ChangeLog        2015-03-17 20:28:33 UTC (rev 181662)
+++ trunk/Source/WebCore/ChangeLog        2015-03-17 20:47:42 UTC (rev 181663)
</span><span class="lines">@@ -1,3 +1,48 @@
</span><ins>+2015-03-17 Benjamin Poulain <bpoulain@apple.com>
+
+ Compile character ranges targeting the same state as range check in the bytecode
+ https://bugs.webkit.org/show_bug.cgi?id=142759
+
+ Reviewed by Alex Christensen.
+
+ Previously, character ranges would be compiled as many individual character checks.
+ For example, a transition on "[a-z]" would do 26 character checks + jump, which leads
+ to enormous matchines.
+
+ With this patch, we find the ranges at lowering time and generate a single instruction
+ for them: "CheckValueRange". This helps making the machine denser when the input
+ use character sets.
+
+ The second part of this patch goes further in the case where the transitions out of
+ a state cover the entire alphabet. In that case, we create a fallback transition
+ on the fly and remove all the ranges made useless.
+ That case is common when ranges are used with inverse character set (e.g. [^a]+a).
+
+ * contentextensions/DFABytecode.h:
+ (WebCore::ContentExtensions::instructionSizeWithArguments):
+ * contentextensions/DFABytecodeCompiler.cpp:
+ (WebCore::ContentExtensions::DFABytecodeCompiler::emitCheckValueRange):
+ (WebCore::ContentExtensions::DFABytecodeCompiler::compileNode):
+ (WebCore::ContentExtensions::DFABytecodeCompiler::compileNodeTransitions):
+ (WebCore::ContentExtensions::DFABytecodeCompiler::compileCheckForRange):
+ * contentextensions/DFABytecodeCompiler.h:
+ Extend the compiler to detect ranges and lower them as CheckValueRange.
+
+ * contentextensions/DFABytecodeInterpreter.cpp:
+ (WebCore::ContentExtensions::DFABytecodeInterpreter::interpret):
+ Range checks in the interpreter.
+
+ * contentextensions/NFA.cpp:
+ (WebCore::ContentExtensions::NFA::setFinal):
+ This assertion does not make sense with the current codebase. Actions are "compressed",
+ it is possible to have two patterns with the same action.
+
+ * contentextensions/NFAToDFA.cpp:
+ (WebCore::ContentExtensions::simplifyTransitions):
+ A very simple DFA optimization function: it only reduce the strength of ranges.
+
+ (WebCore::ContentExtensions::NFAToDFA::convert):
+
</ins><span class="cx"> 2015-03-17 Jer Noble <jer.noble@apple.com>
</span><span class="cx">
</span><span class="cx"> REGRESSION (r181423): Crash @ generatedcontent.org at com.apple.WebCore: WebCore::MediaPlayer::maximumDurationToCacheMediaTime const + 4
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsDFABytecodeh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/DFABytecode.h (181662 => 181663)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/DFABytecode.h        2015-03-17 20:28:33 UTC (rev 181662)
+++ trunk/Source/WebCore/contentextensions/DFABytecode.h        2015-03-17 20:47:42 UTC (rev 181663)
</span><span class="lines">@@ -41,6 +41,12 @@
</span><span class="cx"> // The index to jump to if the values are equal (4 bytes).
</span><span class="cx"> CheckValue,
</span><span class="cx">
</span><ins>+ // Jump to an offset if the input value is within a certain range.
+ // The lower value (1 byte).
+ // The higher value (1 byte).
+ // The index to jump to if the value is in the range (4 bytes).
+ CheckValueRange,
+
</ins><span class="cx"> // AppendAction has one argument:
</span><span class="cx"> // The action to append (4 bytes).
</span><span class="cx"> AppendAction,
</span><span class="lines">@@ -63,6 +69,8 @@
</span><span class="cx"> switch (instruction) {
</span><span class="cx"> case DFABytecodeInstruction::CheckValue:
</span><span class="cx"> return sizeof(DFABytecodeInstruction) + sizeof(uint8_t) + sizeof(unsigned);
</span><ins>+ case DFABytecodeInstruction::CheckValueRange:
+ return sizeof(DFABytecodeInstruction) + sizeof(uint8_t) + sizeof(uint8_t) + sizeof(unsigned);
</ins><span class="cx"> case DFABytecodeInstruction::AppendAction:
</span><span class="cx"> return sizeof(DFABytecodeInstruction) + sizeof(unsigned);
</span><span class="cx"> case DFABytecodeInstruction::TestFlagsAndAppendAction:
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsDFABytecodeCompilercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/DFABytecodeCompiler.cpp (181662 => 181663)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/DFABytecodeCompiler.cpp        2015-03-17 20:28:33 UTC (rev 181662)
+++ trunk/Source/WebCore/contentextensions/DFABytecodeCompiler.cpp        2015-03-17 20:47:42 UTC (rev 181663)
</span><span class="lines">@@ -30,6 +30,7 @@
</span><span class="cx">
</span><span class="cx"> #include "ContentExtensionRule.h"
</span><span class="cx"> #include "DFA.h"
</span><ins>+#include "DFANode.h"
</ins><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><span class="lines">@@ -76,6 +77,18 @@
</span><span class="cx"> append<unsigned>(m_bytecode, 0); // This value will be set when linking.
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+void DFABytecodeCompiler::emitCheckValueRange(uint8_t lowValue, uint8_t highValue, unsigned destinationNodeIndex)
+{
+ ASSERT_WITH_MESSAGE(lowValue != highValue, "A single value check should be emitted for single values.");
+ ASSERT_WITH_MESSAGE(lowValue < highValue, "The instruction semantic impose lowValue is smaller than highValue.");
+
+ append<DFABytecodeInstruction>(m_bytecode, DFABytecodeInstruction::CheckValueRange);
+ append<uint8_t>(m_bytecode, lowValue);
+ append<uint8_t>(m_bytecode, highValue);
+ m_linkRecords.append(std::make_pair(m_bytecode.size(), destinationNodeIndex));
+ append<unsigned>(m_bytecode, 0);
+}
+
</ins><span class="cx"> void DFABytecodeCompiler::emitTerminate()
</span><span class="cx"> {
</span><span class="cx"> append<DFABytecodeInstruction>(m_bytecode, DFABytecodeInstruction::Terminate);
</span><span class="lines">@@ -95,16 +108,63 @@
</span><span class="cx"> else
</span><span class="cx"> emitAppendAction(static_cast<unsigned>(action));
</span><span class="cx"> }
</span><del>-
- for (const auto& transition : node.transitions)
- emitCheckValue(transition.key, transition.value);
-
</del><ins>+ compileNodeTransitions(node);
+}
+
+void DFABytecodeCompiler::compileNodeTransitions(const DFANode& node)
+{
+ bool hasRangeMin = false;
+ uint16_t rangeMin;
+ unsigned rangeDestination = 0;
+
+ for (unsigned char i = 0; i < 128; ++i) {
+ auto transitionIterator = node.transitions.find(i);
+ if (transitionIterator == node.transitions.end()) {
+ if (hasRangeMin) {
+ ASSERT_WITH_MESSAGE(!(node.hasFallbackTransition && node.fallbackTransition == rangeDestination), "Individual transitions to the fallback transitions should have been eliminated by the optimizer.");
+
+ unsigned char lastHighValue = i - 1;
+ compileCheckForRange(rangeMin, lastHighValue, rangeDestination);
+ hasRangeMin = false;
+ }
+ continue;
+ }
+
+ if (!hasRangeMin) {
+ hasRangeMin = true;
+ rangeMin = transitionIterator->key;
+ rangeDestination = transitionIterator->value;
+ } else {
+ if (transitionIterator->value == rangeDestination)
+ continue;
+
+ unsigned char lastHighValue = i - 1;
+ compileCheckForRange(rangeMin, lastHighValue, rangeDestination);
+ rangeMin = i;
+ rangeDestination = transitionIterator->value;
+ }
+ }
+ if (hasRangeMin)
+ compileCheckForRange(rangeMin, 127, rangeDestination);
+
</ins><span class="cx"> if (node.hasFallbackTransition)
</span><span class="cx"> emitJump(node.fallbackTransition);
</span><span class="cx"> else
</span><span class="cx"> emitTerminate();
</span><span class="cx"> }
</span><del>-
</del><ins>+
+void DFABytecodeCompiler::compileCheckForRange(uint16_t lowValue, uint16_t highValue, unsigned destinationNodeIndex)
+{
+ ASSERT_WITH_MESSAGE(lowValue < 128, "The DFA engine only supports the ASCII alphabet.");
+ ASSERT_WITH_MESSAGE(highValue < 128, "The DFA engine only supports the ASCII alphabet.");
+ ASSERT(lowValue <= highValue);
+
+ if (lowValue == highValue)
+ emitCheckValue(lowValue, destinationNodeIndex);
+ else
+ emitCheckValueRange(lowValue, highValue, destinationNodeIndex);
+}
+
</ins><span class="cx"> void DFABytecodeCompiler::compile()
</span><span class="cx"> {
</span><span class="cx"> ASSERT(!m_bytecode.size());
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsDFABytecodeCompilerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/DFABytecodeCompiler.h (181662 => 181663)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/DFABytecodeCompiler.h        2015-03-17 20:28:33 UTC (rev 181662)
+++ trunk/Source/WebCore/contentextensions/DFABytecodeCompiler.h        2015-03-17 20:47:42 UTC (rev 181663)
</span><span class="lines">@@ -29,7 +29,6 @@
</span><span class="cx"> #if ENABLE(CONTENT_EXTENSIONS)
</span><span class="cx">
</span><span class="cx"> #include "DFABytecode.h"
</span><del>-#include "DFANode.h"
</del><span class="cx"> #include <wtf/Vector.h>
</span><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="lines">@@ -37,6 +36,7 @@
</span><span class="cx"> namespace ContentExtensions {
</span><span class="cx">
</span><span class="cx"> class DFA;
</span><ins>+class DFANode;
</ins><span class="cx">
</span><span class="cx"> class DFABytecodeCompiler {
</span><span class="cx"> public:
</span><span class="lines">@@ -50,11 +50,14 @@
</span><span class="cx">
</span><span class="cx"> private:
</span><span class="cx"> void compileNode(unsigned);
</span><ins>+ void compileNodeTransitions(const DFANode&);
+ void compileCheckForRange(uint16_t lowValue, uint16_t highValue, unsigned destinationNodeIndex);
</ins><span class="cx">
</span><span class="cx"> void emitAppendAction(unsigned);
</span><span class="cx"> void emitTestFlagsAndAppendAction(uint16_t flags, unsigned);
</span><span class="cx"> void emitJump(unsigned destinationNodeIndex);
</span><span class="cx"> void emitCheckValue(uint8_t value, unsigned destinationNodeIndex);
</span><ins>+ void emitCheckValueRange(uint8_t lowValue, uint8_t highValue, unsigned destinationNodeIndex);
</ins><span class="cx"> void emitTerminate();
</span><span class="cx">
</span><span class="cx"> Vector<DFABytecode>& m_bytecode;
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsDFABytecodeInterpretercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp (181662 => 181663)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp        2015-03-17 20:28:33 UTC (rev 181662)
+++ trunk/Source/WebCore/contentextensions/DFABytecodeInterpreter.cpp        2015-03-17 20:47:42 UTC (rev 181663)
</span><span class="lines">@@ -73,6 +73,22 @@
</span><span class="cx"> programCounter += instructionSizeWithArguments(DFABytecodeInstruction::CheckValue);
</span><span class="cx"> break;
</span><span class="cx">
</span><ins>+ case DFABytecodeInstruction::CheckValueRange: {
+ if (urlIndexIsAfterEndOfString)
+ return actions;
+
+ char character = url[urlIndex];
+ if (character >= getBits<uint8_t>(m_bytecode, m_bytecodeLength, programCounter + sizeof(DFABytecode))
+ && character <= getBits<uint8_t>(m_bytecode, m_bytecodeLength, programCounter + sizeof(DFABytecode) + sizeof(uint8_t))) {
+ programCounter = getBits<unsigned>(m_bytecode, m_bytecodeLength, programCounter + sizeof(DFABytecode) + sizeof(uint8_t) + sizeof(uint8_t));
+ if (!character)
+ urlIndexIsAfterEndOfString = true;
+ urlIndex++; // This represents an edge in the DFA.
+ } else
+ programCounter += instructionSizeWithArguments(DFABytecodeInstruction::CheckValueRange);
+ break;
+ }
+
</ins><span class="cx"> case DFABytecodeInstruction::Jump:
</span><span class="cx"> if (!url[urlIndex] || urlIndexIsAfterEndOfString)
</span><span class="cx"> return actions;
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsNFAcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/NFA.cpp (181662 => 181663)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/NFA.cpp        2015-03-17 20:28:33 UTC (rev 181662)
+++ trunk/Source/WebCore/contentextensions/NFA.cpp        2015-03-17 20:47:42 UTC (rev 181663)
</span><span class="lines">@@ -83,8 +83,8 @@
</span><span class="cx">
</span><span class="cx"> void NFA::setFinal(unsigned node, uint64_t ruleId)
</span><span class="cx"> {
</span><del>- ASSERT(!m_nodes[node].finalRuleIds.contains(ruleId));
- m_nodes[node].finalRuleIds.append(ruleId);
</del><ins>+ if (!m_nodes[node].finalRuleIds.contains(ruleId))
+ m_nodes[node].finalRuleIds.append(ruleId);
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> unsigned NFA::graphSize() const
</span></span></pre></div>
<a id="trunkSourceWebCorecontentextensionsNFAToDFAcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/contentextensions/NFAToDFA.cpp (181662 => 181663)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/contentextensions/NFAToDFA.cpp        2015-03-17 20:28:33 UTC (rev 181662)
+++ trunk/Source/WebCore/contentextensions/NFAToDFA.cpp        2015-03-17 20:47:42 UTC (rev 181663)
</span><span class="lines">@@ -338,6 +338,49 @@
</span><span class="cx"> return uniqueNodeIdAddResult.iterator->impl()->m_dfaNodeId;
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+static void simplifyTransitions(Vector<DFANode>& dfaGraph)
+{
+ for (DFANode& dfaNode : dfaGraph) {
+ if (!dfaNode.hasFallbackTransition
+ && ((dfaNode.transitions.size() == 126 && !dfaNode.transitions.contains(0))
+ || (dfaNode.transitions.size() == 127 && dfaNode.transitions.contains(0)))) {
+ unsigned bestTarget = std::numeric_limits<unsigned>::max();
+ unsigned bestTargetScore = 0;
+ HashMap<unsigned, unsigned, DefaultHash<unsigned>::Hash, WTF::UnsignedWithZeroKeyHashTraits<unsigned>> targetHistogram;
+ for (const auto transition : dfaNode.transitions) {
+ if (!transition.key)
+ continue;
+
+ unsigned transitionTarget = transition.value;
+ auto addResult = targetHistogram.add(transitionTarget, 1);
+ if (!addResult.isNewEntry)
+ addResult.iterator->value++;
+
+ if (addResult.iterator->value > bestTargetScore) {
+ bestTargetScore = addResult.iterator->value;
+ bestTarget = transitionTarget;
+ }
+ }
+ ASSERT_WITH_MESSAGE(bestTargetScore, "There should be at least one valid target since having transitions is a precondition to enter this path.");
+
+ dfaNode.hasFallbackTransition = true;
+ dfaNode.fallbackTransition = bestTarget;
+ }
+
+ if (dfaNode.hasFallbackTransition) {
+ Vector<uint16_t, 128> keys;
+ DFANodeTransitions& transitions = dfaNode.transitions;
+ copyKeysToVector(transitions, keys);
+
+ for (uint16_t key : keys) {
+ auto transitionIterator = transitions.find(key);
+ if (transitionIterator->value == dfaNode.fallbackTransition)
+ transitions.remove(transitionIterator);
+ }
+ }
+ }
+}
+
</ins><span class="cx"> DFA NFAToDFA::convert(NFA& nfa)
</span><span class="cx"> {
</span><span class="cx"> Vector<NFANode>& nfaGraph = nfa.m_nodes;
</span><span class="lines">@@ -387,6 +430,7 @@
</span><span class="cx"> }
</span><span class="cx"> } while (!unprocessedNodes.isEmpty());
</span><span class="cx">
</span><ins>+ simplifyTransitions(dfaGraph);
</ins><span class="cx"> return DFA(WTF::move(dfaGraph), 0);
</span><span class="cx"> }
</span><span class="cx">
</span></span></pre></div>
<a id="trunkToolsChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Tools/ChangeLog (181662 => 181663)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Tools/ChangeLog        2015-03-17 20:28:33 UTC (rev 181662)
+++ trunk/Tools/ChangeLog        2015-03-17 20:47:42 UTC (rev 181663)
</span><span class="lines">@@ -1,3 +1,13 @@
</span><ins>+2015-03-17 Benjamin Poulain <bpoulain@apple.com>
+
+ Compile character ranges targeting the same state as range check in the bytecode
+ https://bugs.webkit.org/show_bug.cgi?id=142759
+
+ Reviewed by Alex Christensen.
+
+ * TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp:
+ (TestWebKitAPI::TEST_F):
+
</ins><span class="cx"> 2015-03-17 Youenn Fablet <youenn.fablet@crf.canon.fr>
</span><span class="cx">
</span><span class="cx"> W3C test parser and converter should use test importer host
</span></span></pre></div>
<a id="trunkToolsTestWebKitAPITestsWebCoreContentExtensionscpp"></a>
<div class="modfile"><h4>Modified: trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp (181662 => 181663)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp        2015-03-17 20:28:33 UTC (rev 181662)
+++ trunk/Tools/TestWebKitAPI/Tests/WebCore/ContentExtensions.cpp        2015-03-17 20:47:42 UTC (rev 181663)
</span><span class="lines">@@ -122,6 +122,56 @@
</span><span class="cx"> testRequest(backend, mainDocumentRequest("http://webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+TEST_F(ContentExtensionTest, RangeBasic)
+{
+ const char* rangeBasicFilter = "[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\".*w[0-9]c\", \"url-filter-is-case-sensitive\":true}},{\"action\":{\"type\":\"block-cookies\"},\"trigger\":{\"url-filter\":\".*[A-H][a-z]cko\", \"url-filter-is-case-sensitive\":true}}]";
+ auto extensionData = ContentExtensions::compileRuleList(rangeBasicFilter);
+ auto extension = InMemoryCompiledContentExtension::create(WTF::move(extensionData));
+
+ ContentExtensions::ContentExtensionsBackend backend;
+ backend.addContentExtension("PatternNestedGroupsFilter", extension);
+
+ testRequest(backend, mainDocumentRequest("http://w3c.org"), { ContentExtensions::ActionType::BlockLoad });
+ testRequest(backend, mainDocumentRequest("w2c://whatwg.org/"), { ContentExtensions::ActionType::BlockLoad });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/w0c"), { ContentExtensions::ActionType::BlockLoad });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/wac"), { });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/wAc"), { });
+
+ // Note: URL parsing and canonicalization lowercase the scheme and hostname.
+ testRequest(backend, mainDocumentRequest("Aacko://webkit.org"), { });
+ testRequest(backend, mainDocumentRequest("aacko://webkit.org"), { });
+ testRequest(backend, mainDocumentRequest("http://gCcko.org/"), { });
+ testRequest(backend, mainDocumentRequest("http://gccko.org/"), { });
+
+ testRequest(backend, mainDocumentRequest("http://webkit.org/Gecko"), { ContentExtensions::ActionType::BlockCookies });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/gecko"), { });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/GEcko"), { });
+}
+
+TEST_F(ContentExtensionTest, RangeExclusionGeneratingUniversalTransition)
+{
+ // Transition of the type ([^X]X) effictively transition on every input.
+ const char* rangeExclusionGeneratingUniversalTransitionFilter = "[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\".*[^a]+afoobar\"}}]";
+ auto extensionData = ContentExtensions::compileRuleList(rangeExclusionGeneratingUniversalTransitionFilter);
+ auto extension = InMemoryCompiledContentExtension::create(WTF::move(extensionData));
+
+ ContentExtensions::ContentExtensionsBackend backend;
+ backend.addContentExtension("PatternNestedGroupsFilter", extension);
+
+ testRequest(backend, mainDocumentRequest("http://w3c.org"), { });
+
+ testRequest(backend, mainDocumentRequest("http://w3c.org/foobafoobar"), { ContentExtensions::ActionType::BlockLoad });
+ testRequest(backend, mainDocumentRequest("http://w3c.org/foobarfoobar"), { });
+ testRequest(backend, mainDocumentRequest("http://w3c.org/FOOBAFOOBAR"), { ContentExtensions::ActionType::BlockLoad });
+ testRequest(backend, mainDocumentRequest("http://w3c.org/FOOBARFOOBAR"), { });
+
+ // The character before the "a" prefix cannot be another "a".
+ testRequest(backend, mainDocumentRequest("http://w3c.org/aafoobar"), { });
+ testRequest(backend, mainDocumentRequest("http://w3c.org/Aafoobar"), { });
+ testRequest(backend, mainDocumentRequest("http://w3c.org/aAfoobar"), { });
+ testRequest(backend, mainDocumentRequest("http://w3c.org/AAfoobar"), { });
+}
+
</ins><span class="cx"> const char* patternsStartingWithGroupFilter = "[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\"(http://whatwg\\\\.org/)?webkit\134\134.org\"}}]";
</span><span class="cx">
</span><span class="cx"> TEST_F(ContentExtensionTest, PatternStartingWithGroup)
</span><span class="lines">@@ -219,6 +269,26 @@
</span><span class="cx"> testRequest(backend, mainDocumentRequest("http://webkit.org/foobarfoo"), { });
</span><span class="cx"> testRequest(backend, mainDocumentRequest("http://webkit.org/foobarf"), { });
</span><span class="cx"> }
</span><ins>+
+TEST_F(ContentExtensionTest, EndOfLineAssertionWithInvertedCharacterSet)
+{
+ const char* endOfLineAssertionWithInvertedCharacterSetFilter = "[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\".*[^y]$\"}}]";
+ auto extensionData = ContentExtensions::compileRuleList(endOfLineAssertionWithInvertedCharacterSetFilter);
+ auto extension = InMemoryCompiledContentExtension::create(WTF::move(extensionData));
+
+ ContentExtensions::ContentExtensionsBackend backend;
+ backend.addContentExtension("EndOfLineAssertion", extension);
+
+ testRequest(backend, mainDocumentRequest("http://webkit.org/"), { ContentExtensions::ActionType::BlockLoad });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/a"), { ContentExtensions::ActionType::BlockLoad });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/foobar"), { ContentExtensions::ActionType::BlockLoad });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/Ya"), { ContentExtensions::ActionType::BlockLoad });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/yFoobar"), { ContentExtensions::ActionType::BlockLoad });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/y"), { });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/Y"), { });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/foobary"), { });
+ testRequest(backend, mainDocumentRequest("http://webkit.org/foobarY"), { });
+}
</ins><span class="cx">
</span><span class="cx"> const char* loadTypeFilter = "[{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\".*webkit.org\",\"load-type\":[\"third-party\"]}},"
</span><span class="cx"> "{\"action\":{\"type\":\"block\"},\"trigger\":{\"url-filter\":\".*whatwg.org\",\"load-type\":[\"first-party\"]}},"
</span></span></pre>
</div>
</div>
</body>
</html>