<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[178173] trunk/Source</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/178173">178173</a></dd>
<dt>Author</dt> <dd>commit-queue@webkit.org</dd>
<dt>Date</dt> <dd>2015-01-09 09:44:37 -0800 (Fri, 09 Jan 2015)</dd>
</dl>
<h3>Log Message</h3>
<pre>Unreviewed, rolling out <a href="http://trac.webkit.org/projects/webkit/changeset/178154">r178154</a>, <a href="http://trac.webkit.org/projects/webkit/changeset/178163">r178163</a>, and <a href="http://trac.webkit.org/projects/webkit/changeset/178164">r178164</a>.
https://bugs.webkit.org/show_bug.cgi?id=140292
Still multiple assertion failures on tests (Requested by ap on
#webkit).
Reverted changesets:
"Modernize and streamline HTMLTokenizer"
https://bugs.webkit.org/show_bug.cgi?id=140166
http://trac.webkit.org/changeset/178154
"Unreviewed speculative buildfix after <a href="http://trac.webkit.org/projects/webkit/changeset/178154">r178154</a>."
http://trac.webkit.org/changeset/178163
"One more unreviewed speculative buildfix after <a href="http://trac.webkit.org/projects/webkit/changeset/178154">r178154</a>."
http://trac.webkit.org/changeset/178164</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceWTFChangeLog">trunk/Source/WTF/ChangeLog</a></li>
<li><a href="#trunkSourceWTFwtfForwardh">trunk/Source/WTF/wtf/Forward.h</a></li>
<li><a href="#trunkSourceWebCoreChangeLog">trunk/Source/WebCore/ChangeLog</a></li>
<li><a href="#trunkSourceWebCorehtmlparserAtomicHTMLTokenh">trunk/Source/WebCore/html/parser/AtomicHTMLToken.h</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLDocumentParsercpp">trunk/Source/WebCore/html/parser/HTMLDocumentParser.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLDocumentParserh">trunk/Source/WebCore/html/parser/HTMLDocumentParser.h</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLEntityParsercpp">trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLInputStreamh">trunk/Source/WebCore/html/parser/HTMLInputStream.h</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLMetaCharsetParsercpp">trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLMetaCharsetParserh">trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.h</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLPreloadScannercpp">trunk/Source/WebCore/html/parser/HTMLPreloadScanner.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLPreloadScannerh">trunk/Source/WebCore/html/parser/HTMLPreloadScanner.h</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLResourcePreloadercpp">trunk/Source/WebCore/html/parser/HTMLResourcePreloader.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLResourcePreloaderh">trunk/Source/WebCore/html/parser/HTMLResourcePreloader.h</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLSourceTrackercpp">trunk/Source/WebCore/html/parser/HTMLSourceTracker.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLSourceTrackerh">trunk/Source/WebCore/html/parser/HTMLSourceTracker.h</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLTokenh">trunk/Source/WebCore/html/parser/HTMLToken.h</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLTokenizercpp">trunk/Source/WebCore/html/parser/HTMLTokenizer.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLTokenizerh">trunk/Source/WebCore/html/parser/HTMLTokenizer.h</a></li>
<li><a href="#trunkSourceWebCorehtmlparserHTMLTreeBuildercpp">trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserInputStreamPreprocessorh">trunk/Source/WebCore/html/parser/InputStreamPreprocessor.h</a></li>
<li><a href="#trunkSourceWebCorehtmlparserTextDocumentParsercpp">trunk/Source/WebCore/html/parser/TextDocumentParser.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserXSSAuditorcpp">trunk/Source/WebCore/html/parser/XSSAuditor.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserXSSAuditorh">trunk/Source/WebCore/html/parser/XSSAuditor.h</a></li>
<li><a href="#trunkSourceWebCorehtmltrackWebVTTTokenizercpp">trunk/Source/WebCore/html/track/WebVTTTokenizer.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmltrackWebVTTTokenizerh">trunk/Source/WebCore/html/track/WebVTTTokenizer.h</a></li>
<li><a href="#trunkSourceWebCoreplatformtextSegmentedStringcpp">trunk/Source/WebCore/platform/text/SegmentedString.cpp</a></li>
<li><a href="#trunkSourceWebCoreplatformtextSegmentedStringh">trunk/Source/WebCore/platform/text/SegmentedString.h</a></li>
<li><a href="#trunkSourceWebCorexmlparserCharacterReferenceParserInlinesh">trunk/Source/WebCore/xml/parser/CharacterReferenceParserInlines.h</a></li>
<li><a href="#trunkSourceWebCorexmlparserMarkupTokenizerInlinesh">trunk/Source/WebCore/xml/parser/MarkupTokenizerInlines.h</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceWTFChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/ChangeLog (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/ChangeLog        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WTF/ChangeLog        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -1,3 +1,23 @@
</span><ins>+2015-01-09 Commit Queue <commit-queue@webkit.org>
+
+ Unreviewed, rolling out r178154, r178163, and r178164.
+ https://bugs.webkit.org/show_bug.cgi?id=140292
+
+ Still multiple assertion failures on tests (Requested by ap on
+ #webkit).
+
+ Reverted changesets:
+
+ "Modernize and streamline HTMLTokenizer"
+ https://bugs.webkit.org/show_bug.cgi?id=140166
+ http://trac.webkit.org/changeset/178154
+
+ "Unreviewed speculative buildfix after r178154."
+ http://trac.webkit.org/changeset/178163
+
+ "One more unreviewed speculative buildfix after r178154."
+ http://trac.webkit.org/changeset/178164
+
</ins><span class="cx"> 2015-01-08 Darin Adler <darin@apple.com>
</span><span class="cx">
</span><span class="cx"> Modernize and streamline HTMLTokenizer
</span></span></pre></div>
<a id="trunkSourceWTFwtfForwardh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/Forward.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/Forward.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WTF/wtf/Forward.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -30,6 +30,7 @@
</span><span class="cx"> template<typename T> class NeverDestroyed;
</span><span class="cx"> template<typename T> class OwnPtr;
</span><span class="cx"> template<typename T> class PassOwnPtr;
</span><ins>+template<typename T> class PassRef;
</ins><span class="cx"> template<typename T> class PassRefPtr;
</span><span class="cx"> template<typename T> class RefPtr;
</span><span class="cx"> template<typename T> class Ref;
</span><span class="lines">@@ -44,13 +45,11 @@
</span><span class="cx"> class Decoder;
</span><span class="cx"> class Encoder;
</span><span class="cx"> class FunctionDispatcher;
</span><del>-class OrdinalNumber;
</del><span class="cx"> class PrintStream;
</span><span class="cx"> class String;
</span><span class="cx"> class StringBuilder;
</span><span class="cx"> class StringImpl;
</span><span class="cx"> class StringView;
</span><del>-class TextPosition;
</del><span class="cx">
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -64,9 +63,9 @@
</span><span class="cx"> using WTF::FunctionDispatcher;
</span><span class="cx"> using WTF::LazyNeverDestroyed;
</span><span class="cx"> using WTF::NeverDestroyed;
</span><del>-using WTF::OrdinalNumber;
</del><span class="cx"> using WTF::OwnPtr;
</span><span class="cx"> using WTF::PassOwnPtr;
</span><ins>+using WTF::PassRef;
</ins><span class="cx"> using WTF::PassRefPtr;
</span><span class="cx"> using WTF::PrintStream;
</span><span class="cx"> using WTF::Ref;
</span><span class="lines">@@ -76,7 +75,6 @@
</span><span class="cx"> using WTF::StringBuilder;
</span><span class="cx"> using WTF::StringImpl;
</span><span class="cx"> using WTF::StringView;
</span><del>-using WTF::TextPosition;
</del><span class="cx"> using WTF::Vector;
</span><span class="cx">
</span><span class="cx"> #endif // WTF_Forward_h
</span></span></pre></div>
<a id="trunkSourceWebCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/ChangeLog (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/ChangeLog        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/ChangeLog        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -1,3 +1,23 @@
</span><ins>+2015-01-09 Commit Queue <commit-queue@webkit.org>
+
+ Unreviewed, rolling out r178154, r178163, and r178164.
+ https://bugs.webkit.org/show_bug.cgi?id=140292
+
+ Still multiple assertion failures on tests (Requested by ap on
+ #webkit).
+
+ Reverted changesets:
+
+ "Modernize and streamline HTMLTokenizer"
+ https://bugs.webkit.org/show_bug.cgi?id=140166
+ http://trac.webkit.org/changeset/178154
+
+ "Unreviewed speculative buildfix after r178154."
+ http://trac.webkit.org/changeset/178163
+
+ "One more unreviewed speculative buildfix after r178154."
+ http://trac.webkit.org/changeset/178164
+
</ins><span class="cx"> 2015-01-09 Bartlomiej Gajda <b.gajda@samsung.com>
</span><span class="cx">
</span><span class="cx"> [MSE] Implement Append Window support.
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserAtomicHTMLTokenh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/AtomicHTMLToken.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/AtomicHTMLToken.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/AtomicHTMLToken.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -191,6 +191,11 @@
</span><span class="cx"> if (attribute.name.isEmpty())
</span><span class="cx"> continue;
</span><span class="cx">
</span><ins>+ ASSERT(attribute.nameRange.start);
+ ASSERT(attribute.nameRange.end);
+ ASSERT(attribute.valueRange.start);
+ ASSERT(attribute.valueRange.end);
+
</ins><span class="cx"> QualifiedName name(nullAtom, AtomicString(attribute.name), nullAtom);
</span><span class="cx">
</span><span class="cx"> // FIXME: This is N^2 for the number of attributes.
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLDocumentParsercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLDocumentParser.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLDocumentParser.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLDocumentParser.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -39,6 +39,28 @@
</span><span class="cx">
</span><span class="cx"> using namespace HTMLNames;
</span><span class="cx">
</span><ins>+// This is a direct transcription of step 4 from:
+// https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
+static HTMLTokenizer::State tokenizerStateForContextElement(Element& contextElement, bool reportErrors, const HTMLParserOptions& options)
+{
+ const QualifiedName& contextTag = contextElement.tagQName();
+
+ if (contextTag.matches(titleTag) || contextTag.matches(textareaTag))
+ return HTMLTokenizer::RCDATAState;
+ if (contextTag.matches(styleTag)
+ || contextTag.matches(xmpTag)
+ || contextTag.matches(iframeTag)
+ || (contextTag.matches(noembedTag) && options.pluginsEnabled)
+ || (contextTag.matches(noscriptTag) && options.scriptEnabled)
+ || contextTag.matches(noframesTag))
+ return reportErrors ? HTMLTokenizer::RAWTEXTState : HTMLTokenizer::PLAINTEXTState;
+ if (contextTag.matches(scriptTag))
+ return reportErrors ? HTMLTokenizer::ScriptDataState : HTMLTokenizer::PLAINTEXTState;
+ if (contextTag.matches(plaintextTag))
+ return HTMLTokenizer::PLAINTEXTState;
+ return HTMLTokenizer::DataState;
+}
+
</ins><span class="cx"> HTMLDocumentParser::HTMLDocumentParser(HTMLDocument& document)
</span><span class="cx"> : ScriptableDocumentParser(document)
</span><span class="cx"> , m_options(document)
</span><span class="lines">@@ -63,9 +85,8 @@
</span><span class="cx"> , m_treeBuilder(std::make_unique<HTMLTreeBuilder>(*this, fragment, contextElement, parserContentPolicy(), m_options))
</span><span class="cx"> , m_xssAuditorDelegate(fragment.document())
</span><span class="cx"> {
</span><del>- // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
- if (contextElement.isHTMLElement())
- m_tokenizer.updateStateFor(contextElement.tagQName().localName());
</del><ins>+ bool reportErrors = false; // For now document fragment parsing never reports errors.
+ m_tokenizer.setState(tokenizerStateForContextElement(contextElement, reportErrors, m_options));
</ins><span class="cx"> m_xssAuditor.initForFragment();
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -258,22 +279,22 @@
</span><span class="cx">
</span><span class="cx"> while (canTakeNextToken(mode, session) && !session.needsYield) {
</span><span class="cx"> if (!isParsingFragment())
</span><del>- m_sourceTracker.startToken(m_input.current(), m_tokenizer);
</del><ins>+ m_sourceTracker.start(m_input.current(), &m_tokenizer, m_token);
</ins><span class="cx">
</span><del>- auto token = m_tokenizer.nextToken(m_input.current());
- if (!token)
</del><ins>+ if (!m_tokenizer.nextToken(m_input.current(), m_token))
</ins><span class="cx"> break;
</span><span class="cx">
</span><span class="cx"> if (!isParsingFragment()) {
</span><del>- m_sourceTracker.endToken(m_input.current(), m_tokenizer);
</del><ins>+ m_sourceTracker.end(m_input.current(), &m_tokenizer, m_token);
</ins><span class="cx">
</span><span class="cx"> // We do not XSS filter innerHTML, which means we (intentionally) fail
</span><span class="cx"> // http/tests/security/xssAuditor/dom-write-innerHTML.html
</span><del>- if (auto xssInfo = m_xssAuditor.filterToken(FilterTokenRequest(*token, m_sourceTracker, m_tokenizer.shouldAllowCDATA())))
</del><ins>+ if (auto xssInfo = m_xssAuditor.filterToken(FilterTokenRequest(m_token, m_sourceTracker, m_tokenizer.shouldAllowCDATA())))
</ins><span class="cx"> m_xssAuditorDelegate.didBlockScript(*xssInfo);
</span><span class="cx"> }
</span><span class="cx">
</span><del>- constructTreeFromHTMLToken(token);
</del><ins>+ constructTreeFromHTMLToken(m_token);
+ ASSERT(m_token.type() == HTMLToken::Uninitialized);
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> // Ensure we haven't been totally deref'ed after pumping. Any caller of this
</span><span class="lines">@@ -287,20 +308,20 @@
</span><span class="cx"> m_parserScheduler->scheduleForResume();
</span><span class="cx">
</span><span class="cx"> if (isWaitingForScripts()) {
</span><del>- ASSERT(m_tokenizer.isInDataState());
</del><ins>+ ASSERT(m_tokenizer.state() == HTMLTokenizer::DataState);
</ins><span class="cx"> if (!m_preloadScanner) {
</span><span class="cx"> m_preloadScanner = std::make_unique<HTMLPreloadScanner>(m_options, document()->url(), document()->deviceScaleFactor());
</span><span class="cx"> m_preloadScanner->appendToEnd(m_input.current());
</span><span class="cx"> }
</span><del>- m_preloadScanner->scan(*m_preloader, *document());
</del><ins>+ m_preloadScanner->scan(m_preloader.get(), *document());
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> InspectorInstrumentation::didWriteHTML(cookie, m_input.current().currentLine().zeroBasedInt());
</span><span class="cx"> }
</span><span class="cx">
</span><del>-void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLTokenizer::TokenPtr& rawToken)
</del><ins>+void HTMLDocumentParser::constructTreeFromHTMLToken(HTMLToken& rawToken)
</ins><span class="cx"> {
</span><del>- AtomicHTMLToken token(*rawToken);
</del><ins>+ AtomicHTMLToken token(rawToken);
</ins><span class="cx">
</span><span class="cx"> // We clear the rawToken in case constructTreeFromAtomicToken
</span><span class="cx"> // synchronously re-enters the parser. We don't clear the token immedately
</span><span class="lines">@@ -312,13 +333,15 @@
</span><span class="cx"> // FIXME: Stop clearing the rawToken once we start running the parser off
</span><span class="cx"> // the main thread or once we stop allowing synchronous JavaScript
</span><span class="cx"> // execution from parseAttribute.
</span><del>- if (rawToken->type() != HTMLToken::Character) {
- // Clearing the TokenPtr makes sure we don't clear the HTMLToken a second time
- // later when the TokenPtr is destroyed.
</del><ins>+ if (rawToken.type() != HTMLToken::Character)
</ins><span class="cx"> rawToken.clear();
</span><del>- }
</del><span class="cx">
</span><span class="cx"> m_treeBuilder->constructTree(token);
</span><ins>+
+ if (rawToken.type() != HTMLToken::Uninitialized) {
+ ASSERT(rawToken.type() == HTMLToken::Character);
+ rawToken.clear();
+ }
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> bool HTMLDocumentParser::hasInsertionPoint()
</span><span class="lines">@@ -350,7 +373,7 @@
</span><span class="cx"> if (!m_insertionPreloadScanner)
</span><span class="cx"> m_insertionPreloadScanner = std::make_unique<HTMLPreloadScanner>(m_options, document()->url(), document()->deviceScaleFactor());
</span><span class="cx"> m_insertionPreloadScanner->appendToEnd(source);
</span><del>- m_insertionPreloadScanner->scan(*m_preloader, *document());
</del><ins>+ m_insertionPreloadScanner->scan(m_preloader.get(), *document());
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> endIfDelayed();
</span><span class="lines">@@ -375,7 +398,7 @@
</span><span class="cx"> } else {
</span><span class="cx"> m_preloadScanner->appendToEnd(source);
</span><span class="cx"> if (isWaitingForScripts())
</span><del>- m_preloadScanner->scan(*m_preloader, *document());
</del><ins>+ m_preloadScanner->scan(m_preloader.get(), *document());
</ins><span class="cx"> }
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -510,7 +533,7 @@
</span><span class="cx"> {
</span><span class="cx"> ASSERT(m_preloadScanner);
</span><span class="cx"> m_preloadScanner->appendToEnd(m_input.current());
</span><del>- m_preloadScanner->scan(*m_preloader, *document());
</del><ins>+ m_preloadScanner->scan(m_preloader.get(), *document());
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> void HTMLDocumentParser::notifyFinished(CachedResource* cachedResource)
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLDocumentParserh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLDocumentParser.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLDocumentParser.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLDocumentParser.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -103,7 +103,7 @@
</span><span class="cx"> bool canTakeNextToken(SynchronousMode, PumpSession&);
</span><span class="cx"> void pumpTokenizer(SynchronousMode);
</span><span class="cx"> void pumpTokenizerIfPossible(SynchronousMode);
</span><del>- void constructTreeFromHTMLToken(HTMLTokenizer::TokenPtr&);
</del><ins>+ void constructTreeFromHTMLToken(HTMLToken&);
</ins><span class="cx">
</span><span class="cx"> void runScriptsForPausedTreeBuilder();
</span><span class="cx"> void resumeParsingAfterScriptExecution();
</span><span class="lines">@@ -121,6 +121,7 @@
</span><span class="cx"> HTMLParserOptions m_options;
</span><span class="cx"> HTMLInputStream m_input;
</span><span class="cx">
</span><ins>+ HTMLToken m_token;
</ins><span class="cx"> HTMLTokenizer m_tokenizer;
</span><span class="cx"> std::unique_ptr<HTMLScriptRunner> m_scriptRunner;
</span><span class="cx"> std::unique_ptr<HTMLTreeBuilder> m_treeBuilder;
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLEntityParsercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLEntityParser.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -60,9 +60,9 @@
</span><span class="cx"> return windowsLatin1ExtensionArray[value - 0x80];
</span><span class="cx"> }
</span><span class="cx">
</span><del>- static bool acceptMalformed() { return true; }
</del><ins>+ inline static bool acceptMalformed() { return true; }
</ins><span class="cx">
</span><del>- static bool consumeNamedEntity(SegmentedString& source, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter, UChar& cc)
</del><ins>+ inline static bool consumeNamedEntity(SegmentedString& source, StringBuilder& decodedEntity, bool& notEnoughCharacters, UChar additionalAllowedCharacter, UChar& cc)
</ins><span class="cx"> {
</span><span class="cx"> StringBuilder consumedCharacters;
</span><span class="cx"> HTMLEntitySearch entitySearch;
</span><span class="lines">@@ -72,7 +72,7 @@
</span><span class="cx"> if (!entitySearch.isEntityPrefix())
</span><span class="cx"> break;
</span><span class="cx"> consumedCharacters.append(cc);
</span><del>- source.advance();
</del><ins>+ source.advanceAndASSERT(cc);
</ins><span class="cx"> }
</span><span class="cx"> notEnoughCharacters = source.isEmpty();
</span><span class="cx"> if (notEnoughCharacters) {
</span><span class="lines">@@ -97,7 +97,7 @@
</span><span class="cx"> cc = source.currentChar();
</span><span class="cx"> ASSERT_UNUSED(reference, cc == *reference++);
</span><span class="cx"> consumedCharacters.append(cc);
</span><del>- source.advance();
</del><ins>+ source.advanceAndASSERT(cc);
</ins><span class="cx"> ASSERT(!source.isEmpty());
</span><span class="cx"> }
</span><span class="cx"> cc = source.currentChar();
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLInputStreamh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLInputStream.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLInputStream.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLInputStream.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -28,7 +28,6 @@
</span><span class="cx">
</span><span class="cx"> #include "InputStreamPreprocessor.h"
</span><span class="cx"> #include "SegmentedString.h"
</span><del>-#include <wtf/text/TextPosition.h>
</del><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLMetaCharsetParsercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -1,6 +1,5 @@
</span><span class="cx"> /*
</span><span class="cx"> * Copyright (C) 2010 Google Inc. All Rights Reserved.
</span><del>- * Copyright (C) 2015 Apple Inc. All Rights Reserved.
</del><span class="cx"> *
</span><span class="cx"> * Redistribution and use in source and binary forms, with or without
</span><span class="cx"> * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -29,26 +28,41 @@
</span><span class="cx">
</span><span class="cx"> #include "HTMLNames.h"
</span><span class="cx"> #include "HTMLParserIdioms.h"
</span><ins>+#include "HTMLTokenizer.h"
+#include "TextCodec.h"
</ins><span class="cx"> #include "TextEncodingRegistry.h"
</span><span class="cx">
</span><ins>+using namespace WTF;
+
</ins><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><span class="cx"> using namespace HTMLNames;
</span><span class="cx">
</span><span class="cx"> HTMLMetaCharsetParser::HTMLMetaCharsetParser()
</span><del>- : m_codec(newTextCodec(Latin1Encoding()))
</del><ins>+ : m_tokenizer(std::make_unique<HTMLTokenizer>(HTMLParserOptions()))
+ , m_assumedCodec(newTextCodec(Latin1Encoding()))
+ , m_inHeadSection(true)
+ , m_doneChecking(false)
</ins><span class="cx"> {
</span><span class="cx"> }
</span><span class="cx">
</span><del>-static StringView extractCharset(const String& value)
</del><ins>+HTMLMetaCharsetParser::~HTMLMetaCharsetParser()
</ins><span class="cx"> {
</span><ins>+}
+
+static const char charsetString[] = "charset";
+static const size_t charsetLength = sizeof("charset") - 1;
+
+String HTMLMetaCharsetParser::extractCharset(const String& value)
+{
+ size_t pos = 0;
</ins><span class="cx"> unsigned length = value.length();
</span><del>- for (size_t pos = 0; pos < length; ) {
- pos = value.find("charset", pos, false);
</del><ins>+
+ while (pos < length) {
+ pos = value.find(charsetString, pos, false);
</ins><span class="cx"> if (pos == notFound)
</span><span class="cx"> break;
</span><span class="cx">
</span><del>- static const size_t charsetLength = sizeof("charset") - 1;
</del><span class="cx"> pos += charsetLength;
</span><span class="cx">
</span><span class="cx"> // Skip whitespace.
</span><span class="lines">@@ -63,10 +77,12 @@
</span><span class="cx"> while (pos < length && value[pos] <= ' ')
</span><span class="cx"> ++pos;
</span><span class="cx">
</span><del>- UChar quoteMark = 0;
- if (pos < length && (value[pos] == '"' || value[pos] == '\''))
- quoteMark = value[pos++];
-
</del><ins>+ char quoteMark = 0;
+ if (pos < length && (value[pos] == '"' || value[pos] == '\'')) {
+ quoteMark = static_cast<char>(value[pos++]);
+ ASSERT(!(quoteMark & 0x80));
+ }
+
</ins><span class="cx"> if (pos == length)
</span><span class="cx"> break;
</span><span class="cx">
</span><span class="lines">@@ -77,17 +93,19 @@
</span><span class="cx"> if (quoteMark && (end == length))
</span><span class="cx"> break; // Close quote not found.
</span><span class="cx">
</span><del>- return StringView(value).substring(pos, end - pos);
</del><ins>+ return value.substring(pos, end - pos);
</ins><span class="cx"> }
</span><del>- return StringView();
</del><ins>+
+ return "";
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-bool HTMLMetaCharsetParser::processMeta(HTMLToken& token)
</del><ins>+bool HTMLMetaCharsetParser::processMeta()
</ins><span class="cx"> {
</span><ins>+ const HTMLToken::AttributeList& tokenAttributes = m_token.attributes();
</ins><span class="cx"> AttributeList attributes;
</span><del>- for (auto& attribute : token.attributes()) {
- String attributeName = StringImpl::create8BitIfPossible(attribute.name);
- String attributeValue = StringImpl::create8BitIfPossible(attribute.value);
</del><ins>+ for (HTMLToken::AttributeList::const_iterator iter = tokenAttributes.begin(); iter != tokenAttributes.end(); ++iter) {
+ String attributeName = StringImpl::create8BitIfPossible(iter->name);
+ String attributeValue = StringImpl::create8BitIfPossible(iter->value);
</ins><span class="cx"> attributes.append(std::make_pair(attributeName, attributeValue));
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -98,12 +116,12 @@
</span><span class="cx"> TextEncoding HTMLMetaCharsetParser::encodingFromMetaAttributes(const AttributeList& attributes)
</span><span class="cx"> {
</span><span class="cx"> bool gotPragma = false;
</span><del>- enum { None, Charset, Pragma } mode = None;
- StringView charset;
</del><ins>+ Mode mode = None;
+ String charset;
</ins><span class="cx">
</span><del>- for (auto& attribute : attributes) {
- const String& attributeName = attribute.first;
- const String& attributeValue = attribute.second;
</del><ins>+ for (AttributeList::const_iterator iter = attributes.begin(); iter != attributes.end(); ++iter) {
+ const AtomicString& attributeName = iter->first;
+ const String& attributeValue = iter->second;
</ins><span class="cx">
</span><span class="cx"> if (attributeName == http_equivAttr) {
</span><span class="cx"> if (equalIgnoringCase(attributeValue, "content-type"))
</span><span class="lines">@@ -121,11 +139,13 @@
</span><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> if (mode == Charset || (mode == Pragma && gotPragma))
</span><del>- return TextEncoding(stripLeadingAndTrailingHTMLSpaces(charset.toStringWithoutCopying()));
</del><ins>+ return TextEncoding(stripLeadingAndTrailingHTMLSpaces(charset));
</ins><span class="cx">
</span><span class="cx"> return TextEncoding();
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+static const int bytesToCheckUnconditionally = 1024; // That many input bytes will be checked for meta charset even if <head> section is over.
+
</ins><span class="cx"> bool HTMLMetaCharsetParser::checkForMetaCharset(const char* data, size_t length)
</span><span class="cx"> {
</span><span class="cx"> if (m_doneChecking)
</span><span class="lines">@@ -136,32 +156,30 @@
</span><span class="cx"> // We still don't have an encoding, and are in the head.
</span><span class="cx"> // The following tags are allowed in <head>:
</span><span class="cx"> // SCRIPT|STYLE|META|LINK|OBJECT|TITLE|BASE
</span><del>- //
</del><ins>+
</ins><span class="cx"> // We stop scanning when a tag that is not permitted in <head>
</span><span class="cx"> // is seen, rather when </head> is seen, because that more closely
</span><span class="cx"> // matches behavior in other browsers; more details in
</span><span class="cx"> // <http://bugs.webkit.org/show_bug.cgi?id=3590>.
</span><del>- //
</del><ins>+
</ins><span class="cx"> // Additionally, we ignore things that looks like tags in <title>, <script>
</span><span class="cx"> // and <noscript>; see <http://bugs.webkit.org/show_bug.cgi?id=4560>,
</span><span class="cx"> // <http://bugs.webkit.org/show_bug.cgi?id=12165> and
</span><span class="cx"> // <http://bugs.webkit.org/show_bug.cgi?id=12389>.
</span><del>- //
</del><ins>+
</ins><span class="cx"> // Since many sites have charset declarations after <body> or other tags
</span><span class="cx"> // that are disallowed in <head>, we don't bail out until we've checked at
</span><span class="cx"> // least bytesToCheckUnconditionally bytes of input.
</span><span class="cx">
</span><del>- static const int bytesToCheckUnconditionally = 1024;
</del><ins>+ m_input.append(SegmentedString(m_assumedCodec->decode(data, length)));
</ins><span class="cx">
</span><del>- m_input.append(SegmentedString(m_codec->decode(data, length)));
-
- while (auto token = m_tokenizer.nextToken(m_input)) {
- bool isEnd = token->type() == HTMLToken::EndTag;
- if (isEnd || token->type() == HTMLToken::StartTag) {
- AtomicString tagName(token->name());
- if (!isEnd) {
- m_tokenizer.updateStateFor(tagName);
- if (tagName == metaTag && processMeta(*token)) {
</del><ins>+ while (m_tokenizer->nextToken(m_input, m_token)) {
+ bool end = m_token.type() == HTMLToken::EndTag;
+ if (end || m_token.type() == HTMLToken::StartTag) {
+ AtomicString tagName(m_token.name());
+ if (!end) {
+ m_tokenizer->updateStateFor(tagName);
+ if (tagName == metaTag && processMeta()) {
</ins><span class="cx"> m_doneChecking = true;
</span><span class="cx"> return true;
</span><span class="cx"> }
</span><span class="lines">@@ -171,8 +189,7 @@
</span><span class="cx"> && tagName != styleTag && tagName != linkTag
</span><span class="cx"> && tagName != metaTag && tagName != objectTag
</span><span class="cx"> && tagName != titleTag && tagName != baseTag
</span><del>- && (isEnd || tagName != htmlTag)
- && (isEnd || tagName != headTag)) {
</del><ins>+ && (end || tagName != htmlTag) && (end || tagName != headTag)) {
</ins><span class="cx"> m_inHeadSection = false;
</span><span class="cx"> }
</span><span class="cx"> }
</span><span class="lines">@@ -181,6 +198,8 @@
</span><span class="cx"> m_doneChecking = true;
</span><span class="cx"> return true;
</span><span class="cx"> }
</span><ins>+
+ m_token.clear();
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> return false;
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLMetaCharsetParserh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLMetaCharsetParser.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -26,36 +26,49 @@
</span><span class="cx"> #ifndef HTMLMetaCharsetParser_h
</span><span class="cx"> #define HTMLMetaCharsetParser_h
</span><span class="cx">
</span><del>-#include "HTMLTokenizer.h"
</del><ins>+#include "HTMLToken.h"
</ins><span class="cx"> #include "SegmentedString.h"
</span><span class="cx"> #include "TextEncoding.h"
</span><ins>+#include <wtf/Noncopyable.h>
</ins><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><ins>+class HTMLTokenizer;
</ins><span class="cx"> class TextCodec;
</span><span class="cx">
</span><span class="cx"> class HTMLMetaCharsetParser {
</span><span class="cx"> WTF_MAKE_NONCOPYABLE(HTMLMetaCharsetParser); WTF_MAKE_FAST_ALLOCATED;
</span><span class="cx"> public:
</span><span class="cx"> HTMLMetaCharsetParser();
</span><ins>+ ~HTMLMetaCharsetParser();
</ins><span class="cx">
</span><span class="cx"> // Returns true if done checking, regardless whether an encoding is found.
</span><span class="cx"> bool checkForMetaCharset(const char*, size_t);
</span><span class="cx">
</span><span class="cx"> const TextEncoding& encoding() { return m_encoding; }
</span><span class="cx">
</span><ins>+ typedef Vector<std::pair<String, String>> AttributeList;
</ins><span class="cx"> // The returned encoding might not be valid.
</span><del>- typedef Vector<std::pair<String, String>> AttributeList;
- static TextEncoding encodingFromMetaAttributes(const AttributeList&);
</del><ins>+ static TextEncoding encodingFromMetaAttributes(const AttributeList&
+);
</ins><span class="cx">
</span><span class="cx"> private:
</span><del>- bool processMeta(HTMLToken&);
</del><ins>+ bool processMeta();
+ static String extractCharset(const String&);
</ins><span class="cx">
</span><del>- HTMLTokenizer m_tokenizer;
- const std::unique_ptr<TextCodec> m_codec;
</del><ins>+ enum Mode {
+ None,
+ Charset,
+ Pragma,
+ };
+
+ std::unique_ptr<HTMLTokenizer> m_tokenizer;
+ std::unique_ptr<TextCodec> m_assumedCodec;
</ins><span class="cx"> SegmentedString m_input;
</span><del>- bool m_inHeadSection { true };
- bool m_doneChecking { false };
</del><ins>+ HTMLToken m_token;
+ bool m_inHeadSection;
+
+ bool m_doneChecking;
</ins><span class="cx"> TextEncoding m_encoding;
</span><span class="cx"> };
</span><span class="cx">
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLPreloadScannercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLPreloadScanner.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLPreloadScanner.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLPreloadScanner.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -242,10 +242,42 @@
</span><span class="cx">
</span><span class="cx"> TokenPreloadScanner::TokenPreloadScanner(const URL& documentURL, float deviceScaleFactor)
</span><span class="cx"> : m_documentURL(documentURL)
</span><ins>+ , m_inStyle(false)
</ins><span class="cx"> , m_deviceScaleFactor(deviceScaleFactor)
</span><ins>+#if ENABLE(TEMPLATE_ELEMENT)
+ , m_templateCount(0)
+#endif
</ins><span class="cx"> {
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+TokenPreloadScanner::~TokenPreloadScanner()
+{
+}
+
+TokenPreloadScannerCheckpoint TokenPreloadScanner::createCheckpoint()
+{
+ TokenPreloadScannerCheckpoint checkpoint = m_checkpoints.size();
+ m_checkpoints.append(Checkpoint(m_predictedBaseElementURL, m_inStyle
+#if ENABLE(TEMPLATE_ELEMENT)
+ , m_templateCount
+#endif
+ ));
+ return checkpoint;
+}
+
+void TokenPreloadScanner::rewindTo(TokenPreloadScannerCheckpoint checkpointIndex)
+{
+ ASSERT(checkpointIndex < m_checkpoints.size()); // If this ASSERT fires, checkpointIndex is invalid.
+ const Checkpoint& checkpoint = m_checkpoints[checkpointIndex];
+ m_predictedBaseElementURL = checkpoint.predictedBaseElementURL;
+ m_inStyle = checkpoint.inStyle;
+#if ENABLE(TEMPLATE_ELEMENT)
+ m_templateCount = checkpoint.templateCount;
+#endif
+ m_cssScanner.reset();
+ m_checkpoints.clear();
+}
+
</ins><span class="cx"> void TokenPreloadScanner::scan(const HTMLToken& token, Vector<std::unique_ptr<PreloadRequest>>& requests, Document& document)
</span><span class="cx"> {
</span><span class="cx"> switch (token.type()) {
</span><span class="lines">@@ -317,16 +349,20 @@
</span><span class="cx">
</span><span class="cx"> HTMLPreloadScanner::HTMLPreloadScanner(const HTMLParserOptions& options, const URL& documentURL, float deviceScaleFactor)
</span><span class="cx"> : m_scanner(documentURL, deviceScaleFactor)
</span><del>- , m_tokenizer(options)
</del><ins>+ , m_tokenizer(std::make_unique<HTMLTokenizer>(options))
</ins><span class="cx"> {
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+HTMLPreloadScanner::~HTMLPreloadScanner()
+{
+}
+
</ins><span class="cx"> void HTMLPreloadScanner::appendToEnd(const SegmentedString& source)
</span><span class="cx"> {
</span><span class="cx"> m_source.append(source);
</span><span class="cx"> }
</span><span class="cx">
</span><del>-void HTMLPreloadScanner::scan(HTMLResourcePreloader& preloader, Document& document)
</del><ins>+void HTMLPreloadScanner::scan(HTMLResourcePreloader* preloader, Document& document)
</ins><span class="cx"> {
</span><span class="cx"> ASSERT(isMainThread()); // HTMLTokenizer::updateStateFor only works on the main thread.
</span><span class="cx">
</span><span class="lines">@@ -338,13 +374,14 @@
</span><span class="cx">
</span><span class="cx"> PreloadRequestStream requests;
</span><span class="cx">
</span><del>- while (auto token = m_tokenizer.nextToken(m_source)) {
- if (token->type() == HTMLToken::StartTag)
- m_tokenizer.updateStateFor(AtomicString(token->name()));
- m_scanner.scan(*token, requests, document);
</del><ins>+ while (m_tokenizer->nextToken(m_source, m_token)) {
+ if (m_token.type() == HTMLToken::StartTag)
+ m_tokenizer->updateStateFor(AtomicString(m_token.name()));
+ m_scanner.scan(m_token, requests, document);
+ m_token.clear();
</ins><span class="cx"> }
</span><span class="cx">
</span><del>- preloader.preload(WTF::move(requests));
</del><ins>+ preloader->preload(WTF::move(requests));
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> }
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLPreloadScannerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLPreloadScanner.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLPreloadScanner.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLPreloadScanner.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -28,20 +28,40 @@
</span><span class="cx"> #define HTMLPreloadScanner_h
</span><span class="cx">
</span><span class="cx"> #include "CSSPreloadScanner.h"
</span><del>-#include "HTMLTokenizer.h"
</del><ins>+#include "HTMLToken.h"
</ins><span class="cx"> #include "SegmentedString.h"
</span><ins>+#include <wtf/Vector.h>
</ins><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><ins>+typedef size_t TokenPreloadScannerCheckpoint;
+
+class HTMLParserOptions;
+class HTMLTokenizer;
+class SegmentedString;
+class Frame;
+
</ins><span class="cx"> class TokenPreloadScanner {
</span><del>- WTF_MAKE_NONCOPYABLE(TokenPreloadScanner);
</del><ins>+ WTF_MAKE_NONCOPYABLE(TokenPreloadScanner); WTF_MAKE_FAST_ALLOCATED;
</ins><span class="cx"> public:
</span><span class="cx"> explicit TokenPreloadScanner(const URL& documentURL, float deviceScaleFactor = 1.0);
</span><ins>+ ~TokenPreloadScanner();
</ins><span class="cx">
</span><del>- void scan(const HTMLToken&, PreloadRequestStream&, Document&);
</del><ins>+ void scan(const HTMLToken&, PreloadRequestStream& requests, Document&);
</ins><span class="cx">
</span><span class="cx"> void setPredictedBaseElementURL(const URL& url) { m_predictedBaseElementURL = url; }
</span><span class="cx">
</span><ins>+ // A TokenPreloadScannerCheckpoint is valid until the next call to rewindTo,
+ // at which point all outstanding checkpoints are invalidated.
+ TokenPreloadScannerCheckpoint createCheckpoint();
+ void rewindTo(TokenPreloadScannerCheckpoint);
+
+ bool isSafeToSendToAnotherThread()
+ {
+ return m_documentURL.isSafeToSendToAnotherThread()
+ && m_predictedBaseElementURL.isSafeToSendToAnotherThread();
+ }
+
</ins><span class="cx"> private:
</span><span class="cx"> enum class TagId {
</span><span class="cx"> // These tags are scanned by the StartTagScanner.
</span><span class="lines">@@ -65,29 +85,54 @@
</span><span class="cx">
</span><span class="cx"> void updatePredictedBaseURL(const HTMLToken&);
</span><span class="cx">
</span><ins>+ struct Checkpoint {
+ Checkpoint(const URL& predictedBaseElementURL, bool inStyle
+#if ENABLE(TEMPLATE_ELEMENT)
+ , size_t templateCount
+#endif
+ )
+ : predictedBaseElementURL(predictedBaseElementURL)
+ , inStyle(inStyle)
+#if ENABLE(TEMPLATE_ELEMENT)
+ , templateCount(templateCount)
+#endif
+ {
+ }
+
+ URL predictedBaseElementURL;
+ bool inStyle;
+#if ENABLE(TEMPLATE_ELEMENT)
+ size_t templateCount;
+#endif
+ };
+
</ins><span class="cx"> CSSPreloadScanner m_cssScanner;
</span><span class="cx"> const URL m_documentURL;
</span><del>- const float m_deviceScaleFactor { 1 };
</del><ins>+ URL m_predictedBaseElementURL;
+ bool m_inStyle;
+ float m_deviceScaleFactor;
</ins><span class="cx">
</span><del>- URL m_predictedBaseElementURL;
- bool m_inStyle { false };
</del><span class="cx"> #if ENABLE(TEMPLATE_ELEMENT)
</span><del>- unsigned m_templateCount { 0 };
</del><ins>+ size_t m_templateCount;
</ins><span class="cx"> #endif
</span><ins>+
+ Vector<Checkpoint> m_checkpoints;
</ins><span class="cx"> };
</span><span class="cx">
</span><span class="cx"> class HTMLPreloadScanner {
</span><del>- WTF_MAKE_FAST_ALLOCATED;
</del><ins>+ WTF_MAKE_NONCOPYABLE(HTMLPreloadScanner); WTF_MAKE_FAST_ALLOCATED;
</ins><span class="cx"> public:
</span><span class="cx"> HTMLPreloadScanner(const HTMLParserOptions&, const URL& documentURL, float deviceScaleFactor = 1.0);
</span><ins>+ ~HTMLPreloadScanner();
</ins><span class="cx">
</span><span class="cx"> void appendToEnd(const SegmentedString&);
</span><del>- void scan(HTMLResourcePreloader&, Document&);
</del><ins>+ void scan(HTMLResourcePreloader*, Document&);
</ins><span class="cx">
</span><span class="cx"> private:
</span><span class="cx"> TokenPreloadScanner m_scanner;
</span><span class="cx"> SegmentedString m_source;
</span><del>- HTMLTokenizer m_tokenizer;
</del><ins>+ HTMLToken m_token;
+ std::unique_ptr<HTMLTokenizer> m_tokenizer;
</ins><span class="cx"> };
</span><span class="cx">
</span><span class="cx"> }
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLResourcePreloadercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLResourcePreloader.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLResourcePreloader.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLResourcePreloader.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -35,6 +35,15 @@
</span><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><ins>+bool PreloadRequest::isSafeToSendToAnotherThread() const
+{
+ return m_initiator.isSafeToSendToAnotherThread()
+ && m_charset.isSafeToSendToAnotherThread()
+ && m_resourceURL.isSafeToSendToAnotherThread()
+ && m_mediaAttribute.isSafeToSendToAnotherThread()
+ && m_baseURL.isSafeToSendToAnotherThread();
+}
+
</ins><span class="cx"> URL PreloadRequest::completeURL(Document& document)
</span><span class="cx"> {
</span><span class="cx"> return document.completeURL(m_resourceURL, m_baseURL.isEmpty() ? document.url() : m_baseURL);
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLResourcePreloaderh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLResourcePreloader.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLResourcePreloader.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLResourcePreloader.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -35,14 +35,16 @@
</span><span class="cx"> public:
</span><span class="cx"> PreloadRequest(const String& initiator, const String& resourceURL, const URL& baseURL, CachedResource::Type resourceType, const String& mediaAttribute)
</span><span class="cx"> : m_initiator(initiator)
</span><del>- , m_resourceURL(resourceURL)
</del><ins>+ , m_resourceURL(resourceURL.isolatedCopy())
</ins><span class="cx"> , m_baseURL(baseURL.copy())
</span><span class="cx"> , m_resourceType(resourceType)
</span><del>- , m_mediaAttribute(mediaAttribute)
</del><ins>+ , m_mediaAttribute(mediaAttribute.isolatedCopy())
</ins><span class="cx"> , m_crossOriginModeAllowsCookies(false)
</span><span class="cx"> {
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+ bool isSafeToSendToAnotherThread() const;
+
</ins><span class="cx"> CachedResourceRequest resourceRequest(Document&);
</span><span class="cx">
</span><span class="cx"> const String& charset() const { return m_charset; }
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLSourceTrackercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLSourceTracker.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLSourceTracker.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLSourceTracker.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -1,6 +1,5 @@
</span><span class="cx"> /*
</span><span class="cx"> * Copyright (C) 2010 Adam Barth. All Rights Reserved.
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><span class="cx"> *
</span><span class="cx"> * Redistribution and use in source and binary forms, with or without
</span><span class="cx"> * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -26,7 +25,6 @@
</span><span class="cx">
</span><span class="cx"> #include "config.h"
</span><span class="cx"> #include "HTMLSourceTracker.h"
</span><del>-
</del><span class="cx"> #include "HTMLTokenizer.h"
</span><span class="cx"> #include <wtf/text/StringBuilder.h>
</span><span class="cx">
</span><span class="lines">@@ -36,41 +34,36 @@
</span><span class="cx"> {
</span><span class="cx"> }
</span><span class="cx">
</span><del>-void HTMLSourceTracker::startToken(SegmentedString& currentInput, HTMLTokenizer& tokenizer)
</del><ins>+void HTMLSourceTracker::start(SegmentedString& currentInput, HTMLTokenizer* tokenizer, HTMLToken& token)
</ins><span class="cx"> {
</span><del>- if (!m_started) {
- if (tokenizer.numberOfBufferedCharacters())
- m_previousSource = tokenizer.bufferedCharacters();
- else
- m_previousSource.clear();
- m_started = true;
</del><ins>+ if (token.type() == HTMLToken::Uninitialized) {
+ m_previousSource.clear();
+ if (tokenizer->numberOfBufferedCharacters())
+ m_previousSource = tokenizer->bufferedCharacters();
</ins><span class="cx"> } else
</span><span class="cx"> m_previousSource.append(m_currentSource);
</span><span class="cx">
</span><span class="cx"> m_currentSource = currentInput;
</span><del>- m_tokenStart = m_currentSource.numberOfCharactersConsumed() - m_previousSource.length();
</del><ins>+ token.setBaseOffset(m_currentSource.numberOfCharactersConsumed() - m_previousSource.length());
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-void HTMLSourceTracker::endToken(SegmentedString& currentInput, HTMLTokenizer& tokenizer)
</del><ins>+void HTMLSourceTracker::end(SegmentedString& currentInput, HTMLTokenizer* tokenizer, HTMLToken& token)
</ins><span class="cx"> {
</span><del>- ASSERT(m_started);
- m_started = false;
-
- m_tokenEnd = currentInput.numberOfCharactersConsumed() - tokenizer.numberOfBufferedCharacters();
</del><span class="cx"> m_cachedSourceForToken = String();
</span><ins>+
+ // FIXME: This work should really be done by the HTMLTokenizer.
+ token.setEndOffset(currentInput.numberOfCharactersConsumed() - tokenizer->numberOfBufferedCharacters());
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-String HTMLSourceTracker::source(const HTMLToken& token)
</del><ins>+String HTMLSourceTracker::sourceForToken(const HTMLToken& token)
</ins><span class="cx"> {
</span><del>- ASSERT(!m_started);
-
</del><span class="cx"> if (token.type() == HTMLToken::EndOfFile)
</span><span class="cx"> return String(); // Hides the null character we use to mark the end of file.
</span><span class="cx">
</span><span class="cx"> if (!m_cachedSourceForToken.isEmpty())
</span><span class="cx"> return m_cachedSourceForToken;
</span><span class="cx">
</span><del>- unsigned length = m_tokenEnd - m_tokenStart;
</del><ins>+ unsigned length = token.length();
</ins><span class="cx">
</span><span class="cx"> StringBuilder source;
</span><span class="cx"> source.reserveCapacity(length);
</span><span class="lines">@@ -90,9 +83,4 @@
</span><span class="cx"> return m_cachedSourceForToken;
</span><span class="cx"> }
</span><span class="cx">
</span><del>-String HTMLSourceTracker::source(const HTMLToken& token, unsigned attributeStart, unsigned attributeEnd)
-{
- return source(token).substring(attributeStart - m_tokenStart, attributeEnd - attributeStart);
</del><span class="cx"> }
</span><del>-
-}
</del></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLSourceTrackerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLSourceTracker.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLSourceTracker.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLSourceTracker.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -1,6 +1,5 @@
</span><span class="cx"> /*
</span><span class="cx"> * Copyright (C) 2010 Adam Barth. All Rights Reserved.
</span><del>- * Copyright (C) 2015 Apple Inc. All rights reserved.
</del><span class="cx"> *
</span><span class="cx"> * Redistribution and use in source and binary forms, with or without
</span><span class="cx"> * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -27,11 +26,11 @@
</span><span class="cx"> #ifndef HTMLSourceTracker_h
</span><span class="cx"> #define HTMLSourceTracker_h
</span><span class="cx">
</span><ins>+#include "HTMLToken.h"
</ins><span class="cx"> #include "SegmentedString.h"
</span><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><del>-class HTMLToken;
</del><span class="cx"> class HTMLTokenizer;
</span><span class="cx">
</span><span class="cx"> class HTMLSourceTracker {
</span><span class="lines">@@ -39,18 +38,15 @@
</span><span class="cx"> public:
</span><span class="cx"> HTMLSourceTracker();
</span><span class="cx">
</span><del>- void startToken(SegmentedString&, HTMLTokenizer&);
- void endToken(SegmentedString&, HTMLTokenizer&);
</del><ins>+ // FIXME: Once we move "end" into HTMLTokenizer, rename "start" to
+ // something that makes it obvious that this method can be called multiple
+ // times.
+ void start(SegmentedString&, HTMLTokenizer*, HTMLToken&);
+ void end(SegmentedString&, HTMLTokenizer*, HTMLToken&);
</ins><span class="cx">
</span><del>- String source(const HTMLToken&);
- String source(const HTMLToken&, unsigned attributeStart, unsigned attributeEnd);
</del><ins>+ String sourceForToken(const HTMLToken&);
</ins><span class="cx">
</span><span class="cx"> private:
</span><del>- bool m_started { false };
-
- unsigned m_tokenStart;
- unsigned m_tokenEnd;
-
</del><span class="cx"> SegmentedString m_previousSource;
</span><span class="cx"> SegmentedString m_currentSource;
</span><span class="cx">
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLTokenh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLToken.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLToken.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLToken.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -53,12 +53,15 @@
</span><span class="cx"> };
</span><span class="cx">
</span><span class="cx"> struct Attribute {
</span><ins>+ struct Range {
+ unsigned start;
+ unsigned end;
+ };
+
+ Range nameRange;
+ Range valueRange;
</ins><span class="cx"> Vector<UChar, 32> name;
</span><span class="cx"> Vector<UChar, 32> value;
</span><del>-
- // Used by HTMLSourceTracker.
- unsigned startOffset;
- unsigned endOffset;
</del><span class="cx"> };
</span><span class="cx">
</span><span class="cx"> typedef Vector<Attribute, 10> AttributeList;
</span><span class="lines">@@ -70,6 +73,11 @@
</span><span class="cx">
</span><span class="cx"> Type type() const;
</span><span class="cx">
</span><ins>+ // Used by HTMLSourceTracker.
+ void setBaseOffset(unsigned); // Base for attribute offsets, and the end of token offset.
+ void setEndOffset(unsigned);
+ unsigned length() const;
+
</ins><span class="cx"> // EndOfFile
</span><span class="cx">
</span><span class="cx"> void makeEndOfFile();
</span><span class="lines">@@ -105,10 +113,15 @@
</span><span class="cx"> void beginEndTag(LChar);
</span><span class="cx"> void beginEndTag(const Vector<LChar, 32>&);
</span><span class="cx">
</span><del>- void beginAttribute(unsigned offset);
</del><ins>+ void addNewAttribute();
+
+ void beginAttributeName(unsigned offset);
</ins><span class="cx"> void appendToAttributeName(UChar);
</span><ins>+ void endAttributeName(unsigned offset);
+
+ void beginAttributeValue(unsigned offset);
</ins><span class="cx"> void appendToAttributeValue(UChar);
</span><del>- void endAttribute(unsigned offset);
</del><ins>+ void endAttributeValue(unsigned offset);
</ins><span class="cx">
</span><span class="cx"> void setSelfClosing();
</span><span class="cx">
</span><span class="lines">@@ -141,6 +154,9 @@
</span><span class="cx"> private:
</span><span class="cx"> Type m_type;
</span><span class="cx">
</span><ins>+ unsigned m_baseOffset;
+ unsigned m_length;
+
</ins><span class="cx"> DataVector m_data;
</span><span class="cx"> UChar m_data8BitCheck;
</span><span class="cx">
</span><span class="lines">@@ -156,9 +172,8 @@
</span><span class="cx"> const HTMLToken::Attribute* findAttribute(const Vector<HTMLToken::Attribute>&, StringView name);
</span><span class="cx">
</span><span class="cx"> inline HTMLToken::HTMLToken()
</span><del>- : m_type(Uninitialized)
- , m_data8BitCheck(0)
</del><span class="cx"> {
</span><ins>+ clear();
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> inline void HTMLToken::clear()
</span><span class="lines">@@ -166,6 +181,9 @@
</span><span class="cx"> m_type = Uninitialized;
</span><span class="cx"> m_data.clear();
</span><span class="cx"> m_data8BitCheck = 0;
</span><ins>+
+ m_length = 0;
+ m_baseOffset = 0;
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> inline HTMLToken::Type HTMLToken::type() const
</span><span class="lines">@@ -179,6 +197,21 @@
</span><span class="cx"> m_type = EndOfFile;
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+inline unsigned HTMLToken::length() const
+{
+ return m_length;
+}
+
+inline void HTMLToken::setBaseOffset(unsigned offset)
+{
+ m_baseOffset = offset;
+}
+
+inline void HTMLToken::setEndOffset(unsigned endOffset)
+{
+ m_length = endOffset - m_baseOffset;
+}
+
</ins><span class="cx"> inline const HTMLToken::DataVector& HTMLToken::name() const
</span><span class="cx"> {
</span><span class="cx"> ASSERT(m_type == StartTag || m_type == EndTag || m_type == DOCTYPE);
</span><span class="lines">@@ -267,12 +300,9 @@
</span><span class="cx"> ASSERT(m_type == Uninitialized);
</span><span class="cx"> m_type = StartTag;
</span><span class="cx"> m_selfClosing = false;
</span><ins>+ m_currentAttribute = nullptr;
</ins><span class="cx"> m_attributes.clear();
</span><span class="cx">
</span><del>-#if !ASSERT_DISABLED
- m_currentAttribute = nullptr;
-#endif
-
</del><span class="cx"> m_data.append(character);
</span><span class="cx"> m_data8BitCheck = character;
</span><span class="cx"> }
</span><span class="lines">@@ -282,12 +312,9 @@
</span><span class="cx"> ASSERT(m_type == Uninitialized);
</span><span class="cx"> m_type = EndTag;
</span><span class="cx"> m_selfClosing = false;
</span><ins>+ m_currentAttribute = nullptr;
</ins><span class="cx"> m_attributes.clear();
</span><span class="cx">
</span><del>-#if !ASSERT_DISABLED
- m_currentAttribute = nullptr;
-#endif
-
</del><span class="cx"> m_data.append(character);
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -296,41 +323,64 @@
</span><span class="cx"> ASSERT(m_type == Uninitialized);
</span><span class="cx"> m_type = EndTag;
</span><span class="cx"> m_selfClosing = false;
</span><ins>+ m_currentAttribute = nullptr;
</ins><span class="cx"> m_attributes.clear();
</span><span class="cx">
</span><ins>+ m_data.appendVector(characters);
+}
+
+inline void HTMLToken::addNewAttribute()
+{
+ ASSERT(m_type == StartTag || m_type == EndTag);
+ m_attributes.grow(m_attributes.size() + 1);
+ m_currentAttribute = &m_attributes.last();
+
</ins><span class="cx"> #if !ASSERT_DISABLED
</span><del>- m_currentAttribute = nullptr;
</del><ins>+ m_currentAttribute->nameRange.start = 0;
+ m_currentAttribute->nameRange.end = 0;
+ m_currentAttribute->valueRange.start = 0;
+ m_currentAttribute->valueRange.end = 0;
</ins><span class="cx"> #endif
</span><ins>+}
</ins><span class="cx">
</span><del>- m_data.appendVector(characters);
</del><ins>+inline void HTMLToken::beginAttributeName(unsigned offset)
+{
+ ASSERT(offset);
+ ASSERT(!m_currentAttribute->nameRange.start);
+ m_currentAttribute->nameRange.start = offset - m_baseOffset;
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-inline void HTMLToken::beginAttribute(unsigned offset)
</del><ins>+inline void HTMLToken::endAttributeName(unsigned offset)
</ins><span class="cx"> {
</span><del>- ASSERT(m_type == StartTag || m_type == EndTag);
</del><span class="cx"> ASSERT(offset);
</span><ins>+ ASSERT(m_currentAttribute->nameRange.start);
+ ASSERT(!m_currentAttribute->nameRange.end);
</ins><span class="cx">
</span><del>- m_attributes.grow(m_attributes.size() + 1);
- m_currentAttribute = &m_attributes.last();
</del><ins>+ unsigned adjustedOffset = offset - m_baseOffset;
+ m_currentAttribute->nameRange.end = adjustedOffset;
</ins><span class="cx">
</span><del>- m_currentAttribute->startOffset = offset;
</del><ins>+ // FIXME: Is this intentional? Why point the value at the end of the name?
+ m_currentAttribute->valueRange.start = adjustedOffset;
+ m_currentAttribute->valueRange.end = adjustedOffset;
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-inline void HTMLToken::endAttribute(unsigned offset)
</del><ins>+inline void HTMLToken::beginAttributeValue(unsigned offset)
</ins><span class="cx"> {
</span><span class="cx"> ASSERT(offset);
</span><del>- ASSERT(m_currentAttribute);
- m_currentAttribute->endOffset = offset;
-#if !ASSERT_DISABLED
- m_currentAttribute = nullptr;
-#endif
</del><ins>+ m_currentAttribute->valueRange.start = offset - m_baseOffset;
</ins><span class="cx"> }
</span><span class="cx">
</span><ins>+inline void HTMLToken::endAttributeValue(unsigned offset)
+{
+ ASSERT(offset);
+ m_currentAttribute->valueRange.end = offset - m_baseOffset;
+}
+
</ins><span class="cx"> inline void HTMLToken::appendToAttributeName(UChar character)
</span><span class="cx"> {
</span><span class="cx"> ASSERT(character);
</span><span class="cx"> ASSERT(m_type == StartTag || m_type == EndTag);
</span><del>- ASSERT(m_currentAttribute);
</del><ins>+ ASSERT(m_currentAttribute->nameRange.start);
</ins><span class="cx"> m_currentAttribute->name.append(character);
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -338,7 +388,7 @@
</span><span class="cx"> {
</span><span class="cx"> ASSERT(character);
</span><span class="cx"> ASSERT(m_type == StartTag || m_type == EndTag);
</span><del>- ASSERT(m_currentAttribute);
</del><ins>+ ASSERT(m_currentAttribute->valueRange.start);
</ins><span class="cx"> m_currentAttribute->value.append(character);
</span><span class="cx"> }
</span><span class="cx">
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLTokenizercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLTokenizer.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLTokenizer.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLTokenizer.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
</del><ins>+ * Copyright (C) 2008 Apple Inc. All Rights Reserved.
</ins><span class="cx"> * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/
</span><span class="cx"> * Copyright (C) 2010 Google, Inc. All Rights Reserved.
</span><span class="cx"> *
</span><span class="lines">@@ -29,9 +29,12 @@
</span><span class="cx"> #include "HTMLTokenizer.h"
</span><span class="cx">
</span><span class="cx"> #include "HTMLEntityParser.h"
</span><del>-#include "HTMLNames.h"
</del><ins>+#include "HTMLTreeBuilder.h"
</ins><span class="cx"> #include "MarkupTokenizerInlines.h"
</span><ins>+#include "NotImplemented.h"
</ins><span class="cx"> #include <wtf/ASCIICType.h>
</span><ins>+#include <wtf/CurrentTime.h>
+#include <wtf/text/CString.h>
</ins><span class="cx">
</span><span class="cx"> using namespace WTF;
</span><span class="cx">
</span><span class="lines">@@ -39,97 +42,66 @@
</span><span class="cx">
</span><span class="cx"> using namespace HTMLNames;
</span><span class="cx">
</span><del>-static inline LChar convertASCIIAlphaToLower(UChar character)
</del><ins>+static inline UChar toLowerCase(UChar cc)
</ins><span class="cx"> {
</span><del>- ASSERT(isASCIIAlpha(character));
- return toASCIILowerUnchecked(character);
</del><ins>+ ASSERT(isASCIIUpper(cc));
+ const int lowerCaseOffset = 0x20;
+ return cc + lowerCaseOffset;
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-static inline bool vectorEqualsString(const Vector<LChar, 32>& vector, const char* string)
</del><ins>+static inline bool vectorEqualsString(const Vector<LChar, 32>& vector, const String& string)
</ins><span class="cx"> {
</span><del>- unsigned size = vector.size();
- for (unsigned i = 0; i < size; ++i) {
- if (!string[i] || vector[i] != string[i])
- return false;
- }
- return !string[size];
</del><ins>+ if (vector.size() != string.length())
+ return false;
+
+ if (!string.length())
+ return true;
+
+ return equal(string.impl(), vector.data(), vector.size());
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-inline bool HTMLTokenizer::inEndTagBufferingState() const
</del><ins>+static inline bool isEndTagBufferingState(HTMLTokenizer::State state)
</ins><span class="cx"> {
</span><del>- switch (m_state) {
- case RCDATAEndTagOpenState:
- case RCDATAEndTagNameState:
- case RAWTEXTEndTagOpenState:
- case RAWTEXTEndTagNameState:
- case ScriptDataEndTagOpenState:
- case ScriptDataEndTagNameState:
- case ScriptDataEscapedEndTagOpenState:
- case ScriptDataEscapedEndTagNameState:
</del><ins>+ switch (state) {
+ case HTMLTokenizer::RCDATAEndTagOpenState:
+ case HTMLTokenizer::RCDATAEndTagNameState:
+ case HTMLTokenizer::RAWTEXTEndTagOpenState:
+ case HTMLTokenizer::RAWTEXTEndTagNameState:
+ case HTMLTokenizer::ScriptDataEndTagOpenState:
+ case HTMLTokenizer::ScriptDataEndTagNameState:
+ case HTMLTokenizer::ScriptDataEscapedEndTagOpenState:
+ case HTMLTokenizer::ScriptDataEscapedEndTagNameState:
</ins><span class="cx"> return true;
</span><span class="cx"> default:
</span><span class="cx"> return false;
</span><span class="cx"> }
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+#define HTML_BEGIN_STATE(stateName) BEGIN_STATE(HTMLTokenizer, stateName)
+#define HTML_RECONSUME_IN(stateName) RECONSUME_IN(HTMLTokenizer, stateName)
+#define HTML_ADVANCE_TO(stateName) ADVANCE_TO(HTMLTokenizer, stateName)
+#define HTML_SWITCH_TO(stateName) SWITCH_TO(HTMLTokenizer, stateName)
+
</ins><span class="cx"> HTMLTokenizer::HTMLTokenizer(const HTMLParserOptions& options)
</span><del>- : m_preprocessor(*this)
</del><ins>+ : m_inputStreamPreprocessor(this)
</ins><span class="cx"> , m_options(options)
</span><span class="cx"> {
</span><ins>+ reset();
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-inline void HTMLTokenizer::bufferASCIICharacter(UChar character)
</del><ins>+HTMLTokenizer::~HTMLTokenizer()
</ins><span class="cx"> {
</span><del>- ASSERT(character != kEndOfFileMarker);
- ASSERT(isASCII(character));
- LChar narrowedCharacter = character;
- m_token.appendToCharacter(narrowedCharacter);
</del><span class="cx"> }
</span><span class="cx">
</span><del>-inline void HTMLTokenizer::bufferCharacter(UChar character)
</del><ins>+void HTMLTokenizer::reset()
</ins><span class="cx"> {
</span><del>- ASSERT(character != kEndOfFileMarker);
- m_token.appendToCharacter(character);
</del><ins>+ m_state = HTMLTokenizer::DataState;
+ m_token = 0;
+ m_forceNullCharacterReplacement = false;
+ m_shouldAllowCDATA = false;
+ m_additionalAllowedCharacter = '\0';
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-inline bool HTMLTokenizer::emitAndResumeInDataState(SegmentedString& source)
-{
- saveEndTagNameIfNeeded();
- m_state = DataState;
- source.advanceAndUpdateLineNumber();
- return true;
-}
-
-inline bool HTMLTokenizer::emitAndReconsumeInDataState()
-{
- saveEndTagNameIfNeeded();
- m_state = DataState;
- return true;
-}
-
-inline bool HTMLTokenizer::emitEndOfFile(SegmentedString& source)
-{
- m_state = DataState;
- if (haveBufferedCharacterToken())
- return true;
- source.advance();
- m_token.clear();
- m_token.makeEndOfFile();
- return true;
-}
-
-inline void HTMLTokenizer::saveEndTagNameIfNeeded()
-{
- ASSERT(m_token.type() != HTMLToken::Uninitialized);
- if (m_token.type() == HTMLToken::StartTag)
- m_appropriateEndTagName = m_token.name();
-}
-
-inline bool HTMLTokenizer::haveBufferedCharacterToken() const
-{
- return m_token.type() == HTMLToken::Character;
-}
-
</del><span class="cx"> inline bool HTMLTokenizer::processEntity(SegmentedString& source)
</span><span class="cx"> {
</span><span class="cx"> bool notEnoughCharacters = false;
</span><span class="lines">@@ -147,1246 +119,1426 @@
</span><span class="cx"> return true;
</span><span class="cx"> }
</span><span class="cx">
</span><del>-void HTMLTokenizer::flushBufferedEndTag()
</del><ins>+bool HTMLTokenizer::flushBufferedEndTag(SegmentedString& source)
</ins><span class="cx"> {
</span><del>- m_token.beginEndTag(m_bufferedEndTagName);
</del><ins>+ ASSERT(m_token->type() == HTMLToken::Character || m_token->type() == HTMLToken::Uninitialized);
+ source.advanceAndUpdateLineNumber();
+ if (m_token->type() == HTMLToken::Character)
+ return true;
+ m_token->beginEndTag(m_bufferedEndTagName);
</ins><span class="cx"> m_bufferedEndTagName.clear();
</span><span class="cx"> m_appropriateEndTagName.clear();
</span><span class="cx"> m_temporaryBuffer.clear();
</span><ins>+ return false;
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-bool HTMLTokenizer::commitToPartialEndTag(SegmentedString& source, UChar character, State state)
-{
- ASSERT(source.currentChar() == character);
- appendToTemporaryBuffer(character);
- source.advanceAndUpdateLineNumber();
</del><ins>+#define FLUSH_AND_ADVANCE_TO(stateName) \
+ do { \
+ m_state = HTMLTokenizer::stateName; \
+ if (flushBufferedEndTag(source)) \
+ return true; \
+ if (source.isEmpty() \
+ || !m_inputStreamPreprocessor.peek(source)) \
+ return haveBufferedCharacterToken(); \
+ cc = m_inputStreamPreprocessor.nextInputCharacter(); \
+ goto stateName; \
+ } while (false)
</ins><span class="cx">
</span><del>- if (haveBufferedCharacterToken()) {
- // Emit the buffered character token.
- // The next call to processToken will flush the buffered end tag and continue parsing it.
- m_state = state;
- return true;
- }
-
- flushBufferedEndTag();
- return false;
-}
-
-bool HTMLTokenizer::commitToCompleteEndTag(SegmentedString& source)
</del><ins>+bool HTMLTokenizer::flushEmitAndResumeIn(SegmentedString& source, HTMLTokenizer::State state)
</ins><span class="cx"> {
</span><del>- ASSERT(source.currentChar() == '>');
- appendToTemporaryBuffer('>');
- source.advance();
-
- m_state = DataState;
-
- if (haveBufferedCharacterToken()) {
- // Emit the character token we already have.
- // The next call to processToken will flush the buffered end tag and emit it.
- return true;
- }
-
- flushBufferedEndTag();
</del><ins>+ m_state = state;
+ flushBufferedEndTag(source);
</ins><span class="cx"> return true;
</span><span class="cx"> }
</span><span class="cx">
</span><del>-bool HTMLTokenizer::processToken(SegmentedString& source)
</del><ins>+bool HTMLTokenizer::nextToken(SegmentedString& source, HTMLToken& token)
</ins><span class="cx"> {
</span><del>- if (!m_bufferedEndTagName.isEmpty() && !inEndTagBufferingState()) {
- // We are back here after emitting a character token that came just before an end tag.
- // To continue parsing the end tag we need to move the buffered tag name into the token.
- flushBufferedEndTag();
</del><ins>+ // If we have a token in progress, then we're supposed to be called back
+ // with the same token so we can finish it.
+ ASSERT(!m_token || m_token == &token || token.type() == HTMLToken::Uninitialized);
+ m_token = &token;
</ins><span class="cx">
</span><del>- // If we are in the data state, the end tag is already complete and we should emit it
- // now, otherwise, we want to resume parsing the partial end tag.
- if (m_state == DataState)
</del><ins>+ if (!m_bufferedEndTagName.isEmpty() && !isEndTagBufferingState(m_state)) {
+ // FIXME: This should call flushBufferedEndTag().
+ // We started an end tag during our last iteration.
+ m_token->beginEndTag(m_bufferedEndTagName);
+ m_bufferedEndTagName.clear();
+ m_appropriateEndTagName.clear();
+ m_temporaryBuffer.clear();
+ if (m_state == HTMLTokenizer::DataState) {
+ // We're back in the data state, so we must be done with the tag.
</ins><span class="cx"> return true;
</span><ins>+ }
</ins><span class="cx"> }
</span><span class="cx">
</span><del>- if (!m_preprocessor.peek(source, isNullCharacterSkippingState(m_state)))
</del><ins>+ if (source.isEmpty() || !m_inputStreamPreprocessor.peek(source))
</ins><span class="cx"> return haveBufferedCharacterToken();
</span><del>- UChar character = m_preprocessor.nextInputCharacter();
</del><ins>+ UChar cc = m_inputStreamPreprocessor.nextInputCharacter();
</ins><span class="cx">
</span><del>- // https://html.spec.whatwg.org/#tokenization
</del><ins>+ // Source: http://www.whatwg.org/specs/web-apps/current-work/#tokenisation0
</ins><span class="cx"> switch (m_state) {
</span><del>-
- BEGIN_STATE(DataState)
- if (character == '&')
- ADVANCE_TO(CharacterReferenceInDataState);
- if (character == '<') {
- if (haveBufferedCharacterToken())
- RETURN_IN_CURRENT_STATE(true);
- ADVANCE_TO(TagOpenState);
</del><ins>+ HTML_BEGIN_STATE(DataState) {
+ if (cc == '&')
+ HTML_ADVANCE_TO(CharacterReferenceInDataState);
+ else if (cc == '<') {
+ if (m_token->type() == HTMLToken::Character) {
+ // We have a bunch of character tokens queued up that we
+ // are emitting lazily here.
+ return true;
+ }
+ HTML_ADVANCE_TO(TagOpenState);
+ } else if (cc == kEndOfFileMarker)
+ return emitEndOfFile(source);
+ else {
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(DataState);
</ins><span class="cx"> }
</span><del>- if (character == kEndOfFileMarker)
- return emitEndOfFile(source);
- bufferCharacter(character);
- ADVANCE_TO(DataState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(CharacterReferenceInDataState)
</del><ins>+ HTML_BEGIN_STATE(CharacterReferenceInDataState) {
</ins><span class="cx"> if (!processEntity(source))
</span><del>- RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
- SWITCH_TO(DataState);
</del><ins>+ return haveBufferedCharacterToken();
+ HTML_SWITCH_TO(DataState);
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(RCDATAState)
- if (character == '&')
- ADVANCE_TO(CharacterReferenceInRCDATAState);
- if (character == '<')
- ADVANCE_TO(RCDATALessThanSignState);
- if (character == kEndOfFileMarker)
- RECONSUME_IN(DataState);
- bufferCharacter(character);
- ADVANCE_TO(RCDATAState);
</del><ins>+ HTML_BEGIN_STATE(RCDATAState) {
+ if (cc == '&')
+ HTML_ADVANCE_TO(CharacterReferenceInRCDATAState);
+ else if (cc == '<')
+ HTML_ADVANCE_TO(RCDATALessThanSignState);
+ else if (cc == kEndOfFileMarker)
+ return emitEndOfFile(source);
+ else {
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(RCDATAState);
+ }
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(CharacterReferenceInRCDATAState)
</del><ins>+ HTML_BEGIN_STATE(CharacterReferenceInRCDATAState) {
</ins><span class="cx"> if (!processEntity(source))
</span><del>- RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
- SWITCH_TO(RCDATAState);
</del><ins>+ return haveBufferedCharacterToken();
+ HTML_SWITCH_TO(RCDATAState);
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(RAWTEXTState)
- if (character == '<')
- ADVANCE_TO(RAWTEXTLessThanSignState);
- if (character == kEndOfFileMarker)
- RECONSUME_IN(DataState);
- bufferCharacter(character);
- ADVANCE_TO(RAWTEXTState);
</del><ins>+ HTML_BEGIN_STATE(RAWTEXTState) {
+ if (cc == '<')
+ HTML_ADVANCE_TO(RAWTEXTLessThanSignState);
+ else if (cc == kEndOfFileMarker)
+ return emitEndOfFile(source);
+ else {
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(RAWTEXTState);
+ }
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataState)
- if (character == '<')
- ADVANCE_TO(ScriptDataLessThanSignState);
- if (character == kEndOfFileMarker)
- RECONSUME_IN(DataState);
- bufferCharacter(character);
- ADVANCE_TO(ScriptDataState);
</del><ins>+ HTML_BEGIN_STATE(ScriptDataState) {
+ if (cc == '<')
+ HTML_ADVANCE_TO(ScriptDataLessThanSignState);
+ else if (cc == kEndOfFileMarker)
+ return emitEndOfFile(source);
+ else {
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(ScriptDataState);
+ }
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(PLAINTEXTState)
- if (character == kEndOfFileMarker)
- RECONSUME_IN(DataState);
- bufferCharacter(character);
- ADVANCE_TO(PLAINTEXTState);
</del><ins>+ HTML_BEGIN_STATE(PLAINTEXTState) {
+ if (cc == kEndOfFileMarker)
+ return emitEndOfFile(source);
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(PLAINTEXTState);
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(TagOpenState)
- if (character == '!')
- ADVANCE_TO(MarkupDeclarationOpenState);
- if (character == '/')
- ADVANCE_TO(EndTagOpenState);
- if (isASCIIAlpha(character)) {
- m_token.beginStartTag(convertASCIIAlphaToLower(character));
- ADVANCE_TO(TagNameState);
- }
- if (character == '?') {
</del><ins>+ HTML_BEGIN_STATE(TagOpenState) {
+ if (cc == '!')
+ HTML_ADVANCE_TO(MarkupDeclarationOpenState);
+ else if (cc == '/')
+ HTML_ADVANCE_TO(EndTagOpenState);
+ else if (isASCIIUpper(cc)) {
+ m_token->beginStartTag(toLowerCase(cc));
+ HTML_ADVANCE_TO(TagNameState);
+ } else if (isASCIILower(cc)) {
+ m_token->beginStartTag(cc);
+ HTML_ADVANCE_TO(TagNameState);
+ } else if (cc == '?') {
</ins><span class="cx"> parseError();
</span><span class="cx"> // The spec consumes the current character before switching
</span><span class="cx"> // to the bogus comment state, but it's easier to implement
</span><span class="cx"> // if we reconsume the current character.
</span><del>- RECONSUME_IN(BogusCommentState);
</del><ins>+ HTML_RECONSUME_IN(BogusCommentState);
+ } else {
+ parseError();
+ bufferASCIICharacter('<');
+ HTML_RECONSUME_IN(DataState);
</ins><span class="cx"> }
</span><del>- parseError();
- bufferASCIICharacter('<');
- RECONSUME_IN(DataState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(EndTagOpenState)
- if (isASCIIAlpha(character)) {
- m_token.beginEndTag(convertASCIIAlphaToLower(character));
</del><ins>+ HTML_BEGIN_STATE(EndTagOpenState) {
+ if (isASCIIUpper(cc)) {
+ m_token->beginEndTag(static_cast<LChar>(toLowerCase(cc)));
</ins><span class="cx"> m_appropriateEndTagName.clear();
</span><del>- ADVANCE_TO(TagNameState);
- }
- if (character == '>') {
</del><ins>+ HTML_ADVANCE_TO(TagNameState);
+ } else if (isASCIILower(cc)) {
+ m_token->beginEndTag(static_cast<LChar>(cc));
+ m_appropriateEndTagName.clear();
+ HTML_ADVANCE_TO(TagNameState);
+ } else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- ADVANCE_TO(DataState);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_ADVANCE_TO(DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><span class="cx"> bufferASCIICharacter('<');
</span><span class="cx"> bufferASCIICharacter('/');
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ parseError();
+ HTML_RECONSUME_IN(BogusCommentState);
</ins><span class="cx"> }
</span><del>- parseError();
- RECONSUME_IN(BogusCommentState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(TagNameState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BeforeAttributeNameState);
- if (character == '/')
- ADVANCE_TO(SelfClosingStartTagState);
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (m_options.usePreHTML5ParserQuirks && character == '<')
- return emitAndReconsumeInDataState();
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(TagNameState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BeforeAttributeNameState);
+ else if (cc == '/')
+ HTML_ADVANCE_TO(SelfClosingStartTagState);
+ else if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (m_options.usePreHTML5ParserQuirks && cc == '<')
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ else if (isASCIIUpper(cc)) {
+ m_token->appendToName(toLowerCase(cc));
+ HTML_ADVANCE_TO(TagNameState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ m_token->appendToName(cc);
+ HTML_ADVANCE_TO(TagNameState);
</ins><span class="cx"> }
</span><del>- m_token.appendToName(toASCIILower(character));
- ADVANCE_TO(TagNameState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(RCDATALessThanSignState)
- if (character == '/') {
</del><ins>+ HTML_BEGIN_STATE(RCDATALessThanSignState) {
+ if (cc == '/') {
</ins><span class="cx"> m_temporaryBuffer.clear();
</span><span class="cx"> ASSERT(m_bufferedEndTagName.isEmpty());
</span><del>- ADVANCE_TO(RCDATAEndTagOpenState);
</del><ins>+ HTML_ADVANCE_TO(RCDATAEndTagOpenState);
+ } else {
+ bufferASCIICharacter('<');
+ HTML_RECONSUME_IN(RCDATAState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- RECONSUME_IN(RCDATAState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(RCDATAEndTagOpenState)
- if (isASCIIAlpha(character)) {
- appendToTemporaryBuffer(character);
- appendToPossibleEndTag(convertASCIIAlphaToLower(character));
- ADVANCE_TO(RCDATAEndTagNameState);
</del><ins>+ HTML_BEGIN_STATE(RCDATAEndTagOpenState) {
+ if (isASCIIUpper(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
+ HTML_ADVANCE_TO(RCDATAEndTagNameState);
+ } else if (isASCIILower(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(cc));
+ HTML_ADVANCE_TO(RCDATAEndTagNameState);
+ } else {
+ bufferASCIICharacter('<');
+ bufferASCIICharacter('/');
+ HTML_RECONSUME_IN(RCDATAState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- bufferASCIICharacter('/');
- RECONSUME_IN(RCDATAState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(RCDATAEndTagNameState)
- if (isASCIIAlpha(character)) {
- appendToTemporaryBuffer(character);
- appendToPossibleEndTag(convertASCIIAlphaToLower(character));
- ADVANCE_TO(RCDATAEndTagNameState);
- }
- if (isTokenizerWhitespace(character)) {
- if (isAppropriateEndTag()) {
- if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
- return true;
- SWITCH_TO(BeforeAttributeNameState);
</del><ins>+ HTML_BEGIN_STATE(RCDATAEndTagNameState) {
+ if (isASCIIUpper(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
+ HTML_ADVANCE_TO(RCDATAEndTagNameState);
+ } else if (isASCIILower(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(cc));
+ HTML_ADVANCE_TO(RCDATAEndTagNameState);
+ } else {
+ if (isTokenizerWhitespace(cc)) {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
+ }
+ } else if (cc == '/') {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
+ }
+ } else if (cc == '>') {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
+ }
</ins><span class="cx"> }
</span><del>- } else if (character == '/') {
- if (isAppropriateEndTag()) {
- if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
- return true;
- SWITCH_TO(SelfClosingStartTagState);
- }
- } else if (character == '>') {
- if (isAppropriateEndTag())
- return commitToCompleteEndTag(source);
</del><ins>+ bufferASCIICharacter('<');
+ bufferASCIICharacter('/');
+ m_token->appendToCharacter(m_temporaryBuffer);
+ m_bufferedEndTagName.clear();
+ m_temporaryBuffer.clear();
+ HTML_RECONSUME_IN(RCDATAState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- bufferASCIICharacter('/');
- m_token.appendToCharacter(m_temporaryBuffer);
- m_bufferedEndTagName.clear();
- m_temporaryBuffer.clear();
- RECONSUME_IN(RCDATAState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(RAWTEXTLessThanSignState)
- if (character == '/') {
</del><ins>+ HTML_BEGIN_STATE(RAWTEXTLessThanSignState) {
+ if (cc == '/') {
</ins><span class="cx"> m_temporaryBuffer.clear();
</span><span class="cx"> ASSERT(m_bufferedEndTagName.isEmpty());
</span><del>- ADVANCE_TO(RAWTEXTEndTagOpenState);
</del><ins>+ HTML_ADVANCE_TO(RAWTEXTEndTagOpenState);
+ } else {
+ bufferASCIICharacter('<');
+ HTML_RECONSUME_IN(RAWTEXTState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- RECONSUME_IN(RAWTEXTState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(RAWTEXTEndTagOpenState)
- if (isASCIIAlpha(character)) {
- appendToTemporaryBuffer(character);
- appendToPossibleEndTag(convertASCIIAlphaToLower(character));
- ADVANCE_TO(RAWTEXTEndTagNameState);
</del><ins>+ HTML_BEGIN_STATE(RAWTEXTEndTagOpenState) {
+ if (isASCIIUpper(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
+ HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
+ } else if (isASCIILower(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(cc));
+ HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
+ } else {
+ bufferASCIICharacter('<');
+ bufferASCIICharacter('/');
+ HTML_RECONSUME_IN(RAWTEXTState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- bufferASCIICharacter('/');
- RECONSUME_IN(RAWTEXTState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(RAWTEXTEndTagNameState)
- if (isASCIIAlpha(character)) {
- appendToTemporaryBuffer(character);
- appendToPossibleEndTag(convertASCIIAlphaToLower(character));
- ADVANCE_TO(RAWTEXTEndTagNameState);
- }
- if (isTokenizerWhitespace(character)) {
- if (isAppropriateEndTag()) {
- if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
- return true;
- SWITCH_TO(BeforeAttributeNameState);
</del><ins>+ HTML_BEGIN_STATE(RAWTEXTEndTagNameState) {
+ if (isASCIIUpper(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
+ HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
+ } else if (isASCIILower(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(cc));
+ HTML_ADVANCE_TO(RAWTEXTEndTagNameState);
+ } else {
+ if (isTokenizerWhitespace(cc)) {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
+ }
+ } else if (cc == '/') {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
+ }
+ } else if (cc == '>') {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
+ }
</ins><span class="cx"> }
</span><del>- } else if (character == '/') {
- if (isAppropriateEndTag()) {
- if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
- return true;
- SWITCH_TO(SelfClosingStartTagState);
- }
- } else if (character == '>') {
- if (isAppropriateEndTag())
- return commitToCompleteEndTag(source);
</del><ins>+ bufferASCIICharacter('<');
+ bufferASCIICharacter('/');
+ m_token->appendToCharacter(m_temporaryBuffer);
+ m_bufferedEndTagName.clear();
+ m_temporaryBuffer.clear();
+ HTML_RECONSUME_IN(RAWTEXTState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- bufferASCIICharacter('/');
- m_token.appendToCharacter(m_temporaryBuffer);
- m_bufferedEndTagName.clear();
- m_temporaryBuffer.clear();
- RECONSUME_IN(RAWTEXTState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataLessThanSignState)
- if (character == '/') {
</del><ins>+ HTML_BEGIN_STATE(ScriptDataLessThanSignState) {
+ if (cc == '/') {
</ins><span class="cx"> m_temporaryBuffer.clear();
</span><span class="cx"> ASSERT(m_bufferedEndTagName.isEmpty());
</span><del>- ADVANCE_TO(ScriptDataEndTagOpenState);
- }
- if (character == '!') {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataEndTagOpenState);
+ } else if (cc == '!') {
</ins><span class="cx"> bufferASCIICharacter('<');
</span><span class="cx"> bufferASCIICharacter('!');
</span><del>- ADVANCE_TO(ScriptDataEscapeStartState);
</del><ins>+ HTML_ADVANCE_TO(ScriptDataEscapeStartState);
+ } else {
+ bufferASCIICharacter('<');
+ HTML_RECONSUME_IN(ScriptDataState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- RECONSUME_IN(ScriptDataState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataEndTagOpenState)
- if (isASCIIAlpha(character)) {
- appendToTemporaryBuffer(character);
- appendToPossibleEndTag(convertASCIIAlphaToLower(character));
- ADVANCE_TO(ScriptDataEndTagNameState);
</del><ins>+ HTML_BEGIN_STATE(ScriptDataEndTagOpenState) {
+ if (isASCIIUpper(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
+ HTML_ADVANCE_TO(ScriptDataEndTagNameState);
+ } else if (isASCIILower(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(cc));
+ HTML_ADVANCE_TO(ScriptDataEndTagNameState);
+ } else {
+ bufferASCIICharacter('<');
+ bufferASCIICharacter('/');
+ HTML_RECONSUME_IN(ScriptDataState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- bufferASCIICharacter('/');
- RECONSUME_IN(ScriptDataState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataEndTagNameState)
- if (isASCIIAlpha(character)) {
- appendToTemporaryBuffer(character);
- appendToPossibleEndTag(convertASCIIAlphaToLower(character));
- ADVANCE_TO(ScriptDataEndTagNameState);
- }
- if (isTokenizerWhitespace(character)) {
- if (isAppropriateEndTag()) {
- if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
- return true;
- SWITCH_TO(BeforeAttributeNameState);
</del><ins>+ HTML_BEGIN_STATE(ScriptDataEndTagNameState) {
+ if (isASCIIUpper(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
+ HTML_ADVANCE_TO(ScriptDataEndTagNameState);
+ } else if (isASCIILower(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(cc));
+ HTML_ADVANCE_TO(ScriptDataEndTagNameState);
+ } else {
+ if (isTokenizerWhitespace(cc)) {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
+ }
+ } else if (cc == '/') {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
+ }
+ } else if (cc == '>') {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
+ }
</ins><span class="cx"> }
</span><del>- } else if (character == '/') {
- if (isAppropriateEndTag()) {
- if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
- return true;
- SWITCH_TO(SelfClosingStartTagState);
- }
- } else if (character == '>') {
- if (isAppropriateEndTag())
- return commitToCompleteEndTag(source);
</del><ins>+ bufferASCIICharacter('<');
+ bufferASCIICharacter('/');
+ m_token->appendToCharacter(m_temporaryBuffer);
+ m_bufferedEndTagName.clear();
+ m_temporaryBuffer.clear();
+ HTML_RECONSUME_IN(ScriptDataState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- bufferASCIICharacter('/');
- m_token.appendToCharacter(m_temporaryBuffer);
- m_bufferedEndTagName.clear();
- m_temporaryBuffer.clear();
- RECONSUME_IN(ScriptDataState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataEscapeStartState)
- if (character == '-') {
</del><ins>+ HTML_BEGIN_STATE(ScriptDataEscapeStartState) {
+ if (cc == '-') {
</ins><span class="cx"> bufferASCIICharacter('-');
</span><del>- ADVANCE_TO(ScriptDataEscapeStartDashState);
</del><ins>+ HTML_ADVANCE_TO(ScriptDataEscapeStartDashState);
</ins><span class="cx"> } else
</span><del>- RECONSUME_IN(ScriptDataState);
</del><ins>+ HTML_RECONSUME_IN(ScriptDataState);
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataEscapeStartDashState)
- if (character == '-') {
</del><ins>+ HTML_BEGIN_STATE(ScriptDataEscapeStartDashState) {
+ if (cc == '-') {
</ins><span class="cx"> bufferASCIICharacter('-');
</span><del>- ADVANCE_TO(ScriptDataEscapedDashDashState);
</del><ins>+ HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
</ins><span class="cx"> } else
</span><del>- RECONSUME_IN(ScriptDataState);
</del><ins>+ HTML_RECONSUME_IN(ScriptDataState);
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataEscapedState)
- if (character == '-') {
</del><ins>+ HTML_BEGIN_STATE(ScriptDataEscapedState) {
+ if (cc == '-') {
</ins><span class="cx"> bufferASCIICharacter('-');
</span><del>- ADVANCE_TO(ScriptDataEscapedDashState);
- }
- if (character == '<')
- ADVANCE_TO(ScriptDataEscapedLessThanSignState);
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataEscapedDashState);
+ } else if (cc == '<')
+ HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+ else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(ScriptDataEscapedState);
</ins><span class="cx"> }
</span><del>- bufferCharacter(character);
- ADVANCE_TO(ScriptDataEscapedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataEscapedDashState)
- if (character == '-') {
</del><ins>+ HTML_BEGIN_STATE(ScriptDataEscapedDashState) {
+ if (cc == '-') {
</ins><span class="cx"> bufferASCIICharacter('-');
</span><del>- ADVANCE_TO(ScriptDataEscapedDashDashState);
- }
- if (character == '<')
- ADVANCE_TO(ScriptDataEscapedLessThanSignState);
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
+ } else if (cc == '<')
+ HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+ else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(ScriptDataEscapedState);
</ins><span class="cx"> }
</span><del>- bufferCharacter(character);
- ADVANCE_TO(ScriptDataEscapedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataEscapedDashDashState)
- if (character == '-') {
</del><ins>+ HTML_BEGIN_STATE(ScriptDataEscapedDashDashState) {
+ if (cc == '-') {
</ins><span class="cx"> bufferASCIICharacter('-');
</span><del>- ADVANCE_TO(ScriptDataEscapedDashDashState);
- }
- if (character == '<')
- ADVANCE_TO(ScriptDataEscapedLessThanSignState);
- if (character == '>') {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataEscapedDashDashState);
+ } else if (cc == '<')
+ HTML_ADVANCE_TO(ScriptDataEscapedLessThanSignState);
+ else if (cc == '>') {
</ins><span class="cx"> bufferASCIICharacter('>');
</span><del>- ADVANCE_TO(ScriptDataState);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(ScriptDataEscapedState);
</ins><span class="cx"> }
</span><del>- bufferCharacter(character);
- ADVANCE_TO(ScriptDataEscapedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataEscapedLessThanSignState)
- if (character == '/') {
</del><ins>+ HTML_BEGIN_STATE(ScriptDataEscapedLessThanSignState) {
+ if (cc == '/') {
</ins><span class="cx"> m_temporaryBuffer.clear();
</span><span class="cx"> ASSERT(m_bufferedEndTagName.isEmpty());
</span><del>- ADVANCE_TO(ScriptDataEscapedEndTagOpenState);
- }
- if (isASCIIAlpha(character)) {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataEscapedEndTagOpenState);
+ } else if (isASCIIUpper(cc)) {
</ins><span class="cx"> bufferASCIICharacter('<');
</span><del>- bufferASCIICharacter(character);
</del><ins>+ bufferASCIICharacter(cc);
</ins><span class="cx"> m_temporaryBuffer.clear();
</span><del>- appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
- ADVANCE_TO(ScriptDataDoubleEscapeStartState);
</del><ins>+ m_temporaryBuffer.append(toLowerCase(cc));
+ HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
+ } else if (isASCIILower(cc)) {
+ bufferASCIICharacter('<');
+ bufferASCIICharacter(cc);
+ m_temporaryBuffer.clear();
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
+ } else {
+ bufferASCIICharacter('<');
+ HTML_RECONSUME_IN(ScriptDataEscapedState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- RECONSUME_IN(ScriptDataEscapedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataEscapedEndTagOpenState)
- if (isASCIIAlpha(character)) {
- appendToTemporaryBuffer(character);
- appendToPossibleEndTag(convertASCIIAlphaToLower(character));
- ADVANCE_TO(ScriptDataEscapedEndTagNameState);
</del><ins>+ HTML_BEGIN_STATE(ScriptDataEscapedEndTagOpenState) {
+ if (isASCIIUpper(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
+ HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
+ } else if (isASCIILower(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(cc));
+ HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
+ } else {
+ bufferASCIICharacter('<');
+ bufferASCIICharacter('/');
+ HTML_RECONSUME_IN(ScriptDataEscapedState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- bufferASCIICharacter('/');
- RECONSUME_IN(ScriptDataEscapedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataEscapedEndTagNameState)
- if (isASCIIAlpha(character)) {
- appendToTemporaryBuffer(character);
- appendToPossibleEndTag(convertASCIIAlphaToLower(character));
- ADVANCE_TO(ScriptDataEscapedEndTagNameState);
- }
- if (isTokenizerWhitespace(character)) {
- if (isAppropriateEndTag()) {
- if (commitToPartialEndTag(source, character, BeforeAttributeNameState))
- return true;
- SWITCH_TO(BeforeAttributeNameState);
</del><ins>+ HTML_BEGIN_STATE(ScriptDataEscapedEndTagNameState) {
+ if (isASCIIUpper(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(toLowerCase(cc)));
+ HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
+ } else if (isASCIILower(cc)) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ addToPossibleEndTag(static_cast<LChar>(cc));
+ HTML_ADVANCE_TO(ScriptDataEscapedEndTagNameState);
+ } else {
+ if (isTokenizerWhitespace(cc)) {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ FLUSH_AND_ADVANCE_TO(BeforeAttributeNameState);
+ }
+ } else if (cc == '/') {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ FLUSH_AND_ADVANCE_TO(SelfClosingStartTagState);
+ }
+ } else if (cc == '>') {
+ if (isAppropriateEndTag()) {
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ return flushEmitAndResumeIn(source, HTMLTokenizer::DataState);
+ }
</ins><span class="cx"> }
</span><del>- } else if (character == '/') {
- if (isAppropriateEndTag()) {
- if (commitToPartialEndTag(source, '/', SelfClosingStartTagState))
- return true;
- SWITCH_TO(SelfClosingStartTagState);
- }
- } else if (character == '>') {
- if (isAppropriateEndTag())
- return commitToCompleteEndTag(source);
</del><ins>+ bufferASCIICharacter('<');
+ bufferASCIICharacter('/');
+ m_token->appendToCharacter(m_temporaryBuffer);
+ m_bufferedEndTagName.clear();
+ m_temporaryBuffer.clear();
+ HTML_RECONSUME_IN(ScriptDataEscapedState);
</ins><span class="cx"> }
</span><del>- bufferASCIICharacter('<');
- bufferASCIICharacter('/');
- m_token.appendToCharacter(m_temporaryBuffer);
- m_bufferedEndTagName.clear();
- m_temporaryBuffer.clear();
- RECONSUME_IN(ScriptDataEscapedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataDoubleEscapeStartState)
- if (isTokenizerWhitespace(character) || character == '/' || character == '>') {
- bufferASCIICharacter(character);
- if (temporaryBufferIs("script"))
- ADVANCE_TO(ScriptDataDoubleEscapedState);
</del><ins>+ HTML_BEGIN_STATE(ScriptDataDoubleEscapeStartState) {
+ if (isTokenizerWhitespace(cc) || cc == '/' || cc == '>') {
+ bufferASCIICharacter(cc);
+ if (temporaryBufferIs(scriptTag.localName()))
+ HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
</ins><span class="cx"> else
</span><del>- ADVANCE_TO(ScriptDataEscapedState);
- }
- if (isASCIIAlpha(character)) {
- bufferASCIICharacter(character);
- appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
- ADVANCE_TO(ScriptDataDoubleEscapeStartState);
- }
- RECONSUME_IN(ScriptDataEscapedState);
</del><ins>+ HTML_ADVANCE_TO(ScriptDataEscapedState);
+ } else if (isASCIIUpper(cc)) {
+ bufferASCIICharacter(cc);
+ m_temporaryBuffer.append(toLowerCase(cc));
+ HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
+ } else if (isASCIILower(cc)) {
+ bufferASCIICharacter(cc);
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ HTML_ADVANCE_TO(ScriptDataDoubleEscapeStartState);
+ } else
+ HTML_RECONSUME_IN(ScriptDataEscapedState);
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataDoubleEscapedState)
- if (character == '-') {
</del><ins>+ HTML_BEGIN_STATE(ScriptDataDoubleEscapedState) {
+ if (cc == '-') {
</ins><span class="cx"> bufferASCIICharacter('-');
</span><del>- ADVANCE_TO(ScriptDataDoubleEscapedDashState);
- }
- if (character == '<') {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashState);
+ } else if (cc == '<') {
</ins><span class="cx"> bufferASCIICharacter('<');
</span><del>- ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
</ins><span class="cx"> }
</span><del>- bufferCharacter(character);
- ADVANCE_TO(ScriptDataDoubleEscapedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataDoubleEscapedDashState)
- if (character == '-') {
</del><ins>+ HTML_BEGIN_STATE(ScriptDataDoubleEscapedDashState) {
+ if (cc == '-') {
</ins><span class="cx"> bufferASCIICharacter('-');
</span><del>- ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
- }
- if (character == '<') {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
+ } else if (cc == '<') {
</ins><span class="cx"> bufferASCIICharacter('<');
</span><del>- ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
</ins><span class="cx"> }
</span><del>- bufferCharacter(character);
- ADVANCE_TO(ScriptDataDoubleEscapedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataDoubleEscapedDashDashState)
- if (character == '-') {
</del><ins>+ HTML_BEGIN_STATE(ScriptDataDoubleEscapedDashDashState) {
+ if (cc == '-') {
</ins><span class="cx"> bufferASCIICharacter('-');
</span><del>- ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
- }
- if (character == '<') {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataDoubleEscapedDashDashState);
+ } else if (cc == '<') {
</ins><span class="cx"> bufferASCIICharacter('<');
</span><del>- ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
- }
- if (character == '>') {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataDoubleEscapedLessThanSignState);
+ } else if (cc == '>') {
</ins><span class="cx"> bufferASCIICharacter('>');
</span><del>- ADVANCE_TO(ScriptDataState);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_ADVANCE_TO(ScriptDataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
</ins><span class="cx"> }
</span><del>- bufferCharacter(character);
- ADVANCE_TO(ScriptDataDoubleEscapedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataDoubleEscapedLessThanSignState)
- if (character == '/') {
</del><ins>+ HTML_BEGIN_STATE(ScriptDataDoubleEscapedLessThanSignState) {
+ if (cc == '/') {
</ins><span class="cx"> bufferASCIICharacter('/');
</span><span class="cx"> m_temporaryBuffer.clear();
</span><del>- ADVANCE_TO(ScriptDataDoubleEscapeEndState);
- }
- RECONSUME_IN(ScriptDataDoubleEscapedState);
</del><ins>+ HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
+ } else
+ HTML_RECONSUME_IN(ScriptDataDoubleEscapedState);
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ScriptDataDoubleEscapeEndState)
- if (isTokenizerWhitespace(character) || character == '/' || character == '>') {
- bufferASCIICharacter(character);
- if (temporaryBufferIs("script"))
- ADVANCE_TO(ScriptDataEscapedState);
</del><ins>+ HTML_BEGIN_STATE(ScriptDataDoubleEscapeEndState) {
+ if (isTokenizerWhitespace(cc) || cc == '/' || cc == '>') {
+ bufferASCIICharacter(cc);
+ if (temporaryBufferIs(scriptTag.localName()))
+ HTML_ADVANCE_TO(ScriptDataEscapedState);
</ins><span class="cx"> else
</span><del>- ADVANCE_TO(ScriptDataDoubleEscapedState);
- }
- if (isASCIIAlpha(character)) {
- bufferASCIICharacter(character);
- appendToTemporaryBuffer(convertASCIIAlphaToLower(character));
- ADVANCE_TO(ScriptDataDoubleEscapeEndState);
- }
- RECONSUME_IN(ScriptDataDoubleEscapedState);
</del><ins>+ HTML_ADVANCE_TO(ScriptDataDoubleEscapedState);
+ } else if (isASCIIUpper(cc)) {
+ bufferASCIICharacter(cc);
+ m_temporaryBuffer.append(toLowerCase(cc));
+ HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
+ } else if (isASCIILower(cc)) {
+ bufferASCIICharacter(cc);
+ m_temporaryBuffer.append(static_cast<LChar>(cc));
+ HTML_ADVANCE_TO(ScriptDataDoubleEscapeEndState);
+ } else
+ HTML_RECONSUME_IN(ScriptDataDoubleEscapedState);
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(BeforeAttributeNameState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BeforeAttributeNameState);
- if (character == '/')
- ADVANCE_TO(SelfClosingStartTagState);
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (m_options.usePreHTML5ParserQuirks && character == '<')
- return emitAndReconsumeInDataState();
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(BeforeAttributeNameState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BeforeAttributeNameState);
+ else if (cc == '/')
+ HTML_ADVANCE_TO(SelfClosingStartTagState);
+ else if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (m_options.usePreHTML5ParserQuirks && cc == '<')
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ else if (isASCIIUpper(cc)) {
+ m_token->addNewAttribute();
+ m_token->beginAttributeName(source.numberOfCharactersConsumed());
+ m_token->appendToAttributeName(toLowerCase(cc));
+ HTML_ADVANCE_TO(AttributeNameState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ if (cc == '"' || cc == '\'' || cc == '<' || cc == '=')
+ parseError();
+ m_token->addNewAttribute();
+ m_token->beginAttributeName(source.numberOfCharactersConsumed());
+ m_token->appendToAttributeName(cc);
+ HTML_ADVANCE_TO(AttributeNameState);
</ins><span class="cx"> }
</span><del>- if (character == '"' || character == '\'' || character == '<' || character == '=')
- parseError();
- m_token.beginAttribute(source.numberOfCharactersConsumed());
- m_token.appendToAttributeName(toASCIILower(character));
- ADVANCE_TO(AttributeNameState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(AttributeNameState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(AfterAttributeNameState);
- if (character == '/')
- ADVANCE_TO(SelfClosingStartTagState);
- if (character == '=')
- ADVANCE_TO(BeforeAttributeValueState);
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (m_options.usePreHTML5ParserQuirks && character == '<')
- return emitAndReconsumeInDataState();
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(AttributeNameState) {
+ if (isTokenizerWhitespace(cc)) {
+ m_token->endAttributeName(source.numberOfCharactersConsumed());
+ HTML_ADVANCE_TO(AfterAttributeNameState);
+ } else if (cc == '/') {
+ m_token->endAttributeName(source.numberOfCharactersConsumed());
+ HTML_ADVANCE_TO(SelfClosingStartTagState);
+ } else if (cc == '=') {
+ m_token->endAttributeName(source.numberOfCharactersConsumed());
+ HTML_ADVANCE_TO(BeforeAttributeValueState);
+ } else if (cc == '>') {
+ m_token->endAttributeName(source.numberOfCharactersConsumed());
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (m_options.usePreHTML5ParserQuirks && cc == '<') {
+ m_token->endAttributeName(source.numberOfCharactersConsumed());
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else if (isASCIIUpper(cc)) {
+ m_token->appendToAttributeName(toLowerCase(cc));
+ HTML_ADVANCE_TO(AttributeNameState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ m_token->endAttributeName(source.numberOfCharactersConsumed());
+ HTML_RECONSUME_IN(DataState);
+ } else {
+ if (cc == '"' || cc == '\'' || cc == '<' || cc == '=')
+ parseError();
+ m_token->appendToAttributeName(cc);
+ HTML_ADVANCE_TO(AttributeNameState);
</ins><span class="cx"> }
</span><del>- if (character == '"' || character == '\'' || character == '<' || character == '=')
- parseError();
- m_token.appendToAttributeName(toASCIILower(character));
- ADVANCE_TO(AttributeNameState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(AfterAttributeNameState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(AfterAttributeNameState);
- if (character == '/')
- ADVANCE_TO(SelfClosingStartTagState);
- if (character == '=')
- ADVANCE_TO(BeforeAttributeValueState);
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (m_options.usePreHTML5ParserQuirks && character == '<')
- return emitAndReconsumeInDataState();
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(AfterAttributeNameState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(AfterAttributeNameState);
+ else if (cc == '/')
+ HTML_ADVANCE_TO(SelfClosingStartTagState);
+ else if (cc == '=')
+ HTML_ADVANCE_TO(BeforeAttributeValueState);
+ else if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (m_options.usePreHTML5ParserQuirks && cc == '<')
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ else if (isASCIIUpper(cc)) {
+ m_token->addNewAttribute();
+ m_token->beginAttributeName(source.numberOfCharactersConsumed());
+ m_token->appendToAttributeName(toLowerCase(cc));
+ HTML_ADVANCE_TO(AttributeNameState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ if (cc == '"' || cc == '\'' || cc == '<')
+ parseError();
+ m_token->addNewAttribute();
+ m_token->beginAttributeName(source.numberOfCharactersConsumed());
+ m_token->appendToAttributeName(cc);
+ HTML_ADVANCE_TO(AttributeNameState);
</ins><span class="cx"> }
</span><del>- if (character == '"' || character == '\'' || character == '<')
- parseError();
- m_token.beginAttribute(source.numberOfCharactersConsumed());
- m_token.appendToAttributeName(toASCIILower(character));
- ADVANCE_TO(AttributeNameState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(BeforeAttributeValueState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BeforeAttributeValueState);
- if (character == '"')
- ADVANCE_TO(AttributeValueDoubleQuotedState);
- if (character == '&')
- RECONSUME_IN(AttributeValueUnquotedState);
- if (character == '\'')
- ADVANCE_TO(AttributeValueSingleQuotedState);
- if (character == '>') {
</del><ins>+ HTML_BEGIN_STATE(BeforeAttributeValueState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BeforeAttributeValueState);
+ else if (cc == '"') {
+ m_token->beginAttributeValue(source.numberOfCharactersConsumed() + 1);
+ HTML_ADVANCE_TO(AttributeValueDoubleQuotedState);
+ } else if (cc == '&') {
+ m_token->beginAttributeValue(source.numberOfCharactersConsumed());
+ HTML_RECONSUME_IN(AttributeValueUnquotedState);
+ } else if (cc == '\'') {
+ m_token->beginAttributeValue(source.numberOfCharactersConsumed() + 1);
+ HTML_ADVANCE_TO(AttributeValueSingleQuotedState);
+ } else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ if (cc == '<' || cc == '=' || cc == '`')
+ parseError();
+ m_token->beginAttributeValue(source.numberOfCharactersConsumed());
+ m_token->appendToAttributeValue(cc);
+ HTML_ADVANCE_TO(AttributeValueUnquotedState);
</ins><span class="cx"> }
</span><del>- if (character == '<' || character == '=' || character == '`')
- parseError();
- m_token.appendToAttributeValue(character);
- ADVANCE_TO(AttributeValueUnquotedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(AttributeValueDoubleQuotedState)
- if (character == '"') {
- m_token.endAttribute(source.numberOfCharactersConsumed());
- ADVANCE_TO(AfterAttributeValueQuotedState);
- }
- if (character == '&') {
</del><ins>+ HTML_BEGIN_STATE(AttributeValueDoubleQuotedState) {
+ if (cc == '"') {
+ m_token->endAttributeValue(source.numberOfCharactersConsumed());
+ HTML_ADVANCE_TO(AfterAttributeValueQuotedState);
+ } else if (cc == '&') {
</ins><span class="cx"> m_additionalAllowedCharacter = '"';
</span><del>- ADVANCE_TO(CharacterReferenceInAttributeValueState);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.endAttribute(source.numberOfCharactersConsumed());
- RECONSUME_IN(DataState);
</del><ins>+ m_token->endAttributeValue(source.numberOfCharactersConsumed());
+ HTML_RECONSUME_IN(DataState);
+ } else {
+ m_token->appendToAttributeValue(cc);
+ HTML_ADVANCE_TO(AttributeValueDoubleQuotedState);
</ins><span class="cx"> }
</span><del>- m_token.appendToAttributeValue(character);
- ADVANCE_TO(AttributeValueDoubleQuotedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(AttributeValueSingleQuotedState)
- if (character == '\'') {
- m_token.endAttribute(source.numberOfCharactersConsumed());
- ADVANCE_TO(AfterAttributeValueQuotedState);
- }
- if (character == '&') {
</del><ins>+ HTML_BEGIN_STATE(AttributeValueSingleQuotedState) {
+ if (cc == '\'') {
+ m_token->endAttributeValue(source.numberOfCharactersConsumed());
+ HTML_ADVANCE_TO(AfterAttributeValueQuotedState);
+ } else if (cc == '&') {
</ins><span class="cx"> m_additionalAllowedCharacter = '\'';
</span><del>- ADVANCE_TO(CharacterReferenceInAttributeValueState);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.endAttribute(source.numberOfCharactersConsumed());
- RECONSUME_IN(DataState);
</del><ins>+ m_token->endAttributeValue(source.numberOfCharactersConsumed());
+ HTML_RECONSUME_IN(DataState);
+ } else {
+ m_token->appendToAttributeValue(cc);
+ HTML_ADVANCE_TO(AttributeValueSingleQuotedState);
</ins><span class="cx"> }
</span><del>- m_token.appendToAttributeValue(character);
- ADVANCE_TO(AttributeValueSingleQuotedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(AttributeValueUnquotedState)
- if (isTokenizerWhitespace(character)) {
- m_token.endAttribute(source.numberOfCharactersConsumed());
- ADVANCE_TO(BeforeAttributeNameState);
- }
- if (character == '&') {
</del><ins>+ HTML_BEGIN_STATE(AttributeValueUnquotedState) {
+ if (isTokenizerWhitespace(cc)) {
+ m_token->endAttributeValue(source.numberOfCharactersConsumed());
+ HTML_ADVANCE_TO(BeforeAttributeNameState);
+ } else if (cc == '&') {
</ins><span class="cx"> m_additionalAllowedCharacter = '>';
</span><del>- ADVANCE_TO(CharacterReferenceInAttributeValueState);
- }
- if (character == '>') {
- m_token.endAttribute(source.numberOfCharactersConsumed());
- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_ADVANCE_TO(CharacterReferenceInAttributeValueState);
+ } else if (cc == '>') {
+ m_token->endAttributeValue(source.numberOfCharactersConsumed());
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.endAttribute(source.numberOfCharactersConsumed());
- RECONSUME_IN(DataState);
</del><ins>+ m_token->endAttributeValue(source.numberOfCharactersConsumed());
+ HTML_RECONSUME_IN(DataState);
+ } else {
+ if (cc == '"' || cc == '\'' || cc == '<' || cc == '=' || cc == '`')
+ parseError();
+ m_token->appendToAttributeValue(cc);
+ HTML_ADVANCE_TO(AttributeValueUnquotedState);
</ins><span class="cx"> }
</span><del>- if (character == '"' || character == '\'' || character == '<' || character == '=' || character == '`')
- parseError();
- m_token.appendToAttributeValue(character);
- ADVANCE_TO(AttributeValueUnquotedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(CharacterReferenceInAttributeValueState)
</del><ins>+ HTML_BEGIN_STATE(CharacterReferenceInAttributeValueState) {
</ins><span class="cx"> bool notEnoughCharacters = false;
</span><span class="cx"> StringBuilder decodedEntity;
</span><span class="cx"> bool success = consumeHTMLEntity(source, decodedEntity, notEnoughCharacters, m_additionalAllowedCharacter);
</span><span class="cx"> if (notEnoughCharacters)
</span><del>- RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
</del><ins>+ return haveBufferedCharacterToken();
</ins><span class="cx"> if (!success) {
</span><span class="cx"> ASSERT(decodedEntity.isEmpty());
</span><del>- m_token.appendToAttributeValue('&');
</del><ins>+ m_token->appendToAttributeValue('&');
</ins><span class="cx"> } else {
</span><span class="cx"> for (unsigned i = 0; i < decodedEntity.length(); ++i)
</span><del>- m_token.appendToAttributeValue(decodedEntity[i]);
</del><ins>+ m_token->appendToAttributeValue(decodedEntity[i]);
</ins><span class="cx"> }
</span><span class="cx"> // We're supposed to switch back to the attribute value state that
</span><span class="cx"> // we were in when we were switched into this state. Rather than
</span><span class="cx"> // keeping track of this explictly, we observe that the previous
</span><span class="cx"> // state can be determined by m_additionalAllowedCharacter.
</span><span class="cx"> if (m_additionalAllowedCharacter == '"')
</span><del>- SWITCH_TO(AttributeValueDoubleQuotedState);
- if (m_additionalAllowedCharacter == '\'')
- SWITCH_TO(AttributeValueSingleQuotedState);
- ASSERT(m_additionalAllowedCharacter == '>');
- SWITCH_TO(AttributeValueUnquotedState);
</del><ins>+ HTML_SWITCH_TO(AttributeValueDoubleQuotedState);
+ else if (m_additionalAllowedCharacter == '\'')
+ HTML_SWITCH_TO(AttributeValueSingleQuotedState);
+ else if (m_additionalAllowedCharacter == '>')
+ HTML_SWITCH_TO(AttributeValueUnquotedState);
+ else
+ ASSERT_NOT_REACHED();
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(AfterAttributeValueQuotedState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BeforeAttributeNameState);
- if (character == '/')
- ADVANCE_TO(SelfClosingStartTagState);
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (m_options.usePreHTML5ParserQuirks && character == '<')
- return emitAndReconsumeInDataState();
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(AfterAttributeValueQuotedState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BeforeAttributeNameState);
+ else if (cc == '/')
+ HTML_ADVANCE_TO(SelfClosingStartTagState);
+ else if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (m_options.usePreHTML5ParserQuirks && cc == '<')
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ parseError();
+ HTML_RECONSUME_IN(BeforeAttributeNameState);
</ins><span class="cx"> }
</span><del>- parseError();
- RECONSUME_IN(BeforeAttributeNameState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(SelfClosingStartTagState)
- if (character == '>') {
- m_token.setSelfClosing();
- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(SelfClosingStartTagState) {
+ if (cc == '>') {
+ m_token->setSelfClosing();
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- RECONSUME_IN(DataState);
</del><ins>+ HTML_RECONSUME_IN(DataState);
+ } else {
+ parseError();
+ HTML_RECONSUME_IN(BeforeAttributeNameState);
</ins><span class="cx"> }
</span><del>- parseError();
- RECONSUME_IN(BeforeAttributeNameState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(BogusCommentState)
- m_token.beginComment();
- RECONSUME_IN(ContinueBogusCommentState);
</del><ins>+ HTML_BEGIN_STATE(BogusCommentState) {
+ m_token->beginComment();
+ HTML_RECONSUME_IN(ContinueBogusCommentState);
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(ContinueBogusCommentState)
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (character == kEndOfFileMarker)
- return emitAndReconsumeInDataState();
- m_token.appendToComment(character);
- ADVANCE_TO(ContinueBogusCommentState);
</del><ins>+ HTML_BEGIN_STATE(ContinueBogusCommentState) {
+ if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (cc == kEndOfFileMarker)
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ else {
+ m_token->appendToComment(cc);
+ HTML_ADVANCE_TO(ContinueBogusCommentState);
+ }
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(MarkupDeclarationOpenState)
- if (character == '-') {
- auto result = source.advancePast("--");
</del><ins>+ HTML_BEGIN_STATE(MarkupDeclarationOpenState) {
+ DEPRECATED_DEFINE_STATIC_LOCAL(String, dashDashString, (ASCIILiteral("--")));
+ DEPRECATED_DEFINE_STATIC_LOCAL(String, doctypeString, (ASCIILiteral("doctype")));
+ DEPRECATED_DEFINE_STATIC_LOCAL(String, cdataString, (ASCIILiteral("[CDATA[")));
+ if (cc == '-') {
+ SegmentedString::LookAheadResult result = source.lookAhead(dashDashString);
</ins><span class="cx"> if (result == SegmentedString::DidMatch) {
</span><del>- m_token.beginComment();
- SWITCH_TO(CommentStartState);
- }
- if (result == SegmentedString::NotEnoughCharacters)
- RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
- } else if (isASCIIAlphaCaselessEqual(character, 'd')) {
- auto result = source.advancePastIgnoringCase("doctype");
- if (result == SegmentedString::DidMatch)
- SWITCH_TO(DOCTYPEState);
- if (result == SegmentedString::NotEnoughCharacters)
- RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
- } else if (character == '[' && shouldAllowCDATA()) {
- auto result = source.advancePast("[CDATA[");
- if (result == SegmentedString::DidMatch)
- SWITCH_TO(CDATASectionState);
- if (result == SegmentedString::NotEnoughCharacters)
- RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
</del><ins>+ source.advanceAndASSERT('-');
+ source.advanceAndASSERT('-');
+ m_token->beginComment();
+ HTML_SWITCH_TO(CommentStartState);
+ } else if (result == SegmentedString::NotEnoughCharacters)
+ return haveBufferedCharacterToken();
+ } else if (cc == 'D' || cc == 'd') {
+ SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(doctypeString);
+ if (result == SegmentedString::DidMatch) {
+ advanceStringAndASSERTIgnoringCase(source, "doctype");
+ HTML_SWITCH_TO(DOCTYPEState);
+ } else if (result == SegmentedString::NotEnoughCharacters)
+ return haveBufferedCharacterToken();
+ } else if (cc == '[' && shouldAllowCDATA()) {
+ SegmentedString::LookAheadResult result = source.lookAhead(cdataString);
+ if (result == SegmentedString::DidMatch) {
+ advanceStringAndASSERT(source, "[CDATA[");
+ HTML_SWITCH_TO(CDATASectionState);
+ } else if (result == SegmentedString::NotEnoughCharacters)
+ return haveBufferedCharacterToken();
</ins><span class="cx"> }
</span><span class="cx"> parseError();
</span><del>- RECONSUME_IN(BogusCommentState);
</del><ins>+ HTML_RECONSUME_IN(BogusCommentState);
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(CommentStartState)
- if (character == '-')
- ADVANCE_TO(CommentStartDashState);
- if (character == '>') {
</del><ins>+ HTML_BEGIN_STATE(CommentStartState) {
+ if (cc == '-')
+ HTML_ADVANCE_TO(CommentStartDashState);
+ else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- return emitAndReconsumeInDataState();
</del><ins>+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ m_token->appendToComment(cc);
+ HTML_ADVANCE_TO(CommentState);
</ins><span class="cx"> }
</span><del>- m_token.appendToComment(character);
- ADVANCE_TO(CommentState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(CommentStartDashState)
- if (character == '-')
- ADVANCE_TO(CommentEndState);
- if (character == '>') {
</del><ins>+ HTML_BEGIN_STATE(CommentStartDashState) {
+ if (cc == '-')
+ HTML_ADVANCE_TO(CommentEndState);
+ else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- return emitAndReconsumeInDataState();
</del><ins>+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ m_token->appendToComment('-');
+ m_token->appendToComment(cc);
+ HTML_ADVANCE_TO(CommentState);
</ins><span class="cx"> }
</span><del>- m_token.appendToComment('-');
- m_token.appendToComment(character);
- ADVANCE_TO(CommentState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(CommentState)
- if (character == '-')
- ADVANCE_TO(CommentEndDashState);
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(CommentState) {
+ if (cc == '-')
+ HTML_ADVANCE_TO(CommentEndDashState);
+ else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- return emitAndReconsumeInDataState();
</del><ins>+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ m_token->appendToComment(cc);
+ HTML_ADVANCE_TO(CommentState);
</ins><span class="cx"> }
</span><del>- m_token.appendToComment(character);
- ADVANCE_TO(CommentState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(CommentEndDashState)
- if (character == '-')
- ADVANCE_TO(CommentEndState);
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(CommentEndDashState) {
+ if (cc == '-')
+ HTML_ADVANCE_TO(CommentEndState);
+ else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- return emitAndReconsumeInDataState();
</del><ins>+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ m_token->appendToComment('-');
+ m_token->appendToComment(cc);
+ HTML_ADVANCE_TO(CommentState);
</ins><span class="cx"> }
</span><del>- m_token.appendToComment('-');
- m_token.appendToComment(character);
- ADVANCE_TO(CommentState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(CommentEndState)
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (character == '!') {
</del><ins>+ HTML_BEGIN_STATE(CommentEndState) {
+ if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (cc == '!') {
</ins><span class="cx"> parseError();
</span><del>- ADVANCE_TO(CommentEndBangState);
- }
- if (character == '-') {
</del><ins>+ HTML_ADVANCE_TO(CommentEndBangState);
+ } else if (cc == '-') {
</ins><span class="cx"> parseError();
</span><del>- m_token.appendToComment('-');
- ADVANCE_TO(CommentEndState);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ m_token->appendToComment('-');
+ HTML_ADVANCE_TO(CommentEndState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- return emitAndReconsumeInDataState();
</del><ins>+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ parseError();
+ m_token->appendToComment('-');
+ m_token->appendToComment('-');
+ m_token->appendToComment(cc);
+ HTML_ADVANCE_TO(CommentState);
</ins><span class="cx"> }
</span><del>- parseError();
- m_token.appendToComment('-');
- m_token.appendToComment('-');
- m_token.appendToComment(character);
- ADVANCE_TO(CommentState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(CommentEndBangState)
- if (character == '-') {
- m_token.appendToComment('-');
- m_token.appendToComment('-');
- m_token.appendToComment('!');
- ADVANCE_TO(CommentEndDashState);
- }
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(CommentEndBangState) {
+ if (cc == '-') {
+ m_token->appendToComment('-');
+ m_token->appendToComment('-');
+ m_token->appendToComment('!');
+ HTML_ADVANCE_TO(CommentEndDashState);
+ } else if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- return emitAndReconsumeInDataState();
</del><ins>+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ m_token->appendToComment('-');
+ m_token->appendToComment('-');
+ m_token->appendToComment('!');
+ m_token->appendToComment(cc);
+ HTML_ADVANCE_TO(CommentState);
</ins><span class="cx"> }
</span><del>- m_token.appendToComment('-');
- m_token.appendToComment('-');
- m_token.appendToComment('!');
- m_token.appendToComment(character);
- ADVANCE_TO(CommentState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(DOCTYPEState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BeforeDOCTYPENameState);
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(DOCTYPEState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BeforeDOCTYPENameState);
+ else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.beginDOCTYPE();
- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->beginDOCTYPE();
+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ parseError();
+ HTML_RECONSUME_IN(BeforeDOCTYPENameState);
</ins><span class="cx"> }
</span><del>- parseError();
- RECONSUME_IN(BeforeDOCTYPENameState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(BeforeDOCTYPENameState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BeforeDOCTYPENameState);
- if (character == '>') {
</del><ins>+ HTML_BEGIN_STATE(BeforeDOCTYPENameState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BeforeDOCTYPENameState);
+ else if (isASCIIUpper(cc)) {
+ m_token->beginDOCTYPE(toLowerCase(cc));
+ HTML_ADVANCE_TO(DOCTYPENameState);
+ } else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- m_token.beginDOCTYPE();
- m_token.setForceQuirks();
- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ m_token->beginDOCTYPE();
+ m_token->setForceQuirks();
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.beginDOCTYPE();
- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->beginDOCTYPE();
+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ m_token->beginDOCTYPE(cc);
+ HTML_ADVANCE_TO(DOCTYPENameState);
</ins><span class="cx"> }
</span><del>- m_token.beginDOCTYPE(toASCIILower(character));
- ADVANCE_TO(DOCTYPENameState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(DOCTYPENameState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(AfterDOCTYPENameState);
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(DOCTYPENameState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(AfterDOCTYPENameState);
+ else if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (isASCIIUpper(cc)) {
+ m_token->appendToName(toLowerCase(cc));
+ HTML_ADVANCE_TO(DOCTYPENameState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ m_token->appendToName(cc);
+ HTML_ADVANCE_TO(DOCTYPENameState);
</ins><span class="cx"> }
</span><del>- m_token.appendToName(toASCIILower(character));
- ADVANCE_TO(DOCTYPENameState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(AfterDOCTYPENameState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(AfterDOCTYPENameState);
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(AfterDOCTYPENameState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(AfterDOCTYPENameState);
+ if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ DEPRECATED_DEFINE_STATIC_LOCAL(String, publicString, (ASCIILiteral("public")));
+ DEPRECATED_DEFINE_STATIC_LOCAL(String, systemString, (ASCIILiteral("system")));
+ if (cc == 'P' || cc == 'p') {
+ SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(publicString);
+ if (result == SegmentedString::DidMatch) {
+ advanceStringAndASSERTIgnoringCase(source, "public");
+ HTML_SWITCH_TO(AfterDOCTYPEPublicKeywordState);
+ } else if (result == SegmentedString::NotEnoughCharacters)
+ return haveBufferedCharacterToken();
+ } else if (cc == 'S' || cc == 's') {
+ SegmentedString::LookAheadResult result = source.lookAheadIgnoringCase(systemString);
+ if (result == SegmentedString::DidMatch) {
+ advanceStringAndASSERTIgnoringCase(source, "system");
+ HTML_SWITCH_TO(AfterDOCTYPESystemKeywordState);
+ } else if (result == SegmentedString::NotEnoughCharacters)
+ return haveBufferedCharacterToken();
+ }
+ parseError();
+ m_token->setForceQuirks();
+ HTML_ADVANCE_TO(BogusDOCTYPEState);
</ins><span class="cx"> }
</span><del>- if (isASCIIAlphaCaselessEqual(character, 'p')) {
- auto result = source.advancePastIgnoringCase("public");
- if (result == SegmentedString::DidMatch)
- SWITCH_TO(AfterDOCTYPEPublicKeywordState);
- if (result == SegmentedString::NotEnoughCharacters)
- RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
- } else if (isASCIIAlphaCaselessEqual(character, 's')) {
- auto result = source.advancePastIgnoringCase("system");
- if (result == SegmentedString::DidMatch)
- SWITCH_TO(AfterDOCTYPESystemKeywordState);
- if (result == SegmentedString::NotEnoughCharacters)
- RETURN_IN_CURRENT_STATE(haveBufferedCharacterToken());
- }
- parseError();
- m_token.setForceQuirks();
- ADVANCE_TO(BogusDOCTYPEState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(AfterDOCTYPEPublicKeywordState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
- if (character == '"') {
</del><ins>+ HTML_BEGIN_STATE(AfterDOCTYPEPublicKeywordState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
+ else if (cc == '"') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setPublicIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
- }
- if (character == '\'') {
</del><ins>+ m_token->setPublicIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+ } else if (cc == '\'') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setPublicIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
- }
- if (character == '>') {
</del><ins>+ m_token->setPublicIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+ } else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ m_token->setForceQuirks();
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ parseError();
+ m_token->setForceQuirks();
+ HTML_ADVANCE_TO(BogusDOCTYPEState);
</ins><span class="cx"> }
</span><del>- parseError();
- m_token.setForceQuirks();
- ADVANCE_TO(BogusDOCTYPEState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(BeforeDOCTYPEPublicIdentifierState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
- if (character == '"') {
- m_token.setPublicIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
- }
- if (character == '\'') {
- m_token.setPublicIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
- }
- if (character == '>') {
</del><ins>+ HTML_BEGIN_STATE(BeforeDOCTYPEPublicIdentifierState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BeforeDOCTYPEPublicIdentifierState);
+ else if (cc == '"') {
+ m_token->setPublicIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
+ } else if (cc == '\'') {
+ m_token->setPublicIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
+ } else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ m_token->setForceQuirks();
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ parseError();
+ m_token->setForceQuirks();
+ HTML_ADVANCE_TO(BogusDOCTYPEState);
</ins><span class="cx"> }
</span><del>- parseError();
- m_token.setForceQuirks();
- ADVANCE_TO(BogusDOCTYPEState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(DOCTYPEPublicIdentifierDoubleQuotedState)
- if (character == '"')
- ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
- if (character == '>') {
</del><ins>+ HTML_BEGIN_STATE(DOCTYPEPublicIdentifierDoubleQuotedState) {
+ if (cc == '"')
+ HTML_ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
+ else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ m_token->setForceQuirks();
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ m_token->appendToPublicIdentifier(cc);
+ HTML_ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
</ins><span class="cx"> }
</span><del>- m_token.appendToPublicIdentifier(character);
- ADVANCE_TO(DOCTYPEPublicIdentifierDoubleQuotedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(DOCTYPEPublicIdentifierSingleQuotedState)
- if (character == '\'')
- ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
- if (character == '>') {
</del><ins>+ HTML_BEGIN_STATE(DOCTYPEPublicIdentifierSingleQuotedState) {
+ if (cc == '\'')
+ HTML_ADVANCE_TO(AfterDOCTYPEPublicIdentifierState);
+ else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ m_token->setForceQuirks();
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ m_token->appendToPublicIdentifier(cc);
+ HTML_ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
</ins><span class="cx"> }
</span><del>- m_token.appendToPublicIdentifier(character);
- ADVANCE_TO(DOCTYPEPublicIdentifierSingleQuotedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(AfterDOCTYPEPublicIdentifierState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (character == '"') {
</del><ins>+ HTML_BEGIN_STATE(AfterDOCTYPEPublicIdentifierState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
+ else if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (cc == '"') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setSystemIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
- }
- if (character == '\'') {
</del><ins>+ m_token->setSystemIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+ } else if (cc == '\'') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setSystemIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ m_token->setSystemIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ parseError();
+ m_token->setForceQuirks();
+ HTML_ADVANCE_TO(BogusDOCTYPEState);
</ins><span class="cx"> }
</span><del>- parseError();
- m_token.setForceQuirks();
- ADVANCE_TO(BogusDOCTYPEState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(BetweenDOCTYPEPublicAndSystemIdentifiersState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (character == '"') {
- m_token.setSystemIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
- }
- if (character == '\'') {
- m_token.setSystemIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(BetweenDOCTYPEPublicAndSystemIdentifiersState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BetweenDOCTYPEPublicAndSystemIdentifiersState);
+ else if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (cc == '"') {
+ m_token->setSystemIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+ } else if (cc == '\'') {
+ m_token->setSystemIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ parseError();
+ m_token->setForceQuirks();
+ HTML_ADVANCE_TO(BogusDOCTYPEState);
</ins><span class="cx"> }
</span><del>- parseError();
- m_token.setForceQuirks();
- ADVANCE_TO(BogusDOCTYPEState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(AfterDOCTYPESystemKeywordState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
- if (character == '"') {
</del><ins>+ HTML_BEGIN_STATE(AfterDOCTYPESystemKeywordState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
+ else if (cc == '"') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setSystemIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
- }
- if (character == '\'') {
</del><ins>+ m_token->setSystemIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+ } else if (cc == '\'') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setSystemIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
- }
- if (character == '>') {
</del><ins>+ m_token->setSystemIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+ } else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ m_token->setForceQuirks();
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ parseError();
+ m_token->setForceQuirks();
+ HTML_ADVANCE_TO(BogusDOCTYPEState);
</ins><span class="cx"> }
</span><del>- parseError();
- m_token.setForceQuirks();
- ADVANCE_TO(BogusDOCTYPEState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(BeforeDOCTYPESystemIdentifierState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
- if (character == '"') {
- m_token.setSystemIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
- }
- if (character == '\'') {
- m_token.setSystemIdentifierToEmptyString();
- ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
- }
- if (character == '>') {
</del><ins>+ HTML_BEGIN_STATE(BeforeDOCTYPESystemIdentifierState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(BeforeDOCTYPESystemIdentifierState);
+ if (cc == '"') {
+ m_token->setSystemIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
+ } else if (cc == '\'') {
+ m_token->setSystemIdentifierToEmptyString();
+ HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
+ } else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ m_token->setForceQuirks();
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ parseError();
+ m_token->setForceQuirks();
+ HTML_ADVANCE_TO(BogusDOCTYPEState);
</ins><span class="cx"> }
</span><del>- parseError();
- m_token.setForceQuirks();
- ADVANCE_TO(BogusDOCTYPEState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(DOCTYPESystemIdentifierDoubleQuotedState)
- if (character == '"')
- ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
- if (character == '>') {
</del><ins>+ HTML_BEGIN_STATE(DOCTYPESystemIdentifierDoubleQuotedState) {
+ if (cc == '"')
+ HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
+ else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ m_token->setForceQuirks();
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ m_token->appendToSystemIdentifier(cc);
+ HTML_ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
</ins><span class="cx"> }
</span><del>- m_token.appendToSystemIdentifier(character);
- ADVANCE_TO(DOCTYPESystemIdentifierDoubleQuotedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(DOCTYPESystemIdentifierSingleQuotedState)
- if (character == '\'')
- ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
- if (character == '>') {
</del><ins>+ HTML_BEGIN_STATE(DOCTYPESystemIdentifierSingleQuotedState) {
+ if (cc == '\'')
+ HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
+ else if (cc == '>') {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndResumeInDataState(source);
- }
- if (character == kEndOfFileMarker) {
</del><ins>+ m_token->setForceQuirks();
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ } else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ m_token->appendToSystemIdentifier(cc);
+ HTML_ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
</ins><span class="cx"> }
</span><del>- m_token.appendToSystemIdentifier(character);
- ADVANCE_TO(DOCTYPESystemIdentifierSingleQuotedState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(AfterDOCTYPESystemIdentifierState)
- if (isTokenizerWhitespace(character))
- ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (character == kEndOfFileMarker) {
</del><ins>+ HTML_BEGIN_STATE(AfterDOCTYPESystemIdentifierState) {
+ if (isTokenizerWhitespace(cc))
+ HTML_ADVANCE_TO(AfterDOCTYPESystemIdentifierState);
+ else if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (cc == kEndOfFileMarker) {
</ins><span class="cx"> parseError();
</span><del>- m_token.setForceQuirks();
- return emitAndReconsumeInDataState();
</del><ins>+ m_token->setForceQuirks();
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ } else {
+ parseError();
+ HTML_ADVANCE_TO(BogusDOCTYPEState);
</ins><span class="cx"> }
</span><del>- parseError();
- ADVANCE_TO(BogusDOCTYPEState);
</del><ins>+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(BogusDOCTYPEState)
- if (character == '>')
- return emitAndResumeInDataState(source);
- if (character == kEndOfFileMarker)
- return emitAndReconsumeInDataState();
- ADVANCE_TO(BogusDOCTYPEState);
</del><ins>+ HTML_BEGIN_STATE(BogusDOCTYPEState) {
+ if (cc == '>')
+ return emitAndResumeIn(source, HTMLTokenizer::DataState);
+ else if (cc == kEndOfFileMarker)
+ return emitAndReconsumeIn(source, HTMLTokenizer::DataState);
+ HTML_ADVANCE_TO(BogusDOCTYPEState);
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(CDATASectionState)
- if (character == ']')
- ADVANCE_TO(CDATASectionRightSquareBracketState);
- if (character == kEndOfFileMarker)
- RECONSUME_IN(DataState);
- bufferCharacter(character);
- ADVANCE_TO(CDATASectionState);
</del><ins>+ HTML_BEGIN_STATE(CDATASectionState) {
+ if (cc == ']')
+ HTML_ADVANCE_TO(CDATASectionRightSquareBracketState);
+ else if (cc == kEndOfFileMarker)
+ HTML_RECONSUME_IN(DataState);
+ else {
+ bufferCharacter(cc);
+ HTML_ADVANCE_TO(CDATASectionState);
+ }
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><del>- BEGIN_STATE(CDATASectionRightSquareBracketState)
- if (character == ']')
- ADVANCE_TO(CDATASectionDoubleRightSquareBracketState);
- bufferASCIICharacter(']');
- RECONSUME_IN(CDATASectionState);
- END_STATE()
</del><ins>+ HTML_BEGIN_STATE(CDATASectionRightSquareBracketState) {
+ if (cc == ']')
+ HTML_ADVANCE_TO(CDATASectionDoubleRightSquareBracketState);
+ else {
+ bufferASCIICharacter(']');
+ HTML_RECONSUME_IN(CDATASectionState);
+ }
+ }
</ins><span class="cx">
</span><del>- BEGIN_STATE(CDATASectionDoubleRightSquareBracketState)
- if (character == '>')
- ADVANCE_TO(DataState);
- bufferASCIICharacter(']');
- bufferASCIICharacter(']');
- RECONSUME_IN(CDATASectionState);
</del><ins>+ HTML_BEGIN_STATE(CDATASectionDoubleRightSquareBracketState) {
+ if (cc == '>')
+ HTML_ADVANCE_TO(DataState);
+ else {
+ bufferASCIICharacter(']');
+ bufferASCIICharacter(']');
+ HTML_RECONSUME_IN(CDATASectionState);
+ }
+ }
</ins><span class="cx"> END_STATE()
</span><span class="cx">
</span><span class="cx"> }
</span><span class="lines">@@ -1409,45 +1561,39 @@
</span><span class="cx"> void HTMLTokenizer::updateStateFor(const AtomicString& tagName)
</span><span class="cx"> {
</span><span class="cx"> if (tagName == textareaTag || tagName == titleTag)
</span><del>- m_state = RCDATAState;
</del><ins>+ setState(HTMLTokenizer::RCDATAState);
</ins><span class="cx"> else if (tagName == plaintextTag)
</span><del>- m_state = PLAINTEXTState;
</del><ins>+ setState(HTMLTokenizer::PLAINTEXTState);
</ins><span class="cx"> else if (tagName == scriptTag)
</span><del>- m_state = ScriptDataState;
</del><ins>+ setState(HTMLTokenizer::ScriptDataState);
</ins><span class="cx"> else if (tagName == styleTag
</span><span class="cx"> || tagName == iframeTag
</span><span class="cx"> || tagName == xmpTag
</span><span class="cx"> || (tagName == noembedTag && m_options.pluginsEnabled)
</span><span class="cx"> || tagName == noframesTag
</span><span class="cx"> || (tagName == noscriptTag && m_options.scriptEnabled))
</span><del>- m_state = RAWTEXTState;
</del><ins>+ setState(HTMLTokenizer::RAWTEXTState);
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-inline void HTMLTokenizer::appendToTemporaryBuffer(UChar character)
</del><ins>+inline bool HTMLTokenizer::temporaryBufferIs(const String& expectedString)
</ins><span class="cx"> {
</span><del>- ASSERT(isASCII(character));
- m_temporaryBuffer.append(character);
-}
-
-inline bool HTMLTokenizer::temporaryBufferIs(const char* expectedString)
-{
</del><span class="cx"> return vectorEqualsString(m_temporaryBuffer, expectedString);
</span><span class="cx"> }
</span><span class="cx">
</span><del>-inline void HTMLTokenizer::appendToPossibleEndTag(UChar character)
</del><ins>+inline void HTMLTokenizer::addToPossibleEndTag(LChar cc)
</ins><span class="cx"> {
</span><del>- ASSERT(isASCII(character));
- m_bufferedEndTagName.append(character);
</del><ins>+ ASSERT(isEndTagBufferingState(m_state));
+ m_bufferedEndTagName.append(cc);
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-inline bool HTMLTokenizer::isAppropriateEndTag() const
</del><ins>+inline bool HTMLTokenizer::isAppropriateEndTag()
</ins><span class="cx"> {
</span><span class="cx"> if (m_bufferedEndTagName.size() != m_appropriateEndTagName.size())
</span><span class="cx"> return false;
</span><span class="cx">
</span><del>- unsigned size = m_bufferedEndTagName.size();
</del><ins>+ size_t numCharacters = m_bufferedEndTagName.size();
</ins><span class="cx">
</span><del>- for (unsigned i = 0; i < size; i++) {
</del><ins>+ for (size_t i = 0; i < numCharacters; i++) {
</ins><span class="cx"> if (m_bufferedEndTagName[i] != m_appropriateEndTagName[i])
</span><span class="cx"> return false;
</span><span class="cx"> }
</span><span class="lines">@@ -1457,6 +1603,7 @@
</span><span class="cx">
</span><span class="cx"> inline void HTMLTokenizer::parseError()
</span><span class="cx"> {
</span><ins>+ notImplemented();
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> }
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLTokenizerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLTokenizer.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLTokenizer.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLTokenizer.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
</del><ins>+ * Copyright (C) 2008 Apple Inc. All Rights Reserved.
</ins><span class="cx"> * Copyright (C) 2010 Google, Inc. All Rights Reserved.
</span><span class="cx"> *
</span><span class="cx"> * Redistribution and use in source and binary forms, with or without
</span><span class="lines">@@ -30,54 +30,19 @@
</span><span class="cx"> #include "HTMLParserOptions.h"
</span><span class="cx"> #include "HTMLToken.h"
</span><span class="cx"> #include "InputStreamPreprocessor.h"
</span><ins>+#include "SegmentedString.h"
</ins><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><del>-class SegmentedString;
-
</del><span class="cx"> class HTMLTokenizer {
</span><ins>+ WTF_MAKE_NONCOPYABLE(HTMLTokenizer);
+ WTF_MAKE_FAST_ALLOCATED;
</ins><span class="cx"> public:
</span><del>- explicit HTMLTokenizer(const HTMLParserOptions& = HTMLParserOptions());
</del><ins>+ explicit HTMLTokenizer(const HTMLParserOptions&);
+ ~HTMLTokenizer();
</ins><span class="cx">
</span><del>- // If we can't parse a whole token, this returns null.
- class TokenPtr;
- TokenPtr nextToken(SegmentedString&);
</del><ins>+ void reset();
</ins><span class="cx">
</span><del>- // Returns a copy of any characters buffered internally by the tokenizer.
- // The tokenizer buffers characters when searching for the </script> token that terminates a script element.
- String bufferedCharacters() const;
- size_t numberOfBufferedCharacters() const;
-
- // Updates the tokenizer's state according to the given tag name. This is an approximation of how the tree
- // builder would update the tokenizer's state. This method is useful for approximating HTML tokenization.
- // To get exactly the correct tokenization, you need the real tree builder.
- //
- // The main failures in the approximation are as follows:
- //
- // * The first set of character tokens emitted for a <pre> element might contain an extra leading newline.
- // * The replacement of U+0000 with U+FFFD will not be sensitive to the tree builder's insertion mode.
- // * CDATA sections in foreign content will be tokenized as bogus comments instead of as character tokens.
- //
- // This approximation is also the algorithm called for when parsing an HTML fragment.
- // https://html.spec.whatwg.org/multipage/syntax.html#parsing-html-fragments
- void updateStateFor(const AtomicString& tagName);
-
- void setForceNullCharacterReplacement(bool);
-
- bool shouldAllowCDATA() const;
- void setShouldAllowCDATA(bool);
-
- bool isInDataState() const;
-
- void setDataState();
- void setPLAINTEXTState();
- void setRAWTEXTState();
- void setRCDATAState();
- void setScriptDataState();
-
- bool neverSkipNullCharacters() const;
-
-private:
</del><span class="cx"> enum State {
</span><span class="cx"> DataState,
</span><span class="cx"> CharacterReferenceInDataState,
</span><span class="lines">@@ -123,7 +88,10 @@
</span><span class="cx"> AfterAttributeValueQuotedState,
</span><span class="cx"> SelfClosingStartTagState,
</span><span class="cx"> BogusCommentState,
</span><del>- ContinueBogusCommentState, // Not in the HTML spec, used internally to track whether we started the bogus comment token.
</del><ins>+ // The ContinueBogusCommentState is not in the HTML5 spec, but we use
+ // it internally to keep track of whether we've started the bogus
+ // comment token yet.
+ ContinueBogusCommentState,
</ins><span class="cx"> MarkupDeclarationOpenState,
</span><span class="cx"> CommentStartState,
</span><span class="cx"> CommentStartDashState,
</span><span class="lines">@@ -153,198 +121,156 @@
</span><span class="cx"> CDATASectionDoubleRightSquareBracketState,
</span><span class="cx"> };
</span><span class="cx">
</span><del>- bool processToken(SegmentedString&);
- bool processEntity(SegmentedString&);
</del><ins>+ // This function returns true if it emits a token. Otherwise, callers
+ // must provide the same (in progress) token on the next call (unless
+ // they call reset() first).
+ bool nextToken(SegmentedString&, HTMLToken&);
</ins><span class="cx">
</span><del>- void parseError();
</del><ins>+ // Returns a copy of any characters buffered internally by the tokenizer.
+ // The tokenizer buffers characters when searching for the </script> token
+ // that terminates a script element.
+ String bufferedCharacters() const;
</ins><span class="cx">
</span><del>- void bufferASCIICharacter(UChar);
- void bufferCharacter(UChar);
</del><ins>+ size_t numberOfBufferedCharacters() const
+ {
+ // Notice that we add 2 to the length of the m_temporaryBuffer to
+ // account for the "</" characters, which are effecitvely buffered in
+ // the tokenizer's state machine.
+ return m_temporaryBuffer.size() ? m_temporaryBuffer.size() + 2 : 0;
+ }
</ins><span class="cx">
</span><del>- bool emitAndResumeInDataState(SegmentedString&);
- bool emitAndReconsumeInDataState();
- bool emitEndOfFile(SegmentedString&);
</del><ins>+ // Updates the tokenizer's state according to the given tag name. This is
+ // an approximation of how the tree builder would update the tokenizer's
+ // state. This method is useful for approximating HTML tokenization. To
+ // get exactly the correct tokenization, you need the real tree builder.
+ //
+ // The main failures in the approximation are as follows:
+ //
+ // * The first set of character tokens emitted for a <pre> element might
+ // contain an extra leading newline.
+ // * The replacement of U+0000 with U+FFFD will not be sensitive to the
+ // tree builder's insertion mode.
+ // * CDATA sections in foreign content will be tokenized as bogus comments
+ // instead of as character tokens.
+ //
+ void updateStateFor(const AtomicString& tagName);
</ins><span class="cx">
</span><del>- // Return true if we wil emit a character token before dealing with the buffered end tag.
- void flushBufferedEndTag();
- bool commitToPartialEndTag(SegmentedString&, UChar, State);
- bool commitToCompleteEndTag(SegmentedString&);
</del><ins>+ bool forceNullCharacterReplacement() const { return m_forceNullCharacterReplacement; }
+ void setForceNullCharacterReplacement(bool value) { m_forceNullCharacterReplacement = value; }
</ins><span class="cx">
</span><del>- void appendToTemporaryBuffer(UChar);
- bool temporaryBufferIs(const char*);
</del><ins>+ bool shouldAllowCDATA() const { return m_shouldAllowCDATA; }
+ void setShouldAllowCDATA(bool value) { m_shouldAllowCDATA = value; }
</ins><span class="cx">
</span><del>- // Sometimes we speculatively consume input characters and we don't know whether they represent
- // end tags or RCDATA, etc. These functions help manage these state.
- bool inEndTagBufferingState() const;
- void appendToPossibleEndTag(UChar);
- void saveEndTagNameIfNeeded();
- bool isAppropriateEndTag() const;
</del><ins>+ State state() const { return m_state; }
+ void setState(State state) { m_state = state; }
</ins><span class="cx">
</span><del>- bool haveBufferedCharacterToken() const;
</del><ins>+ inline bool shouldSkipNullCharacters() const
+ {
+ return !m_forceNullCharacterReplacement
+ && (m_state == HTMLTokenizer::DataState
+ || m_state == HTMLTokenizer::RCDATAState
+ || m_state == HTMLTokenizer::RAWTEXTState);
+ }
</ins><span class="cx">
</span><del>- static bool isNullCharacterSkippingState(State);
-
- State m_state { DataState };
- bool m_forceNullCharacterReplacement { false };
- bool m_shouldAllowCDATA { false };
-
- mutable HTMLToken m_token;
-
- // https://html.spec.whatwg.org/#additional-allowed-character
- UChar m_additionalAllowedCharacter { 0 };
-
- // https://html.spec.whatwg.org/#preprocessing-the-input-stream
- InputStreamPreprocessor<HTMLTokenizer> m_preprocessor;
-
- Vector<UChar, 32> m_appropriateEndTagName;
-
- // https://html.spec.whatwg.org/#temporary-buffer
- Vector<LChar, 32> m_temporaryBuffer;
-
- // We occasionally want to emit both a character token and an end tag
- // token (e.g., when lexing script). We buffer the name of the end tag
- // token here so we remember it next time we re-enter the tokenizer.
- Vector<LChar, 32> m_bufferedEndTagName;
-
- const HTMLParserOptions m_options;
-};
-
-class HTMLTokenizer::TokenPtr {
-public:
- TokenPtr();
- ~TokenPtr();
-
- TokenPtr(TokenPtr&&);
- TokenPtr& operator=(TokenPtr&&) = delete;
-
- void clear();
-
- operator bool() const;
-
- HTMLToken& operator*() const;
- HTMLToken* operator->() const;
-
</del><span class="cx"> private:
</span><del>- friend class HTMLTokenizer;
- explicit TokenPtr(HTMLToken*);
</del><ins>+ inline bool processEntity(SegmentedString&);
</ins><span class="cx">
</span><del>- HTMLToken* m_token { nullptr };
-};
</del><ins>+ inline void parseError();
</ins><span class="cx">
</span><del>-inline HTMLTokenizer::TokenPtr::TokenPtr()
-{
-}
</del><ins>+ void bufferASCIICharacter(UChar character)
+ {
+ ASSERT(character != kEndOfFileMarker);
+ ASSERT(isASCII(character));
+ m_token->appendToCharacter(static_cast<LChar>(character));
+ }
</ins><span class="cx">
</span><del>-inline HTMLTokenizer::TokenPtr::TokenPtr(HTMLToken* token)
- : m_token(token)
-{
-}
</del><ins>+ void bufferCharacter(UChar character)
+ {
+ ASSERT(character != kEndOfFileMarker);
+ m_token->appendToCharacter(character);
+ }
+ void bufferCharacter(char) = delete;
+ void bufferCharacter(LChar) = delete;
</ins><span class="cx">
</span><del>-inline HTMLTokenizer::TokenPtr::~TokenPtr()
-{
- if (m_token)
- m_token->clear();
-}
</del><ins>+ inline bool emitAndResumeIn(SegmentedString& source, State state)
+ {
+ saveEndTagNameIfNeeded();
+ m_state = state;
+ source.advanceAndUpdateLineNumber();
+ return true;
+ }
+
+ inline bool emitAndReconsumeIn(SegmentedString&, State state)
+ {
+ saveEndTagNameIfNeeded();
+ m_state = state;
+ return true;
+ }
</ins><span class="cx">
</span><del>-inline HTMLTokenizer::TokenPtr::TokenPtr(TokenPtr&& other)
- : m_token(other.m_token)
-{
- other.m_token = nullptr;
-}
-
-inline void HTMLTokenizer::TokenPtr::clear()
-{
- if (m_token) {
</del><ins>+ inline bool emitEndOfFile(SegmentedString& source)
+ {
+ if (haveBufferedCharacterToken())
+ return true;
+ m_state = HTMLTokenizer::DataState;
+ source.advanceAndUpdateLineNumber();
</ins><span class="cx"> m_token->clear();
</span><del>- m_token = nullptr;
</del><ins>+ m_token->makeEndOfFile();
+ return true;
</ins><span class="cx"> }
</span><del>-}
</del><span class="cx">
</span><del>-inline HTMLTokenizer::TokenPtr::operator bool() const
-{
- return m_token;
-}
</del><ins>+ inline bool flushEmitAndResumeIn(SegmentedString&, State);
</ins><span class="cx">
</span><del>-inline HTMLToken& HTMLTokenizer::TokenPtr::operator*() const
-{
- ASSERT(m_token);
- return *m_token;
-}
</del><ins>+ // Return whether we need to emit a character token before dealing with
+ // the buffered end tag.
+ inline bool flushBufferedEndTag(SegmentedString&);
+ inline bool temporaryBufferIs(const String&);
</ins><span class="cx">
</span><del>-inline HTMLToken* HTMLTokenizer::TokenPtr::operator->() const
-{
- ASSERT(m_token);
- return m_token;
-}
</del><ins>+ // Sometimes we speculatively consume input characters and we don't
+ // know whether they represent end tags or RCDATA, etc. These
+ // functions help manage these state.
+ inline void addToPossibleEndTag(LChar cc);
</ins><span class="cx">
</span><del>-inline HTMLTokenizer::TokenPtr HTMLTokenizer::nextToken(SegmentedString& source)
-{
- return TokenPtr(processToken(source) ? &m_token : nullptr);
-}
</del><ins>+ inline void saveEndTagNameIfNeeded()
+ {
+ ASSERT(m_token->type() != HTMLToken::Uninitialized);
+ if (m_token->type() == HTMLToken::StartTag)
+ m_appropriateEndTagName = m_token->name();
+ }
+ inline bool isAppropriateEndTag();
</ins><span class="cx">
</span><del>-inline size_t HTMLTokenizer::numberOfBufferedCharacters() const
-{
- // Notice that we add 2 to the length of the m_temporaryBuffer to
- // account for the "</" characters, which are effecitvely buffered in
- // the tokenizer's state machine.
- return m_temporaryBuffer.size() ? m_temporaryBuffer.size() + 2 : 0;
-}
</del><span class="cx">
</span><del>-inline void HTMLTokenizer::setForceNullCharacterReplacement(bool value)
-{
- m_forceNullCharacterReplacement = value;
-}
</del><ins>+ inline bool haveBufferedCharacterToken()
+ {
+ return m_token->type() == HTMLToken::Character;
+ }
</ins><span class="cx">
</span><del>-inline bool HTMLTokenizer::shouldAllowCDATA() const
-{
- return m_shouldAllowCDATA;
-}
</del><ins>+ State m_state;
+ bool m_forceNullCharacterReplacement;
+ bool m_shouldAllowCDATA;
</ins><span class="cx">
</span><del>-inline void HTMLTokenizer::setShouldAllowCDATA(bool value)
-{
- m_shouldAllowCDATA = value;
-}
</del><ins>+ // m_token is owned by the caller. If nextToken is not on the stack,
+ // this member might be pointing to unallocated memory.
+ HTMLToken* m_token;
</ins><span class="cx">
</span><del>-inline bool HTMLTokenizer::isInDataState() const
-{
- return m_state == DataState;
-}
</del><ins>+ // http://www.whatwg.org/specs/web-apps/current-work/#additional-allowed-character
+ UChar m_additionalAllowedCharacter;
</ins><span class="cx">
</span><del>-inline void HTMLTokenizer::setDataState()
-{
- m_state = DataState;
-}
</del><ins>+ // http://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream
+ InputStreamPreprocessor<HTMLTokenizer> m_inputStreamPreprocessor;
</ins><span class="cx">
</span><del>-inline void HTMLTokenizer::setPLAINTEXTState()
-{
- m_state = PLAINTEXTState;
-}
</del><ins>+ Vector<UChar, 32> m_appropriateEndTagName;
</ins><span class="cx">
</span><del>-inline void HTMLTokenizer::setRAWTEXTState()
-{
- m_state = RAWTEXTState;
-}
</del><ins>+ // http://www.whatwg.org/specs/web-apps/current-work/#temporary-buffer
+ Vector<LChar, 32> m_temporaryBuffer;
</ins><span class="cx">
</span><del>-inline void HTMLTokenizer::setRCDATAState()
-{
- m_state = RCDATAState;
-}
</del><ins>+ // We occationally want to emit both a character token and an end tag
+ // token (e.g., when lexing script). We buffer the name of the end tag
+ // token here so we remember it next time we re-enter the tokenizer.
+ Vector<LChar, 32> m_bufferedEndTagName;
</ins><span class="cx">
</span><del>-inline void HTMLTokenizer::setScriptDataState()
-{
- m_state = ScriptDataState;
-}
</del><ins>+ HTMLParserOptions m_options;
+};
</ins><span class="cx">
</span><del>-inline bool HTMLTokenizer::isNullCharacterSkippingState(State state)
-{
- return state == DataState || state == RCDATAState || state == RAWTEXTState;
</del><span class="cx"> }
</span><span class="cx">
</span><del>-inline bool HTMLTokenizer::neverSkipNullCharacters() const
-{
- return m_forceNullCharacterReplacement;
-}
-
-}
-
</del><span class="cx"> #endif
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserHTMLTreeBuildercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/HTMLTreeBuilder.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -695,7 +695,7 @@
</span><span class="cx"> if (token.name() == plaintextTag) {
</span><span class="cx"> processFakePEndTagIfPInButtonScope();
</span><span class="cx"> m_tree.insertHTMLElement(&token);
</span><del>- m_parser.tokenizer().setPLAINTEXTState();
</del><ins>+ m_parser.tokenizer().setState(HTMLTokenizer::PLAINTEXTState);
</ins><span class="cx"> return;
</span><span class="cx"> }
</span><span class="cx"> if (token.name() == buttonTag) {
</span><span class="lines">@@ -799,7 +799,7 @@
</span><span class="cx"> if (token.name() == textareaTag) {
</span><span class="cx"> m_tree.insertHTMLElement(&token);
</span><span class="cx"> m_shouldSkipLeadingNewline = true;
</span><del>- m_parser.tokenizer().setRCDATAState();
</del><ins>+ m_parser.tokenizer().setState(HTMLTokenizer::RCDATAState);
</ins><span class="cx"> m_originalInsertionMode = m_insertionMode;
</span><span class="cx"> m_framesetOk = false;
</span><span class="cx"> m_insertionMode = InsertionMode::Text;
</span><span class="lines">@@ -2137,8 +2137,8 @@
</span><span class="cx"> // self-closing script tag was encountered and pre-HTML5 parser
</span><span class="cx"> // quirks are enabled. We must set the tokenizer's state to
</span><span class="cx"> // DataState explicitly if the tokenizer didn't have a chance to.
</span><del>- ASSERT(m_parser.tokenizer().isInDataState() || m_options.usePreHTML5ParserQuirks);
- m_parser.tokenizer().setDataState();
</del><ins>+ ASSERT(m_parser.tokenizer().state() == HTMLTokenizer::DataState || m_options.usePreHTML5ParserQuirks);
+ m_parser.tokenizer().setState(HTMLTokenizer::DataState);
</ins><span class="cx"> return;
</span><span class="cx"> }
</span><span class="cx"> m_tree.openElements().pop();
</span><span class="lines">@@ -2739,7 +2739,7 @@
</span><span class="cx"> {
</span><span class="cx"> ASSERT(token.type() == HTMLToken::StartTag);
</span><span class="cx"> m_tree.insertHTMLElement(&token);
</span><del>- m_parser.tokenizer().setRCDATAState();
</del><ins>+ m_parser.tokenizer().setState(HTMLTokenizer::RCDATAState);
</ins><span class="cx"> m_originalInsertionMode = m_insertionMode;
</span><span class="cx"> m_insertionMode = InsertionMode::Text;
</span><span class="cx"> }
</span><span class="lines">@@ -2748,7 +2748,7 @@
</span><span class="cx"> {
</span><span class="cx"> ASSERT(token.type() == HTMLToken::StartTag);
</span><span class="cx"> m_tree.insertHTMLElement(&token);
</span><del>- m_parser.tokenizer().setRAWTEXTState();
</del><ins>+ m_parser.tokenizer().setState(HTMLTokenizer::RAWTEXTState);
</ins><span class="cx"> m_originalInsertionMode = m_insertionMode;
</span><span class="cx"> m_insertionMode = InsertionMode::Text;
</span><span class="cx"> }
</span><span class="lines">@@ -2757,7 +2757,7 @@
</span><span class="cx"> {
</span><span class="cx"> ASSERT(token.type() == HTMLToken::StartTag);
</span><span class="cx"> m_tree.insertScriptElement(&token);
</span><del>- m_parser.tokenizer().setScriptDataState();
</del><ins>+ m_parser.tokenizer().setState(HTMLTokenizer::ScriptDataState);
</ins><span class="cx"> m_originalInsertionMode = m_insertionMode;
</span><span class="cx">
</span><span class="cx"> TextPosition position = m_parser.textPosition();
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserInputStreamPreprocessorh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/InputStreamPreprocessor.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/InputStreamPreprocessor.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/InputStreamPreprocessor.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -40,7 +40,7 @@
</span><span class="cx"> class InputStreamPreprocessor {
</span><span class="cx"> WTF_MAKE_NONCOPYABLE(InputStreamPreprocessor);
</span><span class="cx"> public:
</span><del>- explicit InputStreamPreprocessor(Tokenizer& tokenizer)
</del><ins>+ InputStreamPreprocessor(Tokenizer* tokenizer)
</ins><span class="cx"> : m_tokenizer(tokenizer)
</span><span class="cx"> {
</span><span class="cx"> reset();
</span><span class="lines">@@ -51,11 +51,8 @@
</span><span class="cx"> // Returns whether we succeeded in peeking at the next character.
</span><span class="cx"> // The only way we can fail to peek is if there are no more
</span><span class="cx"> // characters in |source| (after collapsing \r\n, etc).
</span><del>- ALWAYS_INLINE bool peek(SegmentedString& source, bool skipNullCharacters = false)
</del><ins>+ ALWAYS_INLINE bool peek(SegmentedString& source)
</ins><span class="cx"> {
</span><del>- if (source.isEmpty())
- return false;
-
</del><span class="cx"> m_nextInputCharacter = source.currentChar();
</span><span class="cx">
</span><span class="cx"> // Every branch in this function is expensive, so we have a
</span><span class="lines">@@ -67,14 +64,16 @@
</span><span class="cx"> m_skipNextNewLine = false;
</span><span class="cx"> return true;
</span><span class="cx"> }
</span><del>- return processNextInputCharacter(source, skipNullCharacters);
</del><ins>+ return processNextInputCharacter(source);
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> // Returns whether there are more characters in |source| after advancing.
</span><del>- ALWAYS_INLINE bool advance(SegmentedString& source, bool skipNullCharacters = false)
</del><ins>+ ALWAYS_INLINE bool advance(SegmentedString& source)
</ins><span class="cx"> {
</span><span class="cx"> source.advanceAndUpdateLineNumber();
</span><del>- return peek(source, skipNullCharacters);
</del><ins>+ if (source.isEmpty())
+ return false;
+ return peek(source);
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> bool skipNextNewLine() const { return m_skipNextNewLine; }
</span><span class="lines">@@ -86,7 +85,7 @@
</span><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> private:
</span><del>- bool processNextInputCharacter(SegmentedString& source, bool skipNullCharacters)
</del><ins>+ bool processNextInputCharacter(SegmentedString& source)
</ins><span class="cx"> {
</span><span class="cx"> ProcessAgain:
</span><span class="cx"> ASSERT(m_nextInputCharacter == source.currentChar());
</span><span class="lines">@@ -108,7 +107,7 @@
</span><span class="cx"> // by the replacement character. We suspect this is a problem with the spec as doing
</span><span class="cx"> // that filtering breaks surrogate pair handling and causes us not to match Minefield.
</span><span class="cx"> if (m_nextInputCharacter == '\0' && !shouldTreatNullAsEndOfFileMarker(source)) {
</span><del>- if (skipNullCharacters && !m_tokenizer.neverSkipNullCharacters()) {
</del><ins>+ if (m_tokenizer->shouldSkipNullCharacters()) {
</ins><span class="cx"> source.advancePastNonNewline();
</span><span class="cx"> if (source.isEmpty())
</span><span class="cx"> return false;
</span><span class="lines">@@ -126,7 +125,7 @@
</span><span class="cx"> return source.isClosed() && source.length() == 1;
</span><span class="cx"> }
</span><span class="cx">
</span><del>- Tokenizer& m_tokenizer;
</del><ins>+ Tokenizer* m_tokenizer;
</ins><span class="cx">
</span><span class="cx"> // http://www.whatwg.org/specs/web-apps/current-work/#next-input-character
</span><span class="cx"> UChar m_nextInputCharacter;
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserTextDocumentParsercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/TextDocumentParser.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/TextDocumentParser.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/TextDocumentParser.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -61,7 +61,7 @@
</span><span class="cx">
</span><span class="cx"> // Although Text Documents expose a "pre" element in their DOM, they
</span><span class="cx"> // act like a <plaintext> tag, so we have to force plaintext mode.
</span><del>- tokenizer().setPLAINTEXTState();
</del><ins>+ tokenizer().setState(HTMLTokenizer::PLAINTEXTState);
</ins><span class="cx">
</span><span class="cx"> m_haveInsertedFakePreElement = true;
</span><span class="cx"> }
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserXSSAuditorcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/XSSAuditor.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/XSSAuditor.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/XSSAuditor.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -566,7 +566,7 @@
</span><span class="cx"> String XSSAuditor::decodedSnippetForName(const FilterTokenRequest& request)
</span><span class="cx"> {
</span><span class="cx"> // Grab a fixed number of characters equal to the length of the token's name plus one (to account for the "<").
</span><del>- return fullyDecodeString(request.sourceTracker.source(request.token), m_encoding).substring(0, request.token.name().size() + 1);
</del><ins>+ return fullyDecodeString(request.sourceTracker.sourceForToken(request.token), m_encoding).substring(0, request.token.name().size() + 1);
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> String XSSAuditor::decodedSnippetForAttribute(const FilterTokenRequest& request, const HTMLToken::Attribute& attribute, AttributeKind treatment)
</span><span class="lines">@@ -575,9 +575,9 @@
</span><span class="cx"> // for an input of |name="value"|, the snippet is |name="value|. For an
</span><span class="cx"> // unquoted input of |name=value |, the snippet is |name=value|.
</span><span class="cx"> // FIXME: We should grab one character before the name also.
</span><del>- unsigned start = attribute.startOffset;
- unsigned end = attribute.endOffset;
- String decodedSnippet = fullyDecodeString(request.sourceTracker.source(request.token, start, end), m_encoding);
</del><ins>+ unsigned start = attribute.nameRange.start;
+ unsigned end = attribute.valueRange.end;
+ String decodedSnippet = fullyDecodeString(request.sourceTracker.sourceForToken(request.token).substring(start, end - start), m_encoding);
</ins><span class="cx"> decodedSnippet.truncate(kMaximumFragmentLengthTarget);
</span><span class="cx"> if (treatment == SrcLikeAttribute) {
</span><span class="cx"> int slashCount = 0;
</span><span class="lines">@@ -630,7 +630,7 @@
</span><span class="cx">
</span><span class="cx"> String XSSAuditor::decodedSnippetForJavaScript(const FilterTokenRequest& request)
</span><span class="cx"> {
</span><del>- String string = request.sourceTracker.source(request.token);
</del><ins>+ String string = request.sourceTracker.sourceForToken(request.token);
</ins><span class="cx"> size_t startPosition = 0;
</span><span class="cx"> size_t endPosition = string.length();
</span><span class="cx"> size_t foundPosition = notFound;
</span><span class="lines">@@ -737,4 +737,12 @@
</span><span class="cx"> return (m_documentURL.host() == resourceURL.host() && resourceURL.query().isEmpty());
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+bool XSSAuditor::isSafeToSendToAnotherThread() const
+{
+ return m_documentURL.isSafeToSendToAnotherThread()
+ && m_decodedURL.isSafeToSendToAnotherThread()
+ && m_decodedHTTPBody.isSafeToSendToAnotherThread()
+ && m_cachedDecodedSnippet.isSafeToSendToAnotherThread();
+}
+
</ins><span class="cx"> } // namespace WebCore
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserXSSAuditorh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/XSSAuditor.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/XSSAuditor.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/parser/XSSAuditor.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -61,6 +61,7 @@
</span><span class="cx"> void initForFragment();
</span><span class="cx">
</span><span class="cx"> std::unique_ptr<XSSInfo> filterToken(const FilterTokenRequest&);
</span><ins>+ bool isSafeToSendToAnotherThread() const;
</ins><span class="cx">
</span><span class="cx"> private:
</span><span class="cx"> static const size_t kMaximumFragmentLengthTarget = 100;
</span></span></pre></div>
<a id="trunkSourceWebCorehtmltrackWebVTTTokenizercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/track/WebVTTTokenizer.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/track/WebVTTTokenizer.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/track/WebVTTTokenizer.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -1,6 +1,6 @@
</span><span class="cx"> /*
</span><span class="cx"> * Copyright (C) 2011, 2013 Google Inc. All rights reserved.
</span><del>- * Copyright (C) 2014-2015 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2014 Apple Inc. All rights reserved.
</ins><span class="cx"> *
</span><span class="cx"> * Redistribution and use in source and binary forms, with or without
</span><span class="cx"> * modification, are permitted provided that the following conditions are
</span><span class="lines">@@ -41,15 +41,19 @@
</span><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><del>-#define WEBVTT_ADVANCE_TO(stateName) \
- do { \
- ASSERT(!m_input.isEmpty()); \
- m_preprocessor.advance(m_input); \
- character = m_preprocessor.nextInputCharacter(); \
- goto stateName; \
</del><ins>+#define WEBVTT_BEGIN_STATE(stateName) case stateName: stateName:
+#define WEBVTT_ADVANCE_TO(stateName) \
+ do { \
+ state = stateName; \
+ ASSERT(!m_input.isEmpty()); \
+ m_inputStreamPreprocessor.advance(m_input); \
+ cc = m_inputStreamPreprocessor.nextInputCharacter(); \
+ goto stateName; \
</ins><span class="cx"> } while (false)
</span><ins>+
</ins><span class="cx">
</span><del>-template<unsigned charactersCount> ALWAYS_INLINE bool equalLiteral(const StringBuilder& s, const char (&characters)[charactersCount])
</del><ins>+template<unsigned charactersCount>
+ALWAYS_INLINE bool equalLiteral(const StringBuilder& s, const char (&characters)[charactersCount])
</ins><span class="cx"> {
</span><span class="cx"> return WTF::equal(s, reinterpret_cast<const LChar*>(characters), charactersCount - 1);
</span><span class="cx"> }
</span><span class="lines">@@ -75,7 +79,7 @@
</span><span class="cx">
</span><span class="cx"> WebVTTTokenizer::WebVTTTokenizer(const String& input)
</span><span class="cx"> : m_input(input)
</span><del>- , m_preprocessor(*this)
</del><ins>+ , m_inputStreamPreprocessor(this)
</ins><span class="cx"> {
</span><span class="cx"> // Append an EOF marker and close the input "stream".
</span><span class="cx"> ASSERT(!m_input.isClosed());
</span><span class="lines">@@ -85,12 +89,12 @@
</span><span class="cx">
</span><span class="cx"> bool WebVTTTokenizer::nextToken(WebVTTToken& token)
</span><span class="cx"> {
</span><del>- if (m_input.isEmpty() || !m_preprocessor.peek(m_input))
</del><ins>+ if (m_input.isEmpty() || !m_inputStreamPreprocessor.peek(m_input))
</ins><span class="cx"> return false;
</span><span class="cx">
</span><del>- UChar character = m_preprocessor.nextInputCharacter();
- if (character == kEndOfFileMarker) {
- m_preprocessor.advance(m_input);
</del><ins>+ UChar cc = m_inputStreamPreprocessor.nextInputCharacter();
+ if (cc == kEndOfFileMarker) {
+ m_inputStreamPreprocessor.advance(m_input);
</ins><span class="cx"> return false;
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -98,134 +102,169 @@
</span><span class="cx"> StringBuilder result;
</span><span class="cx"> StringBuilder classes;
</span><span class="cx">
</span><del>-// 4.8.10.13.4 WebVTT cue text tokenizer
-DataState:
- if (character == '&') {
- buffer.append('&');
- WEBVTT_ADVANCE_TO(EscapeState);
- } else if (character == '<') {
- if (result.isEmpty())
- WEBVTT_ADVANCE_TO(TagState);
</del><ins>+ enum {
+ DataState,
+ EscapeState,
+ TagState,
+ StartTagState,
+ StartTagClassState,
+ StartTagAnnotationState,
+ EndTagState,
+ TimestampTagState,
+ } state = DataState;
+
+ // 4.8.10.13.4 WebVTT cue text tokenizer
+ switch (state) {
+ WEBVTT_BEGIN_STATE(DataState) {
+ if (cc == '&') {
+ buffer.append(static_cast<LChar>(cc));
+ WEBVTT_ADVANCE_TO(EscapeState);
+ } else if (cc == '<') {
+ if (result.isEmpty())
+ WEBVTT_ADVANCE_TO(TagState);
+ else {
+ // We don't want to advance input or perform a state transition - just return a (new) token.
+ // (On the next call to nextToken we will see '<' again, but take the other branch in this if instead.)
+ return emitToken(token, WebVTTToken::StringToken(result.toString()));
+ }
+ } else if (cc == kEndOfFileMarker)
+ return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
</ins><span class="cx"> else {
</span><del>- // We don't want to advance input or perform a state transition - just return a (new) token.
- // (On the next call to nextToken we will see '<' again, but take the other branch in this if instead.)
</del><ins>+ result.append(cc);
+ WEBVTT_ADVANCE_TO(DataState);
+ }
+ }
+ END_STATE()
+
+ WEBVTT_BEGIN_STATE(EscapeState) {
+ if (cc == ';') {
+ if (equalLiteral(buffer, "&amp"))
+ result.append('&');
+ else if (equalLiteral(buffer, "&lt"))
+ result.append('<');
+ else if (equalLiteral(buffer, "&gt"))
+ result.append('>');
+ else if (equalLiteral(buffer, "&lrm"))
+ result.append(leftToRightMark);
+ else if (equalLiteral(buffer, "&rlm"))
+ result.append(rightToLeftMark);
+ else if (equalLiteral(buffer, "&nbsp"))
+ result.append(noBreakSpace);
+ else {
+ buffer.append(static_cast<LChar>(cc));
+ result.append(buffer);
+ }
+ buffer.clear();
+ WEBVTT_ADVANCE_TO(DataState);
+ } else if (isASCIIAlphanumeric(cc)) {
+ buffer.append(static_cast<LChar>(cc));
+ WEBVTT_ADVANCE_TO(EscapeState);
+ } else if (cc == '<') {
+ result.append(buffer);
</ins><span class="cx"> return emitToken(token, WebVTTToken::StringToken(result.toString()));
</span><ins>+ } else if (cc == kEndOfFileMarker) {
+ result.append(buffer);
+ return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
+ } else {
+ result.append(buffer);
+ buffer.clear();
+
+ if (cc == '&') {
+ buffer.append(static_cast<LChar>(cc));
+ WEBVTT_ADVANCE_TO(EscapeState);
+ }
+ result.append(cc);
+ WEBVTT_ADVANCE_TO(DataState);
</ins><span class="cx"> }
</span><del>- } else if (character == kEndOfFileMarker)
- return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
- else {
- result.append(character);
- WEBVTT_ADVANCE_TO(DataState);
</del><span class="cx"> }
</span><ins>+ END_STATE()
</ins><span class="cx">
</span><del>-EscapeState:
- if (character == ';') {
- if (equalLiteral(buffer, "&amp"))
- result.append('&');
- else if (equalLiteral(buffer, "&lt"))
- result.append('<');
- else if (equalLiteral(buffer, "&gt"))
- result.append('>');
- else if (equalLiteral(buffer, "&lrm"))
- result.append(leftToRightMark);
- else if (equalLiteral(buffer, "&rlm"))
- result.append(rightToLeftMark);
- else if (equalLiteral(buffer, "&nbsp"))
- result.append(noBreakSpace);
</del><ins>+ WEBVTT_BEGIN_STATE(TagState) {
+ if (isTokenizerWhitespace(cc)) {
+ ASSERT(result.isEmpty());
+ WEBVTT_ADVANCE_TO(StartTagAnnotationState);
+ } else if (cc == '.') {
+ ASSERT(result.isEmpty());
+ WEBVTT_ADVANCE_TO(StartTagClassState);
+ } else if (cc == '/') {
+ WEBVTT_ADVANCE_TO(EndTagState);
+ } else if (WTF::isASCIIDigit(cc)) {
+ result.append(cc);
+ WEBVTT_ADVANCE_TO(TimestampTagState);
+ } else if (cc == '>' || cc == kEndOfFileMarker) {
+ ASSERT(result.isEmpty());
+ return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
+ } else {
+ result.append(cc);
+ WEBVTT_ADVANCE_TO(StartTagState);
+ }
+ }
+ END_STATE()
+
+ WEBVTT_BEGIN_STATE(StartTagState) {
+ if (isTokenizerWhitespace(cc))
+ WEBVTT_ADVANCE_TO(StartTagAnnotationState);
+ else if (cc == '.')
+ WEBVTT_ADVANCE_TO(StartTagClassState);
+ else if (cc == '>' || cc == kEndOfFileMarker)
+ return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
</ins><span class="cx"> else {
</span><del>- buffer.append(character);
- result.append(buffer);
</del><ins>+ result.append(cc);
+ WEBVTT_ADVANCE_TO(StartTagState);
</ins><span class="cx"> }
</span><del>- buffer.clear();
- WEBVTT_ADVANCE_TO(DataState);
- } else if (isASCIIAlphanumeric(character)) {
- buffer.append(character);
- WEBVTT_ADVANCE_TO(EscapeState);
- } else if (character == '<') {
- result.append(buffer);
- return emitToken(token, WebVTTToken::StringToken(result.toString()));
- } else if (character == kEndOfFileMarker) {
- result.append(buffer);
- return advanceAndEmitToken(m_input, token, WebVTTToken::StringToken(result.toString()));
- } else {
- result.append(buffer);
- buffer.clear();
</del><ins>+ }
+ END_STATE()
</ins><span class="cx">
</span><del>- if (character == '&') {
- buffer.append('&');
- WEBVTT_ADVANCE_TO(EscapeState);
</del><ins>+ WEBVTT_BEGIN_STATE(StartTagClassState) {
+ if (isTokenizerWhitespace(cc)) {
+ addNewClass(classes, buffer);
+ buffer.clear();
+ WEBVTT_ADVANCE_TO(StartTagAnnotationState);
+ } else if (cc == '.') {
+ addNewClass(classes, buffer);
+ buffer.clear();
+ WEBVTT_ADVANCE_TO(StartTagClassState);
+ } else if (cc == '>' || cc == kEndOfFileMarker) {
+ addNewClass(classes, buffer);
+ buffer.clear();
+ return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString()));
+ } else {
+ buffer.append(cc);
+ WEBVTT_ADVANCE_TO(StartTagClassState);
</ins><span class="cx"> }
</span><del>- result.append(character);
- WEBVTT_ADVANCE_TO(DataState);
</del><ins>+
</ins><span class="cx"> }
</span><ins>+ END_STATE()
</ins><span class="cx">
</span><del>-TagState:
- if (isTokenizerWhitespace(character)) {
- ASSERT(result.isEmpty());
</del><ins>+ WEBVTT_BEGIN_STATE(StartTagAnnotationState) {
+ if (cc == '>' || cc == kEndOfFileMarker) {
+ return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString(), buffer.toAtomicString()));
+ }
+ buffer.append(cc);
</ins><span class="cx"> WEBVTT_ADVANCE_TO(StartTagAnnotationState);
</span><del>- } else if (character == '.') {
- ASSERT(result.isEmpty());
- WEBVTT_ADVANCE_TO(StartTagClassState);
- } else if (character == '/') {
</del><ins>+ }
+ END_STATE()
+
+ WEBVTT_BEGIN_STATE(EndTagState) {
+ if (cc == '>' || cc == kEndOfFileMarker)
+ return advanceAndEmitToken(m_input, token, WebVTTToken::EndTag(result.toString()));
+ result.append(cc);
</ins><span class="cx"> WEBVTT_ADVANCE_TO(EndTagState);
</span><del>- } else if (WTF::isASCIIDigit(character)) {
- result.append(character);
- WEBVTT_ADVANCE_TO(TimestampTagState);
- } else if (character == '>' || character == kEndOfFileMarker) {
- ASSERT(result.isEmpty());
- return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
- } else {
- result.append(character);
- WEBVTT_ADVANCE_TO(StartTagState);
</del><span class="cx"> }
</span><ins>+ END_STATE()
</ins><span class="cx">
</span><del>-StartTagState:
- if (isTokenizerWhitespace(character))
- WEBVTT_ADVANCE_TO(StartTagAnnotationState);
- else if (character == '.')
- WEBVTT_ADVANCE_TO(StartTagClassState);
- else if (character == '>' || character == kEndOfFileMarker)
- return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString()));
- else {
- result.append(character);
- WEBVTT_ADVANCE_TO(StartTagState);
</del><ins>+ WEBVTT_BEGIN_STATE(TimestampTagState) {
+ if (cc == '>' || cc == kEndOfFileMarker)
+ return advanceAndEmitToken(m_input, token, WebVTTToken::TimestampTag(result.toString()));
+ result.append(cc);
+ WEBVTT_ADVANCE_TO(TimestampTagState);
</ins><span class="cx"> }
</span><ins>+ END_STATE()
</ins><span class="cx">
</span><del>-StartTagClassState:
- if (isTokenizerWhitespace(character)) {
- addNewClass(classes, buffer);
- buffer.clear();
- WEBVTT_ADVANCE_TO(StartTagAnnotationState);
- } else if (character == '.') {
- addNewClass(classes, buffer);
- buffer.clear();
- WEBVTT_ADVANCE_TO(StartTagClassState);
- } else if (character == '>' || character == kEndOfFileMarker) {
- addNewClass(classes, buffer);
- buffer.clear();
- return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString()));
- } else {
- buffer.append(character);
- WEBVTT_ADVANCE_TO(StartTagClassState);
</del><span class="cx"> }
</span><span class="cx">
</span><del>-StartTagAnnotationState:
- if (character == '>' || character == kEndOfFileMarker)
- return advanceAndEmitToken(m_input, token, WebVTTToken::StartTag(result.toString(), classes.toAtomicString(), buffer.toAtomicString()));
- buffer.append(character);
- WEBVTT_ADVANCE_TO(StartTagAnnotationState);
-
-EndTagState:
- if (character == '>' || character == kEndOfFileMarker)
- return advanceAndEmitToken(m_input, token, WebVTTToken::EndTag(result.toString()));
- result.append(character);
- WEBVTT_ADVANCE_TO(EndTagState);
-
-TimestampTagState:
- if (character == '>' || character == kEndOfFileMarker)
- return advanceAndEmitToken(m_input, token, WebVTTToken::TimestampTag(result.toString()));
- result.append(character);
- WEBVTT_ADVANCE_TO(TimestampTagState);
</del><ins>+ ASSERT_NOT_REACHED();
+ return false;
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> }
</span></span></pre></div>
<a id="trunkSourceWebCorehtmltrackWebVTTTokenizerh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/track/WebVTTTokenizer.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/track/WebVTTTokenizer.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/html/track/WebVTTTokenizer.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -40,15 +40,19 @@
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><span class="cx"> class WebVTTTokenizer {
</span><ins>+ WTF_MAKE_NONCOPYABLE(WebVTTTokenizer);
</ins><span class="cx"> public:
</span><span class="cx"> explicit WebVTTTokenizer(const String&);
</span><ins>+
</ins><span class="cx"> bool nextToken(WebVTTToken&);
</span><span class="cx">
</span><del>- static bool neverSkipNullCharacters() { return false; }
</del><ins>+ inline bool shouldSkipNullCharacters() const { return true; }
</ins><span class="cx">
</span><span class="cx"> private:
</span><span class="cx"> SegmentedString m_input;
</span><del>- InputStreamPreprocessor<WebVTTTokenizer> m_preprocessor;
</del><ins>+
+ // ://www.whatwg.org/specs/web-apps/current-work/#preprocessing-the-input-stream
+ InputStreamPreprocessor<WebVTTTokenizer> m_inputStreamPreprocessor;
</ins><span class="cx"> };
</span><span class="cx">
</span><span class="cx"> }
</span></span></pre></div>
<a id="trunkSourceWebCoreplatformtextSegmentedStringcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/platform/text/SegmentedString.cpp (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/platform/text/SegmentedString.cpp        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/platform/text/SegmentedString.cpp        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -20,8 +20,6 @@
</span><span class="cx"> #include "config.h"
</span><span class="cx"> #include "SegmentedString.h"
</span><span class="cx">
</span><del>-#include <wtf/text/TextPosition.h>
-
</del><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><span class="cx"> SegmentedString::SegmentedString(const SegmentedString& other)
</span><span class="lines">@@ -46,7 +44,7 @@
</span><span class="cx"> m_currentChar = m_currentString.m_length ? m_currentString.getCurrentChar() : 0;
</span><span class="cx"> }
</span><span class="cx">
</span><del>-SegmentedString& SegmentedString::operator=(const SegmentedString& other)
</del><ins>+const SegmentedString& SegmentedString::operator=(const SegmentedString& other)
</ins><span class="cx"> {
</span><span class="cx"> m_pushedChar1 = other.m_pushedChar1;
</span><span class="cx"> m_pushedChar2 = other.m_pushedChar2;
</span><span class="lines">@@ -132,14 +130,14 @@
</span><span class="cx"> m_empty = false;
</span><span class="cx"> }
</span><span class="cx">
</span><del>-void SegmentedString::pushBack(const SegmentedSubstring& s)
</del><ins>+void SegmentedString::prepend(const SegmentedSubstring& s)
</ins><span class="cx"> {
</span><del>- ASSERT(!m_pushedChar1);
</del><ins>+ ASSERT(!escaped());
</ins><span class="cx"> ASSERT(!s.numberOfCharactersConsumed());
</span><span class="cx"> if (!s.m_length)
</span><span class="cx"> return;
</span><span class="cx">
</span><del>- // FIXME: We're assuming that the characters were originally consumed by
</del><ins>+ // FIXME: We're assuming that the prepend were originally consumed by
</ins><span class="cx"> // this SegmentedString. We're also ASSERTing that s is a fresh
</span><span class="cx"> // SegmentedSubstring. These assumptions are sufficient for our
</span><span class="cx"> // current use, but we might need to handle the more elaborate
</span><span class="lines">@@ -168,7 +166,7 @@
</span><span class="cx"> void SegmentedString::append(const SegmentedString& s)
</span><span class="cx"> {
</span><span class="cx"> ASSERT(!m_closed);
</span><del>- ASSERT(!s.m_pushedChar1);
</del><ins>+ ASSERT(!s.escaped());
</ins><span class="cx"> append(s.m_currentString);
</span><span class="cx"> if (s.isComposite()) {
</span><span class="cx"> Deque<SegmentedSubstring>::const_iterator it = s.m_substrings.begin();
</span><span class="lines">@@ -179,17 +177,17 @@
</span><span class="cx"> m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0);
</span><span class="cx"> }
</span><span class="cx">
</span><del>-void SegmentedString::pushBack(const SegmentedString& s)
</del><ins>+void SegmentedString::prepend(const SegmentedString& s)
</ins><span class="cx"> {
</span><del>- ASSERT(!m_pushedChar1);
- ASSERT(!s.m_pushedChar1);
</del><ins>+ ASSERT(!escaped());
+ ASSERT(!s.escaped());
</ins><span class="cx"> if (s.isComposite()) {
</span><span class="cx"> Deque<SegmentedSubstring>::const_reverse_iterator it = s.m_substrings.rbegin();
</span><span class="cx"> Deque<SegmentedSubstring>::const_reverse_iterator e = s.m_substrings.rend();
</span><span class="cx"> for (; it != e; ++it)
</span><del>- pushBack(*it);
</del><ins>+ prepend(*it);
</ins><span class="cx"> }
</span><del>- pushBack(s.m_currentString);
</del><ins>+ prepend(s.m_currentString);
</ins><span class="cx"> m_currentChar = m_pushedChar1 ? m_pushedChar1 : (m_currentString.m_length ? m_currentString.getCurrentChar() : 0);
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -230,12 +228,12 @@
</span><span class="cx"> return result.toString();
</span><span class="cx"> }
</span><span class="cx">
</span><del>-void SegmentedString::advancePastNonNewlines(unsigned count, UChar* consumedCharacters)
</del><ins>+void SegmentedString::advance(unsigned count, UChar* consumedCharacters)
</ins><span class="cx"> {
</span><span class="cx"> ASSERT_WITH_SECURITY_IMPLICATION(count <= length());
</span><span class="cx"> for (unsigned i = 0; i < count; ++i) {
</span><span class="cx"> consumedCharacters[i] = currentChar();
</span><del>- advancePastNonNewline();
</del><ins>+ advance();
</ins><span class="cx"> }
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -355,7 +353,8 @@
</span><span class="cx">
</span><span class="cx"> OrdinalNumber SegmentedString::currentColumn() const
</span><span class="cx"> {
</span><del>- return OrdinalNumber::fromZeroBasedInt(numberOfCharactersConsumed() - m_numberOfCharactersConsumedPriorToCurrentLine);
</del><ins>+ int zeroBasedColumn = numberOfCharactersConsumed() - m_numberOfCharactersConsumedPriorToCurrentLine;
+ return OrdinalNumber::fromZeroBasedInt(zeroBasedColumn);
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> void SegmentedString::setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAftreProlog, int prologLength)
</span><span class="lines">@@ -364,18 +363,4 @@
</span><span class="cx"> m_numberOfCharactersConsumedPriorToCurrentLine = numberOfCharactersConsumed() + prologLength - columnAftreProlog.zeroBasedInt();
</span><span class="cx"> }
</span><span class="cx">
</span><del>-SegmentedString::AdvancePastResult SegmentedString::advancePastSlowCase(const char* literal, bool caseSensitive)
-{
- unsigned length = strlen(literal);
- if (length > this->length())
- return NotEnoughCharacters;
- UChar* consumedCharacters;
- String consumedString = String::createUninitialized(length, consumedCharacters);
- advancePastNonNewlines(length, consumedCharacters);
- if (consumedString.startsWith(literal, caseSensitive))
- return DidMatch;
- pushBack(SegmentedString(consumedString));
- return DidNotMatch;
</del><span class="cx"> }
</span><del>-
-}
</del></span></pre></div>
<a id="trunkSourceWebCoreplatformtextSegmentedStringh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/platform/text/SegmentedString.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/platform/text/SegmentedString.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/platform/text/SegmentedString.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- Copyright (C) 2004-2008, 2015 Apple Inc. All rights reserved.
</del><ins>+ Copyright (C) 2004, 2005, 2006, 2007, 2008 Apple Inc. All rights reserved.
</ins><span class="cx">
</span><span class="cx"> This library is free software; you can redistribute it and/or
</span><span class="cx"> modify it under the terms of the GNU Library General Public
</span><span class="lines">@@ -22,6 +22,8 @@
</span><span class="cx">
</span><span class="cx"> #include <wtf/Deque.h>
</span><span class="cx"> #include <wtf/text/StringBuilder.h>
</span><ins>+#include <wtf/text/TextPosition.h>
+#include <wtf/text/WTFString.h>
</ins><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><span class="lines">@@ -168,14 +170,16 @@
</span><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> SegmentedString(const SegmentedString&);
</span><del>- SegmentedString& operator=(const SegmentedString&);
</del><span class="cx">
</span><ins>+ const SegmentedString& operator=(const SegmentedString&);
+
</ins><span class="cx"> void clear();
</span><span class="cx"> void close();
</span><span class="cx">
</span><span class="cx"> void append(const SegmentedString&);
</span><del>- void pushBack(const SegmentedString&);
</del><ins>+ void prepend(const SegmentedString&);
</ins><span class="cx">
</span><ins>+ bool excludeLineNumbers() const { return m_currentString.excludeLineNumbers(); }
</ins><span class="cx"> void setExcludeLineNumbers();
</span><span class="cx">
</span><span class="cx"> void push(UChar c)
</span><span class="lines">@@ -195,10 +199,15 @@
</span><span class="cx">
</span><span class="cx"> bool isClosed() const { return m_closed; }
</span><span class="cx">
</span><del>- enum AdvancePastResult { DidNotMatch, DidMatch, NotEnoughCharacters };
- template<unsigned length> AdvancePastResult advancePast(const char (&literal)[length]) { return advancePast(literal, length - 1, true); }
- template<unsigned length> AdvancePastResult advancePastIgnoringCase(const char (&literal)[length]) { return advancePast(literal, length - 1, false); }
</del><ins>+ enum LookAheadResult {
+ DidNotMatch,
+ DidMatch,
+ NotEnoughCharacters,
+ };
</ins><span class="cx">
</span><ins>+ LookAheadResult lookAhead(const String& string) { return lookAheadInline(string, true); }
+ LookAheadResult lookAheadIgnoringCase(const String& string) { return lookAheadInline(string, false); }
+
</ins><span class="cx"> void advance()
</span><span class="cx"> {
</span><span class="cx"> if (m_fastPathFlags & Use8BitAdvance) {
</span><span class="lines">@@ -217,7 +226,7 @@
</span><span class="cx"> (this->*m_advanceFunc)();
</span><span class="cx"> }
</span><span class="cx">
</span><del>- void advanceAndUpdateLineNumber()
</del><ins>+ inline void advanceAndUpdateLineNumber()
</ins><span class="cx"> {
</span><span class="cx"> if (m_fastPathFlags & Use8BitAdvance) {
</span><span class="cx"> ASSERT(!m_pushedChar1);
</span><span class="lines">@@ -244,6 +253,18 @@
</span><span class="cx"> (this->*m_advanceAndUpdateLineNumberFunc)();
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+ void advanceAndASSERT(UChar expectedCharacter)
+ {
+ ASSERT_UNUSED(expectedCharacter, currentChar() == expectedCharacter);
+ advance();
+ }
+
+ void advanceAndASSERTIgnoringCase(UChar expectedCharacter)
+ {
+ ASSERT_UNUSED(expectedCharacter, u_foldCase(currentChar(), U_FOLD_CASE_DEFAULT) == u_foldCase(expectedCharacter, U_FOLD_CASE_DEFAULT));
+ advance();
+ }
+
</ins><span class="cx"> void advancePastNonNewline()
</span><span class="cx"> {
</span><span class="cx"> ASSERT(currentChar() != '\n');
</span><span class="lines">@@ -265,6 +286,12 @@
</span><span class="cx"> advanceAndUpdateLineNumberSlowCase();
</span><span class="cx"> }
</span><span class="cx">
</span><ins>+ // Writes the consumed characters into consumedCharacters, which must
+ // have space for at least |count| characters.
+ void advance(unsigned count, UChar* consumedCharacters);
+
+ bool escaped() const { return m_pushedChar1; }
+
</ins><span class="cx"> int numberOfCharactersConsumed() const
</span><span class="cx"> {
</span><span class="cx"> int numberOfPushedCharacters = 0;
</span><span class="lines">@@ -280,12 +307,12 @@
</span><span class="cx">
</span><span class="cx"> UChar currentChar() const { return m_currentChar; }
</span><span class="cx">
</span><ins>+ // The method is moderately slow, comparing to currentLine method.
</ins><span class="cx"> OrdinalNumber currentColumn() const;
</span><span class="cx"> OrdinalNumber currentLine() const;
</span><del>-
- // Sets value of line/column variables. Column is specified indirectly by a parameter columnAfterProlog
</del><ins>+ // Sets value of line/column variables. Column is specified indirectly by a parameter columnAftreProlog
</ins><span class="cx"> // which is a value of column that we should get after a prolog (first prologLength characters) has been consumed.
</span><del>- void setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAfterProlog, int prologLength);
</del><ins>+ void setCurrentPosition(OrdinalNumber line, OrdinalNumber columnAftreProlog, int prologLength);
</ins><span class="cx">
</span><span class="cx"> private:
</span><span class="cx"> enum FastPathFlags {
</span><span class="lines">@@ -295,7 +322,7 @@
</span><span class="cx"> };
</span><span class="cx">
</span><span class="cx"> void append(const SegmentedSubstring&);
</span><del>- void pushBack(const SegmentedSubstring&);
</del><ins>+ void prepend(const SegmentedSubstring&);
</ins><span class="cx">
</span><span class="cx"> void advance8();
</span><span class="cx"> void advance16();
</span><span class="lines">@@ -347,13 +374,32 @@
</span><span class="cx"> updateSlowCaseFunctionPointers();
</span><span class="cx"> }
</span><span class="cx">
</span><del>- // Writes consumed characters into consumedCharacters, which must have space for at least |count| characters.
- void advancePastNonNewlines(unsigned count);
- void advancePastNonNewlines(unsigned count, UChar* consumedCharacters);
</del><ins>+ inline LookAheadResult lookAheadInline(const String& string, bool caseSensitive)
+ {
+ if (!m_pushedChar1 && string.length() <= static_cast<unsigned>(m_currentString.m_length)) {
+ String currentSubstring = m_currentString.currentSubString(string.length());
+ if (currentSubstring.startsWith(string, caseSensitive))
+ return DidMatch;
+ return DidNotMatch;
+ }
+ return lookAheadSlowCase(string, caseSensitive);
+ }
+
+ LookAheadResult lookAheadSlowCase(const String& string, bool caseSensitive)
+ {
+ unsigned count = string.length();
+ if (count > length())
+ return NotEnoughCharacters;
+ UChar* consumedCharacters;
+ String consumedString = String::createUninitialized(count, consumedCharacters);
+ advance(count, consumedCharacters);
+ LookAheadResult result = DidNotMatch;
+ if (consumedString.startsWith(string, caseSensitive))
+ result = DidMatch;
+ prepend(SegmentedString(consumedString));
+ return result;
+ }
</ins><span class="cx">
</span><del>- AdvancePastResult advancePast(const char* literal, unsigned length, bool caseSensitive);
- AdvancePastResult advancePastSlowCase(const char* literal, bool caseSensitive);
-
</del><span class="cx"> bool isComposite() const { return !m_substrings.isEmpty(); }
</span><span class="cx">
</span><span class="cx"> UChar m_pushedChar1;
</span><span class="lines">@@ -371,27 +417,6 @@
</span><span class="cx"> void (SegmentedString::*m_advanceAndUpdateLineNumberFunc)();
</span><span class="cx"> };
</span><span class="cx">
</span><del>-inline void SegmentedString::advancePastNonNewlines(unsigned count)
-{
- for (unsigned i = 0; i < count; ++i)
- advancePastNonNewline();
</del><span class="cx"> }
</span><span class="cx">
</span><del>-inline SegmentedString::AdvancePastResult SegmentedString::advancePast(const char* literal, unsigned length, bool caseSensitive)
-{
- ASSERT(strlen(literal) == length);
- ASSERT(!strchr(literal, '\n'));
- if (!m_pushedChar1) {
- if (length <= static_cast<unsigned>(m_currentString.m_length)) {
- if (!m_currentString.currentSubString(length).startsWith(literal, caseSensitive))
- return DidNotMatch;
- advancePastNonNewlines(length);
- return DidMatch;
- }
- }
- return advancePastSlowCase(literal, caseSensitive);
-}
-
-}
-
</del><span class="cx"> #endif
</span></span></pre></div>
<a id="trunkSourceWebCorexmlparserCharacterReferenceParserInlinesh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/xml/parser/CharacterReferenceParserInlines.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/xml/parser/CharacterReferenceParserInlines.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/xml/parser/CharacterReferenceParserInlines.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -31,6 +31,11 @@
</span><span class="cx">
</span><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><ins>+inline bool isHexDigit(UChar cc)
+{
+ return (cc >= '0' && cc <= '9') || (cc >= 'a' && cc <= 'f') || (cc >= 'A' && cc <= 'F');
+}
+
</ins><span class="cx"> inline void unconsumeCharacters(SegmentedString& source, const StringBuilder& consumedCharacters)
</span><span class="cx"> {
</span><span class="cx"> if (consumedCharacters.length() == 1)
</span><span class="lines">@@ -39,7 +44,7 @@
</span><span class="cx"> source.push(consumedCharacters[0]);
</span><span class="cx"> source.push(consumedCharacters[1]);
</span><span class="cx"> } else
</span><del>- source.pushBack(SegmentedString(consumedCharacters.toStringPreserveCapacity()));
</del><ins>+ source.prepend(SegmentedString(consumedCharacters.toStringPreserveCapacity()));
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> template <typename ParserFunctions>
</span><span class="lines">@@ -49,7 +54,7 @@
</span><span class="cx"> ASSERT(!notEnoughCharacters);
</span><span class="cx"> ASSERT(decodedCharacter.isEmpty());
</span><span class="cx">
</span><del>- enum {
</del><ins>+ enum EntityState {
</ins><span class="cx"> Initial,
</span><span class="cx"> Number,
</span><span class="cx"> MaybeHexLowerCaseX,
</span><span class="lines">@@ -57,97 +62,111 @@
</span><span class="cx"> Hex,
</span><span class="cx"> Decimal,
</span><span class="cx"> Named
</span><del>- } state = Initial;
</del><ins>+ };
+ EntityState entityState = Initial;
</ins><span class="cx"> UChar32 result = 0;
</span><ins>+ bool overflow = false;
+ const UChar32 highestValidCharacter = 0x10FFFF;
</ins><span class="cx"> StringBuilder consumedCharacters;
</span><span class="cx">
</span><span class="cx"> while (!source.isEmpty()) {
</span><del>- UChar character = source.currentChar();
- switch (state) {
- case Initial:
- if (character == '\x09' || character == '\x0A' || character == '\x0C' || character == ' ' || character == '<' || character == '&')
</del><ins>+ UChar cc = source.currentChar();
+ switch (entityState) {
+ case Initial: {
+ if (cc == '\x09' || cc == '\x0A' || cc == '\x0C' || cc == ' ' || cc == '<' || cc == '&')
</ins><span class="cx"> return false;
</span><del>- if (additionalAllowedCharacter && character == additionalAllowedCharacter)
</del><ins>+ if (additionalAllowedCharacter && cc == additionalAllowedCharacter)
</ins><span class="cx"> return false;
</span><del>- if (character == '#') {
- state = Number;
</del><ins>+ if (cc == '#') {
+ entityState = Number;
</ins><span class="cx"> break;
</span><span class="cx"> }
</span><del>- if (isASCIIAlpha(character)) {
- state = Named;
- goto Named;
</del><ins>+ if ((cc >= 'a' && cc <= 'z') || (cc >= 'A' && cc <= 'Z')) {
+ entityState = Named;
+ continue;
</ins><span class="cx"> }
</span><span class="cx"> return false;
</span><del>- case Number:
- if (character == 'x') {
- state = MaybeHexLowerCaseX;
</del><ins>+ }
+ case Number: {
+ if (cc == 'x') {
+ entityState = MaybeHexLowerCaseX;
</ins><span class="cx"> break;
</span><span class="cx"> }
</span><del>- if (character == 'X') {
- state = MaybeHexUpperCaseX;
</del><ins>+ if (cc == 'X') {
+ entityState = MaybeHexUpperCaseX;
</ins><span class="cx"> break;
</span><span class="cx"> }
</span><del>- if (isASCIIDigit(character)) {
- state = Decimal;
- goto Decimal;
</del><ins>+ if (cc >= '0' && cc <= '9') {
+ entityState = Decimal;
+ continue;
</ins><span class="cx"> }
</span><span class="cx"> source.push('#');
</span><span class="cx"> return false;
</span><del>- case MaybeHexLowerCaseX:
- if (isASCIIHexDigit(character)) {
- state = Hex;
- goto Hex;
</del><ins>+ }
+ case MaybeHexLowerCaseX: {
+ if (isHexDigit(cc)) {
+ entityState = Hex;
+ continue;
</ins><span class="cx"> }
</span><span class="cx"> source.push('#');
</span><span class="cx"> source.push('x');
</span><span class="cx"> return false;
</span><del>- case MaybeHexUpperCaseX:
- if (isASCIIHexDigit(character)) {
- state = Hex;
- goto Hex;
</del><ins>+ }
+ case MaybeHexUpperCaseX: {
+ if (isHexDigit(cc)) {
+ entityState = Hex;
+ continue;
</ins><span class="cx"> }
</span><span class="cx"> source.push('#');
</span><span class="cx"> source.push('X');
</span><span class="cx"> return false;
</span><del>- case Hex:
- Hex:
- if (isASCIIHexDigit(character)) {
- result = result * 16 + toASCIIHexValue(character);
- break;
- }
- if (character == ';') {
- source.advance();
- decodedCharacter.append(ParserFunctions::legalEntityFor(result));
</del><ins>+ }
+ case Hex: {
+ if (cc >= '0' && cc <= '9')
+ result = result * 16 + cc - '0';
+ else if (cc >= 'a' && cc <= 'f')
+ result = result * 16 + 10 + cc - 'a';
+ else if (cc >= 'A' && cc <= 'F')
+ result = result * 16 + 10 + cc - 'A';
+ else if (cc == ';') {
+ source.advanceAndASSERT(cc);
+ decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
</ins><span class="cx"> return true;
</span><del>- }
- if (ParserFunctions::acceptMalformed()) {
- decodedCharacter.append(ParserFunctions::legalEntityFor(result));
</del><ins>+ } else if (ParserFunctions::acceptMalformed()) {
+ decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
</ins><span class="cx"> return true;
</span><ins>+ } else {
+ unconsumeCharacters(source, consumedCharacters);
+ return false;
</ins><span class="cx"> }
</span><del>- unconsumeCharacters(source, consumedCharacters);
- return false;
- case Decimal:
- Decimal:
- if (isASCIIDigit(character)) {
- // FIXME: What about overflow?
- result = result * 10 + character - '0';
- break;
- }
- if (character == ';') {
- source.advance();
- decodedCharacter.append(ParserFunctions::legalEntityFor(result));
</del><ins>+ if (result > highestValidCharacter)
+ overflow = true;
+ break;
+ }
+ case Decimal: {
+ if (cc >= '0' && cc <= '9')
+ result = result * 10 + cc - '0';
+ else if (cc == ';') {
+ source.advanceAndASSERT(cc);
+ decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
</ins><span class="cx"> return true;
</span><ins>+ } else if (ParserFunctions::acceptMalformed()) {
+ decodedCharacter.append(ParserFunctions::legalEntityFor(overflow ? 0 : result));
+ return true;
+ } else {
+ unconsumeCharacters(source, consumedCharacters);
+ return false;
</ins><span class="cx"> }
</span><del>- if (ParserFunctions::acceptMalformed())
- decodedCharacter.append(ParserFunctions::legalEntityFor(result));
- unconsumeCharacters(source, consumedCharacters);
- return false;
- case Named:
- Named:
- return ParserFunctions::consumeNamedEntity(source, decodedCharacter, notEnoughCharacters, additionalAllowedCharacter, character);
</del><ins>+ if (result > highestValidCharacter)
+ overflow = true;
+ break;
</ins><span class="cx"> }
</span><del>- consumedCharacters.append(character);
- source.advance();
</del><ins>+ case Named: {
+ return ParserFunctions::consumeNamedEntity(source, decodedCharacter, notEnoughCharacters, additionalAllowedCharacter, cc);
+ }
+ }
+ consumedCharacters.append(cc);
+ source.advanceAndASSERT(cc);
</ins><span class="cx"> }
</span><span class="cx"> ASSERT(source.isEmpty());
</span><span class="cx"> notEnoughCharacters = true;
</span></span></pre></div>
<a id="trunkSourceWebCorexmlparserMarkupTokenizerInlinesh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/xml/parser/MarkupTokenizerInlines.h (178172 => 178173)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/xml/parser/MarkupTokenizerInlines.h        2015-01-09 17:16:15 UTC (rev 178172)
+++ trunk/Source/WebCore/xml/parser/MarkupTokenizerInlines.h        2015-01-09 17:44:37 UTC (rev 178173)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2008, 2015 Apple Inc. All Rights Reserved.
</del><ins>+ * Copyright (C) 2008 Apple Inc. All Rights Reserved.
</ins><span class="cx"> * Copyright (C) 2009 Torch Mobile, Inc. http://www.torchmobile.com/
</span><span class="cx"> * Copyright (C) 2010 Google, Inc. All Rights Reserved.
</span><span class="cx"> *
</span><span class="lines">@@ -30,61 +30,64 @@
</span><span class="cx">
</span><span class="cx"> #include "SegmentedString.h"
</span><span class="cx">
</span><del>-#if COMPILER(MSVC)
-// Disable the "unreachable code" warning so we can compile the ASSERT_NOT_REACHED in the END_STATE macro.
-#pragma warning(disable: 4702)
-#endif
-
</del><span class="cx"> namespace WebCore {
</span><span class="cx">
</span><del>-inline bool isTokenizerWhitespace(UChar character)
</del><ins>+inline bool isTokenizerWhitespace(UChar cc)
</ins><span class="cx"> {
</span><del>- return character == ' ' || character == '\x0A' || character == '\x09' || character == '\x0C';
</del><ins>+ return cc == ' ' || cc == '\x0A' || cc == '\x09' || cc == '\x0C';
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-#define BEGIN_STATE(stateName) \
- case stateName: \
- stateName: { \
- const auto currentState = stateName; \
- UNUSED_PARAM(currentState);
</del><ins>+inline void advanceStringAndASSERTIgnoringCase(SegmentedString& source, const char* expectedCharacters)
+{
+ while (*expectedCharacters)
+ source.advanceAndASSERTIgnoringCase(*expectedCharacters++);
+}
</ins><span class="cx">
</span><del>-#define END_STATE() \
- ASSERT_NOT_REACHED(); \
- break; \
- }
</del><ins>+inline void advanceStringAndASSERT(SegmentedString& source, const char* expectedCharacters)
+{
+ while (*expectedCharacters)
+ source.advanceAndASSERT(*expectedCharacters++);
+}
</ins><span class="cx">
</span><del>-#define RETURN_IN_CURRENT_STATE(expression) \
- do { \
- m_state = currentState; \
- return expression; \
- } while (false)
</del><ins>+#if COMPILER(MSVC)
+// We need to disable the "unreachable code" warning because we want to assert
+// that some code points aren't reached in the state machine.
+#pragma warning(disable: 4702)
+#endif
</ins><span class="cx">
</span><del>-// We use this macro when the HTML spec says "reconsume the current input character in the <mumble> state."
-#define RECONSUME_IN(newState) \
- do { \
- goto newState; \
</del><ins>+#define BEGIN_STATE(prefix, stateName) case prefix::stateName: stateName:
+#define END_STATE() ASSERT_NOT_REACHED(); break;
+
+// We use this macro when the HTML5 spec says "reconsume the current input
+// character in the <mumble> state."
+#define RECONSUME_IN(prefix, stateName) \
+ do { \
+ m_state = prefix::stateName; \
+ goto stateName; \
</ins><span class="cx"> } while (false)
</span><span class="cx">
</span><del>-// We use this macro when the HTML spec says "consume the next input character ... and switch to the <mumble> state."
-#define ADVANCE_TO(newState) \
- do { \
- if (!m_preprocessor.advance(source, isNullCharacterSkippingState(newState))) { \
- m_state = newState; \
- return haveBufferedCharacterToken(); \
- } \
- character = m_preprocessor.nextInputCharacter(); \
- goto newState; \
</del><ins>+// We use this macro when the HTML5 spec says "consume the next input
+// character ... and switch to the <mumble> state."
+#define ADVANCE_TO(prefix, stateName) \
+ do { \
+ m_state = prefix::stateName; \
+ if (!m_inputStreamPreprocessor.advance(source)) \
+ return haveBufferedCharacterToken(); \
+ cc = m_inputStreamPreprocessor.nextInputCharacter(); \
+ goto stateName; \
</ins><span class="cx"> } while (false)
</span><span class="cx">
</span><del>-// For more complex cases, caller consumes the characters first and then uses this macro.
-#define SWITCH_TO(newState) \
- do { \
- if (!m_preprocessor.peek(source, isNullCharacterSkippingState(newState))) { \
- m_state = newState; \
- return haveBufferedCharacterToken(); \
- } \
- character = m_preprocessor.nextInputCharacter(); \
- goto newState; \
</del><ins>+// Sometimes there's more complicated logic in the spec that separates when
+// we consume the next input character and when we switch to a particular
+// state. We handle those cases by advancing the source directly and using
+// this macro to switch to the indicated state.
+#define SWITCH_TO(prefix, stateName) \
+ do { \
+ m_state = prefix::stateName; \
+ if (source.isEmpty() || !m_inputStreamPreprocessor.peek(source)) \
+ return haveBufferedCharacterToken(); \
+ cc = m_inputStreamPreprocessor.nextInputCharacter(); \
+ goto stateName; \
</ins><span class="cx"> } while (false)
</span><span class="cx">
</span><span class="cx"> }
</span></span></pre>
</div>
</div>
</body>
</html>