<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[195074] trunk/Source/WebCore</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/195074">195074</a></dd>
<dt>Author</dt> <dd>dbates@webkit.org</dd>
<dt>Date</dt> <dd>2016-01-14 13:40:13 -0800 (Thu, 14 Jan 2016)</dd>
</dl>

<h3>Log Message</h3>
<pre>[XSS Auditor] Extract attribute truncation logic and formalize string canonicalization
https://bugs.webkit.org/show_bug.cgi?id=152874

Reviewed by Brent Fulgham.

Derived from Blink patch (by Tom Sepez &lt;tsepez@chromium.org&gt;):
&lt;https://src.chromium.org/viewvc/blink?revision=176339&amp;view=revision&gt;

Extract the src-like and script-like attribute truncation logic into independent functions
towards making it more straightforward to re-purpose this logic. Additionally, formalize the
concept of string canonicalization as a member function that consolidates the process of
decoding URL escape sequences, truncating the decoded string (if applicable), and removing
characters that are considered noise.

* html/parser/XSSAuditor.cpp:
(WebCore::truncateForSrcLikeAttribute): Extracted from XSSAuditor::decodedSnippetForAttribute().
(WebCore::truncateForScriptLikeAttribute): Ditto.
(WebCore::XSSAuditor::init): Write in terms of XSSAuditor::canonicalize().
(WebCore::XSSAuditor::filterCharacterToken): Updated to make use of formalized canonicalization methods.
(WebCore::XSSAuditor::filterScriptToken): Ditto.
(WebCore::XSSAuditor::filterObjectToken): Ditto.
(WebCore::XSSAuditor::filterParamToken): Ditto.
(WebCore::XSSAuditor::filterEmbedToken): Ditto.
(WebCore::XSSAuditor::filterAppletToken): Ditto.
(WebCore::XSSAuditor::filterFrameToken): Ditto.
(WebCore::XSSAuditor::filterInputToken): Ditto.
(WebCore::XSSAuditor::filterButtonToken): Ditto.
(WebCore::XSSAuditor::eraseDangerousAttributesIfInjected): Ditto.
(WebCore::XSSAuditor::eraseAttributeIfInjected): Updated code to use early return style and avoid an unnecessary string
comparison when we know that a src attribute was injected.
(WebCore::XSSAuditor::canonicalizedSnippetForTagName): Renamed; formerly known as XSSAuditor::decodedSnippetForName(). Updated
to make use of XSSAuditor::canonicalize().
(WebCore::XSSAuditor::snippetFromAttribute): Renamed; formerly known as XSSAuditor::decodedSnippetForAttribute(). Moved
truncation logic from here to WebCore::truncateFor{Script, Src}LikeAttribute.
(WebCore::XSSAuditor::canonicalize): Added.
(WebCore::XSSAuditor::canonicalizedSnippetForJavaScript): Added.
(WebCore::canonicalize): Deleted.
(WebCore::XSSAuditor::decodedSnippetForName): Deleted.
(WebCore::XSSAuditor::decodedSnippetForAttribute): Deleted.
(WebCore::XSSAuditor::decodedSnippetForJavaScript): Deleted.
* html/parser/XSSAuditor.h: Define enum class for the various attribute truncation styles.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkSourceWebCoreChangeLog">trunk/Source/WebCore/ChangeLog</a></li>
<li><a href="#trunkSourceWebCorehtmlparserXSSAuditorcpp">trunk/Source/WebCore/html/parser/XSSAuditor.cpp</a></li>
<li><a href="#trunkSourceWebCorehtmlparserXSSAuditorh">trunk/Source/WebCore/html/parser/XSSAuditor.h</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkSourceWebCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/ChangeLog (195073 => 195074)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/ChangeLog        2016-01-14 21:37:49 UTC (rev 195073)
+++ trunk/Source/WebCore/ChangeLog        2016-01-14 21:40:13 UTC (rev 195074)
</span><span class="lines">@@ -1,5 +1,49 @@
</span><span class="cx"> 2016-01-14  Daniel Bates  &lt;dabates@apple.com&gt;
</span><span class="cx"> 
</span><ins>+        [XSS Auditor] Extract attribute truncation logic and formalize string canonicalization
+        https://bugs.webkit.org/show_bug.cgi?id=152874
+
+        Reviewed by Brent Fulgham.
+
+        Derived from Blink patch (by Tom Sepez &lt;tsepez@chromium.org&gt;):
+        &lt;https://src.chromium.org/viewvc/blink?revision=176339&amp;view=revision&gt;
+
+        Extract the src-like and script-like attribute truncation logic into independent functions
+        towards making it more straightforward to re-purpose this logic. Additionally, formalize the
+        concept of string canonicalization as a member function that consolidates the process of
+        decoding URL escape sequences, truncating the decoded string (if applicable), and removing
+        characters that are considered noise.
+
+        * html/parser/XSSAuditor.cpp:
+        (WebCore::truncateForSrcLikeAttribute): Extracted from XSSAuditor::decodedSnippetForAttribute().
+        (WebCore::truncateForScriptLikeAttribute): Ditto.
+        (WebCore::XSSAuditor::init): Write in terms of XSSAuditor::canonicalize().
+        (WebCore::XSSAuditor::filterCharacterToken): Updated to make use of formalized canonicalization methods.
+        (WebCore::XSSAuditor::filterScriptToken): Ditto.
+        (WebCore::XSSAuditor::filterObjectToken): Ditto.
+        (WebCore::XSSAuditor::filterParamToken): Ditto.
+        (WebCore::XSSAuditor::filterEmbedToken): Ditto.
+        (WebCore::XSSAuditor::filterAppletToken): Ditto.
+        (WebCore::XSSAuditor::filterFrameToken): Ditto.
+        (WebCore::XSSAuditor::filterInputToken): Ditto.
+        (WebCore::XSSAuditor::filterButtonToken): Ditto.
+        (WebCore::XSSAuditor::eraseDangerousAttributesIfInjected): Ditto.
+        (WebCore::XSSAuditor::eraseAttributeIfInjected): Updated code to use early return style and avoid an unnecessary string
+        comparison when we know that a src attribute was injected.
+        (WebCore::XSSAuditor::canonicalizedSnippetForTagName): Renamed; formerly known as XSSAuditor::decodedSnippetForName(). Updated
+        to make use of XSSAuditor::canonicalize().
+        (WebCore::XSSAuditor::snippetFromAttribute): Renamed; formerly known as XSSAuditor::decodedSnippetForAttribute(). Moved
+        truncation logic from here to WebCore::truncateFor{Script, Src}LikeAttribute.
+        (WebCore::XSSAuditor::canonicalize): Added.
+        (WebCore::XSSAuditor::canonicalizedSnippetForJavaScript): Added.
+        (WebCore::canonicalize): Deleted.
+        (WebCore::XSSAuditor::decodedSnippetForName): Deleted.
+        (WebCore::XSSAuditor::decodedSnippetForAttribute): Deleted.
+        (WebCore::XSSAuditor::decodedSnippetForJavaScript): Deleted.
+        * html/parser/XSSAuditor.h: Define enum class for the various attribute truncation styles.
+
+2016-01-14  Daniel Bates  &lt;dabates@apple.com&gt;
+
</ins><span class="cx">         [XSS Auditor] Partial bypass when web server collapses path components
</span><span class="cx">         https://bugs.webkit.org/show_bug.cgi?id=152872
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserXSSAuditorcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/XSSAuditor.cpp (195073 => 195074)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/XSSAuditor.cpp        2016-01-14 21:37:49 UTC (rev 195073)
+++ trunk/Source/WebCore/html/parser/XSSAuditor.cpp        2016-01-14 21:40:13 UTC (rev 195074)
</span><span class="lines">@@ -63,11 +63,6 @@
</span><span class="cx">     return (c == '\\' || c == '0' || c == '\0' || c == '/' || c &gt;= 127);
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-static String canonicalize(const String&amp; string)
-{
-    return string.removeCharacters(&amp;isNonCanonicalCharacter);
-}
-
</del><span class="cx"> static bool isRequiredForInjection(UChar c)
</span><span class="cx"> {
</span><span class="cx">     return (c == '\'' || c == '&quot;' || c == '&lt;' || c == '&gt;');
</span><span class="lines">@@ -180,6 +175,57 @@
</span><span class="cx">     return workingString;
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+static void truncateForSrcLikeAttribute(String&amp; decodedSnippet)
+{
+    // In HTTP URLs, characters following the first ?, #, or third slash may come from
+    // the page itself and can be merely ignored by an attacker's server when a remote
+    // script or script-like resource is requested. In DATA URLS, the payload starts at
+    // the first comma, and the the first /*, //, or &lt;!-- may introduce a comment. Characters
+    // following this may come from the page itself and may be ignored when the script is
+    // executed. For simplicity, we don't differentiate based on URL scheme, and stop at
+    // the first # or ?, the third slash, or the first slash or &lt; once a comma is seen.
+    int slashCount = 0;
+    bool commaSeen = false;
+    for (size_t currentLength = 0; currentLength &lt; decodedSnippet.length(); ++currentLength) {
+        UChar currentChar = decodedSnippet[currentLength];
+        if (currentChar == '?'
+            || currentChar == '#'
+            || ((currentChar == '/' || currentChar == '\\') &amp;&amp; (commaSeen || ++slashCount &gt; 2))
+            || (currentChar == '&lt;' &amp;&amp; commaSeen)) {
+            decodedSnippet.truncate(currentLength);
+            return;
+        }
+        if (currentChar == ',')
+            commaSeen = true;
+    }
+}
+
+static void truncateForScriptLikeAttribute(String&amp; decodedSnippet)
+{
+    // Beware of trailing characters which came from the page itself, not the
+    // injected vector. Excluding the terminating character covers common cases
+    // where the page immediately ends the attribute, but doesn't cover more
+    // complex cases where there is other page data following the injection.
+    // Generally, these won't parse as JavaScript, so the injected vector
+    // typically excludes them from consideration via a single-line comment or
+    // by enclosing them in a string literal terminated later by the page's own
+    // closing punctuation. Since the snippet has not been parsed, the vector
+    // may also try to introduce these via entities. As a result, we'd like to
+    // stop before the first &quot;//&quot;, the first &lt;!--, the first entity, or the first
+    // quote not immediately following the first equals sign (taking whitespace
+    // into consideration). To keep things simpler, we don't try to distinguish
+    // between entity-introducing ampersands vs. other uses, nor do we bother to
+    // check for a second slash for a comment, nor do we bother to check for
+    // !-- following a less-than sign. We stop instead on any ampersand
+    // slash, or less-than sign.
+    size_t position = 0;
+    if ((position = decodedSnippet.find('=')) != notFound
+        &amp;&amp; (position = decodedSnippet.find(isNotHTMLSpace, position + 1)) != notFound
+        &amp;&amp; (position = decodedSnippet.find(isTerminatingCharacter, isHTMLQuote(decodedSnippet[position]) ? position + 1 : position)) != notFound) {
+        decodedSnippet.truncate(position);
+    }
+}
+
</ins><span class="cx"> static ContentSecurityPolicy::ReflectedXSSDisposition combineXSSProtectionHeaderAndCSP(ContentSecurityPolicy::ReflectedXSSDisposition xssProtection, ContentSecurityPolicy::ReflectedXSSDisposition reflectedXSS)
</span><span class="cx"> {
</span><span class="cx">     ContentSecurityPolicy::ReflectedXSSDisposition result = std::max(xssProtection, reflectedXSS);
</span><span class="lines">@@ -269,7 +315,7 @@
</span><span class="cx">     if (document-&gt;decoder())
</span><span class="cx">         m_encoding = document-&gt;decoder()-&gt;encoding();
</span><span class="cx"> 
</span><del>-    m_decodedURL = canonicalize(fullyDecodeString(m_documentURL.string(), m_encoding));
</del><ins>+    m_decodedURL = canonicalize(m_documentURL.string(), TruncationStyle::None);
</ins><span class="cx">     if (m_decodedURL.find(isRequiredForInjection) == notFound)
</span><span class="cx">         m_decodedURL = String();
</span><span class="cx"> 
</span><span class="lines">@@ -307,7 +353,7 @@
</span><span class="cx">         if (httpBody &amp;&amp; !httpBody-&gt;isEmpty()) {
</span><span class="cx">             httpBodyAsString = httpBody-&gt;flattenToString();
</span><span class="cx">             if (!httpBodyAsString.isEmpty()) {
</span><del>-                m_decodedHTTPBody = canonicalize(fullyDecodeString(httpBodyAsString, m_encoding));
</del><ins>+                m_decodedHTTPBody = canonicalize(httpBodyAsString, TruncationStyle::None);
</ins><span class="cx">                 if (m_decodedHTTPBody.find(isRequiredForInjection) == notFound)
</span><span class="cx">                     m_decodedHTTPBody = String();
</span><span class="cx">                 if (m_decodedHTTPBody.length() &gt;= minimumLengthForSuffixTree)
</span><span class="lines">@@ -389,7 +435,7 @@
</span><span class="cx"> bool XSSAuditor::filterCharacterToken(const FilterTokenRequest&amp; request)
</span><span class="cx"> {
</span><span class="cx">     ASSERT(m_scriptTagNestingLevel);
</span><del>-    if (m_wasScriptTagFoundInRequest &amp;&amp; isContainedInRequest(decodedSnippetForJavaScript(request))) {
</del><ins>+    if (m_wasScriptTagFoundInRequest &amp;&amp; isContainedInRequest(canonicalizedSnippetForJavaScript(request))) {
</ins><span class="cx">         request.token.clear();
</span><span class="cx">         LChar space = ' ';
</span><span class="cx">         request.token.appendToCharacter(space); // Technically, character tokens can't be empty.
</span><span class="lines">@@ -403,12 +449,12 @@
</span><span class="cx">     ASSERT(request.token.type() == HTMLToken::StartTag);
</span><span class="cx">     ASSERT(hasName(request.token, scriptTag));
</span><span class="cx"> 
</span><del>-    m_wasScriptTagFoundInRequest = isContainedInRequest(decodedSnippetForName(request));
</del><ins>+    m_wasScriptTagFoundInRequest = isContainedInRequest(canonicalizedSnippetForTagName(request));
</ins><span class="cx"> 
</span><span class="cx">     bool didBlockScript = false;
</span><span class="cx">     if (m_wasScriptTagFoundInRequest) {
</span><del>-        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, blankURL().string(), SrcLikeAttribute);
-        didBlockScript |= eraseAttributeIfInjected(request, XLinkNames::hrefAttr, blankURL().string(), SrcLikeAttribute);
</del><ins>+        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
+        didBlockScript |= eraseAttributeIfInjected(request, XLinkNames::hrefAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     return didBlockScript;
</span><span class="lines">@@ -420,8 +466,8 @@
</span><span class="cx">     ASSERT(hasName(request.token, objectTag));
</span><span class="cx"> 
</span><span class="cx">     bool didBlockScript = false;
</span><del>-    if (isContainedInRequest(decodedSnippetForName(request))) {
-        didBlockScript |= eraseAttributeIfInjected(request, dataAttr, blankURL().string(), SrcLikeAttribute);
</del><ins>+    if (isContainedInRequest(canonicalizedSnippetForTagName(request))) {
+        didBlockScript |= eraseAttributeIfInjected(request, dataAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
</ins><span class="cx">         didBlockScript |= eraseAttributeIfInjected(request, typeAttr);
</span><span class="cx">         didBlockScript |= eraseAttributeIfInjected(request, classidAttr);
</span><span class="cx">     }
</span><span class="lines">@@ -441,7 +487,7 @@
</span><span class="cx">     if (!HTMLParamElement::isURLParameter(String(nameAttribute.value)))
</span><span class="cx">         return false;
</span><span class="cx"> 
</span><del>-    return eraseAttributeIfInjected(request, valueAttr, blankURL().string(), SrcLikeAttribute);
</del><ins>+    return eraseAttributeIfInjected(request, valueAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> bool XSSAuditor::filterEmbedToken(const FilterTokenRequest&amp; request)
</span><span class="lines">@@ -450,9 +496,9 @@
</span><span class="cx">     ASSERT(hasName(request.token, embedTag));
</span><span class="cx"> 
</span><span class="cx">     bool didBlockScript = false;
</span><del>-    if (isContainedInRequest(decodedSnippetForName(request))) {
-        didBlockScript |= eraseAttributeIfInjected(request, codeAttr, String(), SrcLikeAttribute);
-        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, blankURL().string(), SrcLikeAttribute);
</del><ins>+    if (isContainedInRequest(canonicalizedSnippetForTagName(request))) {
+        didBlockScript |= eraseAttributeIfInjected(request, codeAttr, String(), TruncationStyle::SrcLikeAttribute);
+        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
</ins><span class="cx">         didBlockScript |= eraseAttributeIfInjected(request, typeAttr);
</span><span class="cx">     }
</span><span class="cx">     return didBlockScript;
</span><span class="lines">@@ -464,8 +510,8 @@
</span><span class="cx">     ASSERT(hasName(request.token, appletTag));
</span><span class="cx"> 
</span><span class="cx">     bool didBlockScript = false;
</span><del>-    if (isContainedInRequest(decodedSnippetForName(request))) {
-        didBlockScript |= eraseAttributeIfInjected(request, codeAttr, String(), SrcLikeAttribute);
</del><ins>+    if (isContainedInRequest(canonicalizedSnippetForTagName(request))) {
+        didBlockScript |= eraseAttributeIfInjected(request, codeAttr, String(), TruncationStyle::SrcLikeAttribute);
</ins><span class="cx">         didBlockScript |= eraseAttributeIfInjected(request, objectAttr);
</span><span class="cx">     }
</span><span class="cx">     return didBlockScript;
</span><span class="lines">@@ -476,9 +522,9 @@
</span><span class="cx">     ASSERT(request.token.type() == HTMLToken::StartTag);
</span><span class="cx">     ASSERT(hasName(request.token, iframeTag) || hasName(request.token, frameTag));
</span><span class="cx"> 
</span><del>-    bool didBlockScript = eraseAttributeIfInjected(request, srcdocAttr, String(), ScriptLikeAttribute);
-    if (isContainedInRequest(decodedSnippetForName(request)))
-        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, String(), SrcLikeAttribute);
</del><ins>+    bool didBlockScript = eraseAttributeIfInjected(request, srcdocAttr, String(), TruncationStyle::ScriptLikeAttribute);
+    if (isContainedInRequest(canonicalizedSnippetForTagName(request)))
+        didBlockScript |= eraseAttributeIfInjected(request, srcAttr, String(), TruncationStyle::SrcLikeAttribute);
</ins><span class="cx"> 
</span><span class="cx">     return didBlockScript;
</span><span class="cx"> }
</span><span class="lines">@@ -512,7 +558,7 @@
</span><span class="cx">     ASSERT(request.token.type() == HTMLToken::StartTag);
</span><span class="cx">     ASSERT(hasName(request.token, inputTag));
</span><span class="cx"> 
</span><del>-    return eraseAttributeIfInjected(request, formactionAttr, blankURL().string(), SrcLikeAttribute);
</del><ins>+    return eraseAttributeIfInjected(request, formactionAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> bool XSSAuditor::filterButtonToken(const FilterTokenRequest&amp; request)
</span><span class="lines">@@ -520,7 +566,7 @@
</span><span class="cx">     ASSERT(request.token.type() == HTMLToken::StartTag);
</span><span class="cx">     ASSERT(hasName(request.token, buttonTag));
</span><span class="cx"> 
</span><del>-    return eraseAttributeIfInjected(request, formactionAttr, blankURL().string(), SrcLikeAttribute);
</del><ins>+    return eraseAttributeIfInjected(request, formactionAttr, blankURL().string(), TruncationStyle::SrcLikeAttribute);
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> bool XSSAuditor::eraseDangerousAttributesIfInjected(const FilterTokenRequest&amp; request)
</span><span class="lines">@@ -536,7 +582,7 @@
</span><span class="cx">         bool valueContainsJavaScriptURL = (!isInlineEventHandler &amp;&amp; protocolIsJavaScript(strippedValue)) || (isSemicolonSeparatedAttribute(attribute) &amp;&amp; semicolonSeparatedValueContainsJavaScriptURL(strippedValue));
</span><span class="cx">         if (!isInlineEventHandler &amp;&amp; !valueContainsJavaScriptURL)
</span><span class="cx">             continue;
</span><del>-        if (!isContainedInRequest(decodedSnippetForAttribute(request, attribute, ScriptLikeAttribute)))
</del><ins>+        if (!isContainedInRequest(canonicalize(snippetFromAttribute(request, attribute), TruncationStyle::ScriptLikeAttribute)))
</ins><span class="cx">             continue;
</span><span class="cx">         request.token.eraseValueOfAttribute(i);
</span><span class="cx">         if (valueContainsJavaScriptURL)
</span><span class="lines">@@ -546,94 +592,59 @@
</span><span class="cx">     return didBlockScript;
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-bool XSSAuditor::eraseAttributeIfInjected(const FilterTokenRequest&amp; request, const QualifiedName&amp; attributeName, const String&amp; replacementValue, AttributeKind treatment)
</del><ins>+bool XSSAuditor::eraseAttributeIfInjected(const FilterTokenRequest&amp; request, const QualifiedName&amp; attributeName, const String&amp; replacementValue, TruncationStyle truncationStyle)
</ins><span class="cx"> {
</span><span class="cx">     size_t indexOfAttribute = 0;
</span><del>-    if (findAttributeWithName(request.token, attributeName, indexOfAttribute)) {
-        const HTMLToken::Attribute&amp; attribute = request.token.attributes().at(indexOfAttribute);
-        if (isContainedInRequest(decodedSnippetForAttribute(request, attribute, treatment))) {
-            if (threadSafeMatch(attributeName, srcAttr) &amp;&amp; isLikelySafeResource(String(attribute.value)))
-                return false;
-            if (threadSafeMatch(attributeName, http_equivAttr) &amp;&amp; !isDangerousHTTPEquiv(String(attribute.value)))
-                return false;
-            request.token.eraseValueOfAttribute(indexOfAttribute);
-            if (!replacementValue.isEmpty())
-                request.token.appendToAttributeValue(indexOfAttribute, replacementValue);
-            return true;
-        }
</del><ins>+    if (!findAttributeWithName(request.token, attributeName, indexOfAttribute))
+        return false;
+
+    const HTMLToken::Attribute&amp; attribute = request.token.attributes().at(indexOfAttribute);
+    if (!isContainedInRequest(canonicalize(snippetFromAttribute(request, attribute), truncationStyle)))
+        return false;
+
+    if (threadSafeMatch(attributeName, srcAttr)) {
+        if (isLikelySafeResource(String(attribute.value)))
+            return false;
+    } else if (threadSafeMatch(attributeName, http_equivAttr)) {
+        if (!isDangerousHTTPEquiv(String(attribute.value)))
+            return false;
</ins><span class="cx">     }
</span><del>-    return false;
</del><ins>+
+    request.token.eraseValueOfAttribute(indexOfAttribute);
+    if (!replacementValue.isEmpty())
+        request.token.appendToAttributeValue(indexOfAttribute, replacementValue);
+    return true;
</ins><span class="cx"> }
</span><span class="cx"> 
</span><del>-String XSSAuditor::decodedSnippetForName(const FilterTokenRequest&amp; request)
</del><ins>+String XSSAuditor::canonicalizedSnippetForTagName(const FilterTokenRequest&amp; request)
</ins><span class="cx"> {
</span><span class="cx">     // Grab a fixed number of characters equal to the length of the token's name plus one (to account for the &quot;&lt;&quot;).
</span><del>-    return canonicalize(fullyDecodeString(request.sourceTracker.source(request.token), m_encoding).substring(0, request.token.name().size() + 1));
</del><ins>+    return canonicalize(request.sourceTracker.source(request.token).substring(0, request.token.name().size() + 1), TruncationStyle::None);
</ins><span class="cx"> }
</span><span class="cx"> 
</span><del>-String XSSAuditor::decodedSnippetForAttribute(const FilterTokenRequest&amp; request, const HTMLToken::Attribute&amp; attribute, AttributeKind treatment)
</del><ins>+String XSSAuditor::snippetFromAttribute(const FilterTokenRequest&amp; request, const HTMLToken::Attribute&amp; attribute)
</ins><span class="cx"> {
</span><span class="cx">     // The range doesn't include the character which terminates the value. So,
</span><span class="cx">     // for an input of |name=&quot;value&quot;|, the snippet is |name=&quot;value|. For an
</span><span class="cx">     // unquoted input of |name=value |, the snippet is |name=value|.
</span><span class="cx">     // FIXME: We should grab one character before the name also.
</span><del>-    unsigned start = attribute.startOffset;
-    unsigned end = attribute.endOffset;
</del><ins>+    return request.sourceTracker.source(request.token, attribute.startOffset, attribute.endOffset);
+}
</ins><span class="cx"> 
</span><del>-    // We defer canonicalizing the decoded string here to preserve embedded slashes (if any) that
-    // may lead us to truncate the string.
-    String decodedSnippet = fullyDecodeString(request.sourceTracker.source(request.token, start, end), m_encoding);
-    decodedSnippet.truncate(kMaximumFragmentLengthTarget);
-    if (treatment == SrcLikeAttribute) {
-        int slashCount = 0;
-        bool commaSeen = false;
-        // In HTTP URLs, characters following the first ?, #, or third slash may come from 
-        // the page itself and can be merely ignored by an attacker's server when a remote
-        // script or script-like resource is requested. In DATA URLS, the payload starts at
-        // the first comma, and the the first /*, //, or &lt;!-- may introduce a comment. Characters
-        // following this may come from the page itself and may be ignored when the script is
-        // executed. For simplicity, we don't differentiate based on URL scheme, and stop at
-        // the first # or ?, the third slash, or the first slash or &lt; once a comma is seen.
-        for (size_t currentLength = 0; currentLength &lt; decodedSnippet.length(); ++currentLength) {
-            UChar currentChar = decodedSnippet[currentLength];
-            if (currentChar == '?'
-                || currentChar == '#'
-                || ((currentChar == '/' || currentChar == '\\') &amp;&amp; (commaSeen || ++slashCount &gt; 2))
-                || (currentChar == '&lt;' &amp;&amp; commaSeen)) {
-                decodedSnippet.truncate(currentLength);
-                break;
-            }
-            if (currentChar == ',')
-                commaSeen = true;
-        }
-    } else if (treatment == ScriptLikeAttribute) {
-        // Beware of trailing characters which came from the page itself, not the 
-        // injected vector. Excluding the terminating character covers common cases
-        // where the page immediately ends the attribute, but doesn't cover more
-        // complex cases where there is other page data following the injection. 
-        // Generally, these won't parse as javascript, so the injected vector
-        // typically excludes them from consideration via a single-line comment or
-        // by enclosing them in a string literal terminated later by the page's own
-        // closing punctuation. Since the snippet has not been parsed, the vector
-        // may also try to introduce these via entities. As a result, we'd like to
-        // stop before the first &quot;//&quot;, the first &lt;!--, the first entity, or the first
-        // quote not immediately following the first equals sign (taking whitespace
-        // into consideration). To keep things simpler, we don't try to distinguish
-        // between entity-introducing amperands vs. other uses, nor do we bother to
-        // check for a second slash for a comment, nor do we bother to check for
-        // !-- following a less-than sign. We stop instead on any ampersand
-        // slash, or less-than sign.
-        size_t position = 0;
-        if ((position = decodedSnippet.find('=')) != notFound
-            &amp;&amp; (position = decodedSnippet.find(isNotHTMLSpace, position + 1)) != notFound
-            &amp;&amp; (position = decodedSnippet.find(isTerminatingCharacter, isHTMLQuote(decodedSnippet[position]) ? position + 1 : position)) != notFound) {
-            decodedSnippet.truncate(position);
-        }
</del><ins>+String XSSAuditor::canonicalize(const String&amp; snippet, TruncationStyle truncationStyle)
+{
+    String decodedSnippet = fullyDecodeString(snippet, m_encoding);
+    if (truncationStyle != TruncationStyle::None) {
+        decodedSnippet.truncate(kMaximumFragmentLengthTarget);
+        if (truncationStyle == TruncationStyle::SrcLikeAttribute)
+            truncateForSrcLikeAttribute(decodedSnippet);
+        else if (truncationStyle == TruncationStyle::ScriptLikeAttribute)
+            truncateForScriptLikeAttribute(decodedSnippet);
</ins><span class="cx">     }
</span><del>-    return canonicalize(decodedSnippet);
</del><ins>+    return decodedSnippet.removeCharacters(&amp;isNonCanonicalCharacter);
</ins><span class="cx"> }
</span><span class="cx"> 
</span><del>-String XSSAuditor::decodedSnippetForJavaScript(const FilterTokenRequest&amp; request)
</del><ins>+String XSSAuditor::canonicalizedSnippetForJavaScript(const FilterTokenRequest&amp; request)
</ins><span class="cx"> {
</span><span class="cx">     String string = request.sourceTracker.source(request.token);
</span><span class="cx">     size_t startPosition = 0;
</span><span class="lines">@@ -687,7 +698,6 @@
</span><span class="cx">                 foundPosition = lastNonSpacePosition;
</span><span class="cx">                 break;
</span><span class="cx">             }
</span><del>-
</del><span class="cx">             if (foundPosition &gt; startPosition + kMaximumFragmentLengthTarget) {
</span><span class="cx">                 // After hitting the length target, we can only stop at a point where we know we are
</span><span class="cx">                 // not in the middle of a %-escape sequence. For the sake of simplicity, approximate
</span><span class="lines">@@ -701,7 +711,7 @@
</span><span class="cx">                 lastNonSpacePosition = foundPosition;
</span><span class="cx">         }
</span><span class="cx"> 
</span><del>-        result = canonicalize(fullyDecodeString(string.substring(startPosition, foundPosition - startPosition), m_encoding));
</del><ins>+        result = canonicalize(string.substring(startPosition, foundPosition - startPosition), TruncationStyle::None);
</ins><span class="cx">         startPosition = foundPosition + 1;
</span><span class="cx">     }
</span><span class="cx">     return result;
</span></span></pre></div>
<a id="trunkSourceWebCorehtmlparserXSSAuditorh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/html/parser/XSSAuditor.h (195073 => 195074)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/html/parser/XSSAuditor.h        2016-01-14 21:37:49 UTC (rev 195073)
+++ trunk/Source/WebCore/html/parser/XSSAuditor.h        2016-01-14 21:40:13 UTC (rev 195074)
</span><span class="lines">@@ -70,7 +70,8 @@
</span><span class="cx">         Initialized
</span><span class="cx">     };
</span><span class="cx"> 
</span><del>-    enum AttributeKind {
</del><ins>+    enum class TruncationStyle {
+        None,
</ins><span class="cx">         NormalAttribute,
</span><span class="cx">         SrcLikeAttribute,
</span><span class="cx">         ScriptLikeAttribute
</span><span class="lines">@@ -92,12 +93,12 @@
</span><span class="cx">     bool filterButtonToken(const FilterTokenRequest&amp;);
</span><span class="cx"> 
</span><span class="cx">     bool eraseDangerousAttributesIfInjected(const FilterTokenRequest&amp;);
</span><del>-    bool eraseAttributeIfInjected(const FilterTokenRequest&amp;, const QualifiedName&amp;, const String&amp; replacementValue = String(), AttributeKind treatment = NormalAttribute);
</del><ins>+    bool eraseAttributeIfInjected(const FilterTokenRequest&amp;, const QualifiedName&amp;, const String&amp; replacementValue = String(), TruncationStyle = TruncationStyle::NormalAttribute);
</ins><span class="cx"> 
</span><del>-    String decodedSnippetForToken(const HTMLToken&amp;);
-    String decodedSnippetForName(const FilterTokenRequest&amp;);
-    String decodedSnippetForAttribute(const FilterTokenRequest&amp;, const HTMLToken::Attribute&amp;, AttributeKind treatment = NormalAttribute);
-    String decodedSnippetForJavaScript(const FilterTokenRequest&amp;);
</del><ins>+    String canonicalizedSnippetForTagName(const FilterTokenRequest&amp;);
+    String canonicalizedSnippetForJavaScript(const FilterTokenRequest&amp;);
+    String snippetFromAttribute(const FilterTokenRequest&amp;, const HTMLToken::Attribute&amp;);
+    String canonicalize(const String&amp;, TruncationStyle);
</ins><span class="cx"> 
</span><span class="cx">     bool isContainedInRequest(const String&amp;);
</span><span class="cx">     bool isLikelySafeResource(const String&amp; url);
</span></span></pre>
</div>
</div>

</body>
</html>