<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[244827] trunk</title>
</head>
<body>
<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; }
#msg dl a { font-weight: bold}
#msg dl a:link { color:#fc3; }
#msg dl a:active { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/244827">244827</a></dd>
<dt>Author</dt> <dd>sroberts@apple.com</dd>
<dt>Date</dt> <dd>2019-05-01 10:13:58 -0700 (Wed, 01 May 2019)</dd>
</dl>
<h3>Log Message</h3>
<pre>Unreviewed, rolling out <a href="http://trac.webkit.org/projects/webkit/changeset/244821">r244821</a>.
LayoutTests/imported/w3c:
Causing
Reverted changeset:
"WebKit has too much of its own UTF-8 code and should rely
more on ICU's UTF-8 support"
https://bugs.webkit.org/show_bug.cgi?id=195535
https://trac.webkit.org/changeset/244821
Source/JavaScriptCore:
Causing
Reverted changeset:
"WebKit has too much of its own UTF-8 code and should rely
more on ICU's UTF-8 support"
https://bugs.webkit.org/show_bug.cgi?id=195535
https://trac.webkit.org/changeset/244821
Source/WebCore:
Causing
Reverted changeset:
"WebKit has too much of its own UTF-8 code and should rely
more on ICU's UTF-8 support"
https://bugs.webkit.org/show_bug.cgi?id=195535
https://trac.webkit.org/changeset/244821
Source/WebKit:
Causing
Reverted changeset:
"WebKit has too much of its own UTF-8 code and should rely
more on ICU's UTF-8 support"
https://bugs.webkit.org/show_bug.cgi?id=195535
https://trac.webkit.org/changeset/244821
Source/WTF:
Causing
Reverted changeset:
"WebKit has too much of its own UTF-8 code and should rely
more on ICU's UTF-8 support"
https://bugs.webkit.org/show_bug.cgi?id=195535
https://trac.webkit.org/changeset/244821
LayoutTests:
Causing 4 Test262 failures on JSC Release and Debug
Reverted changeset:
"WebKit has too much of its own UTF-8 code and should rely
more on ICU's UTF-8 support"
https://bugs.webkit.org/show_bug.cgi?id=195535
https://trac.webkit.org/changeset/244821</pre>
<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkLayoutTestsChangeLog">trunk/LayoutTests/ChangeLog</a></li>
<li><a href="#trunkLayoutTestscss3escapedomapiexpectedtxt">trunk/LayoutTests/css3/escape-dom-api-expected.txt</a></li>
<li><a href="#trunkLayoutTestsfasttextdanglingsurrogatesexpectedtxt">trunk/LayoutTests/fast/text/dangling-surrogates-expected.txt</a></li>
<li><a href="#trunkLayoutTestsimportedw3cChangeLog">trunk/LayoutTests/imported/w3c/ChangeLog</a></li>
<li><a href="#trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecoderutf16surrogatesexpectedtxt">trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt</a></li>
<li><a href="#trunkLayoutTestsjsdomwebidltypemappingexpectedtxt">trunk/LayoutTests/js/dom/webidl-type-mapping-expected.txt</a></li>
<li><a href="#trunkLayoutTestsjsinvalidutf8insyntaxerrorexpectedtxt">trunk/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt</a></li>
<li><a href="#trunkSourceJavaScriptCoreAPIJSClassRefcpp">trunk/Source/JavaScriptCore/API/JSClassRef.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreAPIJSStringRefcpp">trunk/Source/JavaScriptCore/API/JSStringRef.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreChangeLog">trunk/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreruntimeJSGlobalObjectFunctionscpp">trunk/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCorewasmWasmParserh">trunk/Source/JavaScriptCore/wasm/WasmParser.h</a></li>
<li><a href="#trunkSourceWTFChangeLog">trunk/Source/WTF/ChangeLog</a></li>
<li><a href="#trunkSourceWTFwtftextAtomicStringcpp">trunk/Source/WTF/wtf/text/AtomicString.cpp</a></li>
<li><a href="#trunkSourceWTFwtftextAtomicStringImplcpp">trunk/Source/WTF/wtf/text/AtomicStringImpl.cpp</a></li>
<li><a href="#trunkSourceWTFwtftextAtomicStringImplh">trunk/Source/WTF/wtf/text/AtomicStringImpl.h</a></li>
<li><a href="#trunkSourceWTFwtftextStringImplcpp">trunk/Source/WTF/wtf/text/StringImpl.cpp</a></li>
<li><a href="#trunkSourceWTFwtftextStringViewcpp">trunk/Source/WTF/wtf/text/StringView.cpp</a></li>
<li><a href="#trunkSourceWTFwtftextWTFStringcpp">trunk/Source/WTF/wtf/text/WTFString.cpp</a></li>
<li><a href="#trunkSourceWTFwtfunicodeUTF8Conversioncpp">trunk/Source/WTF/wtf/unicode/UTF8Conversion.cpp</a></li>
<li><a href="#trunkSourceWTFwtfunicodeUTF8Conversionh">trunk/Source/WTF/wtf/unicode/UTF8Conversion.h</a></li>
<li><a href="#trunkSourceWebCoreChangeLog">trunk/Source/WebCore/ChangeLog</a></li>
<li><a href="#trunkSourceWebCoreplatformSharedBuffercpp">trunk/Source/WebCore/platform/SharedBuffer.cpp</a></li>
<li><a href="#trunkSourceWebCorexmlXSLTProcessorLibxsltcpp">trunk/Source/WebCore/xml/XSLTProcessorLibxslt.cpp</a></li>
<li><a href="#trunkSourceWebCorexmlparserXMLDocumentParserLibxml2cpp">trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp</a></li>
<li><a href="#trunkSourceWebKitChangeLog">trunk/Source/WebKit/ChangeLog</a></li>
<li><a href="#trunkSourceWebKitSharedAPIAPIStringh">trunk/Source/WebKit/Shared/API/APIString.h</a></li>
<li><a href="#trunkSourceWebKitSharedAPIcWKStringcpp">trunk/Source/WebKit/Shared/API/c/WKString.cpp</a></li>
</ul>
<h3>Removed Paths</h3>
<ul>
<li><a href="#trunkLayoutTestsjsinvalidutf8insyntaxerrorhtml">trunk/LayoutTests/js/invalid-utf8-in-syntax-error.html</a></li>
</ul>
</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkLayoutTestsChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/ChangeLog (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/ChangeLog 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/LayoutTests/ChangeLog 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1,3 +1,16 @@
</span><ins>+2019-05-01 Shawn Roberts <sroberts@apple.com>
+
+ Unreviewed, rolling out r244821.
+
+ Causing 4 Test262 failures on JSC Release and Debug
+
+ Reverted changeset:
+
+ "WebKit has too much of its own UTF-8 code and should rely
+ more on ICU's UTF-8 support"
+ https://bugs.webkit.org/show_bug.cgi?id=195535
+ https://trac.webkit.org/changeset/244821
+
</ins><span class="cx"> 2019-05-01 Youenn Fablet <youenn@apple.com>
</span><span class="cx">
</span><span class="cx"> Reject/throw when calling AudioContext methods on a stopped AudioContext
</span></span></pre></div>
<a id="trunkLayoutTestscss3escapedomapiexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/css3/escape-dom-api-expected.txt (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/css3/escape-dom-api-expected.txt 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/LayoutTests/css3/escape-dom-api-expected.txt 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -4,14 +4,14 @@
</span><span class="cx">
</span><span class="cx">
</span><span class="cx"> PASS CSS.escape.length is 1
</span><del>-PASS CSS.escape('\0') is "�"
-PASS CSS.escape('a\0') is "a�"
-PASS CSS.escape('\0b') is "�b"
-PASS CSS.escape('a\0b') is "a�b"
-PASS CSS.escape('�') is "�"
-PASS CSS.escape('a�') is "a�"
-PASS CSS.escape('�b') is "�b"
-PASS CSS.escape('a�b') is "a�b"
</del><ins>+PASS CSS.escape('\0') is "�"
+PASS CSS.escape('a\0') is "a�"
+PASS CSS.escape('\0b') is "�b"
+PASS CSS.escape('a\0b') is "a�b"
+PASS CSS.escape('�') is "�"
+PASS CSS.escape('a�') is "a�"
+PASS CSS.escape('�b') is "�b"
+PASS CSS.escape('a�b') is "a�b"
</ins><span class="cx"> PASS CSS.escape() threw exception TypeError: Not enough arguments.
</span><span class="cx"> PASS CSS.escape(undefined) is "undefined"
</span><span class="cx"> PASS CSS.escape(true) is "true"
</span><span class="lines">@@ -53,16 +53,16 @@
</span><span class="cx"> PASS CSS.escape('-a') is "-a"
</span><span class="cx"> PASS CSS.escape('--') is "--"
</span><span class="cx"> PASS CSS.escape('--a') is "--a"
</span><del>-PASS CSS.escape('-_©') is "-_©"
-PASS CSS.escape('
') is "\\7f
"
-PASS CSS.escape(' ¡¢') is " ¡¢"
</del><ins>+PASS CSS.escape('Â-_©') is "Â-_©"
+PASS CSS.escape('ÂÂÂÂÂÂ
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ') is "\\7f ÂÂÂÂÂÂ
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ"
+PASS CSS.escape(' ¡¢') is " ¡¢"
</ins><span class="cx"> PASS CSS.escape('a0123456789b') is "a0123456789b"
</span><span class="cx"> PASS CSS.escape('abcdefghijklmnopqrstuvwxyz') is "abcdefghijklmnopqrstuvwxyz"
</span><span class="cx"> PASS CSS.escape('ABCDEFGHIJKLMNOPQRSTUVWXYZ') is "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
</span><span class="cx"> PASS CSS.escape(' !xy') is "\\ \\!xy"
</span><del>-PASS CSS.escape('𝌆') is "𝌆"
-PASS CSS.escape('�') is "\udf06"
-PASS CSS.escape('�') is "\ud834"
</del><ins>+PASS CSS.escape('ð') is "ð"
+PASS CSS.escape('í¼') is "\udf06"
+PASS CSS.escape('í ´') is "\ud834"
</ins><span class="cx"> PASS successfullyParsed is true
</span><span class="cx">
</span><span class="cx"> TEST COMPLETE
</span></span></pre></div>
<a id="trunkLayoutTestsfasttextdanglingsurrogatesexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/fast/text/dangling-surrogates-expected.txt (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/fast/text/dangling-surrogates-expected.txt 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/LayoutTests/fast/text/dangling-surrogates-expected.txt 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -3,8 +3,8 @@
</span><span class="cx"> On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
</span><span class="cx">
</span><span class="cx">
</span><del>-PASS danglingFirst is "�"
-PASS danglingSecond is "�"
</del><ins>+PASS danglingFirst is "í "
+PASS danglingSecond is "í°"
</ins><span class="cx"> PASS successfullyParsed is true
</span><span class="cx">
</span><span class="cx"> TEST COMPLETE
</span></span></pre></div>
<a id="trunkLayoutTestsimportedw3cChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/imported/w3c/ChangeLog (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/imported/w3c/ChangeLog 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/LayoutTests/imported/w3c/ChangeLog 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1,3 +1,16 @@
</span><ins>+2019-05-01 Shawn Roberts <sroberts@apple.com>
+
+ Unreviewed, rolling out r244821.
+
+ Causing
+
+ Reverted changeset:
+
+ "WebKit has too much of its own UTF-8 code and should rely
+ more on ICU's UTF-8 support"
+ https://bugs.webkit.org/show_bug.cgi?id=195535
+ https://trac.webkit.org/changeset/244821
+
</ins><span class="cx"> 2019-05-01 Youenn Fablet <youenn@apple.com>
</span><span class="cx">
</span><span class="cx"> Kept alive loaders should use the redirected request in case of redirections
</span></span></pre></div>
<a id="trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecoderutf16surrogatesexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1,21 +1,21 @@
</span><span class="cx">
</span><del>-FAIL utf-16le - lone surrogate lead assert_equals: expected "\ufffd" but got "�"
</del><ins>+FAIL utf-16le - lone surrogate lead assert_equals: expected "\ufffd" but got "í "
</ins><span class="cx"> FAIL utf-16le - lone surrogate lead (fatal flag set) assert_throws: function "function () {
</span><span class="cx"> new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input))
</span><span class="cx"> }" did not throw
</span><del>-FAIL utf-16le - lone surrogate trail assert_equals: expected "\ufffd" but got "�"
</del><ins>+FAIL utf-16le - lone surrogate trail assert_equals: expected "\ufffd" but got "í°"
</ins><span class="cx"> FAIL utf-16le - lone surrogate trail (fatal flag set) assert_throws: function "function () {
</span><span class="cx"> new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input))
</span><span class="cx"> }" did not throw
</span><del>-FAIL utf-16le - unmatched surrogate lead assert_equals: expected "\ufffd\0" but got "�\0"
</del><ins>+FAIL utf-16le - unmatched surrogate lead assert_equals: expected "\ufffd\0" but got "í \0"
</ins><span class="cx"> FAIL utf-16le - unmatched surrogate lead (fatal flag set) assert_throws: function "function () {
</span><span class="cx"> new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input))
</span><span class="cx"> }" did not throw
</span><del>-FAIL utf-16le - unmatched surrogate trail assert_equals: expected "\ufffd\0" but got "�\0"
</del><ins>+FAIL utf-16le - unmatched surrogate trail assert_equals: expected "\ufffd\0" but got "í°\0"
</ins><span class="cx"> FAIL utf-16le - unmatched surrogate trail (fatal flag set) assert_throws: function "function () {
</span><span class="cx"> new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input))
</span><span class="cx"> }" did not throw
</span><del>-FAIL utf-16le - swapped surrogate pair assert_equals: expected "\ufffd\ufffd" but got "��"
</del><ins>+FAIL utf-16le - swapped surrogate pair assert_equals: expected "\ufffd\ufffd" but got "í°í "
</ins><span class="cx"> FAIL utf-16le - swapped surrogate pair (fatal flag set) assert_throws: function "function () {
</span><span class="cx"> new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input))
</span><span class="cx"> }" did not throw
</span></span></pre></div>
<a id="trunkLayoutTestsjsdomwebidltypemappingexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/js/dom/webidl-type-mapping-expected.txt (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/js/dom/webidl-type-mapping-expected.txt 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/LayoutTests/js/dom/webidl-type-mapping-expected.txt 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1009,48 +1009,48 @@
</span><span class="cx">
</span><span class="cx"> converter.testUSVString = '!@#123ABCabc\x00\x80\xFF\r\n\t'
</span><span class="cx"> converter.testString = '!@#123ABCabc\x00\x80\xFF\r\n\t'
</span><del>-PASS converter.testUSVString is "!@#123ABCabc\u0000ÿ\r\n\t"
-PASS converter.testString is "!@#123ABCabc\u0000ÿ\r\n\t"
</del><ins>+PASS converter.testUSVString is "!@#123ABCabc\u0000Âÿ\r\n\t"
+PASS converter.testString is "!@#123ABCabc\u0000Âÿ\r\n\t"
</ins><span class="cx"> converter.testUSVString = '\u0100'
</span><span class="cx"> converter.testString = '\u0100'
</span><del>-PASS converter.testUSVString is "Ā"
-PASS converter.testString is "Ā"
</del><ins>+PASS converter.testUSVString is "Ä"
+PASS converter.testString is "Ä"
</ins><span class="cx"> PASS converter.testUSVString = {toString: function() { throw Error(); }} threw exception Error.
</span><span class="cx"> PASS converter.testString = {toString: function() { throw Error(); }} threw exception Error.
</span><del>-PASS converter.testUSVString is "Ā"
-PASS converter.testString is "Ā"
</del><ins>+PASS converter.testUSVString is "Ä"
+PASS converter.testString is "Ä"
</ins><span class="cx"> converter.testUSVString = "\ud800"
</span><span class="cx"> converter.testString = "\ud800"
</span><del>-PASS converter.testUSVString is "�"
</del><ins>+PASS converter.testUSVString is "�"
</ins><span class="cx"> PASS converter.testString is "\ud800"
</span><span class="cx"> converter.testUSVString = "\udc00"
</span><span class="cx"> converter.testString = "\udc00"
</span><del>-PASS converter.testUSVString is "�"
</del><ins>+PASS converter.testUSVString is "�"
</ins><span class="cx"> PASS converter.testString is "\udc00"
</span><span class="cx"> converter.testUSVString = "\ud800\u0000"
</span><span class="cx"> converter.testString = "\ud800\u0000"
</span><del>-PASS converter.testUSVString is "�\u0000"
</del><ins>+PASS converter.testUSVString is "�\u0000"
</ins><span class="cx"> PASS converter.testString is "\ud800\u0000"
</span><span class="cx"> converter.testUSVString = "\udc00\u0000"
</span><span class="cx"> converter.testString = "\udc00\u0000"
</span><del>-PASS converter.testUSVString is "�\u0000"
</del><ins>+PASS converter.testUSVString is "�\u0000"
</ins><span class="cx"> PASS converter.testString is "\udc00\u0000"
</span><span class="cx"> converter.testUSVString = "\udc00\ud800"
</span><span class="cx"> converter.testString = "\udc00\ud800"
</span><del>-PASS converter.testUSVString is "��"
</del><ins>+PASS converter.testUSVString is "��"
</ins><span class="cx"> PASS converter.testString is "\udc00\ud800"
</span><del>-converter.testUSVString = "𝄞"
-converter.testString = "𝄞"
-PASS converter.testUSVString is "𝄞"
-PASS converter.testString is "𝄞"
</del><ins>+converter.testUSVString = "ð"
+converter.testString = "ð"
+PASS converter.testUSVString is "ð"
+PASS converter.testString is "ð"
</ins><span class="cx"> converter.testByteString = '!@#123ABCabc\x00\x80\xFF\r\n\t'
</span><del>-PASS converter.testByteString is "!@#123ABCabc\u0000ÿ\r\n\t"
</del><ins>+PASS converter.testByteString is "!@#123ABCabc\u0000Âÿ\r\n\t"
</ins><span class="cx"> converter.testByteString = '\u00FF'
</span><del>-PASS converter.testByteString is "ÿ"
</del><ins>+PASS converter.testByteString is "ÿ"
</ins><span class="cx"> PASS converter.testByteString = '\u0100' threw exception TypeError: Type error.
</span><del>-PASS converter.testByteString is "ÿ"
</del><ins>+PASS converter.testByteString is "ÿ"
</ins><span class="cx"> PASS converter.testByteString = {toString: function() { throw Error(); }} threw exception Error.
</span><del>-PASS converter.testByteString is "ÿ"
</del><ins>+PASS converter.testByteString is "ÿ"
</ins><span class="cx"> converter.testUSVString = true
</span><span class="cx"> converter.testString = true
</span><span class="cx"> converter.testByteString = true
</span><span class="lines">@@ -1180,37 +1180,37 @@
</span><span class="cx"> PASS 'key2' in converter.testNodeRecord() is true
</span><span class="cx"> PASS converter.testNodeRecord()['key2'] is document.documentElement
</span><span class="cx"> PASS converter.setTestNodeRecord({ key: 'hello' }) threw exception TypeError: Type error.
</span><del>-converter.setTestLongRecord({'�': 1 })
-PASS converter.testLongRecord()['�'] is 1
-converter.setTestNodeRecord({'�': document })
-PASS converter.testNodeRecord()['�'] is document
-converter.setTestLongRecord({'�': 1 })
-PASS converter.testLongRecord()['�'] is 1
-converter.setTestNodeRecord({'�': document })
-PASS converter.testNodeRecord()['�'] is document
-converter.setTestLongRecord({'�': 1 })
-PASS converter.testLongRecord()['�\0'] is 1
-converter.setTestNodeRecord({'�': document })
-PASS converter.testNodeRecord()['�\0'] is document
-converter.setTestLongRecord({'�': 1 })
-PASS converter.testLongRecord()['�\0'] is 1
-converter.setTestNodeRecord({'�': document })
-PASS converter.testNodeRecord()['�\0'] is document
-converter.setTestLongRecord({'��': 1 })
-PASS converter.testLongRecord()['��'] is 1
-converter.setTestNodeRecord({'��': document })
-PASS converter.testNodeRecord()['��'] is document
-converter.setTestLongRecord({'𝄞': 1 })
-PASS converter.testLongRecord()['𝄞'] is 1
-converter.setTestNodeRecord({'𝄞': document })
-PASS converter.testNodeRecord()['𝄞'] is document
</del><ins>+converter.setTestLongRecord({'í ': 1 })
+PASS converter.testLongRecord()['í '] is 1
+converter.setTestNodeRecord({'í ': document })
+PASS converter.testNodeRecord()['�'] is document
+converter.setTestLongRecord({'í°': 1 })
+PASS converter.testLongRecord()['í°'] is 1
+converter.setTestNodeRecord({'í°': document })
+PASS converter.testNodeRecord()['�'] is document
+converter.setTestLongRecord({'í ': 1 })
+PASS converter.testLongRecord()['í \0'] is 1
+converter.setTestNodeRecord({'í ': document })
+PASS converter.testNodeRecord()['�\0'] is document
+converter.setTestLongRecord({'í°': 1 })
+PASS converter.testLongRecord()['í°\0'] is 1
+converter.setTestNodeRecord({'í°': document })
+PASS converter.testNodeRecord()['�\0'] is document
+converter.setTestLongRecord({'í°í ': 1 })
+PASS converter.testLongRecord()['í°í '] is 1
+converter.setTestNodeRecord({'í°í ': document })
+PASS converter.testNodeRecord()['��'] is document
+converter.setTestLongRecord({'ð': 1 })
+PASS converter.testLongRecord()['ð'] is 1
+converter.setTestNodeRecord({'ð': document })
+PASS converter.testNodeRecord()['ð'] is document
</ins><span class="cx"> converter.setTestSequenceRecord({ key: ['value', 'other value'] })
</span><span class="cx"> PASS converter.testSequenceRecord().hasOwnProperty('key') is true
</span><span class="cx"> PASS 'key' in converter.testSequenceRecord() is true
</span><span class="cx"> PASS converter.testSequenceRecord()['key'] is ['value', 'other value']
</span><del>-PASS converter.setTestSequenceRecord({ 'Ā': ['value'] }) threw exception TypeError: Type error.
-converter.setTestSequenceRecord({ 'ÿ': ['value'] })
-PASS converter.testSequenceRecord()['ÿ'] is ['value']
</del><ins>+PASS converter.setTestSequenceRecord({ 'Ä': ['value'] }) threw exception TypeError: Type error.
+converter.setTestSequenceRecord({ 'ÿ': ['value'] })
+PASS converter.testSequenceRecord()['ÿ'] is ['value']
</ins><span class="cx"> PASS converter.testImpureNaNUnrestrictedDouble is NaN
</span><span class="cx"> PASS converter.testImpureNaN2UnrestrictedDouble is NaN
</span><span class="cx"> PASS converter.testQuietNaNUnrestrictedDouble is NaN
</span></span></pre></div>
<a id="trunkLayoutTestsjsinvalidutf8insyntaxerrorexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -3,7 +3,7 @@
</span><span class="cx"> On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
</span><span class="cx">
</span><span class="cx">
</span><del>-PASS ({f("�")}) threw exception SyntaxError: Unexpected string literal "�". Expected a parameter pattern or a ')' in parameter list..
</del><ins>+PASS ({f("\x{DEAD}")}) threw exception SyntaxError: Unexpected string literal "íº". Expected a parameter pattern or a ')' in parameter list..
</ins><span class="cx"> PASS successfullyParsed is true
</span><span class="cx">
</span><span class="cx"> TEST COMPLETE
</span></span></pre></div>
<a id="trunkLayoutTestsjsinvalidutf8insyntaxerrorhtml"></a>
<div class="delfile"><h4>Deleted: trunk/LayoutTests/js/invalid-utf8-in-syntax-error.html (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/js/invalid-utf8-in-syntax-error.html 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/LayoutTests/js/invalid-utf8-in-syntax-error.html 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1,10 +0,0 @@
</span><del>-<!DOCTYPE html>
-<html>
-<head>
-<meta charset="utf-8">
-<script src="../resources/js-test.js"></script>
-</head>
-<body>
-<script src="script-tests/invalid-utf8-in-syntax-error.js"></script>
-</body>
-</html>
</del></span></pre></div>
<a id="trunkSourceJavaScriptCoreAPIJSClassRefcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/API/JSClassRef.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/API/JSClassRef.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/JavaScriptCore/API/JSClassRef.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -35,8 +35,10 @@
</span><span class="cx"> #include "ObjectPrototype.h"
</span><span class="cx"> #include "JSCInlines.h"
</span><span class="cx"> #include <wtf/text/StringHash.h>
</span><ins>+#include <wtf/unicode/UTF8Conversion.h>
</ins><span class="cx">
</span><span class="cx"> using namespace JSC;
</span><ins>+using namespace WTF::Unicode;
</ins><span class="cx">
</span><span class="cx"> const JSClassDefinition kJSClassDefinitionEmpty = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
</span><span class="cx">
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreAPIJSStringRefcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/API/JSStringRef.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/API/JSStringRef.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/JavaScriptCore/API/JSStringRef.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -49,7 +49,7 @@
</span><span class="cx"> UChar* p = buffer.data();
</span><span class="cx"> bool sourceIsAllASCII;
</span><span class="cx"> const LChar* stringStart = reinterpret_cast<const LChar*>(string);
</span><del>- if (convertUTF8ToUTF16(string, string + length, &p, p + length, &sourceIsAllASCII)) {
</del><ins>+ if (conversionOK == convertUTF8ToUTF16(&string, string + length, &p, p + length, &sourceIsAllASCII)) {
</ins><span class="cx"> if (sourceIsAllASCII)
</span><span class="cx"> return &OpaqueJSString::create(stringStart, length).leakRef();
</span><span class="cx"> return &OpaqueJSString::create(buffer.data(), p - buffer.data()).leakRef();
</span><span class="lines">@@ -102,18 +102,20 @@
</span><span class="cx"> return 0;
</span><span class="cx">
</span><span class="cx"> char* destination = buffer;
</span><del>- bool failed = false;
</del><ins>+ ConversionResult result;
</ins><span class="cx"> if (string->is8Bit()) {
</span><span class="cx"> const LChar* source = string->characters8();
</span><del>- convertLatin1ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1);
</del><ins>+ result = convertLatin1ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1);
</ins><span class="cx"> } else {
</span><span class="cx"> const UChar* source = string->characters16();
</span><del>- ConversionResult result = convertUTF16ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1);
- failed = result != ConversionOK && result != TargetExhausted;
</del><ins>+ result = convertUTF16ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1, true);
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> *destination++ = '\0';
</span><del>- return failed ? 0 : destination - buffer;
</del><ins>+ if (result != conversionOK && result != targetExhausted)
+ return 0;
+
+ return destination - buffer;
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> bool JSStringIsEqual(JSStringRef a, JSStringRef b)
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ChangeLog (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ChangeLog 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/JavaScriptCore/ChangeLog 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1,3 +1,16 @@
</span><ins>+2019-05-01 Shawn Roberts <sroberts@apple.com>
+
+ Unreviewed, rolling out r244821.
+
+ Causing
+
+ Reverted changeset:
+
+ "WebKit has too much of its own UTF-8 code and should rely
+ more on ICU's UTF-8 support"
+ https://bugs.webkit.org/show_bug.cgi?id=195535
+ https://trac.webkit.org/changeset/244821
+
</ins><span class="cx"> 2019-04-29 Darin Adler <darin@apple.com>
</span><span class="cx">
</span><span class="cx"> WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreruntimeJSGlobalObjectFunctionscpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -58,9 +58,12 @@
</span><span class="cx"> #include <wtf/MathExtras.h>
</span><span class="cx"> #include <wtf/dtoa.h>
</span><span class="cx"> #include <wtf/text/StringBuilder.h>
</span><ins>+#include <wtf/unicode/UTF8Conversion.h>
</ins><span class="cx">
</span><span class="cx"> namespace JSC {
</span><span class="cx">
</span><ins>+using namespace WTF::Unicode;
+
</ins><span class="cx"> const ASCIILiteral ObjectProtoCalledOnNullOrUndefinedError { "Object.prototype.__proto__ called on null or undefined"_s };
</span><span class="cx">
</span><span class="cx"> template<unsigned charactersCount>
</span><span class="lines">@@ -181,10 +184,10 @@
</span><span class="cx"> int charLen = 0;
</span><span class="cx"> if (k <= length - 3 && isASCIIHexDigit(p[1]) && isASCIIHexDigit(p[2])) {
</span><span class="cx"> const char b0 = Lexer<CharType>::convertHex(p[1], p[2]);
</span><del>- const int sequenceLen = 1 + U8_COUNT_TRAIL_BYTES(b0);
- if (k <= length - sequenceLen * 3) {
</del><ins>+ const int sequenceLen = UTF8SequenceLength(b0);
+ if (sequenceLen && k <= length - sequenceLen * 3) {
</ins><span class="cx"> charLen = sequenceLen * 3;
</span><del>- uint8_t sequence[U8_MAX_LENGTH];
</del><ins>+ char sequence[5];
</ins><span class="cx"> sequence[0] = b0;
</span><span class="cx"> for (int i = 1; i < sequenceLen; ++i) {
</span><span class="cx"> const CharType* q = p + i * 3;
</span><span class="lines">@@ -196,20 +199,16 @@
</span><span class="cx"> }
</span><span class="cx"> }
</span><span class="cx"> if (charLen != 0) {
</span><del>- UChar32 character;
- int32_t offset = 0;
- U8_NEXT(sequence, offset, sequenceLen, character);
- if (character < 0)
</del><ins>+ sequence[sequenceLen] = 0;
+ const int character = decodeUTF8Sequence(sequence);
+ if (character < 0 || character >= 0x110000)
</ins><span class="cx"> charLen = 0;
</span><del>- else if (!U_IS_BMP(character)) {
</del><ins>+ else if (character >= 0x10000) {
</ins><span class="cx"> // Convert to surrogate pair.
</span><del>- ASSERT(U_IS_SUPPLEMENTARY(character));
- builder.append(U16_LEAD(character));
- u = U16_TRAIL(character);
- } else {
- ASSERT(!U_IS_SURROGATE(character));
</del><ins>+ builder.append(static_cast<UChar>(0xD800 | ((character - 0x10000) >> 10)));
+ u = static_cast<UChar>(0xDC00 | ((character - 0x10000) & 0x3FF));
+ } else
</ins><span class="cx"> u = static_cast<UChar>(character);
</span><del>- }
</del><span class="cx"> }
</span><span class="cx"> }
</span><span class="cx"> }
</span></span></pre></div>
<a id="trunkSourceJavaScriptCorewasmWasmParserh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/wasm/WasmParser.h (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/wasm/WasmParser.h 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/JavaScriptCore/wasm/WasmParser.h 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -162,7 +162,7 @@
</span><span class="cx">
</span><span class="cx"> UChar* bufferCurrent = bufferStart;
</span><span class="cx"> const char* stringCurrent = reinterpret_cast<const char*>(stringStart);
</span><del>- if (!WTF::Unicode::convertUTF8ToUTF16(stringCurrent, reinterpret_cast<const char *>(stringStart + stringLength), &bufferCurrent, bufferCurrent + buffer.size()))
</del><ins>+ if (WTF::Unicode::convertUTF8ToUTF16(&stringCurrent, reinterpret_cast<const char *>(stringStart + stringLength), &bufferCurrent, bufferCurrent + buffer.size()) != WTF::Unicode::conversionOK)
</ins><span class="cx"> return false;
</span><span class="cx"> }
</span><span class="cx">
</span></span></pre></div>
<a id="trunkSourceWTFChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/ChangeLog (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/ChangeLog 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WTF/ChangeLog 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1,5 +1,18 @@
</span><span class="cx"> 2019-05-01 Shawn Roberts <sroberts@apple.com>
</span><span class="cx">
</span><ins>+ Unreviewed, rolling out r244821.
+
+ Causing
+
+ Reverted changeset:
+
+ "WebKit has too much of its own UTF-8 code and should rely
+ more on ICU's UTF-8 support"
+ https://bugs.webkit.org/show_bug.cgi?id=195535
+ https://trac.webkit.org/changeset/244821
+
+2019-05-01 Shawn Roberts <sroberts@apple.com>
+
</ins><span class="cx"> Unreviewed, rolling out r244822.
</span><span class="cx">
</span><span class="cx"> Causing 4 Test262 failures on JSC Release and Debug
</span></span></pre></div>
<a id="trunkSourceWTFwtftextAtomicStringcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/AtomicString.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/AtomicString.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WTF/wtf/text/AtomicString.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -113,24 +113,19 @@
</span><span class="cx"> return numberToString(number, buffer);
</span><span class="cx"> }
</span><span class="cx">
</span><del>-AtomicString AtomicString::fromUTF8Internal(const char* start, const char* end)
</del><ins>+AtomicString AtomicString::fromUTF8Internal(const char* charactersStart, const char* charactersEnd)
</ins><span class="cx"> {
</span><del>- ASSERT(start);
-
- // Caller needs to handle empty string.
- ASSERT(!end || end > start);
- ASSERT(end || start[0]);
-
- return AtomicStringImpl::addUTF8(start, end ? end : start + std::strlen(start));
</del><ins>+ auto impl = AtomicStringImpl::addUTF8(charactersStart, charactersEnd);
+ if (!impl)
+ return nullAtom();
+ return impl.get();
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> #ifndef NDEBUG
</span><del>-
</del><span class="cx"> void AtomicString::show() const
</span><span class="cx"> {
</span><span class="cx"> m_string.show();
</span><span class="cx"> }
</span><del>-
</del><span class="cx"> #endif
</span><span class="cx">
</span><span class="cx"> WTF_EXPORT_PRIVATE LazyNeverDestroyed<AtomicString> nullAtomData;
</span></span></pre></div>
<a id="trunkSourceWTFwtftextAtomicStringImplcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/AtomicStringImpl.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/AtomicStringImpl.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WTF/wtf/text/AtomicStringImpl.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -219,7 +219,7 @@
</span><span class="cx">
</span><span class="cx"> bool isAllASCII;
</span><span class="cx"> const char* source = buffer.characters;
</span><del>- if (!convertUTF8ToUTF16(source, source + buffer.length, &target, target + buffer.utf16Length, &isAllASCII))
</del><ins>+ if (convertUTF8ToUTF16(&source, source + buffer.length, &target, target + buffer.utf16Length, &isAllASCII) != conversionOK)
</ins><span class="cx"> ASSERT_NOT_REACHED();
</span><span class="cx">
</span><span class="cx"> if (isAllASCII)
</span></span></pre></div>
<a id="trunkSourceWTFwtftextAtomicStringImplh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/AtomicStringImpl.h (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/AtomicStringImpl.h 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WTF/wtf/text/AtomicStringImpl.h 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -56,8 +56,7 @@
</span><span class="cx"> WTF_EXPORT_PRIVATE static Ref<AtomicStringImpl> addLiteral(const char* characters, unsigned length);
</span><span class="cx">
</span><span class="cx"> // Returns null if the input data contains an invalid UTF-8 sequence.
</span><del>- static RefPtr<AtomicStringImpl> addUTF8(const char* start, const char* end);
-
</del><ins>+ WTF_EXPORT_PRIVATE static RefPtr<AtomicStringImpl> addUTF8(const char* start, const char* end);
</ins><span class="cx"> #if USE(CF)
</span><span class="cx"> WTF_EXPORT_PRIVATE static RefPtr<AtomicStringImpl> add(CFStringRef);
</span><span class="cx"> #endif
</span></span></pre></div>
<a id="trunkSourceWTFwtftextStringImplcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/StringImpl.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/StringImpl.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WTF/wtf/text/StringImpl.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1756,11 +1756,11 @@
</span><span class="cx"> char* bufferEnd = buffer + bufferSize;
</span><span class="cx"> while (characters < charactersEnd) {
</span><span class="cx"> // Use strict conversion to detect unpaired surrogates.
</span><del>- ConversionResult result = convertUTF16ToUTF8(&characters, charactersEnd, &buffer, bufferEnd);
- ASSERT(result != TargetExhausted);
</del><ins>+ ConversionResult result = convertUTF16ToUTF8(&characters, charactersEnd, &buffer, bufferEnd, true);
+ ASSERT(result != targetExhausted);
</ins><span class="cx"> // Conversion fails when there is an unpaired surrogate.
</span><span class="cx"> // Put replacement character (U+FFFD) instead of the unpaired surrogate.
</span><del>- if (result != ConversionOK) {
</del><ins>+ if (result != conversionOK) {
</ins><span class="cx"> ASSERT((0xD800 <= *characters && *characters <= 0xDFFF));
</span><span class="cx"> // There should be room left, since one UChar hasn't been converted.
</span><span class="cx"> ASSERT((buffer + 3) <= bufferEnd);
</span><span class="lines">@@ -1772,16 +1772,16 @@
</span><span class="cx"> bool strict = mode == StrictConversion;
</span><span class="cx"> const UChar* originalCharacters = characters;
</span><span class="cx"> ConversionResult result = convertUTF16ToUTF8(&characters, characters + length, &buffer, buffer + bufferSize, strict);
</span><del>- ASSERT(result != TargetExhausted); // (length * 3) should be sufficient for any conversion
</del><ins>+ ASSERT(result != targetExhausted); // (length * 3) should be sufficient for any conversion
</ins><span class="cx">
</span><span class="cx"> // Only produced from strict conversion.
</span><del>- if (result == SourceIllegal) {
</del><ins>+ if (result == sourceIllegal) {
</ins><span class="cx"> ASSERT(strict);
</span><span class="cx"> return UTF8ConversionError::IllegalSource;
</span><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> // Check for an unconverted high surrogate.
</span><del>- if (result == SourceExhausted) {
</del><ins>+ if (result == sourceExhausted) {
</ins><span class="cx"> if (strict)
</span><span class="cx"> return UTF8ConversionError::SourceExhausted;
</span><span class="cx"> // This should be one unpaired high surrogate. Treat it the same
</span><span class="lines">@@ -1809,8 +1809,8 @@
</span><span class="cx"> Vector<char, 1024> bufferVector(length * 3);
</span><span class="cx"> char* buffer = bufferVector.data();
</span><span class="cx"> const LChar* source = characters;
</span><del>- bool charactersFit = convertLatin1ToUTF8(&source, source + length, &buffer, buffer + bufferVector.size());
- ASSERT_UNUSED(charactersFit, charactersFit); // (length * 3) should be sufficient for any conversion
</del><ins>+ ConversionResult result = convertLatin1ToUTF8(&source, source + length, &buffer, buffer + bufferVector.size());
+ ASSERT_UNUSED(result, result != targetExhausted); // (length * 3) should be sufficient for any conversion
</ins><span class="cx"> return CString(bufferVector.data(), buffer - bufferVector.data());
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -1854,8 +1854,9 @@
</span><span class="cx">
</span><span class="cx"> if (is8Bit()) {
</span><span class="cx"> const LChar* characters = this->characters8() + offset;
</span><del>- bool charactersFit = convertLatin1ToUTF8(&characters, characters + length, &buffer, buffer + bufferVector.size());
- ASSERT_UNUSED(charactersFit, charactersFit); // (length * 3) should be sufficient for any conversion
</del><ins>+
+ ConversionResult result = convertLatin1ToUTF8(&characters, characters + length, &buffer, buffer + bufferVector.size());
+ ASSERT_UNUSED(result, result != targetExhausted); // (length * 3) should be sufficient for any conversion
</ins><span class="cx"> } else {
</span><span class="cx"> UTF8ConversionError error = utf8Impl(this->characters16() + offset, length, buffer, bufferVector.size(), mode);
</span><span class="cx"> if (error != UTF8ConversionError::None)
</span></span></pre></div>
<a id="trunkSourceWTFwtftextStringViewcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/StringView.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/StringView.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WTF/wtf/text/StringView.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -35,9 +35,12 @@
</span><span class="cx"> #include <wtf/NeverDestroyed.h>
</span><span class="cx"> #include <wtf/Optional.h>
</span><span class="cx"> #include <wtf/text/TextBreakIterator.h>
</span><ins>+#include <wtf/unicode/UTF8Conversion.h>
</ins><span class="cx">
</span><span class="cx"> namespace WTF {
</span><span class="cx">
</span><ins>+using namespace Unicode;
+
</ins><span class="cx"> bool StringView::containsIgnoringASCIICase(const StringView& matchString) const
</span><span class="cx"> {
</span><span class="cx"> return findIgnoringASCIICase(matchString) != notFound;
</span></span></pre></div>
<a id="trunkSourceWTFwtftextWTFStringcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/WTFString.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/WTFString.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WTF/wtf/text/WTFString.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -859,7 +859,7 @@
</span><span class="cx">
</span><span class="cx"> UChar* bufferCurrent = bufferStart;
</span><span class="cx"> const char* stringCurrent = reinterpret_cast<const char*>(stringStart);
</span><del>- if (!convertUTF8ToUTF16(stringCurrent, reinterpret_cast<const char*>(stringStart + length), &bufferCurrent, bufferCurrent + buffer.size()))
</del><ins>+ if (convertUTF8ToUTF16(&stringCurrent, reinterpret_cast<const char *>(stringStart + length), &bufferCurrent, bufferCurrent + buffer.size()) != conversionOK)
</ins><span class="cx"> return String();
</span><span class="cx">
</span><span class="cx"> unsigned utf16Length = bufferCurrent - bufferStart;
</span></span></pre></div>
<a id="trunkSourceWTFwtfunicodeUTF8Conversioncpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/unicode/UTF8Conversion.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/unicode/UTF8Conversion.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WTF/wtf/unicode/UTF8Conversion.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2007, 2010-2012, 2014, 2019 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2007, 2014 Apple Inc. All rights reserved.
</ins><span class="cx"> * Copyright (C) 2010 Patrick Gansterer <paroga@paroga.com>
</span><span class="cx"> *
</span><span class="cx"> * Redistribution and use in source and binary forms, with or without
</span><span class="lines">@@ -34,107 +34,389 @@
</span><span class="cx"> namespace WTF {
</span><span class="cx"> namespace Unicode {
</span><span class="cx">
</span><del>-bool convertLatin1ToUTF8(const LChar** sourceStart, const LChar* sourceEnd, char** targetStart, char* targetEnd)
</del><ins>+inline int inlineUTF8SequenceLengthNonASCII(char b0)
</ins><span class="cx"> {
</span><del>- const LChar* source;
</del><ins>+ if ((b0 & 0xC0) != 0xC0)
+ return 0;
+ if ((b0 & 0xE0) == 0xC0)
+ return 2;
+ if ((b0 & 0xF0) == 0xE0)
+ return 3;
+ if ((b0 & 0xF8) == 0xF0)
+ return 4;
+ return 0;
+}
+
+inline int inlineUTF8SequenceLength(char b0)
+{
+ return isASCII(b0) ? 1 : inlineUTF8SequenceLengthNonASCII(b0);
+}
+
+int UTF8SequenceLength(char b0)
+{
+ return isASCII(b0) ? 1 : inlineUTF8SequenceLengthNonASCII(b0);
+}
+
+int decodeUTF8Sequence(const char* sequence)
+{
+ // Handle 0-byte sequences (never valid).
+ const unsigned char b0 = sequence[0];
+ const int length = inlineUTF8SequenceLength(b0);
+ if (length == 0)
+ return -1;
+
+ // Handle 1-byte sequences (plain ASCII).
+ const unsigned char b1 = sequence[1];
+ if (length == 1) {
+ if (b1)
+ return -1;
+ return b0;
+ }
+
+ // Handle 2-byte sequences.
+ if ((b1 & 0xC0) != 0x80)
+ return -1;
+ const unsigned char b2 = sequence[2];
+ if (length == 2) {
+ if (b2)
+ return -1;
+ const int c = ((b0 & 0x1F) << 6) | (b1 & 0x3F);
+ if (c < 0x80)
+ return -1;
+ return c;
+ }
+
+ // Handle 3-byte sequences.
+ if ((b2 & 0xC0) != 0x80)
+ return -1;
+ const unsigned char b3 = sequence[3];
+ if (length == 3) {
+ if (b3)
+ return -1;
+ const int c = ((b0 & 0xF) << 12) | ((b1 & 0x3F) << 6) | (b2 & 0x3F);
+ if (c < 0x800)
+ return -1;
+ // UTF-16 surrogates should never appear in UTF-8 data.
+ if (c >= 0xD800 && c <= 0xDFFF)
+ return -1;
+ return c;
+ }
+
+ // Handle 4-byte sequences.
+ if ((b3 & 0xC0) != 0x80)
+ return -1;
+ const unsigned char b4 = sequence[4];
+ if (length == 4) {
+ if (b4)
+ return -1;
+ const int c = ((b0 & 0x7) << 18) | ((b1 & 0x3F) << 12) | ((b2 & 0x3F) << 6) | (b3 & 0x3F);
+ if (c < 0x10000 || c > 0x10FFFF)
+ return -1;
+ return c;
+ }
+
+ return -1;
+}
+
+// Once the bits are split out into bytes of UTF-8, this is a mask OR-ed
+// into the first byte, depending on how many bytes follow. There are
+// as many entries in this table as there are UTF-8 sequence types.
+// (I.e., one byte sequence, two byte... etc.). Remember that sequencs
+// for *legal* UTF-8 will be 4 or fewer bytes total.
+static const unsigned char firstByteMark[7] = { 0x00, 0x00, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC };
+
+ConversionResult convertLatin1ToUTF8(
+ const LChar** sourceStart, const LChar* sourceEnd,
+ char** targetStart, char* targetEnd)
+{
+ ConversionResult result = conversionOK;
+ const LChar* source = *sourceStart;
</ins><span class="cx"> char* target = *targetStart;
</span><del>- unsigned i = 0;
- for (source = *sourceStart; source < sourceEnd; ++source) {
- UBool sawError = false;
- // Work around bug in either Windows compiler or old version of ICU, where passing a uint8_t to
- // U8_APPEND warns, by convering from uint8_t to a wider type.
- UChar32 character = *source;
- U8_APPEND(reinterpret_cast<uint8_t*>(target), i, targetEnd - *targetStart, character, sawError);
- if (sawError)
- return false;
</del><ins>+ while (source < sourceEnd) {
+ UChar32 ch;
+ unsigned short bytesToWrite = 0;
+ const UChar32 byteMask = 0xBF;
+ const UChar32 byteMark = 0x80;
+ const LChar* oldSource = source; // In case we have to back up because of target overflow.
+ ch = static_cast<unsigned short>(*source++);
+
+ // Figure out how many bytes the result will require
+ if (ch < (UChar32)0x80)
+ bytesToWrite = 1;
+ else
+ bytesToWrite = 2;
+
+ target += bytesToWrite;
+ if (target > targetEnd) {
+ source = oldSource; // Back up source pointer!
+ target -= bytesToWrite;
+ result = targetExhausted;
+ break;
+ }
+ switch (bytesToWrite) { // note: everything falls through.
+ case 2:
+ *--target = (char)((ch | byteMark) & byteMask);
+ ch >>= 6;
+ FALLTHROUGH;
+ case 1:
+ *--target = (char)(ch | firstByteMark[bytesToWrite]);
+ }
+ target += bytesToWrite;
</ins><span class="cx"> }
</span><span class="cx"> *sourceStart = source;
</span><del>- *targetStart = target + i;
- return true;
</del><ins>+ *targetStart = target;
+ return result;
</ins><span class="cx"> }
</span><span class="cx">
</span><del>-ConversionResult convertUTF16ToUTF8(const UChar** sourceStart, const UChar* sourceEnd, char** targetStart, char* targetEnd, bool strict)
</del><ins>+ConversionResult convertUTF16ToUTF8(
+ const UChar** sourceStart, const UChar* sourceEnd,
+ char** targetStart, char* targetEnd, bool strict)
</ins><span class="cx"> {
</span><del>- ConversionResult result = ConversionOK;
</del><ins>+ ConversionResult result = conversionOK;
</ins><span class="cx"> const UChar* source = *sourceStart;
</span><span class="cx"> char* target = *targetStart;
</span><del>- UBool sawError = false;
- unsigned i = 0;
</del><span class="cx"> while (source < sourceEnd) {
</span><span class="cx"> UChar32 ch;
</span><del>- int j = 0;
- U16_NEXT(source, j, sourceEnd - source, ch);
- if (U_IS_SURROGATE(ch)) {
- if (source + j == sourceEnd && U_IS_SURROGATE_LEAD(ch)) {
- result = SourceExhausted;
</del><ins>+ unsigned short bytesToWrite = 0;
+ const UChar32 byteMask = 0xBF;
+ const UChar32 byteMark = 0x80;
+ const UChar* oldSource = source; // In case we have to back up because of target overflow.
+ ch = static_cast<unsigned short>(*source++);
+ // If we have a surrogate pair, convert to UChar32 first.
+ if (ch >= 0xD800 && ch <= 0xDBFF) {
+ // If the 16 bits following the high surrogate are in the source buffer...
+ if (source < sourceEnd) {
+ UChar32 ch2 = static_cast<unsigned short>(*source);
+ // If it's a low surrogate, convert to UChar32.
+ if (ch2 >= 0xDC00 && ch2 <= 0xDFFF) {
+ ch = ((ch - 0xD800) << 10) + (ch2 - 0xDC00) + 0x0010000;
+ ++source;
+ } else if (strict) { // it's an unpaired high surrogate
+ --source; // return to the illegal value itself
+ result = sourceIllegal;
+ break;
+ }
+ } else { // We don't have the 16 bits following the high surrogate.
+ --source; // return to the high surrogate
+ result = sourceExhausted;
</ins><span class="cx"> break;
</span><span class="cx"> }
</span><del>- if (strict) {
- result = SourceIllegal;
</del><ins>+ } else if (strict) {
+ // UTF-16 surrogate values are illegal in UTF-32
+ if (ch >= 0xDC00 && ch <= 0xDFFF) {
+ --source; // return to the illegal value itself
+ result = sourceIllegal;
</ins><span class="cx"> break;
</span><span class="cx"> }
</span><ins>+ }
+ // Figure out how many bytes the result will require
+ if (ch < (UChar32)0x80) {
+ bytesToWrite = 1;
+ } else if (ch < (UChar32)0x800) {
+ bytesToWrite = 2;
+ } else if (ch < (UChar32)0x10000) {
+ bytesToWrite = 3;
+ } else if (ch < (UChar32)0x110000) {
+ bytesToWrite = 4;
+ } else {
+ bytesToWrite = 3;
</ins><span class="cx"> ch = replacementCharacter;
</span><span class="cx"> }
</span><del>- U8_APPEND(reinterpret_cast<uint8_t*>(target), i, targetEnd - target, ch, sawError);
- if (sawError) {
- result = TargetExhausted;
</del><ins>+
+ target += bytesToWrite;
+ if (target > targetEnd) {
+ source = oldSource; // Back up source pointer!
+ target -= bytesToWrite;
+ result = targetExhausted;
</ins><span class="cx"> break;
</span><span class="cx"> }
</span><del>- source += j;
</del><ins>+ switch (bytesToWrite) { // note: everything falls through.
+ case 4: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6; FALLTHROUGH;
+ case 3: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6; FALLTHROUGH;
+ case 2: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6; FALLTHROUGH;
+ case 1: *--target = (char)(ch | firstByteMark[bytesToWrite]);
+ }
+ target += bytesToWrite;
</ins><span class="cx"> }
</span><span class="cx"> *sourceStart = source;
</span><del>- *targetStart = target + i;
</del><ins>+ *targetStart = target;
</ins><span class="cx"> return result;
</span><span class="cx"> }
</span><span class="cx">
</span><del>-bool convertUTF8ToUTF16(const char* source, const char* sourceEnd, UChar** targetStart, UChar* targetEnd, bool* sourceAllASCII)
</del><ins>+// This must be called with the length pre-determined by the first byte.
+// If presented with a length > 4, this returns false. The Unicode
+// definition of UTF-8 goes up to 4-byte sequences.
+static bool isLegalUTF8(const unsigned char* source, int length)
</ins><span class="cx"> {
</span><del>- RELEASE_ASSERT(sourceEnd - source <= std::numeric_limits<int>::max());
- UBool error = false;
</del><ins>+ unsigned char a;
+ const unsigned char* srcptr = source + length;
+ switch (length) {
+ default: return false;
+ // Everything else falls through when "true"...
+ case 4: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false; FALLTHROUGH;
+ case 3: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false; FALLTHROUGH;
+ case 2: if ((a = (*--srcptr)) > 0xBF) return false;
+
+ switch (*source) {
+ // no fall-through in this inner switch
+ case 0xE0: if (a < 0xA0) return false; break;
+ case 0xED: if (a > 0x9F) return false; break;
+ case 0xF0: if (a < 0x90) return false; break;
+ case 0xF4: if (a > 0x8F) return false; break;
+ default: if (a < 0x80) return false;
+ }
+ FALLTHROUGH;
+
+ case 1: if (*source >= 0x80 && *source < 0xC2) return false;
+ }
+ if (*source > 0xF4)
+ return false;
+ return true;
+}
+
+// Magic values subtracted from a buffer value during UTF8 conversion.
+// This table contains as many values as there might be trailing bytes
+// in a UTF-8 sequence.
+static const UChar32 offsetsFromUTF8[6] = { 0x00000000UL, 0x00003080UL, 0x000E2080UL, 0x03C82080UL, static_cast<UChar32>(0xFA082080UL), static_cast<UChar32>(0x82082080UL) };
+
+static inline UChar32 readUTF8Sequence(const char*& sequence, unsigned length)
+{
+ UChar32 character = 0;
+
+ // The cases all fall through.
+ switch (length) {
+ case 6: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH;
+ case 5: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH;
+ case 4: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH;
+ case 3: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH;
+ case 2: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH;
+ case 1: character += static_cast<unsigned char>(*sequence++);
+ }
+
+ return character - offsetsFromUTF8[length - 1];
+}
+
+ConversionResult convertUTF8ToUTF16(
+ const char** sourceStart, const char* sourceEnd,
+ UChar** targetStart, UChar* targetEnd, bool* sourceAllASCII, bool strict)
+{
+ ConversionResult result = conversionOK;
+ const char* source = *sourceStart;
</ins><span class="cx"> UChar* target = *targetStart;
</span><del>- UChar32 orAllData = 0;
- unsigned targetOffset = 0;
- for (int sourceOffset = 0; sourceOffset < sourceEnd - source; ) {
- UChar32 character;
- U8_NEXT(reinterpret_cast<const uint8_t*>(source), sourceOffset, sourceEnd - source, character);
- if (character < 0)
- return false;
- U16_APPEND(target, targetOffset, targetEnd - target, character, error);
- if (error)
- return false;
- orAllData |= character;
</del><ins>+ UChar orAllData = 0;
+ while (source < sourceEnd) {
+ int utf8SequenceLength = inlineUTF8SequenceLength(*source);
+ if (sourceEnd - source < utf8SequenceLength) {
+ result = sourceExhausted;
+ break;
+ }
+ // Do this check whether lenient or strict
+ if (!isLegalUTF8(reinterpret_cast<const unsigned char*>(source), utf8SequenceLength)) {
+ result = sourceIllegal;
+ break;
+ }
+
+ UChar32 character = readUTF8Sequence(source, utf8SequenceLength);
+
+ if (target >= targetEnd) {
+ source -= utf8SequenceLength; // Back up source pointer!
+ result = targetExhausted;
+ break;
+ }
+
+ if (U_IS_BMP(character)) {
+ // UTF-16 surrogate values are illegal in UTF-32
+ if (U_IS_SURROGATE(character)) {
+ if (strict) {
+ source -= utf8SequenceLength; // return to the illegal value itself
+ result = sourceIllegal;
+ break;
+ } else {
+ *target++ = replacementCharacter;
+ orAllData |= replacementCharacter;
+ }
+ } else {
+ *target++ = character; // normal case
+ orAllData |= character;
+ }
+ } else if (U_IS_SUPPLEMENTARY(character)) {
+ // target is a character in range 0xFFFF - 0x10FFFF
+ if (target + 1 >= targetEnd) {
+ source -= utf8SequenceLength; // Back up source pointer!
+ result = targetExhausted;
+ break;
+ }
+ *target++ = U16_LEAD(character);
+ *target++ = U16_TRAIL(character);
+ orAllData = 0xffff;
+ } else {
+ if (strict) {
+ source -= utf8SequenceLength; // return to the start
+ result = sourceIllegal;
+ break; // Bail out; shouldn't continue
+ } else {
+ *target++ = replacementCharacter;
+ orAllData |= replacementCharacter;
+ }
+ }
</ins><span class="cx"> }
</span><del>- *targetStart = target + targetOffset;
</del><ins>+ *sourceStart = source;
+ *targetStart = target;
+
</ins><span class="cx"> if (sourceAllASCII)
</span><del>- *sourceAllASCII = isASCII(orAllData);
- return true;
</del><ins>+ *sourceAllASCII = !(orAllData & ~0x7f);
+
+ return result;
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> unsigned calculateStringHashAndLengthFromUTF8MaskingTop8Bits(const char* data, const char* dataEnd, unsigned& dataLength, unsigned& utf16Length)
</span><span class="cx"> {
</span><ins>+ if (!data)
+ return 0;
+
</ins><span class="cx"> StringHasher stringHasher;
</span><ins>+ dataLength = 0;
</ins><span class="cx"> utf16Length = 0;
</span><span class="cx">
</span><del>- int inputOffset = 0;
- int inputLength = dataEnd - data;
- while (inputOffset < inputLength) {
- UChar32 character;
- U8_NEXT(reinterpret_cast<const uint8_t*>(data), inputOffset, inputLength, character);
- if (character < 0)
</del><ins>+ while (data < dataEnd || (!dataEnd && *data)) {
+ if (isASCII(*data)) {
+ stringHasher.addCharacter(*data++);
+ dataLength++;
+ utf16Length++;
+ continue;
+ }
+
+ int utf8SequenceLength = inlineUTF8SequenceLengthNonASCII(*data);
+ dataLength += utf8SequenceLength;
+
+ if (!dataEnd) {
+ for (int i = 1; i < utf8SequenceLength; ++i) {
+ if (!data[i])
+ return 0;
+ }
+ } else if (dataEnd - data < utf8SequenceLength)
</ins><span class="cx"> return 0;
</span><span class="cx">
</span><ins>+ if (!isLegalUTF8(reinterpret_cast<const unsigned char*>(data), utf8SequenceLength))
+ return 0;
+
+ UChar32 character = readUTF8Sequence(data, utf8SequenceLength);
+ ASSERT(!isASCII(character));
+
</ins><span class="cx"> if (U_IS_BMP(character)) {
</span><del>- ASSERT(!U_IS_SURROGATE(character));
- stringHasher.addCharacter(character);
</del><ins>+ // UTF-16 surrogate values are illegal in UTF-32
+ if (U_IS_SURROGATE(character))
+ return 0;
+ stringHasher.addCharacter(static_cast<UChar>(character)); // normal case
</ins><span class="cx"> utf16Length++;
</span><del>- } else {
- ASSERT(U_IS_SUPPLEMENTARY(character));
- stringHasher.addCharacters(U16_LEAD(character), U16_TRAIL(character));
</del><ins>+ } else if (U_IS_SUPPLEMENTARY(character)) {
+ stringHasher.addCharacters(static_cast<UChar>(U16_LEAD(character)),
+ static_cast<UChar>(U16_TRAIL(character)));
</ins><span class="cx"> utf16Length += 2;
</span><del>- }
</del><ins>+ } else
+ return 0;
</ins><span class="cx"> }
</span><span class="cx">
</span><del>- dataLength = inputOffset;
</del><span class="cx"> return stringHasher.hashWithTop8BitsMasked();
</span><span class="cx"> }
</span><span class="cx">
</span><span class="lines">@@ -141,24 +423,36 @@
</span><span class="cx"> bool equalUTF16WithUTF8(const UChar* a, const char* b, const char* bEnd)
</span><span class="cx"> {
</span><span class="cx"> while (b < bEnd) {
</span><del>- int offset = 0;
- UChar32 character;
- U8_NEXT(reinterpret_cast<const uint8_t*>(b), offset, bEnd - b, character);
- if (character < 0)
</del><ins>+ if (isASCII(*a) || isASCII(*b)) {
+ if (*a++ != *b++)
+ return false;
+ continue;
+ }
+
+ int utf8SequenceLength = inlineUTF8SequenceLengthNonASCII(*b);
+
+ if (bEnd - b < utf8SequenceLength)
</ins><span class="cx"> return false;
</span><del>- b += offset;
</del><span class="cx">
</span><ins>+ if (!isLegalUTF8(reinterpret_cast<const unsigned char*>(b), utf8SequenceLength))
+ return false;
+
+ UChar32 character = readUTF8Sequence(b, utf8SequenceLength);
+ ASSERT(!isASCII(character));
+
</ins><span class="cx"> if (U_IS_BMP(character)) {
</span><del>- ASSERT(!U_IS_SURROGATE(character));
</del><ins>+ // UTF-16 surrogate values are illegal in UTF-32
+ if (U_IS_SURROGATE(character))
+ return false;
</ins><span class="cx"> if (*a++ != character)
</span><span class="cx"> return false;
</span><del>- } else {
- ASSERT(U_IS_SUPPLEMENTARY(character));
</del><ins>+ } else if (U_IS_SUPPLEMENTARY(character)) {
</ins><span class="cx"> if (*a++ != U16_LEAD(character))
</span><span class="cx"> return false;
</span><span class="cx"> if (*a++ != U16_TRAIL(character))
</span><span class="cx"> return false;
</span><del>- }
</del><ins>+ } else
+ return false;
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> return true;
</span></span></pre></div>
<a id="trunkSourceWTFwtfunicodeUTF8Conversionh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/unicode/UTF8Conversion.h (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/unicode/UTF8Conversion.h 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WTF/wtf/unicode/UTF8Conversion.h 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2007-2019 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2007 Apple Inc. All rights reserved.
</ins><span class="cx"> *
</span><span class="cx"> * Redistribution and use in source and binary forms, with or without
</span><span class="cx"> * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -31,28 +31,54 @@
</span><span class="cx"> namespace WTF {
</span><span class="cx"> namespace Unicode {
</span><span class="cx">
</span><del>-enum ConversionResult {
- ConversionOK, // conversion successful
- SourceExhausted, // partial character in source, but hit end
- TargetExhausted, // insufficient room in target for conversion
- SourceIllegal // source sequence is illegal/malformed
-};
</del><ins>+ // Given a first byte, gives the length of the UTF-8 sequence it begins.
+ // Returns 0 for bytes that are not legal starts of UTF-8 sequences.
+ // Only allows sequences of up to 4 bytes, since that works for all Unicode characters (U-00000000 to U-0010FFFF).
+ WTF_EXPORT_PRIVATE int UTF8SequenceLength(char);
</ins><span class="cx">
</span><del>-// Conversion functions are strict, except for convertUTF16ToUTF8, which takes
-// "strict" argument. When strict, both illegal sequences and unpaired surrogates
-// will cause an error. When not, illegal sequences and unpaired surrogates are
-// converted to the replacement character, except for an unpaired lead surrogate
-// at the end of the source, which will instead cause a SourceExhausted error.
</del><ins>+ // Takes a null-terminated C-style string with a UTF-8 sequence in it and converts it to a character.
+ // Only allows Unicode characters (U-00000000 to U-0010FFFF).
+ // Returns -1 if the sequence is not valid (including presence of extra bytes).
+ WTF_EXPORT_PRIVATE int decodeUTF8Sequence(const char*);
</ins><span class="cx">
</span><del>-WTF_EXPORT_PRIVATE bool convertUTF8ToUTF16(const char* sourceStart, const char* sourceEnd, UChar** targetStart, UChar* targetEnd, bool* isSourceAllASCII = nullptr);
-WTF_EXPORT_PRIVATE bool convertLatin1ToUTF8(const LChar** sourceStart, const LChar* sourceEnd, char** targetStart, char* targetEnd);
-WTF_EXPORT_PRIVATE ConversionResult convertUTF16ToUTF8(const UChar** sourceStart, const UChar* sourceEnd, char** targetStart, char* targetEnd, bool strict = true);
</del><ins>+ typedef enum {
+ conversionOK, // conversion successful
+ sourceExhausted, // partial character in source, but hit end
+ targetExhausted, // insuff. room in target for conversion
+ sourceIllegal // source sequence is illegal/malformed
+ } ConversionResult;
</ins><span class="cx">
</span><del>-WTF_EXPORT_PRIVATE unsigned calculateStringHashAndLengthFromUTF8MaskingTop8Bits(const char* data, const char* dataEnd, unsigned& dataLength, unsigned& utf16Length);
</del><ins>+ // These conversion functions take a "strict" argument. When this
+ // flag is set to strict, both irregular sequences and isolated surrogates
+ // will cause an error. When the flag is set to lenient, both irregular
+ // sequences and isolated surrogates are converted.
+ //
+ // Whether the flag is strict or lenient, all illegal sequences will cause
+ // an error return. This includes sequences such as: <F4 90 80 80>, <C0 80>,
+ // or <A0> in UTF-8, and values above 0x10FFFF in UTF-32. Conformant code
+ // must check for illegal sequences.
+ //
+ // When the flag is set to lenient, characters over 0x10FFFF are converted
+ // to the replacement character; otherwise (when the flag is set to strict)
+ // they constitute an error.
</ins><span class="cx">
</span><del>-// Callers of these functions must check that the lengths are the same; accordingly we omit an end argument for UTF-16 and Latin-1.
-bool equalUTF16WithUTF8(const UChar* stringInUTF16, const char* stringInUTF8, const char* stringInUTF8End);
-bool equalLatin1WithUTF8(const LChar* stringInLatin1, const char* stringInUTF8, const char* stringInUTF8End);
</del><ins>+ WTF_EXPORT_PRIVATE ConversionResult convertUTF8ToUTF16(
+ const char** sourceStart, const char* sourceEnd,
+ UChar** targetStart, UChar* targetEnd, bool* isSourceAllASCII = 0, bool strict = true);
</ins><span class="cx">
</span><ins>+ WTF_EXPORT_PRIVATE ConversionResult convertLatin1ToUTF8(
+ const LChar** sourceStart, const LChar* sourceEnd,
+ char** targetStart, char* targetEnd);
+
+ WTF_EXPORT_PRIVATE ConversionResult convertUTF16ToUTF8(
+ const UChar** sourceStart, const UChar* sourceEnd,
+ char** targetStart, char* targetEnd, bool strict = true);
+
+ WTF_EXPORT_PRIVATE unsigned calculateStringHashAndLengthFromUTF8MaskingTop8Bits(const char* data, const char* dataEnd, unsigned& dataLength, unsigned& utf16Length);
+
+ // The caller of these functions already knows that the lengths are the same, so we omit an end argument for UTF-16 and Latin-1.
+ bool equalUTF16WithUTF8(const UChar* stringInUTF16, const char* stringInUTF8, const char* stringInUTF8End);
+ bool equalLatin1WithUTF8(const LChar* stringInLatin1, const char* stringInUTF8, const char* stringInUTF8End);
+
</ins><span class="cx"> } // namespace Unicode
</span><span class="cx"> } // namespace WTF
</span></span></pre></div>
<a id="trunkSourceWebCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/ChangeLog (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/ChangeLog 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WebCore/ChangeLog 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1,5 +1,18 @@
</span><span class="cx"> 2019-05-01 Shawn Roberts <sroberts@apple.com>
</span><span class="cx">
</span><ins>+ Unreviewed, rolling out r244821.
+
+ Causing
+
+ Reverted changeset:
+
+ "WebKit has too much of its own UTF-8 code and should rely
+ more on ICU's UTF-8 support"
+ https://bugs.webkit.org/show_bug.cgi?id=195535
+ https://trac.webkit.org/changeset/244821
+
+2019-05-01 Shawn Roberts <sroberts@apple.com>
+
</ins><span class="cx"> Unreviewed, rolling out r244822.
</span><span class="cx">
</span><span class="cx"> Causing
</span></span></pre></div>
<a id="trunkSourceWebCoreplatformSharedBuffercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/platform/SharedBuffer.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/platform/SharedBuffer.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WebCore/platform/SharedBuffer.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -334,16 +334,17 @@
</span><span class="cx">
</span><span class="cx"> // Convert to runs of 8-bit characters.
</span><span class="cx"> char* p = buffer.data();
</span><ins>+ WTF::Unicode::ConversionResult result;
</ins><span class="cx"> if (length) {
</span><span class="cx"> if (string.is8Bit()) {
</span><span class="cx"> const LChar* d = string.characters8();
</span><del>- if (!WTF::Unicode::convertLatin1ToUTF8(&d, d + length, &p, p + buffer.size()))
- return nullptr;
</del><ins>+ result = WTF::Unicode::convertLatin1ToUTF8(&d, d + length, &p, p + buffer.size());
</ins><span class="cx"> } else {
</span><span class="cx"> const UChar* d = string.characters16();
</span><del>- if (WTF::Unicode::convertUTF16ToUTF8(&d, d + length, &p, p + buffer.size()) != WTF::Unicode::ConversionOK)
- return nullptr;
</del><ins>+ result = WTF::Unicode::convertUTF16ToUTF8(&d, d + length, &p, p + buffer.size(), true);
</ins><span class="cx"> }
</span><ins>+ if (result != WTF::Unicode::conversionOK)
+ return nullptr;
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> buffer.shrink(p - buffer.data());
</span></span></pre></div>
<a id="trunkSourceWebCorexmlXSLTProcessorLibxsltcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/xml/XSLTProcessorLibxslt.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/xml/XSLTProcessorLibxslt.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WebCore/xml/XSLTProcessorLibxslt.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -48,6 +48,8 @@
</span><span class="cx"> #include <libxslt/xslt.h>
</span><span class="cx"> #include <libxslt/xsltutils.h>
</span><span class="cx"> #include <wtf/Assertions.h>
</span><ins>+#include <wtf/text/StringBuffer.h>
+#include <wtf/unicode/UTF8Conversion.h>
</ins><span class="cx">
</span><span class="cx"> #if OS(DARWIN) && !PLATFORM(GTK)
</span><span class="cx"> #include "SoftLinkLibxslt.h"
</span><span class="lines">@@ -157,41 +159,27 @@
</span><span class="cx"> globalCachedResourceLoader = cachedResourceLoader;
</span><span class="cx"> }
</span><span class="cx">
</span><del>-static int writeToStringBuilder(void* context, const char* buffer, int length)
</del><ins>+static int writeToStringBuilder(void* context, const char* buffer, int len)
</ins><span class="cx"> {
</span><span class="cx"> StringBuilder& resultOutput = *static_cast<StringBuilder*>(context);
</span><span class="cx">
</span><del>- // FIXME: Consider ways to make this more efficient by moving it into a
- // StringBuilder::appendUTF8 function, and then optimizing to not need a
- // Vector<UChar> and possibly optimize cases that can produce 8-bit Latin-1
- // strings, but that would need to be sophisticated about not processing
- // trailing incomplete sequences and communicating that to the caller.
</del><ins>+ if (!len)
+ return 0;
</ins><span class="cx">
</span><del>- Vector<UChar> outputBuffer(length);
</del><ins>+ StringBuffer<UChar> stringBuffer(len);
+ UChar* bufferUChar = stringBuffer.characters();
+ UChar* bufferUCharEnd = bufferUChar + len;
</ins><span class="cx">
</span><del>- UBool error = false;
- int inputOffset = 0;
- int outputOffset = 0;
- while (inputOffset < length) {
- UChar32 character;
- int nextInputOffset = inputOffset;
- U8_NEXT(reinterpret_cast<const uint8_t*>(buffer), nextInputOffset, length, character);
- if (character < 0) {
- if (nextInputOffset == length)
- break;
- ASSERT_NOT_REACHED();
- return -1;
- }
- inputOffset = nextInputOffset;
- U16_APPEND(outputBuffer.data(), outputOffset, length, character, error);
- if (error) {
- ASSERT_NOT_REACHED();
- return -1;
- }
</del><ins>+ const char* stringCurrent = buffer;
+ WTF::Unicode::ConversionResult result = WTF::Unicode::convertUTF8ToUTF16(&stringCurrent, buffer + len, &bufferUChar, bufferUCharEnd);
+ if (result != WTF::Unicode::conversionOK && result != WTF::Unicode::sourceExhausted) {
+ ASSERT_NOT_REACHED();
+ return -1;
</ins><span class="cx"> }
</span><span class="cx">
</span><del>- resultOutput.append(outputBuffer.data(), outputOffset);
- return inputOffset;
</del><ins>+ int utf16Length = bufferUChar - stringBuffer.characters();
+ resultOutput.append(stringBuffer.characters(), utf16Length);
+ return stringCurrent - buffer;
</ins><span class="cx"> }
</span><span class="cx">
</span><span class="cx"> static bool saveResultToString(xmlDocPtr resultDoc, xsltStylesheetPtr sheet, String& resultString)
</span></span></pre></div>
<a id="trunkSourceWebCorexmlparserXMLDocumentParserLibxml2cpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1153,8 +1153,8 @@
</span><span class="cx"> static size_t convertUTF16EntityToUTF8(const UChar* utf16Entity, size_t numberOfCodeUnits, char* target, size_t targetSize)
</span><span class="cx"> {
</span><span class="cx"> const char* originalTarget = target;
</span><del>- WTF::Unicode::ConversionResult conversionResult = WTF::Unicode::convertUTF16ToUTF8(&utf16Entity, utf16Entity + numberOfCodeUnits, &target, target + targetSize);
- if (conversionResult != WTF::Unicode::ConversionOK)
</del><ins>+ auto conversionResult = WTF::Unicode::convertUTF16ToUTF8(&utf16Entity, utf16Entity + numberOfCodeUnits, &target, target + targetSize);
+ if (conversionResult != WTF::Unicode::conversionOK)
</ins><span class="cx"> return 0;
</span><span class="cx">
</span><span class="cx"> // Even though we must pass the length, libxml expects the entity string to be null terminated.
</span></span></pre></div>
<a id="trunkSourceWebKitChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebKit/ChangeLog (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebKit/ChangeLog 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WebKit/ChangeLog 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -1,3 +1,16 @@
</span><ins>+2019-05-01 Shawn Roberts <sroberts@apple.com>
+
+ Unreviewed, rolling out r244821.
+
+ Causing
+
+ Reverted changeset:
+
+ "WebKit has too much of its own UTF-8 code and should rely
+ more on ICU's UTF-8 support"
+ https://bugs.webkit.org/show_bug.cgi?id=195535
+ https://trac.webkit.org/changeset/244821
+
</ins><span class="cx"> 2019-05-01 Youenn Fablet <youenn@apple.com>
</span><span class="cx">
</span><span class="cx"> Kept alive loaders should use the redirected request in case of redirections
</span></span></pre></div>
<a id="trunkSourceWebKitSharedAPIAPIStringh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebKit/Shared/API/APIString.h (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebKit/Shared/API/APIString.h 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WebKit/Shared/API/APIString.h 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -23,10 +23,14 @@
</span><span class="cx"> * THE POSSIBILITY OF SUCH DAMAGE.
</span><span class="cx"> */
</span><span class="cx">
</span><del>-#pragma once
</del><ins>+#ifndef APIString_h
+#define APIString_h
</ins><span class="cx">
</span><span class="cx"> #include "APIObject.h"
</span><ins>+#include <wtf/Ref.h>
</ins><span class="cx"> #include <wtf/text/StringView.h>
</span><ins>+#include <wtf/text/WTFString.h>
+#include <wtf/unicode/UTF8Conversion.h>
</ins><span class="cx">
</span><span class="cx"> namespace API {
</span><span class="cx">
</span><span class="lines">@@ -71,3 +75,5 @@
</span><span class="cx"> };
</span><span class="cx">
</span><span class="cx"> } // namespace WebKit
</span><ins>+
+#endif // APIString_h
</ins></span></pre></div>
<a id="trunkSourceWebKitSharedAPIcWKStringcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebKit/Shared/API/c/WKString.cpp (244826 => 244827)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebKit/Shared/API/c/WKString.cpp 2019-05-01 17:12:19 UTC (rev 244826)
+++ trunk/Source/WebKit/Shared/API/c/WKString.cpp 2019-05-01 17:13:58 UTC (rev 244827)
</span><span class="lines">@@ -30,7 +30,6 @@
</span><span class="cx"> #include "WKAPICast.h"
</span><span class="cx"> #include <JavaScriptCore/InitializeThreading.h>
</span><span class="cx"> #include <JavaScriptCore/OpaqueJSString.h>
</span><del>-#include <wtf/unicode/UTF8Conversion.h>
</del><span class="cx">
</span><span class="cx"> WKTypeID WKStringGetTypeID()
</span><span class="cx"> {
</span><span class="lines">@@ -79,18 +78,19 @@
</span><span class="cx"> auto stringView = WebKit::toImpl(stringRef)->stringView();
</span><span class="cx">
</span><span class="cx"> char* p = buffer;
</span><ins>+ WTF::Unicode::ConversionResult result;
</ins><span class="cx">
</span><span class="cx"> if (stringView.is8Bit()) {
</span><span class="cx"> const LChar* characters = stringView.characters8();
</span><del>- if (!WTF::Unicode::convertLatin1ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1))
- return 0;
</del><ins>+ result = WTF::Unicode::convertLatin1ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1);
</ins><span class="cx"> } else {
</span><span class="cx"> const UChar* characters = stringView.characters16();
</span><del>- WTF::Unicode::ConversionResult result = WTF::Unicode::convertUTF16ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1, strict);
- if (result != WTF::Unicode::ConversionOK && result != WTF::Unicode::TargetExhausted)
- return 0;
</del><ins>+ result = WTF::Unicode::convertUTF16ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1, strict);
</ins><span class="cx"> }
</span><span class="cx">
</span><ins>+ if (result != WTF::Unicode::conversionOK && result != WTF::Unicode::targetExhausted)
+ return 0;
+
</ins><span class="cx"> *p++ = '\0';
</span><span class="cx"> return p - buffer;
</span><span class="cx"> }
</span></span></pre>
</div>
</div>
</body>
</html>