<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[244821] trunk</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/244821">244821</a></dd>
<dt>Author</dt> <dd>darin@apple.com</dd>
<dt>Date</dt> <dd>2019-05-01 08:52:16 -0700 (Wed, 01 May 2019)</dd>
</dl>

<h3>Log Message</h3>
<pre>WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support
https://bugs.webkit.org/show_bug.cgi?id=195535

Reviewed by Alexey Proskuryakov.

LayoutTests/imported/w3c:

* web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt:
Updated expected results to have the Unicode replacement character in cases where the
text contains unpaired surrogates. The tests are still doing the same operations, and
still getting the same results, but the text output no longer includes illegal UTF-8.

Source/JavaScriptCore:

* API/JSClassRef.cpp: Removed uneeded include of UTF8Conversion.h.

* API/JSStringRef.cpp:
(JSStringCreateWithUTF8CString): Updated for changes to convertUTF8ToUTF16.
(JSStringGetUTF8CString): Updated for changes to convertLatin1ToUTF8.
Removed unneeded "true" to get the strict version of convertUTF16ToUTF8,
since that is the default. Also updated for changes to CompletionResult.

* runtime/JSGlobalObjectFunctions.cpp:
(JSC::decode): Stop using UTF8SequenceLength, and instead use U8_COUNT_TRAIL_BYTES
and U8_MAX_LENGTH. Instead of decodeUTF8Sequence, use U8_NEXT. Also use U_IS_BMP,
U_IS_SUPPLEMENTARY, U16_LEAD, U16_TRAIL, and U_IS_SURROGATE instead of our own
equivalents, since these macros from ICU are correct and efficient.

* wasm/WasmParser.h:
(JSC::Wasm::Parser<SuccessType>::consumeUTF8String): Updated for changes to
convertUTF8ToUTF16.

Source/WebCore:

* platform/SharedBuffer.cpp:
(WebCore::utf8Buffer): Removed unnecessary "strict" argument to convertUTF16ToUTF8 since
that is the default behavior. Also updated for changes to return values.

* xml/XSLTProcessorLibxslt.cpp:
(WebCore::writeToStringBuilder): Removed unnecessary use of StringBuffer for a temporary
buffer for characters. Rewrote to use U8_NEXT and U16_APPEND directly.

* xml/parser/XMLDocumentParserLibxml2.cpp:
(WebCore::convertUTF16EntityToUTF8): Updated for changes to CompletionResult.

Source/WebKit:

* Shared/API/APIString.h: Removed uneeded includes and also switched to #pragma once.

* Shared/API/c/WKString.cpp: Moved include of UTF8Conversion.h here.
(WKStringGetUTF8CStringImpl): Updated for changes to return values.

Source/WTF:

* wtf/text/AtomicString.cpp:
(WTF::AtomicString::fromUTF8Internal): Added code to compute string length when the
end is nullptr; this behavior used to be implemented inside the
calculateStringHashAndLengthFromUTF8MaskingTop8Bits function.

* wtf/text/AtomicStringImpl.cpp:
(WTF::HashAndUTF8CharactersTranslator::translate): Updated for change to
convertUTF8ToUTF16.

* wtf/text/AtomicStringImpl.h: Took the WTF_EXPORT_PRIVATE off of the
AtomicStringImpl::addUTF8 function. This is used only inside a non-inlined function in
the AtomicString class and its behavior changed subtly in this patch; it's helpful
to document that it's not exported.

* wtf/text/StringImpl.cpp:
(WTF::StringImpl::utf8Impl): Don't pass "true" for strictness to convertUTF16ToUTF8
since strict is the default. Also updated for changes to ConversionResult.
(WTF::StringImpl::utf8ForCharacters): Updated for change to convertLatin1ToUTF8.
(WTF::StringImpl::tryGetUtf8ForRange const): Ditto.

* wtf/text/StringView.cpp: Removed uneeded include of UTF8Conversion.h.

* wtf/text/WTFString.cpp:
(WTF::String::fromUTF8): Updated for change to convertUTF8ToUTF16.

* wtf/unicode/UTF8Conversion.cpp:
(WTF::Unicode::inlineUTF8SequenceLengthNonASCII): Deleted.
(WTF::Unicode::inlineUTF8SequenceLength): Deleted.
(WTF::Unicode::UTF8SequenceLength): Deleted.
(WTF::Unicode::decodeUTF8Sequence): Deleted.
(WTF::Unicode::convertLatin1ToUTF8): Use U8_APPEND, enabling us to remove
almost everything in the function. Also changed resturn value to be a boolean
to indicate success since there is only one possible failure (target exhausted).
There is room for further simplification, since most callers have lengths rather
than end pointers for the source buffer, and all but one caller supplies a buffer
size known to be sufficient, so those don't need a return value, nor do they need
to pass an end of buffer pointer.
(WTF::Unicode::convertUTF16ToUTF8): Use U_IS_LEAD, U_IS_TRAIL,
U16_GET_SUPPLEMENTARY, U_IS_SURROGATE, and U8_APPEND. Also changed behavior
for non-strict mode so that unpaired surrogates will be turned into the
replacement character instead of invalid UTF-8 sequences, because U8_APPEND
won't create an invalid UTF-8 sequence, and because we don't need to do that
for any good reason at any call site.
(WTF::Unicode::isLegalUTF8): Deleted.
(WTF::Unicode::readUTF8Sequence): Deleted.
(WTF::Unicode::convertUTF8ToUTF16): Use U8_NEXT instead of
inlineUTF8SequenceLength, isLegalUTF8, and readUTF8Sequence. Use
U16_APPEND instead of lots of code that does the same thing. There is
room for further simplification since most callers don't need the "all ASCII"
feature and could probably pass the arguments in a more natural way.
(WTF::Unicode::calculateStringHashAndLengthFromUTF8MaskingTop8Bits):
Use U8_NEXT instead of isLegalUTF8, readUTF8Sequence, and various
error handling checks for things that are handled by U8_NEXT. Also removed
support for passing nullptr for end to specify a null-terminated string.
(WTF::Unicode::equalUTF16WithUTF8): Ditto.

* wtf/unicode/UTF8Conversion.h: Removed UTF8SequenceLength and
decodeUTF8Sequence. Changed the ConversionResult to match WebKit coding
style, with an eye toward perhaps removing it in the future. Changed
the convertUTF8ToUTF16 return value to a boolean and removed the "strict"
argument since no caller was passing false. Changed the convertLatin1ToUTF8
return value to a boolean. Tweaked comments.

LayoutTests:

* css3/escape-dom-api-expected.txt:
* fast/text/dangling-surrogates-expected.txt:
* js/dom/webidl-type-mapping-expected.txt:
* js/invalid-utf8-in-syntax-error-expected.txt:
Updated expected results to have the Unicode replacement character in cases where the
text contains unpaired surrogates. The tests are still doing the same operations, and
still getting the same results, but the text output no longer includes illegal UTF-8
because the WTF changes affect the code path that DumpRenderTree and WebKitTestRunner
use to produce the text output.

* js/invalid-utf8-in-syntax-error.html: Added. Before adding this, the test was
run, but unlike the rest of the tests in this directory, was only run as part of
run-javascriptcore-tests. There are two reasons for adding this. One is to be
consistent with the rest of the tests here and run a second time as part of the
broader WebKit tests. The second is that we can now use "--reset-results" to generate
new expected results, something that run-webkit-tests has but run-javascriptcore-tests
does not have.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkLayoutTestsChangeLog">trunk/LayoutTests/ChangeLog</a></li>
<li><a href="#trunkLayoutTestscss3escapedomapiexpectedtxt">trunk/LayoutTests/css3/escape-dom-api-expected.txt</a></li>
<li><a href="#trunkLayoutTestsfasttextdanglingsurrogatesexpectedtxt">trunk/LayoutTests/fast/text/dangling-surrogates-expected.txt</a></li>
<li><a href="#trunkLayoutTestsimportedw3cChangeLog">trunk/LayoutTests/imported/w3c/ChangeLog</a></li>
<li><a href="#trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecoderutf16surrogatesexpectedtxt">trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt</a></li>
<li><a href="#trunkLayoutTestsjsdomwebidltypemappingexpectedtxt">trunk/LayoutTests/js/dom/webidl-type-mapping-expected.txt</a></li>
<li><a href="#trunkLayoutTestsjsinvalidutf8insyntaxerrorexpectedtxt">trunk/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt</a></li>
<li><a href="#trunkSourceJavaScriptCoreAPIJSClassRefcpp">trunk/Source/JavaScriptCore/API/JSClassRef.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreAPIJSStringRefcpp">trunk/Source/JavaScriptCore/API/JSStringRef.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCoreChangeLog">trunk/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#trunkSourceJavaScriptCoreruntimeJSGlobalObjectFunctionscpp">trunk/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp</a></li>
<li><a href="#trunkSourceJavaScriptCorewasmWasmParserh">trunk/Source/JavaScriptCore/wasm/WasmParser.h</a></li>
<li><a href="#trunkSourceWTFChangeLog">trunk/Source/WTF/ChangeLog</a></li>
<li><a href="#trunkSourceWTFwtftextAtomicStringcpp">trunk/Source/WTF/wtf/text/AtomicString.cpp</a></li>
<li><a href="#trunkSourceWTFwtftextAtomicStringImplcpp">trunk/Source/WTF/wtf/text/AtomicStringImpl.cpp</a></li>
<li><a href="#trunkSourceWTFwtftextAtomicStringImplh">trunk/Source/WTF/wtf/text/AtomicStringImpl.h</a></li>
<li><a href="#trunkSourceWTFwtftextStringImplcpp">trunk/Source/WTF/wtf/text/StringImpl.cpp</a></li>
<li><a href="#trunkSourceWTFwtftextStringViewcpp">trunk/Source/WTF/wtf/text/StringView.cpp</a></li>
<li><a href="#trunkSourceWTFwtftextWTFStringcpp">trunk/Source/WTF/wtf/text/WTFString.cpp</a></li>
<li><a href="#trunkSourceWTFwtfunicodeUTF8Conversioncpp">trunk/Source/WTF/wtf/unicode/UTF8Conversion.cpp</a></li>
<li><a href="#trunkSourceWTFwtfunicodeUTF8Conversionh">trunk/Source/WTF/wtf/unicode/UTF8Conversion.h</a></li>
<li><a href="#trunkSourceWebCoreChangeLog">trunk/Source/WebCore/ChangeLog</a></li>
<li><a href="#trunkSourceWebCoreplatformSharedBuffercpp">trunk/Source/WebCore/platform/SharedBuffer.cpp</a></li>
<li><a href="#trunkSourceWebCorexmlXSLTProcessorLibxsltcpp">trunk/Source/WebCore/xml/XSLTProcessorLibxslt.cpp</a></li>
<li><a href="#trunkSourceWebCorexmlparserXMLDocumentParserLibxml2cpp">trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp</a></li>
<li><a href="#trunkSourceWebKitChangeLog">trunk/Source/WebKit/ChangeLog</a></li>
<li><a href="#trunkSourceWebKitSharedAPIAPIStringh">trunk/Source/WebKit/Shared/API/APIString.h</a></li>
<li><a href="#trunkSourceWebKitSharedAPIcWKStringcpp">trunk/Source/WebKit/Shared/API/c/WKString.cpp</a></li>
</ul>

<h3>Added Paths</h3>
<ul>
<li><a href="#trunkLayoutTestsjsinvalidutf8insyntaxerrorhtml">trunk/LayoutTests/js/invalid-utf8-in-syntax-error.html</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkLayoutTestsChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/ChangeLog (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/ChangeLog      2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/LayoutTests/ChangeLog 2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1,3 +1,28 @@
</span><ins>+2019-04-29  Darin Adler  <darin@apple.com>
+
+        WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support
+        https://bugs.webkit.org/show_bug.cgi?id=195535
+
+        Reviewed by Alexey Proskuryakov.
+
+        * css3/escape-dom-api-expected.txt:
+        * fast/text/dangling-surrogates-expected.txt:
+        * js/dom/webidl-type-mapping-expected.txt:
+        * js/invalid-utf8-in-syntax-error-expected.txt:
+        Updated expected results to have the Unicode replacement character in cases where the
+        text contains unpaired surrogates. The tests are still doing the same operations, and
+        still getting the same results, but the text output no longer includes illegal UTF-8
+        because the WTF changes affect the code path that DumpRenderTree and WebKitTestRunner
+        use to produce the text output.
+
+        * js/invalid-utf8-in-syntax-error.html: Added. Before adding this, the test was
+        run, but unlike the rest of the tests in this directory, was only run as part of
+        run-javascriptcore-tests. There are two reasons for adding this. One is to be
+        consistent with the rest of the tests here and run a second time as part of the
+        broader WebKit tests. The second is that we can now use "--reset-results" to generate
+        new expected results, something that run-webkit-tests has but run-javascriptcore-tests
+        does not have.
+
</ins><span class="cx"> 2019-04-30  Myles C. Maxfield  <mmaxfield@apple.com>
</span><span class="cx"> 
</span><span class="cx">         font-weight: 1000 is not parsed successfully
</span></span></pre></div>
<a id="trunkLayoutTestscss3escapedomapiexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/css3/escape-dom-api-expected.txt (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/css3/escape-dom-api-expected.txt       2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/LayoutTests/css3/escape-dom-api-expected.txt  2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -4,14 +4,14 @@
</span><span class="cx"> 
</span><span class="cx"> 
</span><span class="cx"> PASS CSS.escape.length is 1
</span><del>-PASS CSS.escape('\0') is "�"
-PASS CSS.escape('a\0') is "a�"
-PASS CSS.escape('\0b') is "�b"
-PASS CSS.escape('a\0b') is "a�b"
-PASS CSS.escape('�') is "�"
-PASS CSS.escape('a�') is "a�"
-PASS CSS.escape('�b') is "�b"
-PASS CSS.escape('a�b') is "a�b"
</del><ins>+PASS CSS.escape('\0') is "�"
+PASS CSS.escape('a\0') is "a�"
+PASS CSS.escape('\0b') is "�b"
+PASS CSS.escape('a\0b') is "a�b"
+PASS CSS.escape('�') is "�"
+PASS CSS.escape('a�') is "a�"
+PASS CSS.escape('�b') is "�b"
+PASS CSS.escape('a�b') is "a�b"
</ins><span class="cx"> PASS CSS.escape() threw exception TypeError: Not enough arguments.
</span><span class="cx"> PASS CSS.escape(undefined) is "undefined"
</span><span class="cx"> PASS CSS.escape(true) is "true"
</span><span class="lines">@@ -53,16 +53,16 @@
</span><span class="cx"> PASS CSS.escape('-a') is "-a"
</span><span class="cx"> PASS CSS.escape('--') is "--"
</span><span class="cx"> PASS CSS.escape('--a') is "--a"
</span><del>-PASS CSS.escape('€-_©') is "€-_©"
-PASS CSS.escape('€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ') is "\\7f Â€ÂÂ‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ"
-PASS CSS.escape(' ¡¢') is " ¡¢"
</del><ins>+PASS CSS.escape('€-_©') is "€-_©"
+PASS CSS.escape('€‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ') is "\\7f €‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ"
+PASS CSS.escape(' ¡¢') is " ¡¢"
</ins><span class="cx"> PASS CSS.escape('a0123456789b') is "a0123456789b"
</span><span class="cx"> PASS CSS.escape('abcdefghijklmnopqrstuvwxyz') is "abcdefghijklmnopqrstuvwxyz"
</span><span class="cx"> PASS CSS.escape('ABCDEFGHIJKLMNOPQRSTUVWXYZ') is "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
</span><span class="cx"> PASS CSS.escape(' !xy') is "\\ \\!xy"
</span><del>-PASS CSS.escape('𝌆') is "𝌆"
-PASS CSS.escape('í¼†') is "\udf06"
-PASS CSS.escape('í ´') is "\ud834"
</del><ins>+PASS CSS.escape('𝌆') is "𝌆"
+PASS CSS.escape('�') is "\udf06"
+PASS CSS.escape('�') is "\ud834"
</ins><span class="cx"> PASS successfullyParsed is true
</span><span class="cx"> 
</span><span class="cx"> TEST COMPLETE
</span></span></pre></div>
<a id="trunkLayoutTestsfasttextdanglingsurrogatesexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/fast/text/dangling-surrogates-expected.txt (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/fast/text/dangling-surrogates-expected.txt     2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/LayoutTests/fast/text/dangling-surrogates-expected.txt        2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -3,8 +3,8 @@
</span><span class="cx"> On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
</span><span class="cx"> 
</span><span class="cx"> 
</span><del>-PASS danglingFirst is "í ƒ"
-PASS danglingSecond is "í°"
</del><ins>+PASS danglingFirst is "�"
+PASS danglingSecond is "�"
</ins><span class="cx"> PASS successfullyParsed is true
</span><span class="cx"> 
</span><span class="cx"> TEST COMPLETE
</span></span></pre></div>
<a id="trunkLayoutTestsimportedw3cChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/imported/w3c/ChangeLog (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/imported/w3c/ChangeLog 2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/LayoutTests/imported/w3c/ChangeLog    2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1,3 +1,15 @@
</span><ins>+2019-04-29  Darin Adler  <darin@apple.com>
+
+        WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support
+        https://bugs.webkit.org/show_bug.cgi?id=195535
+
+        Reviewed by Alexey Proskuryakov.
+
+        * web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt:
+        Updated expected results to have the Unicode replacement character in cases where the
+        text contains unpaired surrogates. The tests are still doing the same operations, and
+        still getting the same results, but the text output no longer includes illegal UTF-8.
+
</ins><span class="cx"> 2019-04-30  Youenn Fablet  <youenn@apple.com>
</span><span class="cx"> 
</span><span class="cx">         [macOS WK1] ASSERTION FAILED: formData in WebCore::ResourceRequest::doUpdateResourceHTTPBody()
</span></span></pre></div>
<a id="trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecoderutf16surrogatesexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt     2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-utf16-surrogates-expected.txt        2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1,21 +1,21 @@
</span><span class="cx"> 
</span><del>-FAIL utf-16le - lone surrogate lead assert_equals: expected "\ufffd" but got "í €"
</del><ins>+FAIL utf-16le - lone surrogate lead assert_equals: expected "\ufffd" but got "�"
</ins><span class="cx"> FAIL utf-16le - lone surrogate lead (fatal flag set) assert_throws: function "function () {
</span><span class="cx">             new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input))
</span><span class="cx">         }" did not throw
</span><del>-FAIL utf-16le - lone surrogate trail assert_equals: expected "\ufffd" but got "í°€"
</del><ins>+FAIL utf-16le - lone surrogate trail assert_equals: expected "\ufffd" but got "�"
</ins><span class="cx"> FAIL utf-16le - lone surrogate trail (fatal flag set) assert_throws: function "function () {
</span><span class="cx">             new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input))
</span><span class="cx">         }" did not throw
</span><del>-FAIL utf-16le - unmatched surrogate lead assert_equals: expected "\ufffd\0" but got "í €\0"
</del><ins>+FAIL utf-16le - unmatched surrogate lead assert_equals: expected "\ufffd\0" but got "�\0"
</ins><span class="cx"> FAIL utf-16le - unmatched surrogate lead (fatal flag set) assert_throws: function "function () {
</span><span class="cx">             new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input))
</span><span class="cx">         }" did not throw
</span><del>-FAIL utf-16le - unmatched surrogate trail assert_equals: expected "\ufffd\0" but got "í°€\0"
</del><ins>+FAIL utf-16le - unmatched surrogate trail assert_equals: expected "\ufffd\0" but got "�\0"
</ins><span class="cx"> FAIL utf-16le - unmatched surrogate trail (fatal flag set) assert_throws: function "function () {
</span><span class="cx">             new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input))
</span><span class="cx">         }" did not throw
</span><del>-FAIL utf-16le - swapped surrogate pair assert_equals: expected "\ufffd\ufffd" but got "í°€í €"
</del><ins>+FAIL utf-16le - swapped surrogate pair assert_equals: expected "\ufffd\ufffd" but got "��"
</ins><span class="cx"> FAIL utf-16le - swapped surrogate pair (fatal flag set) assert_throws: function "function () {
</span><span class="cx">             new TextDecoder(t.encoding, {fatal: true}).decode(new Uint8Array(t.input))
</span><span class="cx">         }" did not throw
</span></span></pre></div>
<a id="trunkLayoutTestsjsdomwebidltypemappingexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/js/dom/webidl-type-mapping-expected.txt (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/js/dom/webidl-type-mapping-expected.txt        2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/LayoutTests/js/dom/webidl-type-mapping-expected.txt   2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1009,48 +1009,48 @@
</span><span class="cx"> 
</span><span class="cx"> converter.testUSVString = '!@#123ABCabc\x00\x80\xFF\r\n\t'
</span><span class="cx"> converter.testString = '!@#123ABCabc\x00\x80\xFF\r\n\t'
</span><del>-PASS converter.testUSVString is "!@#123ABCabc\u0000€ÿ\r\n\t"
-PASS converter.testString is "!@#123ABCabc\u0000€ÿ\r\n\t"
</del><ins>+PASS converter.testUSVString is "!@#123ABCabc\u0000€ÿ\r\n\t"
+PASS converter.testString is "!@#123ABCabc\u0000€ÿ\r\n\t"
</ins><span class="cx"> converter.testUSVString = '\u0100'
</span><span class="cx"> converter.testString = '\u0100'
</span><del>-PASS converter.testUSVString is "Ā"
-PASS converter.testString is "Ā"
</del><ins>+PASS converter.testUSVString is "Ā"
+PASS converter.testString is "Ā"
</ins><span class="cx"> PASS converter.testUSVString = {toString: function() { throw Error(); }} threw exception Error.
</span><span class="cx"> PASS converter.testString = {toString: function() { throw Error(); }} threw exception Error.
</span><del>-PASS converter.testUSVString is "Ā"
-PASS converter.testString is "Ā"
</del><ins>+PASS converter.testUSVString is "Ā"
+PASS converter.testString is "Ā"
</ins><span class="cx"> converter.testUSVString = "\ud800"
</span><span class="cx"> converter.testString = "\ud800"
</span><del>-PASS converter.testUSVString is "�"
</del><ins>+PASS converter.testUSVString is "�"
</ins><span class="cx"> PASS converter.testString is "\ud800"
</span><span class="cx"> converter.testUSVString = "\udc00"
</span><span class="cx"> converter.testString = "\udc00"
</span><del>-PASS converter.testUSVString is "�"
</del><ins>+PASS converter.testUSVString is "�"
</ins><span class="cx"> PASS converter.testString is "\udc00"
</span><span class="cx"> converter.testUSVString = "\ud800\u0000"
</span><span class="cx"> converter.testString = "\ud800\u0000"
</span><del>-PASS converter.testUSVString is "�\u0000"
</del><ins>+PASS converter.testUSVString is "�\u0000"
</ins><span class="cx"> PASS converter.testString is "\ud800\u0000"
</span><span class="cx"> converter.testUSVString = "\udc00\u0000"
</span><span class="cx"> converter.testString = "\udc00\u0000"
</span><del>-PASS converter.testUSVString is "�\u0000"
</del><ins>+PASS converter.testUSVString is "�\u0000"
</ins><span class="cx"> PASS converter.testString is "\udc00\u0000"
</span><span class="cx"> converter.testUSVString = "\udc00\ud800"
</span><span class="cx"> converter.testString = "\udc00\ud800"
</span><del>-PASS converter.testUSVString is "��"
</del><ins>+PASS converter.testUSVString is "��"
</ins><span class="cx"> PASS converter.testString is "\udc00\ud800"
</span><del>-converter.testUSVString = "𝄞"
-converter.testString = "𝄞"
-PASS converter.testUSVString is "𝄞"
-PASS converter.testString is "𝄞"
</del><ins>+converter.testUSVString = "𝄞"
+converter.testString = "𝄞"
+PASS converter.testUSVString is "𝄞"
+PASS converter.testString is "𝄞"
</ins><span class="cx"> converter.testByteString = '!@#123ABCabc\x00\x80\xFF\r\n\t'
</span><del>-PASS converter.testByteString is "!@#123ABCabc\u0000€ÿ\r\n\t"
</del><ins>+PASS converter.testByteString is "!@#123ABCabc\u0000€ÿ\r\n\t"
</ins><span class="cx"> converter.testByteString = '\u00FF'
</span><del>-PASS converter.testByteString is "ÿ"
</del><ins>+PASS converter.testByteString is "ÿ"
</ins><span class="cx"> PASS converter.testByteString = '\u0100' threw exception TypeError: Type error.
</span><del>-PASS converter.testByteString is "ÿ"
</del><ins>+PASS converter.testByteString is "ÿ"
</ins><span class="cx"> PASS converter.testByteString = {toString: function() { throw Error(); }} threw exception Error.
</span><del>-PASS converter.testByteString is "ÿ"
</del><ins>+PASS converter.testByteString is "ÿ"
</ins><span class="cx"> converter.testUSVString = true
</span><span class="cx"> converter.testString = true
</span><span class="cx"> converter.testByteString = true
</span><span class="lines">@@ -1180,37 +1180,37 @@
</span><span class="cx"> PASS 'key2' in converter.testNodeRecord() is true
</span><span class="cx"> PASS converter.testNodeRecord()['key2'] is document.documentElement
</span><span class="cx"> PASS converter.setTestNodeRecord({ key: 'hello' }) threw exception TypeError: Type error.
</span><del>-converter.setTestLongRecord({'í €': 1 })
-PASS converter.testLongRecord()['í €'] is 1
-converter.setTestNodeRecord({'í €': document })
-PASS converter.testNodeRecord()['�'] is document
-converter.setTestLongRecord({'í°€': 1 })
-PASS converter.testLongRecord()['í°€'] is 1
-converter.setTestNodeRecord({'í°€': document })
-PASS converter.testNodeRecord()['�'] is document
-converter.setTestLongRecord({'í €': 1 })
-PASS converter.testLongRecord()['í €\0'] is 1
-converter.setTestNodeRecord({'í €': document })
-PASS converter.testNodeRecord()['�\0'] is document
-converter.setTestLongRecord({'í°€': 1 })
-PASS converter.testLongRecord()['í°€\0'] is 1
-converter.setTestNodeRecord({'í°€': document })
-PASS converter.testNodeRecord()['�\0'] is document
-converter.setTestLongRecord({'í°€í €': 1 })
-PASS converter.testLongRecord()['í°€í €'] is 1
-converter.setTestNodeRecord({'í°€í €': document })
-PASS converter.testNodeRecord()['��'] is document
-converter.setTestLongRecord({'𝄞': 1 })
-PASS converter.testLongRecord()['𝄞'] is 1
-converter.setTestNodeRecord({'𝄞': document })
-PASS converter.testNodeRecord()['𝄞'] is document
</del><ins>+converter.setTestLongRecord({'�': 1 })
+PASS converter.testLongRecord()['�'] is 1
+converter.setTestNodeRecord({'�': document })
+PASS converter.testNodeRecord()['�'] is document
+converter.setTestLongRecord({'�': 1 })
+PASS converter.testLongRecord()['�'] is 1
+converter.setTestNodeRecord({'�': document })
+PASS converter.testNodeRecord()['�'] is document
+converter.setTestLongRecord({'�': 1 })
+PASS converter.testLongRecord()['�\0'] is 1
+converter.setTestNodeRecord({'�': document })
+PASS converter.testNodeRecord()['�\0'] is document
+converter.setTestLongRecord({'�': 1 })
+PASS converter.testLongRecord()['�\0'] is 1
+converter.setTestNodeRecord({'�': document })
+PASS converter.testNodeRecord()['�\0'] is document
+converter.setTestLongRecord({'��': 1 })
+PASS converter.testLongRecord()['��'] is 1
+converter.setTestNodeRecord({'��': document })
+PASS converter.testNodeRecord()['��'] is document
+converter.setTestLongRecord({'𝄞': 1 })
+PASS converter.testLongRecord()['𝄞'] is 1
+converter.setTestNodeRecord({'𝄞': document })
+PASS converter.testNodeRecord()['𝄞'] is document
</ins><span class="cx"> converter.setTestSequenceRecord({ key: ['value', 'other value'] })
</span><span class="cx"> PASS converter.testSequenceRecord().hasOwnProperty('key') is true
</span><span class="cx"> PASS 'key' in converter.testSequenceRecord() is true
</span><span class="cx"> PASS converter.testSequenceRecord()['key'] is ['value', 'other value']
</span><del>-PASS converter.setTestSequenceRecord({ 'Ā': ['value'] }) threw exception TypeError: Type error.
-converter.setTestSequenceRecord({ 'ÿ': ['value'] })
-PASS converter.testSequenceRecord()['ÿ'] is ['value']
</del><ins>+PASS converter.setTestSequenceRecord({ 'Ā': ['value'] }) threw exception TypeError: Type error.
+converter.setTestSequenceRecord({ 'ÿ': ['value'] })
+PASS converter.testSequenceRecord()['ÿ'] is ['value']
</ins><span class="cx"> PASS converter.testImpureNaNUnrestrictedDouble is NaN
</span><span class="cx"> PASS converter.testImpureNaN2UnrestrictedDouble is NaN
</span><span class="cx"> PASS converter.testQuietNaNUnrestrictedDouble is NaN
</span></span></pre></div>
<a id="trunkLayoutTestsjsinvalidutf8insyntaxerrorexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt   2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/LayoutTests/js/invalid-utf8-in-syntax-error-expected.txt      2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -3,7 +3,7 @@
</span><span class="cx"> On success, you will see a series of "PASS" messages, followed by "TEST COMPLETE".
</span><span class="cx"> 
</span><span class="cx"> 
</span><del>-PASS ({f("\x{DEAD}")}) threw exception SyntaxError: Unexpected string literal "íº­". Expected a parameter pattern or a ')' in parameter list..
</del><ins>+PASS ({f("�")}) threw exception SyntaxError: Unexpected string literal "�". Expected a parameter pattern or a ')' in parameter list..
</ins><span class="cx"> PASS successfullyParsed is true
</span><span class="cx"> 
</span><span class="cx"> TEST COMPLETE
</span></span></pre></div>
<a id="trunkLayoutTestsjsinvalidutf8insyntaxerrorhtml"></a>
<div class="addfile"><h4>Added: trunk/LayoutTests/js/invalid-utf8-in-syntax-error.html (0 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/js/invalid-utf8-in-syntax-error.html                           (rev 0)
+++ trunk/LayoutTests/js/invalid-utf8-in-syntax-error.html      2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -0,0 +1,10 @@
</span><ins>+<!DOCTYPE html>
+<html>
+<head>
+<meta charset="utf-8">
+<script src="../resources/js-test.js"></script>
+</head>
+<body>
+<script src="script-tests/invalid-utf8-in-syntax-error.js"></script>
+</body>
+</html>
</ins><span class="cx">Property changes on: trunk/LayoutTests/js/invalid-utf8-in-syntax-error.html
</span><span class="cx">___________________________________________________________________
</span></span></pre></div>
<a id="svneolstyle"></a>
<div class="addfile"><h4>Added: svn:eol-style</h4></div>
<ins>+native
</ins><span class="cx">\ No newline at end of property
</span><a id="svnmimetype"></a>
<div class="addfile"><h4>Added: svn:mime-type</h4></div>
<ins>+text/html
</ins><span class="cx">\ No newline at end of property
</span><a id="trunkSourceJavaScriptCoreAPIJSClassRefcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/API/JSClassRef.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/API/JSClassRef.cpp   2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/JavaScriptCore/API/JSClassRef.cpp      2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -35,10 +35,8 @@
</span><span class="cx"> #include "ObjectPrototype.h"
</span><span class="cx"> #include "JSCInlines.h"
</span><span class="cx"> #include <wtf/text/StringHash.h>
</span><del>-#include <wtf/unicode/UTF8Conversion.h>
</del><span class="cx"> 
</span><span class="cx"> using namespace JSC;
</span><del>-using namespace WTF::Unicode;
</del><span class="cx"> 
</span><span class="cx"> const JSClassDefinition kJSClassDefinitionEmpty = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 };
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreAPIJSStringRefcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/API/JSStringRef.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/API/JSStringRef.cpp  2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/JavaScriptCore/API/JSStringRef.cpp     2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -49,7 +49,7 @@
</span><span class="cx">         UChar* p = buffer.data();
</span><span class="cx">         bool sourceIsAllASCII;
</span><span class="cx">         const LChar* stringStart = reinterpret_cast<const LChar*>(string);
</span><del>-        if (conversionOK == convertUTF8ToUTF16(&string, string + length, &p, p + length, &sourceIsAllASCII)) {
</del><ins>+        if (convertUTF8ToUTF16(string, string + length, &p, p + length, &sourceIsAllASCII)) {
</ins><span class="cx">             if (sourceIsAllASCII)
</span><span class="cx">                 return &OpaqueJSString::create(stringStart, length).leakRef();
</span><span class="cx">             return &OpaqueJSString::create(buffer.data(), p - buffer.data()).leakRef();
</span><span class="lines">@@ -102,20 +102,18 @@
</span><span class="cx">         return 0;
</span><span class="cx"> 
</span><span class="cx">     char* destination = buffer;
</span><del>-    ConversionResult result;
</del><ins>+    bool failed = false;
</ins><span class="cx">     if (string->is8Bit()) {
</span><span class="cx">         const LChar* source = string->characters8();
</span><del>-        result = convertLatin1ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1);
</del><ins>+        convertLatin1ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1);
</ins><span class="cx">     } else {
</span><span class="cx">         const UChar* source = string->characters16();
</span><del>-        result = convertUTF16ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1, true);
</del><ins>+        ConversionResult result = convertUTF16ToUTF8(&source, source + string->length(), &destination, destination + bufferSize - 1);
+        failed = result != ConversionOK && result != TargetExhausted;
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     *destination++ = '\0';
</span><del>-    if (result != conversionOK && result != targetExhausted)
-        return 0;
-
-    return destination - buffer;
</del><ins>+    return failed ? 0 : destination - buffer;
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> bool JSStringIsEqual(JSStringRef a, JSStringRef b)
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/ChangeLog (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/ChangeLog    2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/JavaScriptCore/ChangeLog       2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1,3 +1,28 @@
</span><ins>+2019-04-29  Darin Adler  <darin@apple.com>
+
+        WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support
+        https://bugs.webkit.org/show_bug.cgi?id=195535
+
+        Reviewed by Alexey Proskuryakov.
+
+        * API/JSClassRef.cpp: Removed uneeded include of UTF8Conversion.h.
+
+        * API/JSStringRef.cpp:
+        (JSStringCreateWithUTF8CString): Updated for changes to convertUTF8ToUTF16.
+        (JSStringGetUTF8CString): Updated for changes to convertLatin1ToUTF8.
+        Removed unneeded "true" to get the strict version of convertUTF16ToUTF8,
+        since that is the default. Also updated for changes to CompletionResult.
+
+        * runtime/JSGlobalObjectFunctions.cpp:
+        (JSC::decode): Stop using UTF8SequenceLength, and instead use U8_COUNT_TRAIL_BYTES
+        and U8_MAX_LENGTH. Instead of decodeUTF8Sequence, use U8_NEXT. Also use U_IS_BMP,
+        U_IS_SUPPLEMENTARY, U16_LEAD, U16_TRAIL, and U_IS_SURROGATE instead of our own
+        equivalents, since these macros from ICU are correct and efficient.
+
+        * wasm/WasmParser.h:
+        (JSC::Wasm::Parser<SuccessType>::consumeUTF8String): Updated for changes to
+        convertUTF8ToUTF16.
+
</ins><span class="cx"> 2019-04-30  Commit Queue  <commit-queue@webkit.org>
</span><span class="cx"> 
</span><span class="cx">         Unreviewed, rolling out r244806.
</span></span></pre></div>
<a id="trunkSourceJavaScriptCoreruntimeJSGlobalObjectFunctionscpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp  2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/JavaScriptCore/runtime/JSGlobalObjectFunctions.cpp     2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -58,12 +58,9 @@
</span><span class="cx"> #include <wtf/MathExtras.h>
</span><span class="cx"> #include <wtf/dtoa.h>
</span><span class="cx"> #include <wtf/text/StringBuilder.h>
</span><del>-#include <wtf/unicode/UTF8Conversion.h>
</del><span class="cx"> 
</span><span class="cx"> namespace JSC {
</span><span class="cx"> 
</span><del>-using namespace WTF::Unicode;
-
</del><span class="cx"> const ASCIILiteral ObjectProtoCalledOnNullOrUndefinedError { "Object.prototype.__proto__ called on null or undefined"_s };
</span><span class="cx"> 
</span><span class="cx"> template<unsigned charactersCount>
</span><span class="lines">@@ -184,10 +181,10 @@
</span><span class="cx">             int charLen = 0;
</span><span class="cx">             if (k <= length - 3 && isASCIIHexDigit(p[1]) && isASCIIHexDigit(p[2])) {
</span><span class="cx">                 const char b0 = Lexer<CharType>::convertHex(p[1], p[2]);
</span><del>-                const int sequenceLen = UTF8SequenceLength(b0);
-                if (sequenceLen && k <= length - sequenceLen * 3) {
</del><ins>+                const int sequenceLen = 1 + U8_COUNT_TRAIL_BYTES(b0);
+                if (k <= length - sequenceLen * 3) {
</ins><span class="cx">                     charLen = sequenceLen * 3;
</span><del>-                    char sequence[5];
</del><ins>+                    uint8_t sequence[U8_MAX_LENGTH];
</ins><span class="cx">                     sequence[0] = b0;
</span><span class="cx">                     for (int i = 1; i < sequenceLen; ++i) {
</span><span class="cx">                         const CharType* q = p + i * 3;
</span><span class="lines">@@ -199,16 +196,20 @@
</span><span class="cx">                         }
</span><span class="cx">                     }
</span><span class="cx">                     if (charLen != 0) {
</span><del>-                        sequence[sequenceLen] = 0;
-                        const int character = decodeUTF8Sequence(sequence);
-                        if (character < 0 || character >= 0x110000)
</del><ins>+                        UChar32 character;
+                        int32_t offset = 0;
+                        U8_NEXT(sequence, offset, sequenceLen, character);
+                        if (character < 0)
</ins><span class="cx">                             charLen = 0;
</span><del>-                        else if (character >= 0x10000) {
</del><ins>+                        else if (!U_IS_BMP(character)) {
</ins><span class="cx">                             // Convert to surrogate pair.
</span><del>-                            builder.append(static_cast<UChar>(0xD800 | ((character - 0x10000) >> 10)));
-                            u = static_cast<UChar>(0xDC00 | ((character - 0x10000) & 0x3FF));
-                        } else
</del><ins>+                            ASSERT(U_IS_SUPPLEMENTARY(character));
+                            builder.append(U16_LEAD(character));
+                            u = U16_TRAIL(character);
+                        } else {
+                            ASSERT(!U_IS_SURROGATE(character));
</ins><span class="cx">                             u = static_cast<UChar>(character);
</span><ins>+                        }
</ins><span class="cx">                     }
</span><span class="cx">                 }
</span><span class="cx">             }
</span></span></pre></div>
<a id="trunkSourceJavaScriptCorewasmWasmParserh"></a>
<div class="modfile"><h4>Modified: trunk/Source/JavaScriptCore/wasm/WasmParser.h (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/JavaScriptCore/wasm/WasmParser.h    2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/JavaScriptCore/wasm/WasmParser.h       2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -162,7 +162,7 @@
</span><span class="cx"> 
</span><span class="cx">         UChar* bufferCurrent = bufferStart;
</span><span class="cx">         const char* stringCurrent = reinterpret_cast<const char*>(stringStart);
</span><del>-        if (WTF::Unicode::convertUTF8ToUTF16(&stringCurrent, reinterpret_cast<const char *>(stringStart + stringLength), &bufferCurrent, bufferCurrent + buffer.size()) != WTF::Unicode::conversionOK)
</del><ins>+        if (!WTF::Unicode::convertUTF8ToUTF16(stringCurrent, reinterpret_cast<const char *>(stringStart + stringLength), &bufferCurrent, bufferCurrent + buffer.size()))
</ins><span class="cx">             return false;
</span><span class="cx">     }
</span><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceWTFChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/ChangeLog (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/ChangeLog       2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WTF/ChangeLog  2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1,3 +1,73 @@
</span><ins>+2019-04-29  Darin Adler  <darin@apple.com>
+
+        WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support
+        https://bugs.webkit.org/show_bug.cgi?id=195535
+
+        Reviewed by Alexey Proskuryakov.
+
+        * wtf/text/AtomicString.cpp:
+        (WTF::AtomicString::fromUTF8Internal): Added code to compute string length when the
+        end is nullptr; this behavior used to be implemented inside the
+        calculateStringHashAndLengthFromUTF8MaskingTop8Bits function.
+
+        * wtf/text/AtomicStringImpl.cpp:
+        (WTF::HashAndUTF8CharactersTranslator::translate): Updated for change to
+        convertUTF8ToUTF16.
+
+        * wtf/text/AtomicStringImpl.h: Took the WTF_EXPORT_PRIVATE off of the
+        AtomicStringImpl::addUTF8 function. This is used only inside a non-inlined function in
+        the AtomicString class and its behavior changed subtly in this patch; it's helpful
+        to document that it's not exported.
+
+        * wtf/text/StringImpl.cpp:
+        (WTF::StringImpl::utf8Impl): Don't pass "true" for strictness to convertUTF16ToUTF8
+        since strict is the default. Also updated for changes to ConversionResult.
+        (WTF::StringImpl::utf8ForCharacters): Updated for change to convertLatin1ToUTF8.
+        (WTF::StringImpl::tryGetUtf8ForRange const): Ditto.
+
+        * wtf/text/StringView.cpp: Removed uneeded include of UTF8Conversion.h.
+
+        * wtf/text/WTFString.cpp:
+        (WTF::String::fromUTF8): Updated for change to convertUTF8ToUTF16.
+
+        * wtf/unicode/UTF8Conversion.cpp:
+        (WTF::Unicode::inlineUTF8SequenceLengthNonASCII): Deleted.
+        (WTF::Unicode::inlineUTF8SequenceLength): Deleted.
+        (WTF::Unicode::UTF8SequenceLength): Deleted.
+        (WTF::Unicode::decodeUTF8Sequence): Deleted.
+        (WTF::Unicode::convertLatin1ToUTF8): Use U8_APPEND, enabling us to remove
+        almost everything in the function. Also changed resturn value to be a boolean
+        to indicate success since there is only one possible failure (target exhausted).
+        There is room for further simplification, since most callers have lengths rather
+        than end pointers for the source buffer, and all but one caller supplies a buffer
+        size known to be sufficient, so those don't need a return value, nor do they need
+        to pass an end of buffer pointer.
+        (WTF::Unicode::convertUTF16ToUTF8): Use U_IS_LEAD, U_IS_TRAIL,
+        U16_GET_SUPPLEMENTARY, U_IS_SURROGATE, and U8_APPEND. Also changed behavior
+        for non-strict mode so that unpaired surrogates will be turned into the
+        replacement character instead of invalid UTF-8 sequences, because U8_APPEND
+        won't create an invalid UTF-8 sequence, and because we don't need to do that
+        for any good reason at any call site.
+        (WTF::Unicode::isLegalUTF8): Deleted.
+        (WTF::Unicode::readUTF8Sequence): Deleted.
+        (WTF::Unicode::convertUTF8ToUTF16): Use U8_NEXT instead of
+        inlineUTF8SequenceLength, isLegalUTF8, and readUTF8Sequence. Use
+        U16_APPEND instead of lots of code that does the same thing. There is
+        room for further simplification since most callers don't need the "all ASCII"
+        feature and could probably pass the arguments in a more natural way.
+        (WTF::Unicode::calculateStringHashAndLengthFromUTF8MaskingTop8Bits):
+        Use U8_NEXT instead of isLegalUTF8, readUTF8Sequence, and various
+        error handling checks for things that are handled by U8_NEXT. Also removed
+        support for passing nullptr for end to specify a null-terminated string.
+        (WTF::Unicode::equalUTF16WithUTF8): Ditto.
+
+        * wtf/unicode/UTF8Conversion.h: Removed UTF8SequenceLength and
+        decodeUTF8Sequence. Changed the ConversionResult to match WebKit coding
+        style, with an eye toward perhaps removing it in the future. Changed
+        the convertUTF8ToUTF16 return value to a boolean and removed the "strict"
+        argument since no caller was passing false. Changed the convertLatin1ToUTF8
+        return value to a boolean. Tweaked comments.
+
</ins><span class="cx"> 2019-04-30  John Wilander  <wilander@apple.com>
</span><span class="cx"> 
</span><span class="cx">         Add logging of Ad Click Attribution errors and events to a dedicated channel
</span></span></pre></div>
<a id="trunkSourceWTFwtftextAtomicStringcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/AtomicString.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/AtomicString.cpp       2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WTF/wtf/text/AtomicString.cpp  2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -113,19 +113,24 @@
</span><span class="cx">     return numberToString(number, buffer);
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-AtomicString AtomicString::fromUTF8Internal(const char* charactersStart, const char* charactersEnd)
</del><ins>+AtomicString AtomicString::fromUTF8Internal(const char* start, const char* end)
</ins><span class="cx"> {
</span><del>-    auto impl = AtomicStringImpl::addUTF8(charactersStart, charactersEnd);
-    if (!impl)
-        return nullAtom();
-    return impl.get();
</del><ins>+    ASSERT(start);
+
+    // Caller needs to handle empty string.
+    ASSERT(!end || end > start);
+    ASSERT(end || start[0]);
+
+    return AtomicStringImpl::addUTF8(start, end ? end : start + std::strlen(start));
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> #ifndef NDEBUG
</span><ins>+
</ins><span class="cx"> void AtomicString::show() const
</span><span class="cx"> {
</span><span class="cx">     m_string.show();
</span><span class="cx"> }
</span><ins>+
</ins><span class="cx"> #endif
</span><span class="cx"> 
</span><span class="cx"> WTF_EXPORT_PRIVATE LazyNeverDestroyed<AtomicString> nullAtomData;
</span></span></pre></div>
<a id="trunkSourceWTFwtftextAtomicStringImplcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/AtomicStringImpl.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/AtomicStringImpl.cpp   2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WTF/wtf/text/AtomicStringImpl.cpp      2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -219,7 +219,7 @@
</span><span class="cx"> 
</span><span class="cx">         bool isAllASCII;
</span><span class="cx">         const char* source = buffer.characters;
</span><del>-        if (convertUTF8ToUTF16(&source, source + buffer.length, &target, target + buffer.utf16Length, &isAllASCII) != conversionOK)
</del><ins>+        if (!convertUTF8ToUTF16(source, source + buffer.length, &target, target + buffer.utf16Length, &isAllASCII))
</ins><span class="cx">             ASSERT_NOT_REACHED();
</span><span class="cx"> 
</span><span class="cx">         if (isAllASCII)
</span></span></pre></div>
<a id="trunkSourceWTFwtftextAtomicStringImplh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/AtomicStringImpl.h (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/AtomicStringImpl.h     2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WTF/wtf/text/AtomicStringImpl.h        2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -56,7 +56,8 @@
</span><span class="cx">     WTF_EXPORT_PRIVATE static Ref<AtomicStringImpl> addLiteral(const char* characters, unsigned length);
</span><span class="cx"> 
</span><span class="cx">     // Returns null if the input data contains an invalid UTF-8 sequence.
</span><del>-    WTF_EXPORT_PRIVATE static RefPtr<AtomicStringImpl> addUTF8(const char* start, const char* end);
</del><ins>+    static RefPtr<AtomicStringImpl> addUTF8(const char* start, const char* end);
+
</ins><span class="cx"> #if USE(CF)
</span><span class="cx">     WTF_EXPORT_PRIVATE static RefPtr<AtomicStringImpl> add(CFStringRef);
</span><span class="cx"> #endif
</span></span></pre></div>
<a id="trunkSourceWTFwtftextStringImplcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/StringImpl.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/StringImpl.cpp 2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WTF/wtf/text/StringImpl.cpp    2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1756,11 +1756,11 @@
</span><span class="cx">         char* bufferEnd = buffer + bufferSize;
</span><span class="cx">         while (characters < charactersEnd) {
</span><span class="cx">             // Use strict conversion to detect unpaired surrogates.
</span><del>-            ConversionResult result = convertUTF16ToUTF8(&characters, charactersEnd, &buffer, bufferEnd, true);
-            ASSERT(result != targetExhausted);
</del><ins>+            ConversionResult result = convertUTF16ToUTF8(&characters, charactersEnd, &buffer, bufferEnd);
+            ASSERT(result != TargetExhausted);
</ins><span class="cx">             // Conversion fails when there is an unpaired surrogate.
</span><span class="cx">             // Put replacement character (U+FFFD) instead of the unpaired surrogate.
</span><del>-            if (result != conversionOK) {
</del><ins>+            if (result != ConversionOK) {
</ins><span class="cx">                 ASSERT((0xD800 <= *characters && *characters <= 0xDFFF));
</span><span class="cx">                 // There should be room left, since one UChar hasn't been converted.
</span><span class="cx">                 ASSERT((buffer + 3) <= bufferEnd);
</span><span class="lines">@@ -1772,16 +1772,16 @@
</span><span class="cx">         bool strict = mode == StrictConversion;
</span><span class="cx">         const UChar* originalCharacters = characters;
</span><span class="cx">         ConversionResult result = convertUTF16ToUTF8(&characters, characters + length, &buffer, buffer + bufferSize, strict);
</span><del>-        ASSERT(result != targetExhausted); // (length * 3) should be sufficient for any conversion
</del><ins>+        ASSERT(result != TargetExhausted); // (length * 3) should be sufficient for any conversion
</ins><span class="cx"> 
</span><span class="cx">         // Only produced from strict conversion.
</span><del>-        if (result == sourceIllegal) {
</del><ins>+        if (result == SourceIllegal) {
</ins><span class="cx">             ASSERT(strict);
</span><span class="cx">             return UTF8ConversionError::IllegalSource;
</span><span class="cx">         }
</span><span class="cx"> 
</span><span class="cx">         // Check for an unconverted high surrogate.
</span><del>-        if (result == sourceExhausted) {
</del><ins>+        if (result == SourceExhausted) {
</ins><span class="cx">             if (strict)
</span><span class="cx">                 return UTF8ConversionError::SourceExhausted;
</span><span class="cx">             // This should be one unpaired high surrogate. Treat it the same
</span><span class="lines">@@ -1809,8 +1809,8 @@
</span><span class="cx">     Vector<char, 1024> bufferVector(length * 3);
</span><span class="cx">     char* buffer = bufferVector.data();
</span><span class="cx">     const LChar* source = characters;
</span><del>-    ConversionResult result = convertLatin1ToUTF8(&source, source + length, &buffer, buffer + bufferVector.size());
-    ASSERT_UNUSED(result, result != targetExhausted); // (length * 3) should be sufficient for any conversion
</del><ins>+    bool charactersFit = convertLatin1ToUTF8(&source, source + length, &buffer, buffer + bufferVector.size());
+    ASSERT_UNUSED(charactersFit, charactersFit); // (length * 3) should be sufficient for any conversion
</ins><span class="cx">     return CString(bufferVector.data(), buffer - bufferVector.data());
</span><span class="cx"> }
</span><span class="cx"> 
</span><span class="lines">@@ -1854,9 +1854,8 @@
</span><span class="cx"> 
</span><span class="cx">     if (is8Bit()) {
</span><span class="cx">         const LChar* characters = this->characters8() + offset;
</span><del>-
-        ConversionResult result = convertLatin1ToUTF8(&characters, characters + length, &buffer, buffer + bufferVector.size());
-        ASSERT_UNUSED(result, result != targetExhausted); // (length * 3) should be sufficient for any conversion
</del><ins>+        bool charactersFit = convertLatin1ToUTF8(&characters, characters + length, &buffer, buffer + bufferVector.size());
+        ASSERT_UNUSED(charactersFit, charactersFit); // (length * 3) should be sufficient for any conversion
</ins><span class="cx">     } else {
</span><span class="cx">         UTF8ConversionError error = utf8Impl(this->characters16() + offset, length, buffer, bufferVector.size(), mode);
</span><span class="cx">         if (error != UTF8ConversionError::None)
</span></span></pre></div>
<a id="trunkSourceWTFwtftextStringViewcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/StringView.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/StringView.cpp 2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WTF/wtf/text/StringView.cpp    2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -35,12 +35,9 @@
</span><span class="cx"> #include <wtf/NeverDestroyed.h>
</span><span class="cx"> #include <wtf/Optional.h>
</span><span class="cx"> #include <wtf/text/TextBreakIterator.h>
</span><del>-#include <wtf/unicode/UTF8Conversion.h>
</del><span class="cx"> 
</span><span class="cx"> namespace WTF {
</span><span class="cx"> 
</span><del>-using namespace Unicode;
-
</del><span class="cx"> bool StringView::containsIgnoringASCIICase(const StringView& matchString) const
</span><span class="cx"> {
</span><span class="cx">     return findIgnoringASCIICase(matchString) != notFound;
</span></span></pre></div>
<a id="trunkSourceWTFwtftextWTFStringcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/text/WTFString.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/text/WTFString.cpp  2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WTF/wtf/text/WTFString.cpp     2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -859,7 +859,7 @@
</span><span class="cx">  
</span><span class="cx">     UChar* bufferCurrent = bufferStart;
</span><span class="cx">     const char* stringCurrent = reinterpret_cast<const char*>(stringStart);
</span><del>-    if (convertUTF8ToUTF16(&stringCurrent, reinterpret_cast<const char *>(stringStart + length), &bufferCurrent, bufferCurrent + buffer.size()) != conversionOK)
</del><ins>+    if (!convertUTF8ToUTF16(stringCurrent, reinterpret_cast<const char*>(stringStart + length), &bufferCurrent, bufferCurrent + buffer.size()))
</ins><span class="cx">         return String();
</span><span class="cx"> 
</span><span class="cx">     unsigned utf16Length = bufferCurrent - bufferStart;
</span></span></pre></div>
<a id="trunkSourceWTFwtfunicodeUTF8Conversioncpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/unicode/UTF8Conversion.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/unicode/UTF8Conversion.cpp  2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WTF/wtf/unicode/UTF8Conversion.cpp     2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2007, 2014 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2007, 2010-2012, 2014, 2019 Apple Inc. All rights reserved.
</ins><span class="cx">  * Copyright (C) 2010 Patrick Gansterer <paroga@paroga.com>
</span><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="lines">@@ -34,389 +34,107 @@
</span><span class="cx"> namespace WTF {
</span><span class="cx"> namespace Unicode {
</span><span class="cx"> 
</span><del>-inline int inlineUTF8SequenceLengthNonASCII(char b0)
</del><ins>+bool convertLatin1ToUTF8(const LChar** sourceStart, const LChar* sourceEnd, char** targetStart, char* targetEnd)
</ins><span class="cx"> {
</span><del>-    if ((b0 & 0xC0) != 0xC0)
-        return 0;
-    if ((b0 & 0xE0) == 0xC0)
-        return 2;
-    if ((b0 & 0xF0) == 0xE0)
-        return 3;
-    if ((b0 & 0xF8) == 0xF0)
-        return 4;
-    return 0;
-}
-
-inline int inlineUTF8SequenceLength(char b0)
-{
-    return isASCII(b0) ? 1 : inlineUTF8SequenceLengthNonASCII(b0);
-}
-
-int UTF8SequenceLength(char b0)
-{
-    return isASCII(b0) ? 1 : inlineUTF8SequenceLengthNonASCII(b0);
-}
-
-int decodeUTF8Sequence(const char* sequence)
-{
-    // Handle 0-byte sequences (never valid).
-    const unsigned char b0 = sequence[0];
-    const int length = inlineUTF8SequenceLength(b0);
-    if (length == 0)
-        return -1;
-
-    // Handle 1-byte sequences (plain ASCII).
-    const unsigned char b1 = sequence[1];
-    if (length == 1) {
-        if (b1)
-            return -1;
-        return b0;
-    }
-
-    // Handle 2-byte sequences.
-    if ((b1 & 0xC0) != 0x80)
-        return -1;
-    const unsigned char b2 = sequence[2];
-    if (length == 2) {
-        if (b2)
-            return -1;
-        const int c = ((b0 & 0x1F) << 6) | (b1 & 0x3F);
-        if (c < 0x80)
-            return -1;
-        return c;
-    }
-
-    // Handle 3-byte sequences.
-    if ((b2 & 0xC0) != 0x80)
-        return -1;
-    const unsigned char b3 = sequence[3];
-    if (length == 3) {
-        if (b3)
-            return -1;
-        const int c = ((b0 & 0xF) << 12) | ((b1 & 0x3F) << 6) | (b2 & 0x3F);
-        if (c < 0x800)
-            return -1;
-        // UTF-16 surrogates should never appear in UTF-8 data.
-        if (c >= 0xD800 && c <= 0xDFFF)
-            return -1;
-        return c;
-    }
-
-    // Handle 4-byte sequences.
-    if ((b3 & 0xC0) != 0x80)
-        return -1;
-    const unsigned char b4 = sequence[4];
-    if (length == 4) {
-        if (b4)
-            return -1;
-        const int c = ((b0 & 0x7) << 18) | ((b1 & 0x3F) << 12) | ((b2 & 0x3F) << 6) | (b3 & 0x3F);
-        if (c < 0x10000 || c > 0x10FFFF)
-            return -1;
-        return c;
-    }
-
-    return -1;
-}
-
-// Once the bits are split out into bytes of UTF-8, this is a mask OR-ed
-// into the first byte, depending on how many bytes follow.  There are
-// as many entries in this table as there are UTF-8 sequence types.
-// (I.e., one byte sequence, two byte... etc.). Remember that sequencs
-// for *legal* UTF-8 will be 4 or fewer bytes total.
-static const unsigned char firstByteMark[7] = { 0x00, 0x00, 0xC0, 0xE0, 0xF0, 0xF8, 0xFC };
-
-ConversionResult convertLatin1ToUTF8(
-    const LChar** sourceStart, const LChar* sourceEnd, 
-    char** targetStart, char* targetEnd)
-{
-    ConversionResult result = conversionOK;
-    const LChar* source = *sourceStart;
</del><ins>+    const LChar* source;
</ins><span class="cx">     char* target = *targetStart;
</span><del>-    while (source < sourceEnd) {
-        UChar32 ch;
-        unsigned short bytesToWrite = 0;
-        const UChar32 byteMask = 0xBF;
-        const UChar32 byteMark = 0x80; 
-        const LChar* oldSource = source; // In case we have to back up because of target overflow.
-        ch = static_cast<unsigned short>(*source++);
-
-        // Figure out how many bytes the result will require
-        if (ch < (UChar32)0x80)
-            bytesToWrite = 1;
-        else
-            bytesToWrite = 2;
-
-        target += bytesToWrite;
-        if (target > targetEnd) {
-            source = oldSource; // Back up source pointer!
-            target -= bytesToWrite;
-            result = targetExhausted;
-            break;
-        }
-        switch (bytesToWrite) { // note: everything falls through.
-        case 2:
-            *--target = (char)((ch | byteMark) & byteMask);
-            ch >>= 6;
-            FALLTHROUGH;
-        case 1:
-            *--target =  (char)(ch | firstByteMark[bytesToWrite]);
-        }
-        target += bytesToWrite;
</del><ins>+    unsigned i = 0;
+    for (source = *sourceStart; source < sourceEnd; ++source) {
+        UBool sawError = false;
+        // Work around bug in either Windows compiler or old version of ICU, where passing a uint8_t to
+        // U8_APPEND warns, by convering from uint8_t to a wider type.
+        UChar32 character = *source;
+        U8_APPEND(reinterpret_cast<uint8_t*>(target), i, targetEnd - *targetStart, character, sawError);
+        if (sawError)
+            return false;
</ins><span class="cx">     }
</span><span class="cx">     *sourceStart = source;
</span><del>-    *targetStart = target;
-    return result;
</del><ins>+    *targetStart = target + i;
+    return true;
</ins><span class="cx"> }
</span><span class="cx"> 
</span><del>-ConversionResult convertUTF16ToUTF8(
-    const UChar** sourceStart, const UChar* sourceEnd, 
-    char** targetStart, char* targetEnd, bool strict)
</del><ins>+ConversionResult convertUTF16ToUTF8(const UChar** sourceStart, const UChar* sourceEnd, char** targetStart, char* targetEnd, bool strict)
</ins><span class="cx"> {
</span><del>-    ConversionResult result = conversionOK;
</del><ins>+    ConversionResult result = ConversionOK;
</ins><span class="cx">     const UChar* source = *sourceStart;
</span><span class="cx">     char* target = *targetStart;
</span><ins>+    UBool sawError = false;
+    unsigned i = 0;
</ins><span class="cx">     while (source < sourceEnd) {
</span><span class="cx">         UChar32 ch;
</span><del>-        unsigned short bytesToWrite = 0;
-        const UChar32 byteMask = 0xBF;
-        const UChar32 byteMark = 0x80; 
-        const UChar* oldSource = source; // In case we have to back up because of target overflow.
-        ch = static_cast<unsigned short>(*source++);
-        // If we have a surrogate pair, convert to UChar32 first.
-        if (ch >= 0xD800 && ch <= 0xDBFF) {
-            // If the 16 bits following the high surrogate are in the source buffer...
-            if (source < sourceEnd) {
-                UChar32 ch2 = static_cast<unsigned short>(*source);
-                // If it's a low surrogate, convert to UChar32.
-                if (ch2 >= 0xDC00 && ch2 <= 0xDFFF) {
-                    ch = ((ch - 0xD800) << 10) + (ch2 - 0xDC00) + 0x0010000;
-                    ++source;
-                } else if (strict) { // it's an unpaired high surrogate
-                    --source; // return to the illegal value itself
-                    result = sourceIllegal;
-                    break;
-                }
-            } else { // We don't have the 16 bits following the high surrogate.
-                --source; // return to the high surrogate
-                result = sourceExhausted;
</del><ins>+        int j = 0;
+        U16_NEXT(source, j, sourceEnd - source, ch);
+        if (U_IS_SURROGATE(ch)) {
+            if (source + j == sourceEnd && U_IS_SURROGATE_LEAD(ch)) {
+                result = SourceExhausted;
</ins><span class="cx">                 break;
</span><span class="cx">             }
</span><del>-        } else if (strict) {
-            // UTF-16 surrogate values are illegal in UTF-32
-            if (ch >= 0xDC00 && ch <= 0xDFFF) {
-                --source; // return to the illegal value itself
-                result = sourceIllegal;
</del><ins>+            if (strict) {
+                result = SourceIllegal;
</ins><span class="cx">                 break;
</span><span class="cx">             }
</span><del>-        }
-        // Figure out how many bytes the result will require
-        if (ch < (UChar32)0x80) {
-            bytesToWrite = 1;
-        } else if (ch < (UChar32)0x800) {
-            bytesToWrite = 2;
-        } else if (ch < (UChar32)0x10000) {
-            bytesToWrite = 3;
-        } else if (ch < (UChar32)0x110000) {
-            bytesToWrite = 4;
-        } else {
-            bytesToWrite = 3;
</del><span class="cx">             ch = replacementCharacter;
</span><span class="cx">         }
</span><del>-
-        target += bytesToWrite;
-        if (target > targetEnd) {
-            source = oldSource; // Back up source pointer!
-            target -= bytesToWrite;
-            result = targetExhausted;
</del><ins>+        U8_APPEND(reinterpret_cast<uint8_t*>(target), i, targetEnd - target, ch, sawError);
+        if (sawError) {
+            result = TargetExhausted;
</ins><span class="cx">             break;
</span><span class="cx">         }
</span><del>-        switch (bytesToWrite) { // note: everything falls through.
-            case 4: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6; FALLTHROUGH;
-            case 3: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6; FALLTHROUGH;
-            case 2: *--target = (char)((ch | byteMark) & byteMask); ch >>= 6; FALLTHROUGH;
-            case 1: *--target =  (char)(ch | firstByteMark[bytesToWrite]);
-        }
-        target += bytesToWrite;
</del><ins>+        source += j;
</ins><span class="cx">     }
</span><span class="cx">     *sourceStart = source;
</span><del>-    *targetStart = target;
</del><ins>+    *targetStart = target + i;
</ins><span class="cx">     return result;
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-// This must be called with the length pre-determined by the first byte.
-// If presented with a length > 4, this returns false.  The Unicode
-// definition of UTF-8 goes up to 4-byte sequences.
-static bool isLegalUTF8(const unsigned char* source, int length)
</del><ins>+bool convertUTF8ToUTF16(const char* source, const char* sourceEnd, UChar** targetStart, UChar* targetEnd, bool* sourceAllASCII)
</ins><span class="cx"> {
</span><del>-    unsigned char a;
-    const unsigned char* srcptr = source + length;
-    switch (length) {
-        default: return false;
-        // Everything else falls through when "true"...
-        case 4: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false; FALLTHROUGH;
-        case 3: if ((a = (*--srcptr)) < 0x80 || a > 0xBF) return false; FALLTHROUGH;
-        case 2: if ((a = (*--srcptr)) > 0xBF) return false;
-
-        switch (*source) {
-            // no fall-through in this inner switch
-            case 0xE0: if (a < 0xA0) return false; break;
-            case 0xED: if (a > 0x9F) return false; break;
-            case 0xF0: if (a < 0x90) return false; break;
-            case 0xF4: if (a > 0x8F) return false; break;
-            default:   if (a < 0x80) return false;
-        }
-        FALLTHROUGH;
-
-        case 1: if (*source >= 0x80 && *source < 0xC2) return false;
-    }
-    if (*source > 0xF4)
-        return false;
-    return true;
-}
-
-// Magic values subtracted from a buffer value during UTF8 conversion.
-// This table contains as many values as there might be trailing bytes
-// in a UTF-8 sequence.
-static const UChar32 offsetsFromUTF8[6] = { 0x00000000UL, 0x00003080UL, 0x000E2080UL, 0x03C82080UL, static_cast<UChar32>(0xFA082080UL), static_cast<UChar32>(0x82082080UL) };
-
-static inline UChar32 readUTF8Sequence(const char*& sequence, unsigned length)
-{
-    UChar32 character = 0;
-
-    // The cases all fall through.
-    switch (length) {
-        case 6: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH;
-        case 5: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH;
-        case 4: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH;
-        case 3: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH;
-        case 2: character += static_cast<unsigned char>(*sequence++); character <<= 6; FALLTHROUGH;
-        case 1: character += static_cast<unsigned char>(*sequence++);
-    }
-
-    return character - offsetsFromUTF8[length - 1];
-}
-
-ConversionResult convertUTF8ToUTF16(
-    const char** sourceStart, const char* sourceEnd, 
-    UChar** targetStart, UChar* targetEnd, bool* sourceAllASCII, bool strict)
-{
-    ConversionResult result = conversionOK;
-    const char* source = *sourceStart;
</del><ins>+    RELEASE_ASSERT(sourceEnd - source <= std::numeric_limits<int>::max());
+    UBool error = false;
</ins><span class="cx">     UChar* target = *targetStart;
</span><del>-    UChar orAllData = 0;
-    while (source < sourceEnd) {
-        int utf8SequenceLength = inlineUTF8SequenceLength(*source);
-        if (sourceEnd - source < utf8SequenceLength)  {
-            result = sourceExhausted;
-            break;
-        }
-        // Do this check whether lenient or strict
-        if (!isLegalUTF8(reinterpret_cast<const unsigned char*>(source), utf8SequenceLength)) {
-            result = sourceIllegal;
-            break;
-        }
-
-        UChar32 character = readUTF8Sequence(source, utf8SequenceLength);
-
-        if (target >= targetEnd) {
-            source -= utf8SequenceLength; // Back up source pointer!
-            result = targetExhausted;
-            break;
-        }
-
-        if (U_IS_BMP(character)) {
-            // UTF-16 surrogate values are illegal in UTF-32
-            if (U_IS_SURROGATE(character)) {
-                if (strict) {
-                    source -= utf8SequenceLength; // return to the illegal value itself
-                    result = sourceIllegal;
-                    break;
-                } else {
-                    *target++ = replacementCharacter;
-                    orAllData |= replacementCharacter;
-                }
-            } else {
-                *target++ = character; // normal case
-                orAllData |= character;
-            }
-        } else if (U_IS_SUPPLEMENTARY(character)) {
-            // target is a character in range 0xFFFF - 0x10FFFF
-            if (target + 1 >= targetEnd) {
-                source -= utf8SequenceLength; // Back up source pointer!
-                result = targetExhausted;
-                break;
-            }
-            *target++ = U16_LEAD(character);
-            *target++ = U16_TRAIL(character);
-            orAllData = 0xffff;
-        } else {
-            if (strict) {
-                source -= utf8SequenceLength; // return to the start
-                result = sourceIllegal;
-                break; // Bail out; shouldn't continue
-            } else {
-                *target++ = replacementCharacter;
-                orAllData |= replacementCharacter;
-            }
-        }
</del><ins>+    UChar32 orAllData = 0;
+    unsigned targetOffset = 0;
+    for (int sourceOffset = 0; sourceOffset < sourceEnd - source; ) {
+        UChar32 character;
+        U8_NEXT(reinterpret_cast<const uint8_t*>(source), sourceOffset, sourceEnd - source, character);
+        if (character < 0)
+            return false;
+        U16_APPEND(target, targetOffset, targetEnd - target, character, error);
+        if (error)
+            return false;
+        orAllData |= character;
</ins><span class="cx">     }
</span><del>-    *sourceStart = source;
-    *targetStart = target;
-
</del><ins>+    *targetStart = target + targetOffset;
</ins><span class="cx">     if (sourceAllASCII)
</span><del>-        *sourceAllASCII = !(orAllData & ~0x7f);
-
-    return result;
</del><ins>+        *sourceAllASCII = isASCII(orAllData);
+    return true;
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> unsigned calculateStringHashAndLengthFromUTF8MaskingTop8Bits(const char* data, const char* dataEnd, unsigned& dataLength, unsigned& utf16Length)
</span><span class="cx"> {
</span><del>-    if (!data)
-        return 0;
-
</del><span class="cx">     StringHasher stringHasher;
</span><del>-    dataLength = 0;
</del><span class="cx">     utf16Length = 0;
</span><span class="cx"> 
</span><del>-    while (data < dataEnd || (!dataEnd && *data)) {
-        if (isASCII(*data)) {
-            stringHasher.addCharacter(*data++);
-            dataLength++;
-            utf16Length++;
-            continue;
-        }
-
-        int utf8SequenceLength = inlineUTF8SequenceLengthNonASCII(*data);
-        dataLength += utf8SequenceLength;
-
-        if (!dataEnd) {
-            for (int i = 1; i < utf8SequenceLength; ++i) {
-                if (!data[i])
-                    return 0;
-            }
-        } else if (dataEnd - data < utf8SequenceLength)
</del><ins>+    int inputOffset = 0;
+    int inputLength = dataEnd - data;
+    while (inputOffset < inputLength) {
+        UChar32 character;
+        U8_NEXT(reinterpret_cast<const uint8_t*>(data), inputOffset, inputLength, character);
+        if (character < 0)
</ins><span class="cx">             return 0;
</span><span class="cx"> 
</span><del>-        if (!isLegalUTF8(reinterpret_cast<const unsigned char*>(data), utf8SequenceLength))
-            return 0;
-
-        UChar32 character = readUTF8Sequence(data, utf8SequenceLength);
-        ASSERT(!isASCII(character));
-
</del><span class="cx">         if (U_IS_BMP(character)) {
</span><del>-            // UTF-16 surrogate values are illegal in UTF-32
-            if (U_IS_SURROGATE(character))
-                return 0;
-            stringHasher.addCharacter(static_cast<UChar>(character)); // normal case
</del><ins>+            ASSERT(!U_IS_SURROGATE(character));
+            stringHasher.addCharacter(character);
</ins><span class="cx">             utf16Length++;
</span><del>-        } else if (U_IS_SUPPLEMENTARY(character)) {
-            stringHasher.addCharacters(static_cast<UChar>(U16_LEAD(character)),
-                                       static_cast<UChar>(U16_TRAIL(character)));
</del><ins>+        } else {
+            ASSERT(U_IS_SUPPLEMENTARY(character));
+            stringHasher.addCharacters(U16_LEAD(character), U16_TRAIL(character));
</ins><span class="cx">             utf16Length += 2;
</span><del>-        } else
-            return 0;
</del><ins>+        }
</ins><span class="cx">     }
</span><span class="cx"> 
</span><ins>+    dataLength = inputOffset;
</ins><span class="cx">     return stringHasher.hashWithTop8BitsMasked();
</span><span class="cx"> }
</span><span class="cx"> 
</span><span class="lines">@@ -423,36 +141,24 @@
</span><span class="cx"> bool equalUTF16WithUTF8(const UChar* a, const char* b, const char* bEnd)
</span><span class="cx"> {
</span><span class="cx">     while (b < bEnd) {
</span><del>-        if (isASCII(*a) || isASCII(*b)) {
-            if (*a++ != *b++)
-                return false;
-            continue;
-        }
-
-        int utf8SequenceLength = inlineUTF8SequenceLengthNonASCII(*b);
-
-        if (bEnd - b < utf8SequenceLength)
</del><ins>+        int offset = 0;
+        UChar32 character;
+        U8_NEXT(reinterpret_cast<const uint8_t*>(b), offset, bEnd - b, character);
+        if (character < 0)
</ins><span class="cx">             return false;
</span><ins>+        b += offset;
</ins><span class="cx"> 
</span><del>-        if (!isLegalUTF8(reinterpret_cast<const unsigned char*>(b), utf8SequenceLength))
-            return false;
-
-        UChar32 character = readUTF8Sequence(b, utf8SequenceLength);
-        ASSERT(!isASCII(character));
-
</del><span class="cx">         if (U_IS_BMP(character)) {
</span><del>-            // UTF-16 surrogate values are illegal in UTF-32
-            if (U_IS_SURROGATE(character))
-                return false;
</del><ins>+            ASSERT(!U_IS_SURROGATE(character));
</ins><span class="cx">             if (*a++ != character)
</span><span class="cx">                 return false;
</span><del>-        } else if (U_IS_SUPPLEMENTARY(character)) {
</del><ins>+        } else {
+            ASSERT(U_IS_SUPPLEMENTARY(character));
</ins><span class="cx">             if (*a++ != U16_LEAD(character))
</span><span class="cx">                 return false;
</span><span class="cx">             if (*a++ != U16_TRAIL(character))
</span><span class="cx">                 return false;
</span><del>-        } else
-            return false;
</del><ins>+        }
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     return true;
</span></span></pre></div>
<a id="trunkSourceWTFwtfunicodeUTF8Conversionh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WTF/wtf/unicode/UTF8Conversion.h (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WTF/wtf/unicode/UTF8Conversion.h    2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WTF/wtf/unicode/UTF8Conversion.h       2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2007 Apple Inc.  All rights reserved.
</del><ins>+ * Copyright (C) 2007-2019 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -31,54 +31,28 @@
</span><span class="cx"> namespace WTF {
</span><span class="cx"> namespace Unicode {
</span><span class="cx"> 
</span><del>-    // Given a first byte, gives the length of the UTF-8 sequence it begins.
-    // Returns 0 for bytes that are not legal starts of UTF-8 sequences.
-    // Only allows sequences of up to 4 bytes, since that works for all Unicode characters (U-00000000 to U-0010FFFF).
-    WTF_EXPORT_PRIVATE int UTF8SequenceLength(char);
</del><ins>+enum ConversionResult {
+    ConversionOK, // conversion successful
+    SourceExhausted, // partial character in source, but hit end
+    TargetExhausted, // insufficient room in target for conversion
+    SourceIllegal // source sequence is illegal/malformed
+};
</ins><span class="cx"> 
</span><del>-    // Takes a null-terminated C-style string with a UTF-8 sequence in it and converts it to a character.
-    // Only allows Unicode characters (U-00000000 to U-0010FFFF).
-    // Returns -1 if the sequence is not valid (including presence of extra bytes).
-    WTF_EXPORT_PRIVATE int decodeUTF8Sequence(const char*);
</del><ins>+// Conversion functions are strict, except for convertUTF16ToUTF8, which takes
+// "strict" argument. When strict, both illegal sequences and unpaired surrogates
+// will cause an error. When not, illegal sequences and unpaired surrogates are
+// converted to the replacement character, except for an unpaired lead surrogate
+// at the end of the source, which will instead cause a SourceExhausted error.
</ins><span class="cx"> 
</span><del>-    typedef enum {
-            conversionOK,       // conversion successful
-            sourceExhausted,    // partial character in source, but hit end
-            targetExhausted,    // insuff. room in target for conversion
-            sourceIllegal       // source sequence is illegal/malformed
-    } ConversionResult;
</del><ins>+WTF_EXPORT_PRIVATE bool convertUTF8ToUTF16(const char* sourceStart, const char* sourceEnd, UChar** targetStart, UChar* targetEnd, bool* isSourceAllASCII = nullptr);
+WTF_EXPORT_PRIVATE bool convertLatin1ToUTF8(const LChar** sourceStart, const LChar* sourceEnd, char** targetStart, char* targetEnd);
+WTF_EXPORT_PRIVATE ConversionResult convertUTF16ToUTF8(const UChar** sourceStart, const UChar* sourceEnd, char** targetStart, char* targetEnd, bool strict = true);
</ins><span class="cx"> 
</span><del>-    // These conversion functions take a "strict" argument. When this
-    // flag is set to strict, both irregular sequences and isolated surrogates
-    // will cause an error.  When the flag is set to lenient, both irregular
-    // sequences and isolated surrogates are converted.
-    // 
-    // Whether the flag is strict or lenient, all illegal sequences will cause
-    // an error return. This includes sequences such as: <F4 90 80 80>, <C0 80>,
-    // or <A0> in UTF-8, and values above 0x10FFFF in UTF-32. Conformant code
-    // must check for illegal sequences.
-    // 
-    // When the flag is set to lenient, characters over 0x10FFFF are converted
-    // to the replacement character; otherwise (when the flag is set to strict)
-    // they constitute an error.
</del><ins>+WTF_EXPORT_PRIVATE unsigned calculateStringHashAndLengthFromUTF8MaskingTop8Bits(const char* data, const char* dataEnd, unsigned& dataLength, unsigned& utf16Length);
</ins><span class="cx"> 
</span><del>-    WTF_EXPORT_PRIVATE ConversionResult convertUTF8ToUTF16(
-                    const char** sourceStart, const char* sourceEnd, 
-                    UChar** targetStart, UChar* targetEnd, bool* isSourceAllASCII = 0, bool strict = true);
</del><ins>+// Callers of these functions must check that the lengths are the same; accordingly we omit an end argument for UTF-16 and Latin-1.
+bool equalUTF16WithUTF8(const UChar* stringInUTF16, const char* stringInUTF8, const char* stringInUTF8End);
+bool equalLatin1WithUTF8(const LChar* stringInLatin1, const char* stringInUTF8, const char* stringInUTF8End);
</ins><span class="cx"> 
</span><del>-    WTF_EXPORT_PRIVATE ConversionResult convertLatin1ToUTF8(
-                    const LChar** sourceStart, const LChar* sourceEnd, 
-                    char** targetStart, char* targetEnd);
-
-    WTF_EXPORT_PRIVATE ConversionResult convertUTF16ToUTF8(
-                    const UChar** sourceStart, const UChar* sourceEnd, 
-                    char** targetStart, char* targetEnd, bool strict = true);
-
-    WTF_EXPORT_PRIVATE unsigned calculateStringHashAndLengthFromUTF8MaskingTop8Bits(const char* data, const char* dataEnd, unsigned& dataLength, unsigned& utf16Length);
-
-    // The caller of these functions already knows that the lengths are the same, so we omit an end argument for UTF-16 and Latin-1.
-    bool equalUTF16WithUTF8(const UChar* stringInUTF16, const char* stringInUTF8, const char* stringInUTF8End);
-    bool equalLatin1WithUTF8(const LChar* stringInLatin1, const char* stringInUTF8, const char* stringInUTF8End);
-
</del><span class="cx"> } // namespace Unicode
</span><span class="cx"> } // namespace WTF
</span></span></pre></div>
<a id="trunkSourceWebCoreChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/ChangeLog (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/ChangeLog   2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WebCore/ChangeLog      2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1,3 +1,21 @@
</span><ins>+2019-04-29  Darin Adler  <darin@apple.com>
+
+        WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support
+        https://bugs.webkit.org/show_bug.cgi?id=195535
+
+        Reviewed by Alexey Proskuryakov.
+
+        * platform/SharedBuffer.cpp:
+        (WebCore::utf8Buffer): Removed unnecessary "strict" argument to convertUTF16ToUTF8 since
+        that is the default behavior. Also updated for changes to return values.
+
+        * xml/XSLTProcessorLibxslt.cpp:
+        (WebCore::writeToStringBuilder): Removed unnecessary use of StringBuffer for a temporary
+        buffer for characters. Rewrote to use U8_NEXT and U16_APPEND directly.
+
+        * xml/parser/XMLDocumentParserLibxml2.cpp:
+        (WebCore::convertUTF16EntityToUTF8): Updated for changes to CompletionResult.
+
</ins><span class="cx"> 2019-04-30  John Wilander  <wilander@apple.com>
</span><span class="cx"> 
</span><span class="cx">         Add logging of Ad Click Attribution errors and events to a dedicated channel
</span></span></pre></div>
<a id="trunkSourceWebCoreplatformSharedBuffercpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/platform/SharedBuffer.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/platform/SharedBuffer.cpp   2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WebCore/platform/SharedBuffer.cpp      2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -334,17 +334,16 @@
</span><span class="cx"> 
</span><span class="cx">     // Convert to runs of 8-bit characters.
</span><span class="cx">     char* p = buffer.data();
</span><del>-    WTF::Unicode::ConversionResult result;
</del><span class="cx">     if (length) {
</span><span class="cx">         if (string.is8Bit()) {
</span><span class="cx">             const LChar* d = string.characters8();
</span><del>-            result = WTF::Unicode::convertLatin1ToUTF8(&d, d + length, &p, p + buffer.size());
</del><ins>+            if (!WTF::Unicode::convertLatin1ToUTF8(&d, d + length, &p, p + buffer.size()))
+                return nullptr;
</ins><span class="cx">         } else {
</span><span class="cx">             const UChar* d = string.characters16();
</span><del>-            result = WTF::Unicode::convertUTF16ToUTF8(&d, d + length, &p, p + buffer.size(), true);
</del><ins>+            if (WTF::Unicode::convertUTF16ToUTF8(&d, d + length, &p, p + buffer.size()) != WTF::Unicode::ConversionOK)
+                return nullptr;
</ins><span class="cx">         }
</span><del>-        if (result != WTF::Unicode::conversionOK)
-            return nullptr;
</del><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     buffer.shrink(p - buffer.data());
</span></span></pre></div>
<a id="trunkSourceWebCorexmlXSLTProcessorLibxsltcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/xml/XSLTProcessorLibxslt.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/xml/XSLTProcessorLibxslt.cpp        2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WebCore/xml/XSLTProcessorLibxslt.cpp   2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -48,8 +48,6 @@
</span><span class="cx"> #include <libxslt/xslt.h>
</span><span class="cx"> #include <libxslt/xsltutils.h>
</span><span class="cx"> #include <wtf/Assertions.h>
</span><del>-#include <wtf/text/StringBuffer.h>
-#include <wtf/unicode/UTF8Conversion.h>
</del><span class="cx"> 
</span><span class="cx"> #if OS(DARWIN) && !PLATFORM(GTK)
</span><span class="cx"> #include "SoftLinkLibxslt.h"
</span><span class="lines">@@ -159,27 +157,41 @@
</span><span class="cx">     globalCachedResourceLoader = cachedResourceLoader;
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-static int writeToStringBuilder(void* context, const char* buffer, int len)
</del><ins>+static int writeToStringBuilder(void* context, const char* buffer, int length)
</ins><span class="cx"> {
</span><span class="cx">     StringBuilder& resultOutput = *static_cast<StringBuilder*>(context);
</span><span class="cx"> 
</span><del>-    if (!len)
-        return 0;
</del><ins>+    // FIXME: Consider ways to make this more efficient by moving it into a
+    // StringBuilder::appendUTF8 function, and then optimizing to not need a
+    // Vector<UChar> and possibly optimize cases that can produce 8-bit Latin-1
+    // strings, but that would need to be sophisticated about not processing
+    // trailing incomplete sequences and communicating that to the caller.
</ins><span class="cx"> 
</span><del>-    StringBuffer<UChar> stringBuffer(len);
-    UChar* bufferUChar = stringBuffer.characters();
-    UChar* bufferUCharEnd = bufferUChar + len;
</del><ins>+    Vector<UChar> outputBuffer(length);
</ins><span class="cx"> 
</span><del>-    const char* stringCurrent = buffer;
-    WTF::Unicode::ConversionResult result = WTF::Unicode::convertUTF8ToUTF16(&stringCurrent, buffer + len, &bufferUChar, bufferUCharEnd);
-    if (result != WTF::Unicode::conversionOK && result != WTF::Unicode::sourceExhausted) {
-        ASSERT_NOT_REACHED();
-        return -1;
</del><ins>+    UBool error = false;
+    int inputOffset = 0;
+    int outputOffset = 0;
+    while (inputOffset < length) {
+        UChar32 character;
+        int nextInputOffset = inputOffset;
+        U8_NEXT(reinterpret_cast<const uint8_t*>(buffer), nextInputOffset, length, character);
+        if (character < 0) {
+            if (nextInputOffset == length)
+                break;
+            ASSERT_NOT_REACHED();
+            return -1;
+        }
+        inputOffset = nextInputOffset;
+        U16_APPEND(outputBuffer.data(), outputOffset, length, character, error);
+        if (error) {
+            ASSERT_NOT_REACHED();
+            return -1;
+        }
</ins><span class="cx">     }
</span><span class="cx"> 
</span><del>-    int utf16Length = bufferUChar - stringBuffer.characters();
-    resultOutput.append(stringBuffer.characters(), utf16Length);
-    return stringCurrent - buffer;
</del><ins>+    resultOutput.append(outputBuffer.data(), outputOffset);
+    return inputOffset;
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> static bool saveResultToString(xmlDocPtr resultDoc, xsltStylesheetPtr sheet, String& resultString)
</span></span></pre></div>
<a id="trunkSourceWebCorexmlparserXMLDocumentParserLibxml2cpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp     2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp        2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1153,8 +1153,8 @@
</span><span class="cx"> static size_t convertUTF16EntityToUTF8(const UChar* utf16Entity, size_t numberOfCodeUnits, char* target, size_t targetSize)
</span><span class="cx"> {
</span><span class="cx">     const char* originalTarget = target;
</span><del>-    auto conversionResult = WTF::Unicode::convertUTF16ToUTF8(&utf16Entity, utf16Entity + numberOfCodeUnits, &target, target + targetSize);
-    if (conversionResult != WTF::Unicode::conversionOK)
</del><ins>+    WTF::Unicode::ConversionResult conversionResult = WTF::Unicode::convertUTF16ToUTF8(&utf16Entity, utf16Entity + numberOfCodeUnits, &target, target + targetSize);
+    if (conversionResult != WTF::Unicode::ConversionOK)
</ins><span class="cx">         return 0;
</span><span class="cx"> 
</span><span class="cx">     // Even though we must pass the length, libxml expects the entity string to be null terminated.
</span></span></pre></div>
<a id="trunkSourceWebKitChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebKit/ChangeLog (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebKit/ChangeLog    2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WebKit/ChangeLog       2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -1,3 +1,15 @@
</span><ins>+2019-04-29  Darin Adler  <darin@apple.com>
+
+        WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support
+        https://bugs.webkit.org/show_bug.cgi?id=195535
+
+        Reviewed by Alexey Proskuryakov.
+
+        * Shared/API/APIString.h: Removed uneeded includes and also switched to #pragma once.
+
+        * Shared/API/c/WKString.cpp: Moved include of UTF8Conversion.h here.
+        (WKStringGetUTF8CStringImpl): Updated for changes to return values.
+
</ins><span class="cx"> 2019-04-30  Chris Dumez  <cdumez@apple.com>
</span><span class="cx"> 
</span><span class="cx">         Regression(PSON) URL scheme handlers can no longer respond asynchronously
</span></span></pre></div>
<a id="trunkSourceWebKitSharedAPIAPIStringh"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebKit/Shared/API/APIString.h (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebKit/Shared/API/APIString.h       2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WebKit/Shared/API/APIString.h  2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -23,14 +23,10 @@
</span><span class="cx">  * THE POSSIBILITY OF SUCH DAMAGE.
</span><span class="cx">  */
</span><span class="cx"> 
</span><del>-#ifndef APIString_h
-#define APIString_h
</del><ins>+#pragma once
</ins><span class="cx"> 
</span><span class="cx"> #include "APIObject.h"
</span><del>-#include <wtf/Ref.h>
</del><span class="cx"> #include <wtf/text/StringView.h>
</span><del>-#include <wtf/text/WTFString.h>
-#include <wtf/unicode/UTF8Conversion.h>
</del><span class="cx"> 
</span><span class="cx"> namespace API {
</span><span class="cx"> 
</span><span class="lines">@@ -75,5 +71,3 @@
</span><span class="cx"> };
</span><span class="cx"> 
</span><span class="cx"> } // namespace WebKit
</span><del>-
-#endif // APIString_h
</del></span></pre></div>
<a id="trunkSourceWebKitSharedAPIcWKStringcpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebKit/Shared/API/c/WKString.cpp (244820 => 244821)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebKit/Shared/API/c/WKString.cpp    2019-05-01 06:30:51 UTC (rev 244820)
+++ trunk/Source/WebKit/Shared/API/c/WKString.cpp       2019-05-01 15:52:16 UTC (rev 244821)
</span><span class="lines">@@ -30,6 +30,7 @@
</span><span class="cx"> #include "WKAPICast.h"
</span><span class="cx"> #include <JavaScriptCore/InitializeThreading.h>
</span><span class="cx"> #include <JavaScriptCore/OpaqueJSString.h>
</span><ins>+#include <wtf/unicode/UTF8Conversion.h>
</ins><span class="cx"> 
</span><span class="cx"> WKTypeID WKStringGetTypeID()
</span><span class="cx"> {
</span><span class="lines">@@ -78,19 +79,18 @@
</span><span class="cx">     auto stringView = WebKit::toImpl(stringRef)->stringView();
</span><span class="cx"> 
</span><span class="cx">     char* p = buffer;
</span><del>-    WTF::Unicode::ConversionResult result;
</del><span class="cx"> 
</span><span class="cx">     if (stringView.is8Bit()) {
</span><span class="cx">         const LChar* characters = stringView.characters8();
</span><del>-        result = WTF::Unicode::convertLatin1ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1);
</del><ins>+        if (!WTF::Unicode::convertLatin1ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1))
+            return 0;
</ins><span class="cx">     } else {
</span><span class="cx">         const UChar* characters = stringView.characters16();
</span><del>-        result = WTF::Unicode::convertUTF16ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1, strict);
</del><ins>+        WTF::Unicode::ConversionResult result = WTF::Unicode::convertUTF16ToUTF8(&characters, characters + stringView.length(), &p, p + bufferSize - 1, strict);
+        if (result != WTF::Unicode::ConversionOK && result != WTF::Unicode::TargetExhausted)
+            return 0;
</ins><span class="cx">     }
</span><span class="cx"> 
</span><del>-    if (result != WTF::Unicode::conversionOK && result != WTF::Unicode::targetExhausted)
-        return 0;
-
</del><span class="cx">     *p++ = '\0';
</span><span class="cx">     return p - buffer;
</span><span class="cx"> }
</span></span></pre>
</div>
</div>

</body>
</html>