<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[245455] releases/WebKitGTK/webkit-2.24/Source</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/245455">245455</a></dd>
<dt>Author</dt> <dd>carlosgc@webkit.org</dd>
<dt>Date</dt> <dd>2019-05-17 04:25:29 -0700 (Fri, 17 May 2019)</dd>
</dl>

<h3>Log Message</h3>
<pre>Merge <a href="http://trac.webkit.org/projects/webkit/changeset/243049">r243049</a> - Improve normalization code, including moving from unorm.h to unorm2.h
https://bugs.webkit.org/show_bug.cgi?id=195330

Reviewed by Michael Catanzaro.

Source/JavaScriptCore:

* runtime/JSString.h: Move StringViewWithUnderlyingString to StringView.h.

* runtime/StringPrototype.cpp: Include unorm2.h instead of unorm.h.
(JSC::normalizer): Added. Function to create normalizer object given
enumeration value indicating which is selected. Simplified because we
know the function will not fail and so we don't need error handling code.
(JSC::normalize): Changed this function to take a JSString* so we can
optimize the case where no normalization is needed. Added an early exit
if the string is stored as 8-bit and another if the string is already
normalized, using unorm2_isNormalized. Changed error handling to only
check cases that can actually fail in practice. Also did other small
optimizations like passing VM rather than ExecState.
(JSC::stringProtoFuncNormalize): Used smaller enumeration names that are
identical to the names used in the API and normalization parlance rather
than longer ones that expand the acronyms. Updated to pass JSString* to
the normalize function, so we can optimize 8-bit and already-normalized
cases, rather than callling the expensive String::upconvertedCharacters
function. Use throwVMRangeError.

Source/WebCore:

* editing/TextIterator.cpp: Include unorm2.h.
(WebCore::normalizeCharacters): Rewrote to use unorm2_normalize rather than
unorm_normalize, but left the logic otherwise the same.

* platform/graphics/SurrogatePairAwareTextIterator.cpp: Include unorm2.h.
(WebCore::SurrogatePairAwareTextIterator::normalizeVoicingMarks):
Use unorm2_composePair instead of unorm_normalize.

* platform/graphics/cairo/FontCairoHarfbuzzNG.cpp:
(characterSequenceIsEmoji): Changed to use existing SurrogatePairAwareTextIterator.
(FontCascade::fontForCombiningCharacterSequence): Use normalizedNFC instead of
calling unorm2_normalize directly.

* WebCore/platform/graphics/freetype/SimpleFontDataFreeType.cpp:
Removed unneeded include of <unicode/normlzr.h>.

* platform/text/TextEncoding.cpp:
(WebCore::TextEncoding::encode const): Use normalizedNFC instead of the
code that was here. The normalizedNFC function is better in multiple ways,
but primarily it handles 8-bit strings and other already-normalized
strings much more efficiently.

Source/WTF:

* wtf/URLHelpers.cpp: Removed unneeded include of unorm.h since the
normalization code is now in StringView.cpp.
(WTF::URLHelpers::escapeUnsafeCharacters): Renamed from
createStringWithEscapedUnsafeCharacters since it now only creates
a new string if one is needed. Use unsigned for string lengths, since
that's what WTF::String uses, not size_t. Added a first loop so that
we can return the string unmodified if no lookalike characters are
found. Removed unnecessary round trip from UTF-16 and then back in
the case where the character is not a lookalike.
(WTF::URLHelpers::toNormalizationFormC): Deleted. Moved this logic
into the WTF::normalizedNFC function in StringView.cpp.
(WTF::URLHelpers::userVisibleURL): Call escapeUnsafeCharacters and
normalizedNFC. The normalizedNFC function is better in multiple ways,
but primarily it handles 8-bit strings and other already-normalized
strings much more efficiently.

* wtf/text/StringView.cpp:
(WTF::normalizedNFC): Added. This has two overloads. One is for when
we already have a String, and want to re-use it if no normalization
is needed, and another is when we only have a StringView, and may need
to allocate a String to hold the result. Includes a fast special case
for 8-bit and already-normalized strings, and uses the same strategy
that JSC::normalize was already using: calls unorm2_normalize twice,
first just to determine the length.

* wtf/text/StringView.h: Added normalizedNFC, which can be called with
either a StringView or a String. Also moved StringViewWithUnderlyingString
here from JSString.h, here for use as the return value of normalizedNFC;
it is used for a similar purpose in the JavaScriptCore rope implementation.
Also removed an inaccurate comment.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#releasesWebKitGTKwebkit224SourceJavaScriptCoreChangeLog">releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/ChangeLog</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceJavaScriptCoreruntimeJSStringh">releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/runtime/JSString.h</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceJavaScriptCoreruntimeStringPrototypecpp">releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/runtime/StringPrototype.cpp</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceWTFChangeLog">releases/WebKitGTK/webkit-2.24/Source/WTF/ChangeLog</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceWTFwtfURLHelperscpp">releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/URLHelpers.cpp</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceWTFwtftextStringViewcpp">releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/text/StringView.cpp</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceWTFwtftextStringViewh">releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/text/StringView.h</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceWebCoreChangeLog">releases/WebKitGTK/webkit-2.24/Source/WebCore/ChangeLog</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceWebCoreeditingTextIteratorcpp">releases/WebKitGTK/webkit-2.24/Source/WebCore/editing/TextIterator.cpp</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceWebCoreplatformgraphicsSurrogatePairAwareTextIteratorcpp">releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/SurrogatePairAwareTextIterator.cpp</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceWebCoreplatformgraphicscairoFontCairoHarfbuzzNGcpp">releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/cairo/FontCairoHarfbuzzNG.cpp</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceWebCoreplatformgraphicsfreetypeSimpleFontDataFreeTypecpp">releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/freetype/SimpleFontDataFreeType.cpp</a></li>
<li><a href="#releasesWebKitGTKwebkit224SourceWebCoreplatformtextTextEncodingcpp">releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/text/TextEncoding.cpp</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="releasesWebKitGTKwebkit224SourceJavaScriptCoreChangeLog"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/ChangeLog (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/ChangeLog   2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/ChangeLog      2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -1,3 +1,29 @@
</span><ins>+2019-03-16  Darin Adler  <darin@apple.com>
+
+        Improve normalization code, including moving from unorm.h to unorm2.h
+        https://bugs.webkit.org/show_bug.cgi?id=195330
+
+        Reviewed by Michael Catanzaro.
+
+        * runtime/JSString.h: Move StringViewWithUnderlyingString to StringView.h.
+
+        * runtime/StringPrototype.cpp: Include unorm2.h instead of unorm.h.
+        (JSC::normalizer): Added. Function to create normalizer object given
+        enumeration value indicating which is selected. Simplified because we
+        know the function will not fail and so we don't need error handling code.
+        (JSC::normalize): Changed this function to take a JSString* so we can
+        optimize the case where no normalization is needed. Added an early exit
+        if the string is stored as 8-bit and another if the string is already
+        normalized, using unorm2_isNormalized. Changed error handling to only
+        check cases that can actually fail in practice. Also did other small
+        optimizations like passing VM rather than ExecState.
+        (JSC::stringProtoFuncNormalize): Used smaller enumeration names that are
+        identical to the names used in the API and normalization parlance rather
+        than longer ones that expand the acronyms. Updated to pass JSString* to
+        the normalize function, so we can optimize 8-bit and already-normalized
+        cases, rather than callling the expensive String::upconvertedCharacters
+        function. Use throwVMRangeError.
+
</ins><span class="cx"> 2019-05-07  Yusuke Suzuki  <ysuzuki@apple.com>
</span><span class="cx"> 
</span><span class="cx">         [JSC] DFG_ASSERT failed in lowInt52
</span></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceJavaScriptCoreruntimeJSStringh"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/runtime/JSString.h (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/runtime/JSString.h  2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/runtime/JSString.h     2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -67,12 +67,6 @@
</span><span class="cx"> bool isJSString(JSValue);
</span><span class="cx"> JSString* asString(JSValue);
</span><span class="cx"> 
</span><del>-struct StringViewWithUnderlyingString {
-    StringView view;
-    String underlyingString;
-};
-
-
</del><span class="cx"> // In 64bit architecture, JSString and JSRopeString have the following memory layout to make sizeof(JSString) == 16 and sizeof(JSRopeString) == 32.
</span><span class="cx"> // JSString has only one pointer. We use it for String. length() and is8Bit() queries go to StringImpl. In JSRopeString, we reuse the above pointer
</span><span class="cx"> // place for the 1st fiber. JSRopeString has three fibers so its size is 48. To keep length and is8Bit flag information in JSRopeString, JSRopeString
</span></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceJavaScriptCoreruntimeStringPrototypecpp"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/runtime/StringPrototype.cpp (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/runtime/StringPrototype.cpp 2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/JavaScriptCore/runtime/StringPrototype.cpp    2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -49,7 +49,7 @@
</span><span class="cx"> #include "SuperSampler.h"
</span><span class="cx"> #include <algorithm>
</span><span class="cx"> #include <unicode/uconfig.h>
</span><del>-#include <unicode/unorm.h>
</del><ins>+#include <unicode/unorm2.h>
</ins><span class="cx"> #include <unicode/ustring.h>
</span><span class="cx"> #include <wtf/ASCIICType.h>
</span><span class="cx"> #include <wtf/MathExtras.h>
</span><span class="lines">@@ -1805,58 +1805,84 @@
</span><span class="cx">     return JSValue::encode(JSStringIterator::create(exec, exec->jsCallee()->globalObject(vm)->stringIteratorStructure(), string));
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-enum class NormalizationForm {
-    CanonicalComposition,
-    CanonicalDecomposition,
-    CompatibilityComposition,
-    CompatibilityDecomposition
-};
</del><ins>+enum class NormalizationForm { NFC, NFD, NFKC, NFKD };
</ins><span class="cx"> 
</span><del>-static JSValue normalize(ExecState* exec, const UChar* source, size_t sourceLength, NormalizationForm form)
</del><ins>+static constexpr bool normalizationAffects8Bit(NormalizationForm form)
</ins><span class="cx"> {
</span><del>-    VM& vm = exec->vm();
-    auto scope = DECLARE_THROW_SCOPE(vm);
</del><ins>+    switch (form) {
+    case NormalizationForm::NFC:
+        return false;
+    case NormalizationForm::NFD:
+        return true;
+    case NormalizationForm::NFKC:
+        return false;
+    case NormalizationForm::NFKD:
+        return true;
+    }
+    ASSERT_NOT_REACHED();
+    return true;
+}
</ins><span class="cx"> 
</span><ins>+static const UNormalizer2* normalizer(NormalizationForm form)
+{
</ins><span class="cx">     UErrorCode status = U_ZERO_ERROR;
</span><del>-    // unorm2_get*Instance() documentation says: "Returns an unmodifiable singleton instance. Do not delete it."
</del><span class="cx">     const UNormalizer2* normalizer = nullptr;
</span><span class="cx">     switch (form) {
</span><del>-    case NormalizationForm::CanonicalComposition:
</del><ins>+    case NormalizationForm::NFC:
</ins><span class="cx">         normalizer = unorm2_getNFCInstance(&status);
</span><span class="cx">         break;
</span><del>-    case NormalizationForm::CanonicalDecomposition:
</del><ins>+    case NormalizationForm::NFD:
</ins><span class="cx">         normalizer = unorm2_getNFDInstance(&status);
</span><span class="cx">         break;
</span><del>-    case NormalizationForm::CompatibilityComposition:
</del><ins>+    case NormalizationForm::NFKC:
</ins><span class="cx">         normalizer = unorm2_getNFKCInstance(&status);
</span><span class="cx">         break;
</span><del>-    case NormalizationForm::CompatibilityDecomposition:
</del><ins>+    case NormalizationForm::NFKD:
</ins><span class="cx">         normalizer = unorm2_getNFKDInstance(&status);
</span><span class="cx">         break;
</span><span class="cx">     }
</span><ins>+    ASSERT(normalizer);
+    ASSERT(U_SUCCESS(status));
+    return normalizer;
+}
</ins><span class="cx"> 
</span><del>-    if (!normalizer || U_FAILURE(status))
-        return throwTypeError(exec, scope);
</del><ins>+static JSValue normalize(ExecState* exec, JSString* string, NormalizationForm form)
+{
+    VM& vm = exec->vm();
+    auto scope = DECLARE_THROW_SCOPE(vm);
</ins><span class="cx"> 
</span><del>-    int32_t normalizedStringLength = unorm2_normalize(normalizer, source, sourceLength, nullptr, 0, &status);
</del><ins>+    auto viewWithString = string->viewWithUnderlyingString(exec);
+    RETURN_IF_EXCEPTION(scope, { });
</ins><span class="cx"> 
</span><del>-    if (U_FAILURE(status) && status != U_BUFFER_OVERFLOW_ERROR) {
-        // The behavior is not specified when normalize fails.
-        // Now we throw a type error since it seems that the contents of the string are invalid.
-        return throwTypeError(exec, scope);
-    }
</del><ins>+    StringView view = viewWithString.view;
+    if (view.is8Bit() && (!normalizationAffects8Bit(form) || charactersAreAllASCII(view.characters8(), view.length())))
+        RELEASE_AND_RETURN(scope, string);
</ins><span class="cx"> 
</span><del>-    UChar* buffer = nullptr;
-    auto impl = StringImpl::tryCreateUninitialized(normalizedStringLength, buffer);
-    if (!impl)
</del><ins>+    const UNormalizer2* normalizer = JSC::normalizer(form);
+
+    // Since ICU does not offer functions that can perform normalization or check for
+    // normalization with input that is Latin-1, we need to upconvert to UTF-16 at this point.
+    auto characters = view.upconvertedCharacters();
+
+    UErrorCode status = U_ZERO_ERROR;
+    UBool isNormalized = unorm2_isNormalized(normalizer, characters, view.length(), &status);
+    ASSERT(U_SUCCESS(status));
+    if (isNormalized)
+        RELEASE_AND_RETURN(scope, string);
+
+    int32_t normalizedStringLength = unorm2_normalize(normalizer, characters, view.length(), nullptr, 0, &status);
+    ASSERT(status == U_BUFFER_OVERFLOW_ERROR);
+
+    UChar* buffer;
+    auto result = StringImpl::tryCreateUninitialized(normalizedStringLength, buffer);
+    if (!result)
</ins><span class="cx">         return throwOutOfMemoryError(exec, scope);
</span><span class="cx"> 
</span><span class="cx">     status = U_ZERO_ERROR;
</span><del>-    unorm2_normalize(normalizer, source, sourceLength, buffer, normalizedStringLength, &status);
-    if (U_FAILURE(status))
-        return throwTypeError(exec, scope);
</del><ins>+    unorm2_normalize(normalizer, characters, view.length(), buffer, normalizedStringLength, &status);
+    ASSERT(U_SUCCESS(status));
</ins><span class="cx"> 
</span><del>-    RELEASE_AND_RETURN(scope, jsString(exec, WTFMove(impl)));
</del><ins>+    RELEASE_AND_RETURN(scope, jsString(&vm, WTFMove(result)));
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> EncodedJSValue JSC_HOST_CALL stringProtoFuncNormalize(ExecState* exec)
</span><span class="lines">@@ -1867,29 +1893,28 @@
</span><span class="cx">     JSValue thisValue = exec->thisValue();
</span><span class="cx">     if (!checkObjectCoercible(thisValue))
</span><span class="cx">         return throwVMTypeError(exec, scope);
</span><del>-    auto viewWithString = thisValue.toString(exec)->viewWithUnderlyingString(exec);
-    RETURN_IF_EXCEPTION(scope, encodedJSValue());
-    StringView view = viewWithString.view;
</del><ins>+    JSString* string = thisValue.toString(exec);
+    RETURN_IF_EXCEPTION(scope, { });
</ins><span class="cx"> 
</span><del>-    NormalizationForm form = NormalizationForm::CanonicalComposition;
-    // Verify that the argument is provided and is not undefined.
-    if (!exec->argument(0).isUndefined()) {
-        String formString = exec->uncheckedArgument(0).toWTFString(exec);
-        RETURN_IF_EXCEPTION(scope, encodedJSValue());
</del><ins>+    auto form = NormalizationForm::NFC;
+    JSValue formValue = exec->argument(0);
+    if (!formValue.isUndefined()) {
+        String formString = formValue.toWTFString(exec);
+        RETURN_IF_EXCEPTION(scope, { });
</ins><span class="cx"> 
</span><span class="cx">         if (formString == "NFC")
</span><del>-            form = NormalizationForm::CanonicalComposition;
</del><ins>+            form = NormalizationForm::NFC;
</ins><span class="cx">         else if (formString == "NFD")
</span><del>-            form = NormalizationForm::CanonicalDecomposition;
</del><ins>+            form = NormalizationForm::NFD;
</ins><span class="cx">         else if (formString == "NFKC")
</span><del>-            form = NormalizationForm::CompatibilityComposition;
</del><ins>+            form = NormalizationForm::NFKC;
</ins><span class="cx">         else if (formString == "NFKD")
</span><del>-            form = NormalizationForm::CompatibilityDecomposition;
</del><ins>+            form = NormalizationForm::NFKD;
</ins><span class="cx">         else
</span><del>-            return throwVMError(exec, scope, createRangeError(exec, "argument does not match any normalization form"_s));
</del><ins>+            return throwVMRangeError(exec, scope, "argument does not match any normalization form"_s);
</ins><span class="cx">     }
</span><span class="cx"> 
</span><del>-    RELEASE_AND_RETURN(scope, JSValue::encode(normalize(exec, view.upconvertedCharacters(), view.length(), form)));
</del><ins>+    RELEASE_AND_RETURN(scope, JSValue::encode(normalize(exec, string, form)));
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> } // namespace JSC
</span></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceWTFChangeLog"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/WTF/ChangeLog (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/WTF/ChangeLog      2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/WTF/ChangeLog 2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -1,3 +1,41 @@
</span><ins>+2019-03-16  Darin Adler  <darin@apple.com>
+
+        Improve normalization code, including moving from unorm.h to unorm2.h
+        https://bugs.webkit.org/show_bug.cgi?id=195330
+
+        Reviewed by Michael Catanzaro.
+
+        * wtf/URLHelpers.cpp: Removed unneeded include of unorm.h since the
+        normalization code is now in StringView.cpp.
+        (WTF::URLHelpers::escapeUnsafeCharacters): Renamed from
+        createStringWithEscapedUnsafeCharacters since it now only creates
+        a new string if one is needed. Use unsigned for string lengths, since
+        that's what WTF::String uses, not size_t. Added a first loop so that
+        we can return the string unmodified if no lookalike characters are
+        found. Removed unnecessary round trip from UTF-16 and then back in
+        the case where the character is not a lookalike.
+        (WTF::URLHelpers::toNormalizationFormC): Deleted. Moved this logic
+        into the WTF::normalizedNFC function in StringView.cpp.
+        (WTF::URLHelpers::userVisibleURL): Call escapeUnsafeCharacters and
+        normalizedNFC. The normalizedNFC function is better in multiple ways,
+        but primarily it handles 8-bit strings and other already-normalized
+        strings much more efficiently.
+
+        * wtf/text/StringView.cpp:
+        (WTF::normalizedNFC): Added. This has two overloads. One is for when
+        we already have a String, and want to re-use it if no normalization
+        is needed, and another is when we only have a StringView, and may need
+        to allocate a String to hold the result. Includes a fast special case
+        for 8-bit and already-normalized strings, and uses the same strategy
+        that JSC::normalize was already using: calls unorm2_normalize twice,
+        first just to determine the length.
+
+        * wtf/text/StringView.h: Added normalizedNFC, which can be called with
+        either a StringView or a String. Also moved StringViewWithUnderlyingString
+        here from JSString.h, here for use as the return value of normalizedNFC;
+        it is used for a similar purpose in the JavaScriptCore rope implementation.
+        Also removed an inaccurate comment.
+
</ins><span class="cx"> 2019-05-07  Brent Fulgham  <bfulgham@apple.com>
</span><span class="cx"> 
</span><span class="cx">         Correct JSON parser to address unterminated escape character
</span></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceWTFwtfURLHelperscpp"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/URLHelpers.cpp (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/URLHelpers.cpp     2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/URLHelpers.cpp        2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2005, 2007, 2014 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2005-2019 Apple Inc. All rights reserved.
</ins><span class="cx">  * Copyright (C) 2018 Igalia S.L.
</span><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="lines">@@ -33,7 +33,6 @@
</span><span class="cx"> #include "URLParser.h"
</span><span class="cx"> #include <mutex>
</span><span class="cx"> #include <unicode/uidna.h>
</span><del>-#include <unicode/unorm.h>
</del><span class="cx"> #include <unicode/uscript.h>
</span><span class="cx"> 
</span><span class="cx"> namespace WTF {
</span><span class="lines">@@ -735,17 +734,35 @@
</span><span class="cx">     return result;
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-static String createStringWithEscapedUnsafeCharacters(const String& sourceBuffer)
</del><ins>+static String escapeUnsafeCharacters(const String& sourceBuffer)
</ins><span class="cx"> {
</span><ins>+    unsigned length = sourceBuffer.length();
+
+    Optional<UChar32> previousCodePoint;
+
+    unsigned i;
+    for (i = 0; i < length; ) {
+        UChar32 c = sourceBuffer.characterStartingAt(i);
+        if (isLookalikeCharacter(previousCodePoint, sourceBuffer.characterStartingAt(i)))
+            break;
+        previousCodePoint = c;
+        i += U16_LENGTH(c);
+    }
+
+    if (i == length)
+        return sourceBuffer;
+
</ins><span class="cx">     Vector<UChar, urlBytesBufferLength> outBuffer;
</span><span class="cx"> 
</span><del>-    const size_t length = sourceBuffer.length();
</del><ins>+    outBuffer.grow(i);
+    if (sourceBuffer.is8Bit())
+        StringImpl::copyCharacters(outBuffer.data(), sourceBuffer.characters8(), i);
+    else
+        StringImpl::copyCharacters(outBuffer.data(), sourceBuffer.characters16(), i);
</ins><span class="cx"> 
</span><del>-    Optional<UChar32> previousCodePoint;
-    size_t i = 0;
-    while (i < length) {
</del><ins>+    for (; i < length; ) {
</ins><span class="cx">         UChar32 c = sourceBuffer.characterStartingAt(i);
</span><del>-
</del><ins>+        unsigned characterLength = U16_LENGTH(c);
</ins><span class="cx">         if (isLookalikeCharacter(previousCodePoint, c)) {
</span><span class="cx">             uint8_t utf8Buffer[4];
</span><span class="cx">             size_t offset = 0;
</span><span class="lines">@@ -752,7 +769,7 @@
</span><span class="cx">             UBool failure = false;
</span><span class="cx">             U8_APPEND(utf8Buffer, offset, 4, c, failure)
</span><span class="cx">             ASSERT(!failure);
</span><del>-            
</del><ins>+
</ins><span class="cx">             for (size_t j = 0; j < offset; ++j) {
</span><span class="cx">                 outBuffer.append('%');
</span><span class="cx">                 outBuffer.append(upperNibbleToASCIIHexDigit(utf8Buffer[j]));
</span><span class="lines">@@ -759,52 +776,16 @@
</span><span class="cx">                 outBuffer.append(lowerNibbleToASCIIHexDigit(utf8Buffer[j]));
</span><span class="cx">             }
</span><span class="cx">         } else {
</span><del>-            UChar utf16Buffer[2];
-            size_t offset = 0;
-            UBool failure = false;
-            U16_APPEND(utf16Buffer, offset, 2, c, failure)
-            ASSERT(!failure);
-            for (size_t j = 0; j < offset; ++j)
-                outBuffer.append(utf16Buffer[j]);
</del><ins>+            for (unsigned j = 0; j < characterLength; ++j)
+                outBuffer.append(sourceBuffer[i + j]);
</ins><span class="cx">         }
</span><span class="cx">         previousCodePoint = c;
</span><del>-        i += U16_LENGTH(c);
</del><ins>+        i += characterLength;
</ins><span class="cx">     }
</span><ins>+
</ins><span class="cx">     return String::adopt(WTFMove(outBuffer));
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-static String toNormalizationFormC(const String& string)
-{
-    Vector<UChar> sourceBuffer = string.charactersWithNullTermination();
-    ASSERT(sourceBuffer.last() == '\0');
-    sourceBuffer.removeLast();
-
-    UErrorCode uerror = U_ZERO_ERROR;
-    const UNormalizer2* normalizer = unorm2_getNFCInstance(&uerror);
-    if (U_FAILURE(uerror))
-        return { };
-
-    UNormalizationCheckResult checkResult = unorm2_quickCheck(normalizer, sourceBuffer.data(), sourceBuffer.size(), &uerror);
-    if (U_FAILURE(uerror))
-        return { };
-
-    // No need to normalize if already normalized.
-    if (checkResult == UNORM_YES)
-        return string;
-
-    Vector<UChar, urlBytesBufferLength> normalizedCharacters(sourceBuffer.size());
-    auto normalizedLength = unorm2_normalize(normalizer, sourceBuffer.data(), sourceBuffer.size(), normalizedCharacters.data(), normalizedCharacters.size(), &uerror);
-    if (uerror == U_BUFFER_OVERFLOW_ERROR) {
-        uerror = U_ZERO_ERROR;
-        normalizedCharacters.resize(normalizedLength);
-        normalizedLength = unorm2_normalize(normalizer, sourceBuffer.data(), sourceBuffer.size(), normalizedCharacters.data(), normalizedLength, &uerror);
-    }
-    if (U_FAILURE(uerror))
-        return { };
-
-    return String(normalizedCharacters.data(), normalizedLength);
-}
-
</del><span class="cx"> String userVisibleURL(const CString& url)
</span><span class="cx"> {
</span><span class="cx">     auto* before = reinterpret_cast<const unsigned char*>(url.data());
</span><span class="lines">@@ -889,8 +870,7 @@
</span><span class="cx">             result = mappedResult;
</span><span class="cx">     }
</span><span class="cx"> 
</span><del>-    auto normalized = toNormalizationFormC(result);
-    return createStringWithEscapedUnsafeCharacters(normalized);
</del><ins>+    return escapeUnsafeCharacters(normalizedNFC(result));
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> } // namespace URLHelpers
</span></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceWTFwtftextStringViewcpp"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/text/StringView.cpp (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/text/StringView.cpp        2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/text/StringView.cpp   2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -1,6 +1,6 @@
</span><span class="cx"> /*
</span><span class="cx"> 
</span><del>-Copyright (C) 2014-2017 Apple Inc. All rights reserved.
</del><ins>+Copyright (C) 2014-2019 Apple Inc. All rights reserved.
</ins><span class="cx"> 
</span><span class="cx"> Redistribution and use in source and binary forms, with or without
</span><span class="cx"> modification, are permitted provided that the following conditions
</span><span class="lines">@@ -29,6 +29,7 @@
</span><span class="cx"> 
</span><span class="cx"> #include <mutex>
</span><span class="cx"> #include <unicode/ubrk.h>
</span><ins>+#include <unicode/unorm2.h>
</ins><span class="cx"> #include <wtf/HashMap.h>
</span><span class="cx"> #include <wtf/Lock.h>
</span><span class="cx"> #include <wtf/NeverDestroyed.h>
</span><span class="lines">@@ -240,6 +241,43 @@
</span><span class="cx">     return convertASCIICase<ASCIICase::Upper>(static_cast<const UChar*>(m_characters), m_length);
</span><span class="cx"> }
</span><span class="cx"> 
</span><ins>+StringViewWithUnderlyingString normalizedNFC(StringView string)
+{
+    // Latin-1 characters are unaffected by normalization.
+    if (string.is8Bit())
+        return { string, { } };
+
+    UErrorCode status = U_ZERO_ERROR;
+    const UNormalizer2* normalizer = unorm2_getNFCInstance(&status);
+    ASSERT(U_SUCCESS(status));
+
+    // No need to normalize if already normalized.
+    UBool checkResult = unorm2_isNormalized(normalizer, string.characters16(), string.length(), &status);
+    if (checkResult)
+        return { string, { } };
+
+    unsigned normalizedLength = unorm2_normalize(normalizer, string.characters16(), string.length(), nullptr, 0, &status);
+    ASSERT(status == U_BUFFER_OVERFLOW_ERROR);
+
+    UChar* characters;
+    String result = String::createUninitialized(normalizedLength, characters);
+
+    status = U_ZERO_ERROR;
+    unorm2_normalize(normalizer, string.characters16(), string.length(), characters, normalizedLength, &status);
+    ASSERT(U_SUCCESS(status));
+
+    StringView view { result };
+    return { view, WTFMove(result) };
+}
+
+String normalizedNFC(const String& string)
+{
+    auto result = normalizedNFC(StringView { string });
+    if (result.underlyingString.isNull())
+        return string;
+    return result.underlyingString;
+}
+
</ins><span class="cx"> #if CHECK_STRINGVIEW_LIFETIME
</span><span class="cx"> 
</span><span class="cx"> // Manage reference count manually so UnderlyingString does not need to be defined in the header.
</span></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceWTFwtftextStringViewh"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/text/StringView.h (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/text/StringView.h  2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/WTF/wtf/text/StringView.h     2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2014-2017 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2014-2019 Apple Inc. All rights reserved.
</ins><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="cx">  * modification, are permitted provided that the following conditions
</span><span class="lines">@@ -211,6 +211,16 @@
</span><span class="cx"> inline bool operator!=(const LChar*a, StringView b) { return !equal(b, a); }
</span><span class="cx"> inline bool operator!=(const char*a, StringView b) { return !equal(b, a); }
</span><span class="cx"> 
</span><ins>+struct StringViewWithUnderlyingString;
+
+// This returns a StringView of the normalized result, and a String that is either
+// null, if the input was already normalized, or contains the normalized result
+// and needs to be kept around so the StringView remains valid. Typically the
+// easiest way to use it correctly is to put it into a local and use the StringView.
+WTF_EXPORT_PRIVATE StringViewWithUnderlyingString normalizedNFC(StringView);
+
+WTF_EXPORT_PRIVATE String normalizedNFC(const String&);
+
</ins><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> #include <wtf/text/AtomicString.h>
</span><span class="lines">@@ -218,12 +228,17 @@
</span><span class="cx"> 
</span><span class="cx"> namespace WTF {
</span><span class="cx"> 
</span><ins>+struct StringViewWithUnderlyingString {
+    StringView view;
+    String underlyingString;
+};
+
</ins><span class="cx"> inline StringView::StringView()
</span><span class="cx"> {
</span><del>-    // FIXME: It's peculiar that null strings are 16-bit and empty strings return 8-bit (according to the is8Bit function).
</del><span class="cx"> }
</span><span class="cx"> 
</span><span class="cx"> #if CHECK_STRINGVIEW_LIFETIME
</span><ins>+
</ins><span class="cx"> inline StringView::~StringView()
</span><span class="cx"> {
</span><span class="cx">     setUnderlyingString(nullptr);
</span><span class="lines">@@ -280,6 +295,7 @@
</span><span class="cx"> 
</span><span class="cx">     return *this;
</span><span class="cx"> }
</span><ins>+
</ins><span class="cx"> #endif // CHECK_STRINGVIEW_LIFETIME
</span><span class="cx"> 
</span><span class="cx"> inline void StringView::initialize(const LChar* characters, unsigned length)
</span><span class="lines">@@ -996,3 +1012,4 @@
</span><span class="cx"> using WTF::append;
</span><span class="cx"> using WTF::equal;
</span><span class="cx"> using WTF::StringView;
</span><ins>+using WTF::StringViewWithUnderlyingString;
</ins></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceWebCoreChangeLog"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/WebCore/ChangeLog (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/WebCore/ChangeLog  2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/WebCore/ChangeLog     2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -1,3 +1,32 @@
</span><ins>+2019-03-16  Darin Adler  <darin@apple.com>
+
+        Improve normalization code, including moving from unorm.h to unorm2.h
+        https://bugs.webkit.org/show_bug.cgi?id=195330
+
+        Reviewed by Michael Catanzaro.
+
+        * editing/TextIterator.cpp: Include unorm2.h.
+        (WebCore::normalizeCharacters): Rewrote to use unorm2_normalize rather than
+        unorm_normalize, but left the logic otherwise the same.
+
+        * platform/graphics/SurrogatePairAwareTextIterator.cpp: Include unorm2.h.
+        (WebCore::SurrogatePairAwareTextIterator::normalizeVoicingMarks):
+        Use unorm2_composePair instead of unorm_normalize.
+
+        * platform/graphics/cairo/FontCairoHarfbuzzNG.cpp:
+        (characterSequenceIsEmoji): Changed to use existing SurrogatePairAwareTextIterator.
+        (FontCascade::fontForCombiningCharacterSequence): Use normalizedNFC instead of
+        calling unorm2_normalize directly.
+
+        * WebCore/platform/graphics/freetype/SimpleFontDataFreeType.cpp:
+        Removed unneeded include of <unicode/normlzr.h>.
+
+        * platform/text/TextEncoding.cpp:
+        (WebCore::TextEncoding::encode const): Use normalizedNFC instead of the
+        code that was here. The normalizedNFC function is better in multiple ways,
+        but primarily it handles 8-bit strings and other already-normalized
+        strings much more efficiently.
+
</ins><span class="cx"> 2019-05-15  Zalan Bujtas  <zalan@apple.com>
</span><span class="cx"> 
</span><span class="cx">         Do not create a shape object outside of the layout context
</span></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceWebCoreeditingTextIteratorcpp"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/WebCore/editing/TextIterator.cpp (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/WebCore/editing/TextIterator.cpp   2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/WebCore/editing/TextIterator.cpp      2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2004-2017 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2004-2019 Apple Inc. All rights reserved.
</ins><span class="cx">  * Copyright (C) 2005 Alexey Proskuryakov.
</span><span class="cx">  *
</span><span class="cx">  * Redistribution and use in source and binary forms, with or without
</span><span class="lines">@@ -60,6 +60,7 @@
</span><span class="cx"> #include "TextControlInnerElements.h"
</span><span class="cx"> #include "VisiblePosition.h"
</span><span class="cx"> #include "VisibleUnits.h"
</span><ins>+#include <unicode/unorm2.h>
</ins><span class="cx"> #include <wtf/Function.h>
</span><span class="cx"> #include <wtf/text/CString.h>
</span><span class="cx"> #include <wtf/text/StringBuilder.h>
</span><span class="lines">@@ -71,10 +72,9 @@
</span><span class="cx"> #include <wtf/text/TextBreakIteratorInternalICU.h>
</span><span class="cx"> #endif
</span><span class="cx"> 
</span><ins>+namespace WebCore {
</ins><span class="cx"> 
</span><del>-namespace WebCore {
</del><span class="cx"> using namespace WTF::Unicode;
</span><del>-
</del><span class="cx"> using namespace HTMLNames;
</span><span class="cx"> 
</span><span class="cx"> // Buffer that knows how to compare with a search target.
</span><span class="lines">@@ -2014,32 +2014,27 @@
</span><span class="cx">     return false;
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-ALLOW_DEPRECATED_DECLARATIONS_BEGIN
-// NOTE: ICU's unorm_normalize function is deprecated.
-
</del><span class="cx"> static void normalizeCharacters(const UChar* characters, unsigned length, Vector<UChar>& buffer)
</span><span class="cx"> {
</span><del>-    ASSERT(length);
</del><ins>+    UErrorCode status = U_ZERO_ERROR;
+    const UNormalizer2* normalizer = unorm2_getNFCInstance(&status);
+    ASSERT(U_SUCCESS(status));
</ins><span class="cx"> 
</span><span class="cx">     buffer.resize(length);
</span><span class="cx"> 
</span><del>-    UErrorCode status = U_ZERO_ERROR;
-    size_t bufferSize = unorm_normalize(characters, length, UNORM_NFC, 0, buffer.data(), length, &status);
-    ASSERT(status == U_ZERO_ERROR || status == U_STRING_NOT_TERMINATED_WARNING || status == U_BUFFER_OVERFLOW_ERROR);
-    ASSERT(bufferSize);
</del><ins>+    auto normalizedLength = unorm2_normalize(normalizer, characters, length, buffer.data(), length, &status);
+    ASSERT(U_SUCCESS(status) || status == U_BUFFER_OVERFLOW_ERROR);
</ins><span class="cx"> 
</span><del>-    buffer.resize(bufferSize);
</del><ins>+    buffer.resize(normalizedLength);
</ins><span class="cx"> 
</span><del>-    if (status == U_ZERO_ERROR || status == U_STRING_NOT_TERMINATED_WARNING)
</del><ins>+    if (U_SUCCESS(status))
</ins><span class="cx">         return;
</span><span class="cx"> 
</span><span class="cx">     status = U_ZERO_ERROR;
</span><del>-    unorm_normalize(characters, length, UNORM_NFC, 0, buffer.data(), bufferSize, &status);
-    ASSERT(status == U_STRING_NOT_TERMINATED_WARNING);
</del><ins>+    unorm2_normalize(normalizer, characters, length, buffer.data(), length, &status);
+    ASSERT(U_SUCCESS(status));
</ins><span class="cx"> }
</span><span class="cx"> 
</span><del>-ALLOW_DEPRECATED_DECLARATIONS_END
-
</del><span class="cx"> static bool isNonLatin1Separator(UChar32 character)
</span><span class="cx"> {
</span><span class="cx">     ASSERT_ARG(character, character >= 256);
</span></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceWebCoreplatformgraphicsSurrogatePairAwareTextIteratorcpp"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/SurrogatePairAwareTextIterator.cpp (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/SurrogatePairAwareTextIterator.cpp       2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/SurrogatePairAwareTextIterator.cpp  2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2003, 2006, 2008, 2009, 2010, 2011 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2003-2019 Apple Inc. All rights reserved.
</ins><span class="cx">  * Copyright (C) 2008 Holger Hans Peter Freyther
</span><span class="cx">  * Copyright (C) Research In Motion Limited 2011. All rights reserved.
</span><span class="cx">  *
</span><span class="lines">@@ -23,7 +23,7 @@
</span><span class="cx"> #include "config.h"
</span><span class="cx"> #include "SurrogatePairAwareTextIterator.h"
</span><span class="cx"> 
</span><del>-#include <unicode/unorm.h>
</del><ins>+#include <unicode/unorm2.h>
</ins><span class="cx"> 
</span><span class="cx"> namespace WebCore {
</span><span class="cx"> 
</span><span class="lines">@@ -69,29 +69,24 @@
</span><span class="cx">     return true;
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-ALLOW_DEPRECATED_DECLARATIONS_BEGIN
-// NOTE: ICU's unorm_normalize function is deprecated.
-
</del><span class="cx"> UChar32 SurrogatePairAwareTextIterator::normalizeVoicingMarks()
</span><span class="cx"> {
</span><span class="cx">     // According to http://www.unicode.org/Public/UNIDATA/UCD.html#Canonical_Combining_Class_Values
</span><del>-    static const uint8_t hiraganaKatakanaVoicingMarksCombiningClass = 8;
</del><ins>+    static constexpr uint8_t hiraganaKatakanaVoicingMarksCombiningClass = 8;
</ins><span class="cx"> 
</span><span class="cx">     if (m_currentIndex + 1 >= m_endIndex)
</span><span class="cx">         return 0;
</span><span class="cx"> 
</span><span class="cx">     if (u_getCombiningClass(m_characters[1]) == hiraganaKatakanaVoicingMarksCombiningClass) {
</span><del>-        // Normalize into composed form using 3.2 rules.
-        UChar normalizedCharacters[2] = { 0, 0 };
-        UErrorCode uStatus = U_ZERO_ERROR;  
-        int32_t resultLength = unorm_normalize(m_characters, 2, UNORM_NFC, UNORM_UNICODE_3_2, &normalizedCharacters[0], 2, &uStatus);
-        if (resultLength == 1 && !uStatus)
-            return normalizedCharacters[0];
</del><ins>+        UErrorCode status = U_ZERO_ERROR;
+        const UNormalizer2* normalizer = unorm2_getNFCInstance(&status);
+        ASSERT(U_SUCCESS(status));
+        auto composedCharacter = unorm2_composePair(normalizer, m_characters[0], m_characters[1]);
+        if (composedCharacter > 0)
+            return composedCharacter;
</ins><span class="cx">     }
</span><span class="cx"> 
</span><span class="cx">     return 0;
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-ALLOW_DEPRECATED_DECLARATIONS_END
-
</del><span class="cx"> }
</span></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceWebCoreplatformgraphicscairoFontCairoHarfbuzzNGcpp"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/cairo/FontCairoHarfbuzzNG.cpp (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/cairo/FontCairoHarfbuzzNG.cpp    2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/cairo/FontCairoHarfbuzzNG.cpp       2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -32,7 +32,6 @@
</span><span class="cx"> #include "CharacterProperties.h"
</span><span class="cx"> #include "FontCache.h"
</span><span class="cx"> #include "SurrogatePairAwareTextIterator.h"
</span><del>-#include <unicode/normlzr.h>
</del><span class="cx"> 
</span><span class="cx"> namespace WebCore {
</span><span class="cx"> 
</span><span class="lines">@@ -46,11 +45,10 @@
</span><span class="cx">     return false;
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-static bool characterSequenceIsEmoji(const Vector<UChar, 4>& normalizedCharacters, int32_t normalizedLength)
</del><ins>+static bool characterSequenceIsEmoji(SurrogatePairAwareTextIterator& iterator, UChar32 firstCharacter, unsigned firstClusterLength)
</ins><span class="cx"> {
</span><del>-    UChar32 character;
-    unsigned clusterLength = 0;
-    SurrogatePairAwareTextIterator iterator(normalizedCharacters.data(), 0, normalizedLength, normalizedLength);
</del><ins>+    UChar32 character = firstCharacter;
+    unsigned clusterLength = firstClusterLength;
</ins><span class="cx">     if (!iterator.consume(character, clusterLength))
</span><span class="cx">         return false;
</span><span class="cx"> 
</span><span class="lines">@@ -100,36 +98,27 @@
</span><span class="cx">     return false;
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-const Font* FontCascade::fontForCombiningCharacterSequence(const UChar* characters, size_t length) const
</del><ins>+const Font* FontCascade::fontForCombiningCharacterSequence(const UChar* originalCharacters, size_t originalLength) const
</ins><span class="cx"> {
</span><del>-    UErrorCode error = U_ZERO_ERROR;
-    Vector<UChar, 4> normalizedCharacters(length);
-    const auto* normalizer = unorm2_getNFCInstance(&error);
-    if (U_FAILURE(error))
-        return nullptr;
-    int32_t normalizedLength = unorm2_normalize(normalizer, characters, length, normalizedCharacters.data(), length, &error);
-    if (U_FAILURE(error)) {
-        if (error != U_BUFFER_OVERFLOW_ERROR)
-            return nullptr;
</del><ins>+    auto normalizedString = normalizedNFC(StringView { originalCharacters, static_cast<unsigned>(originalLength) });
</ins><span class="cx"> 
</span><del>-        error = U_ZERO_ERROR;
-        normalizedCharacters.resize(normalizedLength);
-        normalizedLength = unorm2_normalize(normalizer, characters, length, normalizedCharacters.data(), normalizedLength, &error);
-        if (U_FAILURE(error))
-            return nullptr;
-    }
</del><ins>+    // Code below relies on normalizedNFC never narrowing a 16-bit input string into an 8-bit output string.
+    // At the time of this writing, the function never does this, but in theory a future version could, and
+    // we would then need to add code paths here for the simpler 8-bit case.
+    auto characters = normalizedString.view.characters16();
+    auto length = normalizedString.view.length();
</ins><span class="cx"> 
</span><span class="cx">     UChar32 character;
</span><span class="cx">     unsigned clusterLength = 0;
</span><del>-    SurrogatePairAwareTextIterator iterator(normalizedCharacters.data(), 0, normalizedLength, normalizedLength);
</del><ins>+    SurrogatePairAwareTextIterator iterator(characters, 0, length, length);
</ins><span class="cx">     if (!iterator.consume(character, clusterLength))
</span><span class="cx">         return nullptr;
</span><span class="cx"> 
</span><del>-    bool isEmoji = characterSequenceIsEmoji(normalizedCharacters, normalizedLength);
</del><ins>+    bool isEmoji = characterSequenceIsEmoji(iterator, character, clusterLength);
</ins><span class="cx"> 
</span><span class="cx">     const Font* baseFont = glyphDataForCharacter(character, false, NormalVariant).font;
</span><span class="cx">     if (baseFont
</span><del>-        && (static_cast<int32_t>(clusterLength) == normalizedLength || baseFont->canRenderCombiningCharacterSequence(characters, length))
</del><ins>+        && (clusterLength == length || baseFont->canRenderCombiningCharacterSequence(characters, length))
</ins><span class="cx">         && (!isEmoji || baseFont->platformData().isColorBitmapFont()))
</span><span class="cx">         return baseFont;
</span><span class="cx"> 
</span></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceWebCoreplatformgraphicsfreetypeSimpleFontDataFreeTypecpp"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/freetype/SimpleFontDataFreeType.cpp (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/freetype/SimpleFontDataFreeType.cpp      2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/graphics/freetype/SimpleFontDataFreeType.cpp 2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -50,7 +50,6 @@
</span><span class="cx"> #include <ft2build.h>
</span><span class="cx"> #include FT_TRUETYPE_TABLES_H
</span><span class="cx"> #include FT_TRUETYPE_TAGS_H
</span><del>-#include <unicode/normlzr.h>
</del><span class="cx"> #include <wtf/MathExtras.h>
</span><span class="cx"> 
</span><span class="cx"> namespace WebCore {
</span></span></pre></div>
<a id="releasesWebKitGTKwebkit224SourceWebCoreplatformtextTextEncodingcpp"></a>
<div class="modfile"><h4>Modified: releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/text/TextEncoding.cpp (245454 => 245455)</h4>
<pre class="diff"><span>
<span class="info">--- releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/text/TextEncoding.cpp     2019-05-17 11:25:20 UTC (rev 245454)
+++ releases/WebKitGTK/webkit-2.24/Source/WebCore/platform/text/TextEncoding.cpp        2019-05-17 11:25:29 UTC (rev 245455)
</span><span class="lines">@@ -1,5 +1,5 @@
</span><span class="cx"> /*
</span><del>- * Copyright (C) 2004-2017 Apple Inc. All rights reserved.
</del><ins>+ * Copyright (C) 2004-2019 Apple Inc. All rights reserved.
</ins><span class="cx">  * Copyright (C) 2006 Alexey Proskuryakov <ap@nypop.com>
</span><span class="cx">  * Copyright (C) 2007-2009 Torch Mobile, Inc.
</span><span class="cx">  *
</span><span class="lines">@@ -31,10 +31,8 @@
</span><span class="cx"> #include "DecodeEscapeSequences.h"
</span><span class="cx"> #include "TextCodec.h"
</span><span class="cx"> #include "TextEncodingRegistry.h"
</span><del>-#include <unicode/unorm.h>
</del><span class="cx"> #include <wtf/NeverDestroyed.h>
</span><span class="cx"> #include <wtf/StdLibExtras.h>
</span><del>-#include <wtf/text/CString.h>
</del><span class="cx"> #include <wtf/text/StringView.h>
</span><span class="cx"> 
</span><span class="cx"> namespace WebCore {
</span><span class="lines">@@ -71,48 +69,18 @@
</span><span class="cx">     return newTextCodec(*this)->decode(data, length, true, stopOnError, sawError);
</span><span class="cx"> }
</span><span class="cx"> 
</span><del>-ALLOW_DEPRECATED_DECLARATIONS_BEGIN
-// NOTE: ICU's unorm_quickCheck and unorm_normalize functions are deprecated.
-
-Vector<uint8_t> TextEncoding::encode(StringView text, UnencodableHandling handling) const
</del><ins>+Vector<uint8_t> TextEncoding::encode(StringView string, UnencodableHandling handling) const
</ins><span class="cx"> {
</span><del>-    if (!m_name || text.isEmpty())
</del><ins>+    if (!m_name || string.isEmpty())
</ins><span class="cx">         return { };
</span><span class="cx"> 
</span><del>-    // FIXME: Consider adding a fast case for ASCII.
-
</del><span class="cx">     // FIXME: What's the right place to do normalization?
</span><span class="cx">     // It's a little strange to do it inside the encode function.
</span><span class="cx">     // Perhaps normalization should be an explicit step done before calling encode.
</span><del>-
-    auto upconvertedCharacters = text.upconvertedCharacters();
-
-    const UChar* source = upconvertedCharacters;
-    unsigned sourceLength = text.length();
-
-    Vector<UChar> normalizedCharacters;
-
-    UErrorCode err = U_ZERO_ERROR;
-    if (unorm_quickCheck(source, sourceLength, UNORM_NFC, &err) != UNORM_YES) {
-        // First try using the length of the original string, since normalization to NFC rarely increases length.
-        normalizedCharacters.grow(sourceLength);
-        int32_t normalizedLength = unorm_normalize(source, sourceLength, UNORM_NFC, 0, normalizedCharacters.data(), sourceLength, &err);
-        if (err == U_BUFFER_OVERFLOW_ERROR) {
-            err = U_ZERO_ERROR;
-            normalizedCharacters.resize(normalizedLength);
-            normalizedLength = unorm_normalize(source, sourceLength, UNORM_NFC, 0, normalizedCharacters.data(), normalizedLength, &err);
-        }
-        ASSERT(U_SUCCESS(err));
-
-        source = normalizedCharacters.data();
-        sourceLength = normalizedLength;
-    }
-
-    return newTextCodec(*this)->encode(StringView { source, sourceLength }, handling);
</del><ins>+    auto normalizedString = normalizedNFC(string);
+    return newTextCodec(*this)->encode(normalizedString.view, handling);
</ins><span class="cx"> }
</span><span class="cx"> 
</span><del>-ALLOW_DEPRECATED_DECLARATIONS_END
-
</del><span class="cx"> const char* TextEncoding::domName() const
</span><span class="cx"> {
</span><span class="cx">     if (noExtendedTextEncodingNameUsed())
</span></span></pre>
</div>
</div>

</body>
</html>