<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head><meta http-equiv="content-type" content="text/html; charset=utf-8" />
<title>[287024] trunk</title>
</head>
<body>

<style type="text/css"><!--
#msg dl.meta { border: 1px #006 solid; background: #369; padding: 6px; color: #fff; }
#msg dl.meta dt { float: left; width: 6em; font-weight: bold; }
#msg dt:after { content:':';}
#msg dl, #msg dt, #msg ul, #msg li, #header, #footer, #logmsg { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt;  }
#msg dl a { font-weight: bold}
#msg dl a:link    { color:#fc3; }
#msg dl a:active  { color:#ff0; }
#msg dl a:visited { color:#cc6; }
h3 { font-family: verdana,arial,helvetica,sans-serif; font-size: 10pt; font-weight: bold; }
#msg pre { overflow: auto; background: #ffc; border: 1px #fa0 solid; padding: 6px; }
#logmsg { background: #ffc; border: 1px #fa0 solid; padding: 1em 1em 0 1em; }
#logmsg p, #logmsg pre, #logmsg blockquote { margin: 0 0 1em 0; }
#logmsg p, #logmsg li, #logmsg dt, #logmsg dd { line-height: 14pt; }
#logmsg h1, #logmsg h2, #logmsg h3, #logmsg h4, #logmsg h5, #logmsg h6 { margin: .5em 0; }
#logmsg h1:first-child, #logmsg h2:first-child, #logmsg h3:first-child, #logmsg h4:first-child, #logmsg h5:first-child, #logmsg h6:first-child { margin-top: 0; }
#logmsg ul, #logmsg ol { padding: 0; list-style-position: inside; margin: 0 0 0 1em; }
#logmsg ul { text-indent: -1em; padding-left: 1em; }#logmsg ol { text-indent: -1.5em; padding-left: 1.5em; }
#logmsg > ul, #logmsg > ol { margin: 0 0 1em 0; }
#logmsg pre { background: #eee; padding: 1em; }
#logmsg blockquote { border: 1px solid #fa0; border-left-width: 10px; padding: 1em 1em 0 1em; background: white;}
#logmsg dl { margin: 0; }
#logmsg dt { font-weight: bold; }
#logmsg dd { margin: 0; padding: 0 0 0.5em 0; }
#logmsg dd:before { content:'\00bb';}
#logmsg table { border-spacing: 0px; border-collapse: collapse; border-top: 4px solid #fa0; border-bottom: 1px solid #fa0; background: #fff; }
#logmsg table th { text-align: left; font-weight: normal; padding: 0.2em 0.5em; border-top: 1px dotted #fa0; }
#logmsg table td { text-align: right; border-top: 1px dotted #fa0; padding: 0.2em 0.5em; }
#logmsg table thead th { text-align: center; border-bottom: 1px solid #fa0; }
#logmsg table th.Corner { text-align: left; }
#logmsg hr { border: none 0; border-top: 2px dashed #fa0; height: 1px; }
#header, #footer { color: #fff; background: #636; border: 1px #300 solid; padding: 6px; }
#patch { width: 100%; }
#patch h4 {font-family: verdana,arial,helvetica,sans-serif;font-size:10pt;padding:8px;background:#369;color:#fff;margin:0;}
#patch .propset h4, #patch .binary h4 {margin:0;}
#patch pre {padding:0;line-height:1.2em;margin:0;}
#patch .diff {width:100%;background:#eee;padding: 0 0 10px 0;overflow:auto;}
#patch .propset .diff, #patch .binary .diff  {padding:10px 0;}
#patch span {display:block;padding:0 10px;}
#patch .modfile, #patch .addfile, #patch .delfile, #patch .propset, #patch .binary, #patch .copfile {border:1px solid #ccc;margin:10px 0;}
#patch ins {background:#dfd;text-decoration:none;display:block;padding:0 10px;}
#patch del {background:#fdd;text-decoration:none;display:block;padding:0 10px;}
#patch .lines, .info {color:#888;background:#fff;}
--></style>
<div id="msg">
<dl class="meta">
<dt>Revision</dt> <dd><a href="http://trac.webkit.org/projects/webkit/changeset/287024">287024</a></dd>
<dt>Author</dt> <dd>commit-queue@webkit.org</dd>
<dt>Date</dt> <dd>2021-12-14 08:29:11 -0800 (Tue, 14 Dec 2021)</dd>
</dl>

<h3>Log Message</h3>
<pre>TextDecoder doesn't detect invalid UTF-8 sequences early enough
https://bugs.webkit.org/show_bug.cgi?id=233921

Patch by Andreu Botella <andreu@andreubotella.com> on 2021-12-14
Reviewed by Darin Adler.

LayoutTests/imported/w3c:

Import WPT tests from
https://github.com/web-platform-tests/wpt/pull/31537.

* web-platform-tests/encoding/textdecoder-eof.any.js:
(test):
* web-platform-tests/encoding/textdecoder-streaming.any-expected.txt:
* web-platform-tests/encoding/textdecoder-streaming.any.js:
(string_appeared_here.forEach.):
(string_appeared_here.forEach.test):
(string_appeared_here.forEach):
* web-platform-tests/encoding/textdecoder-streaming.any.worker-expected.txt:

Source/WebCore/PAL:

In streaming mode, when TextCodecUTF8 found a lead byte for which a
valid sequence would span longer than the currently available bytes, it
used to defer any processing of that sequence until all such bytes were
available, even if errors could be detected earlier. Additionally, if
the stream was flushed at that point, it would emit a single replacement
character, regardless of whether the remaining bytes formed a valid
sequence, even if they had lead bytes, resulting in skipped characters.
Both issues are solved by always checking the validity of partial
sequences.

The approach used in this patch uses `decodeNonASCIISequence` to find
the length of the maximal subpart of a partial sequence, and if the
length is equal to the partial sequence size and we're not at EOF, we
don't emit the error. This is enough to handle the missing characters at
EOF, and when combined with changing the condition of the outer do-while
loops in the `decode` method from `flush && m_partialSequenceSize` to
only `m_partialSequenceSize`, it also fixes the streaming issue.

This patch is a port of
https://chromium-review.googlesource.com/c/chromium/src/+/3263938

Tests: imported/w3c/web-platform-tests/encoding/textdecoder-eof.any.html
       imported/w3c/web-platform-tests/encoding/textdecoder-stream.any.html

* pal/text/TextCodecUTF8.cpp:
(PAL::TextCodecUTF8::handlePartialSequence): Changed to always process
partial sequences.
(PAL::TextCodecUTF8::decode): Changed the loop condition of the outer
do-while loops to not depend on `flush`.</pre>

<h3>Modified Paths</h3>
<ul>
<li><a href="#trunkLayoutTestsimportedw3cChangeLog">trunk/LayoutTests/imported/w3c/ChangeLog</a></li>
<li><a href="#trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecodereofanyjs">trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-eof.any.js</a></li>
<li><a href="#trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecoderstreaminganyexpectedtxt">trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any-expected.txt</a></li>
<li><a href="#trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecoderstreaminganyjs">trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any.js</a></li>
<li><a href="#trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecoderstreaminganyworkerexpectedtxt">trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any.worker-expected.txt</a></li>
<li><a href="#trunkSourceWebCorePALChangeLog">trunk/Source/WebCore/PAL/ChangeLog</a></li>
<li><a href="#trunkSourceWebCorePALpaltextTextCodecUTF8cpp">trunk/Source/WebCore/PAL/pal/text/TextCodecUTF8.cpp</a></li>
</ul>

</div>
<div id="patch">
<h3>Diff</h3>
<a id="trunkLayoutTestsimportedw3cChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/imported/w3c/ChangeLog (287023 => 287024)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/imported/w3c/ChangeLog 2021-12-14 16:20:09 UTC (rev 287023)
+++ trunk/LayoutTests/imported/w3c/ChangeLog    2021-12-14 16:29:11 UTC (rev 287024)
</span><span class="lines">@@ -1,3 +1,22 @@
</span><ins>+2021-12-14  Andreu Botella  <andreu@andreubotella.com>
+
+        TextDecoder doesn't detect invalid UTF-8 sequences early enough
+        https://bugs.webkit.org/show_bug.cgi?id=233921
+
+        Reviewed by Darin Adler.
+
+        Import WPT tests from
+        https://github.com/web-platform-tests/wpt/pull/31537.
+
+        * web-platform-tests/encoding/textdecoder-eof.any.js:
+        (test):
+        * web-platform-tests/encoding/textdecoder-streaming.any-expected.txt:
+        * web-platform-tests/encoding/textdecoder-streaming.any.js:
+        (string_appeared_here.forEach.):
+        (string_appeared_here.forEach.test):
+        (string_appeared_here.forEach):
+        * web-platform-tests/encoding/textdecoder-streaming.any.worker-expected.txt:
+
</ins><span class="cx"> 2021-12-14  Rob Buis  <rbuis@igalia.com>
</span><span class="cx"> 
</span><span class="cx">         Incorrect aspect ratio size
</span></span></pre></div>
<a id="trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecodereofanyjs"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-eof.any.js (287023 => 287024)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-eof.any.js        2021-12-14 16:20:09 UTC (rev 287023)
+++ trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-eof.any.js   2021-12-14 16:29:11 UTC (rev 287024)
</span><span class="lines">@@ -1,7 +1,14 @@
</span><span class="cx"> test(() => {
</span><ins>+  // Truncated sequences
</ins><span class="cx">   assert_equals(new TextDecoder().decode(new Uint8Array([0xF0])), "\uFFFD");
</span><span class="cx">   assert_equals(new TextDecoder().decode(new Uint8Array([0xF0, 0x9F])), "\uFFFD");
</span><span class="cx">   assert_equals(new TextDecoder().decode(new Uint8Array([0xF0, 0x9F, 0x92])), "\uFFFD");
</span><ins>+
+  // Errors near end-of-queue
+  assert_equals(new TextDecoder().decode(new Uint8Array([0xF0, 0x9F, 0x41])), "\uFFFDA");
+  assert_equals(new TextDecoder().decode(new Uint8Array([0xF0, 0x41, 0x42])), "\uFFFDAB");
+  assert_equals(new TextDecoder().decode(new Uint8Array([0xF0, 0x41, 0xF0])), "\uFFFDA\uFFFD");
+  assert_equals(new TextDecoder().decode(new Uint8Array([0xF0, 0x8F, 0x92])), "\uFFFD\uFFFD\uFFFD");
</ins><span class="cx"> }, "TextDecoder end-of-queue handling");
</span><span class="cx"> 
</span><span class="cx"> test(() => {
</span><span class="lines">@@ -15,4 +22,19 @@
</span><span class="cx"> 
</span><span class="cx">   decoder.decode(new Uint8Array([0xF0, 0x9F]), { stream: true });
</span><span class="cx">   assert_equals(decoder.decode(new Uint8Array([0x92])), "\uFFFD");
</span><ins>+
+  assert_equals(decoder.decode(new Uint8Array([0xF0, 0x9F]), { stream: true }), "");
+  assert_equals(decoder.decode(new Uint8Array([0x41]), { stream: true }), "\uFFFDA");
+  assert_equals(decoder.decode(), "");
+
+  assert_equals(decoder.decode(new Uint8Array([0xF0, 0x41, 0x42]), { stream: true }), "\uFFFDAB");
+  assert_equals(decoder.decode(), "");
+
+  assert_equals(decoder.decode(new Uint8Array([0xF0, 0x41, 0xF0]), { stream: true }), "\uFFFDA");
+  assert_equals(decoder.decode(), "\uFFFD");
+
+  assert_equals(decoder.decode(new Uint8Array([0xF0]), { stream: true }), "");
+  assert_equals(decoder.decode(new Uint8Array([0x8F]), { stream: true }), "\uFFFD\uFFFD");
+  assert_equals(decoder.decode(new Uint8Array([0x92]), { stream: true }), "\uFFFD");
+  assert_equals(decoder.decode(), "");
</ins><span class="cx"> }, "TextDecoder end-of-queue handling using stream: true");
</span></span></pre></div>
<a id="trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecoderstreaminganyexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any-expected.txt (287023 => 287024)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any-expected.txt        2021-12-14 16:20:09 UTC (rev 287023)
+++ trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any-expected.txt   2021-12-14 16:29:11 UTC (rev 287024)
</span><span class="lines">@@ -14,6 +14,7 @@
</span><span class="cx"> PASS Streaming decode: utf-16be, 3 byte window (ArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-16be, 4 byte window (ArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-16be, 5 byte window (ArrayBuffer)
</span><ins>+PASS Streaming decode: UTF-8 chunk tests (ArrayBuffer)
</ins><span class="cx"> PASS Streaming decode: utf-8, 1 byte window (SharedArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-8, 2 byte window (SharedArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-8, 3 byte window (SharedArrayBuffer)
</span><span class="lines">@@ -29,4 +30,5 @@
</span><span class="cx"> PASS Streaming decode: utf-16be, 3 byte window (SharedArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-16be, 4 byte window (SharedArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-16be, 5 byte window (SharedArrayBuffer)
</span><ins>+PASS Streaming decode: UTF-8 chunk tests (SharedArrayBuffer)
</ins><span class="cx"> 
</span></span></pre></div>
<a id="trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecoderstreaminganyjs"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any.js (287023 => 287024)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any.js  2021-12-14 16:20:09 UTC (rev 287023)
+++ trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any.js     2021-12-14 16:29:11 UTC (rev 287024)
</span><span class="lines">@@ -28,10 +28,11 @@
</span><span class="cx">                 var decoder = new TextDecoder(encoding);
</span><span class="cx">                 for (var i = 0; i < encoded.length; i += len) {
</span><span class="cx">                     var sub = [];
</span><del>-                    for (var j = i; j < encoded.length && j < i + len; ++j)
</del><ins>+                    for (var j = i; j < encoded.length && j < i + len; ++j) {
</ins><span class="cx">                         sub.push(encoded[j]);
</span><del>-                        var uintArray = new Uint8Array(createBuffer(arrayBufferOrSharedArrayBuffer, sub.length));
-                        uintArray.set(sub);
</del><ins>+                    }
+                    var uintArray = new Uint8Array(createBuffer(arrayBufferOrSharedArrayBuffer, sub.length));
+                    uintArray.set(sub);
</ins><span class="cx">                     out += decoder.decode(uintArray, {stream: true});
</span><span class="cx">                 }
</span><span class="cx">                 out += decoder.decode();
</span><span class="lines">@@ -39,4 +40,50 @@
</span><span class="cx">             }, 'Streaming decode: ' + encoding + ', ' + len + ' byte window (' + arrayBufferOrSharedArrayBuffer + ')');
</span><span class="cx">         }
</span><span class="cx">     });
</span><ins>+
+    test(() => {
+        function bytes(byteArray) {
+            const view = new Uint8Array(createBuffer(arrayBufferOrSharedArrayBuffer, byteArray.length));
+            view.set(byteArray);
+            return view;
+        }
+
+        const decoder = new TextDecoder();
+
+        assert_equals(decoder.decode(bytes([0xC1]), {stream: true}), "\uFFFD");
+        assert_equals(decoder.decode(), "");
+
+        assert_equals(decoder.decode(bytes([0xF5]), {stream: true}), "\uFFFD");
+        assert_equals(decoder.decode(), "");
+
+        assert_equals(decoder.decode(bytes([0xE0, 0x41]), {stream: true}), "\uFFFDA");
+        assert_equals(decoder.decode(bytes([0x42])), "B");
+
+        assert_equals(decoder.decode(bytes([0xE0, 0x80]), {stream: true}), "\uFFFD\uFFFD");
+        assert_equals(decoder.decode(bytes([0x80])), "\uFFFD");
+
+        assert_equals(decoder.decode(bytes([0xED, 0xA0]), {stream: true}), "\uFFFD\uFFFD");
+        assert_equals(decoder.decode(bytes([0x80])), "\uFFFD");
+
+        assert_equals(decoder.decode(bytes([0xF0, 0x41]), {stream: true}), "\uFFFDA");
+        assert_equals(decoder.decode(bytes([0x42]), {stream: true}), "B");
+        assert_equals(decoder.decode(bytes([0x43])), "C");
+
+        assert_equals(decoder.decode(bytes([0xF0, 0x80]), {stream: true}), "\uFFFD\uFFFD");
+        assert_equals(decoder.decode(bytes([0x80]), {stream: true}), "\uFFFD");
+        assert_equals(decoder.decode(bytes([0x80])), "\uFFFD");
+
+        assert_equals(decoder.decode(bytes([0xF4, 0xA0]), {stream: true}), "\uFFFD\uFFFD");
+        assert_equals(decoder.decode(bytes([0x80]), {stream: true}), "\uFFFD");
+        assert_equals(decoder.decode(bytes([0x80])), "\uFFFD");
+
+        assert_equals(decoder.decode(bytes([0xF0, 0x90, 0x41]), {stream: true}), "\uFFFDA");
+        assert_equals(decoder.decode(bytes([0x42])), "B");
+
+        // 4-byte UTF-8 sequences always correspond to non-BMP characters. Here
+        // we make sure that, although the first 3 bytes are enough to emit the
+        // lead surrogate, it only gets emitted when the fourth byte is read.
+        assert_equals(decoder.decode(bytes([0xF0, 0x9F, 0x92]), {stream: true}), "");
+        assert_equals(decoder.decode(bytes([0xA9])), "\u{1F4A9}");
+    }, `Streaming decode: UTF-8 chunk tests (${arrayBufferOrSharedArrayBuffer})`);
</ins><span class="cx"> })
</span></span></pre></div>
<a id="trunkLayoutTestsimportedw3cwebplatformtestsencodingtextdecoderstreaminganyworkerexpectedtxt"></a>
<div class="modfile"><h4>Modified: trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any.worker-expected.txt (287023 => 287024)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any.worker-expected.txt 2021-12-14 16:20:09 UTC (rev 287023)
+++ trunk/LayoutTests/imported/w3c/web-platform-tests/encoding/textdecoder-streaming.any.worker-expected.txt    2021-12-14 16:29:11 UTC (rev 287024)
</span><span class="lines">@@ -14,6 +14,7 @@
</span><span class="cx"> PASS Streaming decode: utf-16be, 3 byte window (ArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-16be, 4 byte window (ArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-16be, 5 byte window (ArrayBuffer)
</span><ins>+PASS Streaming decode: UTF-8 chunk tests (ArrayBuffer)
</ins><span class="cx"> PASS Streaming decode: utf-8, 1 byte window (SharedArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-8, 2 byte window (SharedArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-8, 3 byte window (SharedArrayBuffer)
</span><span class="lines">@@ -29,4 +30,5 @@
</span><span class="cx"> PASS Streaming decode: utf-16be, 3 byte window (SharedArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-16be, 4 byte window (SharedArrayBuffer)
</span><span class="cx"> PASS Streaming decode: utf-16be, 5 byte window (SharedArrayBuffer)
</span><ins>+PASS Streaming decode: UTF-8 chunk tests (SharedArrayBuffer)
</ins><span class="cx"> 
</span></span></pre></div>
<a id="trunkSourceWebCorePALChangeLog"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/PAL/ChangeLog (287023 => 287024)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/PAL/ChangeLog       2021-12-14 16:20:09 UTC (rev 287023)
+++ trunk/Source/WebCore/PAL/ChangeLog  2021-12-14 16:29:11 UTC (rev 287024)
</span><span class="lines">@@ -1,3 +1,40 @@
</span><ins>+2021-12-14  Andreu Botella  <andreu@andreubotella.com>
+
+        TextDecoder doesn't detect invalid UTF-8 sequences early enough
+        https://bugs.webkit.org/show_bug.cgi?id=233921
+
+        Reviewed by Darin Adler.
+
+        In streaming mode, when TextCodecUTF8 found a lead byte for which a
+        valid sequence would span longer than the currently available bytes, it
+        used to defer any processing of that sequence until all such bytes were
+        available, even if errors could be detected earlier. Additionally, if
+        the stream was flushed at that point, it would emit a single replacement
+        character, regardless of whether the remaining bytes formed a valid
+        sequence, even if they had lead bytes, resulting in skipped characters.
+        Both issues are solved by always checking the validity of partial
+        sequences.
+
+        The approach used in this patch uses `decodeNonASCIISequence` to find
+        the length of the maximal subpart of a partial sequence, and if the
+        length is equal to the partial sequence size and we're not at EOF, we
+        don't emit the error. This is enough to handle the missing characters at
+        EOF, and when combined with changing the condition of the outer do-while
+        loops in the `decode` method from `flush && m_partialSequenceSize` to
+        only `m_partialSequenceSize`, it also fixes the streaming issue.
+
+        This patch is a port of
+        https://chromium-review.googlesource.com/c/chromium/src/+/3263938
+
+        Tests: imported/w3c/web-platform-tests/encoding/textdecoder-eof.any.html
+               imported/w3c/web-platform-tests/encoding/textdecoder-stream.any.html
+
+        * pal/text/TextCodecUTF8.cpp:
+        (PAL::TextCodecUTF8::handlePartialSequence): Changed to always process
+        partial sequences.
+        (PAL::TextCodecUTF8::decode): Changed the loop condition of the outer
+        do-while loops to not depend on `flush`.
+
</ins><span class="cx"> 2021-12-14  Ben Nham  <nham@apple.com>
</span><span class="cx"> 
</span><span class="cx">         Add web push message decryption routines
</span></span></pre></div>
<a id="trunkSourceWebCorePALpaltextTextCodecUTF8cpp"></a>
<div class="modfile"><h4>Modified: trunk/Source/WebCore/PAL/pal/text/TextCodecUTF8.cpp (287023 => 287024)</h4>
<pre class="diff"><span>
<span class="info">--- trunk/Source/WebCore/PAL/pal/text/TextCodecUTF8.cpp      2021-12-14 16:20:09 UTC (rev 287023)
+++ trunk/Source/WebCore/PAL/pal/text/TextCodecUTF8.cpp 2021-12-14 16:29:11 UTC (rev 287024)
</span><span class="lines">@@ -189,25 +189,34 @@
</span><span class="cx">         if (!count)
</span><span class="cx">             return true;
</span><span class="cx"> 
</span><ins>+        // Copy from `source` until we have `count` bytes.
+        if (count > m_partialSequenceSize && end > source) {
+            size_t additionalBytes = std::min<size_t>(count - m_partialSequenceSize, end - source);
+            memcpy(m_partialSequence + m_partialSequenceSize, source, additionalBytes);
+            source += additionalBytes;
+            m_partialSequenceSize += additionalBytes;
+        }
+
+        // If we still don't have `count` bytes, fill the rest with zeros (any
+        // other lead byte would do), so we can run `decodeNonASCIISequence` to
+        // tell if the chunk that we have is valid. These bytes are not part of
+        // the partial sequence, so don't increment `m_partialSequenceSize`.
+        bool partialSequenceIsTooShort = false;
</ins><span class="cx">         if (count > m_partialSequenceSize) {
</span><del>-            if (count - m_partialSequenceSize > end - source) {
-                if (!flush) {
-                    // The new data is not enough to complete the sequence, so
-                    // add it to the existing partial sequence.
-                    memcpy(m_partialSequence + m_partialSequenceSize, source, end - source);
-                    m_partialSequenceSize += end - source;
-                    return false;
-                }
-                // An incomplete partial sequence at the end is an error, but it will create
-                // a 16 bit string due to the replacementCharacter. Let the 16 bit path handle
-                // the error.
-                return true;
-            }
-            memcpy(m_partialSequence + m_partialSequenceSize, source, count - m_partialSequenceSize);
-            source += count - m_partialSequenceSize;
-            m_partialSequenceSize = count;
</del><ins>+            partialSequenceIsTooShort = true;
+            memset(m_partialSequence + m_partialSequenceSize, 0, count - m_partialSequenceSize);
</ins><span class="cx">         }
</span><ins>+
</ins><span class="cx">         int character = decodeNonASCIISequence(m_partialSequence, count);
</span><ins>+        if (partialSequenceIsTooShort) {
+            ASSERT(character == nonCharacter);
+            ASSERT(count <= m_partialSequenceSize);
+            // If we're not at the end, and the partial sequence that we have is
+            // incomplete but otherwise valid, a non-character is not an error.
+            if (!flush && count == m_partialSequenceSize)
+                return false;
+        }
+
</ins><span class="cx">         if (!isLatin1(character))
</span><span class="cx">             return true;
</span><span class="cx"> 
</span><span class="lines">@@ -236,29 +245,35 @@
</span><span class="cx">             consumePartialSequenceByte();
</span><span class="cx">             continue;
</span><span class="cx">         }
</span><ins>+
+        // Copy from `source` until we have `count` bytes.
+        if (count > m_partialSequenceSize && end > source) {
+            size_t additionalBytes = std::min<size_t>(count - m_partialSequenceSize, end - source);
+            memcpy(m_partialSequence + m_partialSequenceSize, source, additionalBytes);
+            source += additionalBytes;
+            m_partialSequenceSize += additionalBytes;
+        }
+
+        // If we still don't have `count` bytes, fill the rest with zeros (any
+        // other lead byte would do), so we can run `decodeNonASCIISequence` to
+        // tell if the chunk that we have is valid. These bytes are not part of
+        // the partial sequence, so don't increment `m_partialSequenceSize`.
+        bool partialSequenceIsTooShort = false;
</ins><span class="cx">         if (count > m_partialSequenceSize) {
</span><del>-            if (count - m_partialSequenceSize > end - source) {
-                if (!flush) {
-                    // The new data is not enough to complete the sequence, so
-                    // add it to the existing partial sequence.
-                    memcpy(m_partialSequence + m_partialSequenceSize, source, end - source);
-                    m_partialSequenceSize += end - source;
-                    return;
-                }
-                // An incomplete partial sequence at the end is an error.
-                sawError = true;
-                if (stopOnError)
-                    return;
-                *destination++ = replacementCharacter;
-                m_partialSequenceSize = 0;
-                source = end;
-                continue;
-            }
-            memcpy(m_partialSequence + m_partialSequenceSize, source, count - m_partialSequenceSize);
-            source += count - m_partialSequenceSize;
-            m_partialSequenceSize = count;
</del><ins>+            partialSequenceIsTooShort = true;
+            memset(m_partialSequence + m_partialSequenceSize, 0, count - m_partialSequenceSize);
</ins><span class="cx">         }
</span><ins>+
</ins><span class="cx">         int character = decodeNonASCIISequence(m_partialSequence, count);
</span><ins>+        if (partialSequenceIsTooShort) {
+            ASSERT(character == nonCharacter);
+            ASSERT(count <= m_partialSequenceSize);
+            // If we're not at the end, and the partial sequence that we have is
+            // incomplete but otherwise valid, a non-character is not an error.
+            if (!flush && count == m_partialSequenceSize)
+                return;
+        }
+
</ins><span class="cx">         if (character == nonCharacter) {
</span><span class="cx">             sawError = true;
</span><span class="cx">             if (stopOnError)
</span><span class="lines">@@ -353,7 +368,7 @@
</span><span class="cx">             source += count;
</span><span class="cx">             *destination++ = character;
</span><span class="cx">         }
</span><del>-    } while (flush && m_partialSequenceSize);
</del><ins>+    } while (m_partialSequenceSize);
</ins><span class="cx"> 
</span><span class="cx">     buffer.shrink(destination - buffer.characters());
</span><span class="cx">     if (flush)
</span><span class="lines">@@ -433,7 +448,7 @@
</span><span class="cx">                 continue;
</span><span class="cx">             destination16 = appendCharacter(destination16, character);
</span><span class="cx">         }
</span><del>-    } while (flush && m_partialSequenceSize);
</del><ins>+    } while (m_partialSequenceSize);
</ins><span class="cx"> 
</span><span class="cx">     buffer16.shrink(destination16 - buffer16.characters());
</span><span class="cx">     if (flush)
</span></span></pre>
</div>
</div>

</body>
</html>