[Webkit-unassigned] [Bug 234030] New: TextCodecUTF8 can skip characters after an invalid sequence near EOF
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Wed Dec 8 12:55:22 PST 2021
https://bugs.webkit.org/show_bug.cgi?id=234030
Bug ID: 234030
Summary: TextCodecUTF8 can skip characters after an invalid
sequence near EOF
Product: WebKit
Version: WebKit Nightly Build
Hardware: Unspecified
OS: Unspecified
Status: NEW
Severity: Normal
Priority: P2
Component: Page Loading
Assignee: webkit-unassigned at lists.webkit.org
Reporter: andreu at andreubotella.com
CC: beidson at apple.com
Created attachment 446414
--> https://bugs.webkit.org/attachment.cgi?id=446414&action=review
Sample to show that this bug affects page loading.
WPT tests: https://wpt.fyi/results/encoding/textdecoder-eof.any.html?label=experimental&label=master&aligned (also tests for bug 233921).
When the TextCodecUTF8 decoder finds a non-ASCII lead byte, it waits until enough bytes are consumed to make a valid sequence starting at that position, before starting to process the bytes. But if the stream is flushed before that, the decoder assumes that the remaining bytes are part of a truncated partial sequence, and so discards them while emitting a single replacement character. But this assumption doesn't necessarily hold, and it can result in non-replacement characters being skipped:
// "�A" in Firefox and Chromium 98, and according to the spec.
// "��A" in earlier versions of Chromium.
// "�" in WebKit.
new TextDecoder().decode(new Uint8Array([0xF0, 0x9F, 0x41]));
This can also result in fewer replacement characters being emitted than should be the case:
// "��A" in Firefox, Chrome, and according to the spec.
// "�" in WebKit.
new TextDecoder().decode(new Uint8Array([0xF0, 0x80, 0x41]));
This bug also affects page loading, as with the attached sample.
--
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20211208/8add5404/attachment.htm>
More information about the webkit-unassigned
mailing list