[Webkit-unassigned] [Bug 234030] New: TextCodecUTF8 can skip characters after an invalid sequence near EOF

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Wed Dec 8 12:55:22 PST 2021


https://bugs.webkit.org/show_bug.cgi?id=234030

            Bug ID: 234030
           Summary: TextCodecUTF8 can skip characters after an invalid
                    sequence near EOF
           Product: WebKit
           Version: WebKit Nightly Build
          Hardware: Unspecified
                OS: Unspecified
            Status: NEW
          Severity: Normal
          Priority: P2
         Component: Page Loading
          Assignee: webkit-unassigned at lists.webkit.org
          Reporter: andreu at andreubotella.com
                CC: beidson at apple.com

Created attachment 446414

  --> https://bugs.webkit.org/attachment.cgi?id=446414&action=review

Sample to show that this bug affects page loading.

WPT tests: https://wpt.fyi/results/encoding/textdecoder-eof.any.html?label=experimental&label=master&aligned (also tests for bug 233921).

When the TextCodecUTF8 decoder finds a non-ASCII lead byte, it waits until enough bytes are consumed to make a valid sequence starting at that position, before starting to process the bytes. But if the stream is flushed before that, the decoder assumes that the remaining bytes are part of a truncated partial sequence, and so discards them while emitting a single replacement character. But this assumption doesn't necessarily hold, and it can result in non-replacement characters being skipped:

// "�A" in Firefox and Chromium 98, and according to the spec.
// "��A" in earlier versions of Chromium.
// "�" in WebKit.
new TextDecoder().decode(new Uint8Array([0xF0, 0x9F, 0x41]));

This can also result in fewer replacement characters being emitted than should be the case:

// "��A" in Firefox, Chrome, and according to the spec.
// "�" in WebKit.
new TextDecoder().decode(new Uint8Array([0xF0, 0x80, 0x41]));

This bug also affects page loading, as with the attached sample.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20211208/8add5404/attachment.htm>


More information about the webkit-unassigned mailing list