[Webkit-unassigned] [Bug 235308] New: The encoding argument to PAL::decodeURLEscapeSequencesAsData is unnecessary

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Mon Jan 17 17:50:26 PST 2022


https://bugs.webkit.org/show_bug.cgi?id=235308

            Bug ID: 235308
           Summary: The encoding argument to
                    PAL::decodeURLEscapeSequencesAsData is unnecessary
           Product: WebKit
           Version: WebKit Nightly Build
          Hardware: Unspecified
                OS: Unspecified
            Status: NEW
          Severity: Normal
          Priority: P2
         Component: WebCore Misc.
          Assignee: webkit-unassigned at lists.webkit.org
          Reporter: abotella at igalia.com

While investigating bug 235307, I noticed that `PAL::decodeURLEscapeSequencesAsData` only seems to be used in `WebCore::DataURLDecoder`, and the `TextEncoding` object that is passed corresponds to the charset parsed from the data URL's MIME type. That algorithm is used to encode the parts of the input string that aren't percent escapes. But the spec's algorithm to process data URLs (https://fetch.spec.whatwg.org/#data-urls), while it parses the MIME type, it does not try to extract the charset, let alone use it for decoding the body.

What seems to be happening is that the input to the data URL processor is a URL object, and that URL is then serialized in step 2 of the processor (in WebKit, this happens in `DecodeTask::process()`). For data URLs, the result of parsing and serializing is always an ASCII string, with non-ASCII characters percent-encoded as UTF-8 (or as the encoding with which the URL was parsed, if they happen to be parsed as part of the query). Therefore, as long as the `string` parameter to `decodeURLEscapeSequencesAsData` is a serialized URL, there are no code points in the input string that would encode differently depending on the passed encoding*, and so the encoding is effectively irrelevant.

Removing this argument would also make the `charset` field of `WebCore::DataURLDecoder::Result` unnecessary.

*. C0 controls are also serialized, so ISO-2022-JP will behave the same as the rest of encodings.

-- 
You are receiving this mail because:
You are the assignee for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-unassigned/attachments/20220118/215cd138/attachment.htm>


More information about the webkit-unassigned mailing list