[webkit-reviews] review granted: [Bug 195535] WebKit has too much of its own UTF-8 code and should rely more on ICU's UTF-8 support : [Attachment 368529] Patch

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Tue Apr 30 09:30:45 PDT 2019


Alexey Proskuryakov <ap at webkit.org> has granted Darin Adler <darin at apple.com>'s
request for review:
Bug 195535: WebKit has too much of its own UTF-8 code and should rely more on
ICU's UTF-8 support
https://bugs.webkit.org/show_bug.cgi?id=195535

Attachment 368529: Patch

https://bugs.webkit.org/attachment.cgi?id=368529&action=review




--- Comment #30 from Alexey Proskuryakov <ap at webkit.org> ---
Comment on attachment 368529
  --> https://bugs.webkit.org/attachment.cgi?id=368529
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=368529&action=review

Large patch, somewhat superficial review.

Overall comment: I don't think that hiding the types in "auto result =
convertUTF16ToUTF8" and "auto success = convertLatin1ToUTF8" is an improvement,
especially as the types are different, and the plan is to change them further.
This took me an outstanding amount of effort to follow, with all the subtle
differences with regards to where "target exhausted" is a failure, and where it
is a success.

> Source/WTF/wtf/text/WTFString.cpp:862
> +    if (!convertUTF8ToUTF16(stringCurrent, reinterpret_cast<const char
*>(stringStart + length), &bufferCurrent, bufferCurrent + buffer.size()))

Pre-existing issue, misplaced star.

> Source/WTF/wtf/unicode/UTF8Conversion.cpp:2
> - * Copyright (C) 2007, 2014 Apple Inc. All rights reserved.
> + * Copyright (C) 2007-2019 Apple Inc. All rights reserved.

Have we actually made non-trivial changes every year?

> Source/WebCore/platform/SharedBuffer.cpp:340
> -	       result = WTF::Unicode::convertLatin1ToUTF8(&d, d + length, &p, p
+ buffer.size());
> +	       if (!WTF::Unicode::convertLatin1ToUTF8(&d, d + length, &p, p +
buffer.size()))

Typically, we export symbols in WTF into the global namespace, and don't use
the WTF prefix at call sites. Is there a good reason for WTF::Unicode to be
special?

> Source/WebKit/Shared/API/APIString.h:26
> +#pragma once

Do we now fully rely on pragma once even in cross-platform API headers?

> LayoutTests/ChangeLog:14
> +	   * css3/escape-dom-api-expected.txt:
> +	   * fast/text/dangling-surrogates-expected.txt:
> +	   * js/dom/webidl-type-mapping-expected.txt:
> +	   * js/invalid-utf8-in-syntax-error-expected.txt:
> +	   Updated expected results to have the Unicode replacement character
in cases where the
> +	   text contains unpaired surrogates. The tests are still doing the
same operations, and
> +	   still getting the same results, but the text output no longer
includes illegal UTF-8.

It would be useful to mention whether the new behavior matches other browser
engines.

> LayoutTests/js/invalid-utf8-in-syntax-error.html:1
> +<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">

This should use HTML5 doctype.

How did you create the wrapper? I just verified that "make-new-script-test
js/invalid-utf8-in-syntax-error.html" creates a modern wrapper, also without
pre/post scripts.


More information about the webkit-reviews mailing list