[webkit-reviews] review granted: [Bug 195330] Improve the toNormalizationFormC function : [Attachment 363643] Patch

Tue Mar 5 11:02:24 PST 2019

Michael Catanzaro <mcatanzaro at igalia.com> has granted Darin Adler
<darin at apple.com>'s request for review:
Bug 195330: Improve the toNormalizationFormC function
https://bugs.webkit.org/show_bug.cgi?id=195330

Attachment 363643: Patch

https://bugs.webkit.org/attachment.cgi?id=363643&action=review

--- Comment #4 from Michael Catanzaro <mcatanzaro at igalia.com> ---
Comment on attachment 363643
  --> https://bugs.webkit.org/attachment.cgi?id=363643
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=363643&action=review

>> Source/WTF/wtf/URLHelpers.cpp:779
>> +	if (string.is8Bit() || string.isEmpty())
> 
> This isEmpty check isn’t needed.

Why not?

Anyway, returning early for 8-bit strings was a good call. Now, because you
know the WTFString is internally 16-bit, you're able to use the nice
characters16 function. Much better than what I came up with when I tried.

> Source/WTF/wtf/URLHelpers.cpp:786
> +    ASSERT(U_SUCCESS(error));
> +    if (U_FAILURE(error))
> +	   return string;

I'm surprised to see you handling the case where the assertion fails. Normally
I would complain that U_SUCCESS(error) is guaranteed to be true here, because
of the assert, but I suppose you prefer to handle it for release builds with
asserts disabled....

Anyway, I do like that you're returning the original string on error, instead
of the empty string like before. More robust.

> Source/WTF/wtf/URLHelpers.cpp:795
> +    auto normalizedLength = unorm2_normalize(normalizer,
string.characters16(), string.length(), nullptr, 0, &error);
> +    ASSERT(error == U_BUFFER_OVERFLOW_ERROR);

Er, what? You assume the normalized string will *always* be larger than the
original string? I see it passes the tests, which surprises me, so that's good,
and I'm no expert on Unicode normalization... but this doesn't seem like a good
assumption at all. Shouldn't the code be prepared to handle the normalized
result being smaller than, or equal to, the original? Please reconsider this
before landing.

As we discussed previously, there are many places in WebKit outside
URLHelpers.cpp currently using unorm_normalize() or unorm2_normalize()
manually, which would benefit from use of this function. So even if for some
strange reason normalized URLs are guaranteed to be larger, I would still make
this more general. The original code looks safer.