[webkit-reviews] review granted: [Bug 45594] Add AtomicString::fromUTF8 : [Attachment 73819] Patch
bugzilla-daemon at webkit.org
bugzilla-daemon at webkit.org
Mon Nov 15 12:11:32 PST 2010
Darin Adler <darin at apple.com> has granted Patrick R. Gansterer
<paroga at paroga.com>'s request for review:
Bug 45594: Add AtomicString::fromUTF8
https://bugs.webkit.org/show_bug.cgi?id=45594
Attachment 73819: Patch
https://bugs.webkit.org/attachment.cgi?id=73819&action=review
------- Additional Comments from Darin Adler <darin at apple.com>
View in context: https://bugs.webkit.org/attachment.cgi?id=73819&action=review
Looks good. Please consider some of my suggestions for improvement. For now, I
set commit-queue- so you could think over if you want to make these changes
before you land.
> JavaScriptCore/wtf/text/AtomicString.cpp:233
> + UChar *target;
The * is positioned incorrectly here. It should be next to the UChar.
> JavaScriptCore/wtf/text/AtomicString.cpp:234
> + location = StringImpl::createUninitialized(buffer.utf16Length,
target).releaseRef();
Please use the new name, leakRef, not the old name, releaseRef, in new code.
> JavaScriptCore/wtf/text/AtomicString.h:112
> + // the input data contains invalid UTF-8 characters.
Comment should say “invalid UTF-8 sequences” not “invalid UTF-8 characters”.
> JavaScriptCore/wtf/text/WTFString.h:313
> + // the input data contains invalid UTF-8 characters.
Same comment.
> JavaScriptCore/wtf/unicode/UTF8.cpp:238
> + UChar32 ch = 0;
I’d much prefer “character” for this local.
> JavaScriptCore/wtf/unicode/UTF8.cpp:243
> + case 6: ch += static_cast<unsigned char>(*sequence++); ch <<= 6; //
remember, illegal UTF-8
> + case 5: ch += static_cast<unsigned char>(*sequence++); ch <<= 6; //
remember, illegal UTF-8
Who are we saying “remember” to in these comments? I know you just moved them,
but I think these comments are unhelpful and unclear. We should remove the
comments.
> JavaScriptCore/wtf/unicode/UTF8.cpp:262
> + if (source + utf8SequenceLength > sourceEnd) {
This should be written like this:
if (sourceEnd - source < utf8SequenceLength)
It’s not great to compute an address past the end of the buffer, and doing it
this way guarantees we don’t try to do that.
> JavaScriptCore/wtf/unicode/UTF8.cpp:313
> +unsigned calculateHashFromUTF8(const char* data, const char* dataEnd,
unsigned& utf16Length)
This function needs a policy for handling lengths that do not fit in unsigned.
Even if that policy is simply to call CRASH when encountering a giant length.
Not doing so can lead to security vulnerabilities. Alternatively, the type of
the out argument could be changed to a size_t, but then the callers would need
to check if they narrowed it to an unsigned.
> JavaScriptCore/wtf/unicode/UTF8.cpp:322
> + if (!(*data & 0x80)) {
It would be better to use isASCII from ASCIICType.h for this.
> JavaScriptCore/wtf/unicode/UTF8.cpp:337
> + ASSERT(ch >= 0x80);
It would be better to use isASCII from ASCIICType.h for this.
> JavaScriptCore/wtf/unicode/UTF8.cpp:339
> + if (ch <= 0xFFFF) {
It would be better to use U_IS_BMP for this. This is a macro from ICU that we
may already implement for non-ICU platforms, but if not we can do so.
> JavaScriptCore/wtf/unicode/UTF8.cpp:341
> + if (ch >= 0xD800 && ch <= 0xDFFF)
It would be better to use U_IS_SURROGATE for this. Another ICU macro.
> JavaScriptCore/wtf/unicode/UTF8.cpp:348
> + } else if (ch <= 0x10FFFF) {
> + ch -= 0x0010000UL;
> + stringHasher.addCharacters(static_cast<UChar>((ch >> 10) +
0xD800),
> + static_cast<UChar>((ch & 0x03FF) +
0xDC00));
It would be better to use U_IS_SUPPLEMENTARY, U16_LEAD, and U16_TRAIL for this.
More ICU macros.
> JavaScriptCore/wtf/unicode/UTF8.cpp:360
> + if (!(*b & 0x80)) {
It would be better to use isASCII from ASCIICType.h for this.
> JavaScriptCore/wtf/unicode/UTF8.cpp:361
> + if (*a++ != static_cast<UChar>(*b++))
The static_cast is unfortunate. Can we use a local variable to avoid the
typecast?
> JavaScriptCore/wtf/unicode/UTF8.cpp:375
> + ASSERT(ch >= 0x80);
It would be better to use isASCII from ASCIICType.h for this.
> JavaScriptCore/wtf/unicode/UTF8.cpp:377
> + if (ch <= 0xFFFF) {
It would be better to use U_IS_BMP for this.
> JavaScriptCore/wtf/unicode/UTF8.cpp:379
> + if (ch >= 0xD800 && ch <= 0xDFFF)
It would be better to use U_IS_SURROGATE for this.
> JavaScriptCore/wtf/unicode/UTF8.cpp:381
> + if (*a++ != static_cast<UChar>(ch))
I don’t think we need the static_cast here. Please remove it.
> JavaScriptCore/wtf/unicode/UTF8.cpp:389
> + } else if (ch <= 0x10FFFF) {
> + ch -= 0x0010000UL;
> +
> + if (*a++ != static_cast<UChar>((ch >> 10) + 0xD800))
> + return false;
> + if (*a++ != static_cast<UChar>((ch & 0x03FF) + 0xDC00))
> + return false;
It would be better to use U_IS_SUPPLEMENTARY, U16_LEAD, and U16_TRAIL for this.
> JavaScriptCore/wtf/unicode/UTF8.h:73
> + unsigned calculateHashFromUTF8(const char* data, const char* dataEnd,
unsigned& utf16Length);
Still unfortunate to have this here in this file. At the very least this needs
a comment to explain what kind of hash we are talking about.
> JavaScriptCore/wtf/unicode/UTF8.h:75
> + bool equalUTF8(const UChar* a, const UChar* aEnd, const char* b, const
char* bEnd);
This name should mention UTF-16, not just UTF-8, since the function compares
UTF-16 to UTF-8.
More information about the webkit-reviews
mailing list