[webkit-reviews] review granted: [Bug 8769] TextEncoding::fromUnicode() - support non-BMP characters and convert to NFC : [Attachment 8140] proposed patch

Sun May 7 10:39:47 PDT 2006

Darin Adler <darin at apple.com> has granted Darin Adler <darin at apple.com>'s
request for review:
Bug 8769: TextEncoding::fromUnicode() - support non-BMP characters and convert
to NFC
http://bugzilla.opendarwin.org/show_bug.cgi?id=8769

Attachment 8140: proposed patch
http://bugzilla.opendarwin.org/attachment.cgi?id=8140&action=edit

------- Additional Comments from Darin Adler <darin at apple.com>
Looks fine.

Using more ICU in cases like this leads to code that I think is more readable.
For example:

    - Instead of unsigned int, ICU has the UChar32 type.
    - Instead of (badChar & 0xfc00) == 0xd800, ICU has the U16_IS_LEAD macro.
    - Instead of ((low & 0xfc00) == 0xdc00), ICU has the U16_IS_TRAIL macro.
    - instead of:

+		     badChar <<= 10;
+		     badChar += low;
+		     badChar += 0x10000 - (0xd800 << 10) - 0xdc00;

    ICU has the U16_GET_SUPPLEMENTARY macro.

There are other advantages to ICU too. One of the worst things about this
function is the number of times it copies the string. It's particularly
unfortunate that we create a brand new CGString by calling
DeprecatedString::getCFString and then immdiately turn around and call
CFStringCreateMutableCopy.

If this function was hot at all, we could take advantage of the
unorm_quickCheck call in ICU which will quickly determine if there's a
possibility normalization is needed; in the common case we won't have to make a
copy.

But I suppose we can do some of those sorts of things when we make a
non-Mac-specific version of the function.

I suppose that ultimately that's what I would have liked to see. A
non-Macintosh-specific ICU-based version.

r=me on this anyway