[webkit-reviews] review granted: [Bug 115953] TextCodecICU complains about ambiguous codec names with current ICU release : [Attachment 201450] proposed fix

Sat May 11 16:31:29 PDT 2013

Darin Adler <darin at apple.com> has granted Alexey Proskuryakov <ap at webkit.org>'s
request for review:
Bug 115953: TextCodecICU complains about ambiguous codec names with current ICU
release
https://bugs.webkit.org/show_bug.cgi?id=115953

Attachment 201450: proposed fix
https://bugs.webkit.org/attachment.cgi?id=201450&action=review

------- Additional Comments from Darin Adler <darin at apple.com>
View in context: https://bugs.webkit.org/attachment.cgi?id=201450&action=review

It would have been nice to do all the renaming in a separate patch first. The
name changes combined with other style changes and behavior changes makes a
much harder to review patch.

I’m OK with this, but I also think we could make some significant improvements
to make the code more readable.

>> Source/WebCore/platform/text/TextCodecICU.cpp:190
>> +	const char* canonicalConverterNameForISO_8859_8_I =
ucnv_getCanonicalName("ISO-8859-8-I", "IANA", &error);
> 
> canonicalConverterNameForISO_8859_8_I is incorrectly named. Don't use
underscores in your identifier names.  [readability/naming/underscores] [4]

I think you could leave out the underlines in the name of this local variable.
Note that it has a small scope so it doesn’t have to have a long name.

Separately, it’s not great that the string "ISO-8859-8-I" is repeated twice in
these three lines of code. Would be nice to avoid that.

And there is no “why” comment explaining why this has to be handled
differently. Presumably the "MIME" standard name for this is bad and needs to
be ignored?

> Source/WebCore/platform/text/TextCodecICU.cpp:198
> +	   const char* webStandardName =
ucnv_getStandardName(canonicalConverterName, "MIME", &error);

Not good that this ucnv_getStandardName, first MIME, then IANA, technique is
repeated twice in these two functions. Can’t we share the code?

> Source/WebCore/platform/text/TextCodecICU.cpp:206
> +	   // Don't register codecs for overridden encodings.

It worries me that this repeats the list of encodings that are also listed
above. Is there a way to guarantee they stay in sync? Maybe even share a table?

> Source/WebCore/platform/text/TextCodecICU.cpp:210
> +	       || strcasecmp(webStandardName, "iso-8859-9") == 0

Why is this one a strcasecmp, but all the others strcmp? Also, historically we
mostly steered clear of strcasecmp since it uses a locale-specific concept of
case, for the same reason we use toASCIILower instead of tolower.

> Source/WebCore/platform/text/TextCodecICU.cpp:269
> -    m_converterICU = ucnv_open(m_encodingName, &err);
> -#if !LOG_DISABLED
> -    if (err == U_AMBIGUOUS_ALIAS_WARNING)
> -	   LOG_ERROR("ICU ambiguous alias warning for encoding: %s",
m_encodingName);
> -#endif
> +    m_converterICU = ucnv_open(m_canonicalConverterName, &err);
> +    ASSERT(U_SUCCESS(err));

If we do get an error, now we won’t see the error in the log, just that some
error occurred, but not which one. Not sure this is an improvement.

> Source/WebCore/platform/text/TextCodecICU.h:46
> +	   static PassOwnPtr<TextCodec> create(const TextEncoding&, const void*
canonicalConverterName);

Why const void* for the canonical converter name instead of const char*?