[webkit-reviews] review denied: [Bug 107459] WebKit should support decoding multi-byte entities in XML content : [Attachment 183800] Patch

Mon Jan 21 17:07:58 PST 2013

Darin Adler <darin at apple.com> has denied Martin Robinson
<mrobinson at webkit.org>'s request for review:
Bug 107459: WebKit should support decoding multi-byte entities in XML content
https://bugs.webkit.org/show_bug.cgi?id=107459

Attachment 183800: Patch
https://bugs.webkit.org/attachment.cgi?id=183800&action=review

------- Additional Comments from Darin Adler <darin at apple.com>
View in context: https://bugs.webkit.org/attachment.cgi?id=183800&action=review

Code changes looks like it will function OK, but it’s not using String and
CString the way they should be. It’s worth it to spend a little more time on
this to get rid of all the needless memory allocation.

> Source/WebCore/html/parser/HTMLEntityParser.cpp:153
> +static void appendUChar32ToString(UChar32 value, String& string)
> +{
> +    if (U_IS_BMP(value)) {
> +	   UChar character = static_cast<UChar>(value);
> +	   ASSERT(character == value);
> +	   string.append(character);
> +	   return;
> +    }
> +    string.append(U16_LEAD(value));
> +    string.append(U16_TRAIL(value));
> +}

This has terrible performance! We’ll allocate a new StringImpl for each call to
String::append. We should not have code like this anywhere in WebKit. When we
need to build up a string we need to use StringBuilder or a Vector. If the
string is always short and fixed size then we can use a local UChar array and
construct a string from it.

> Source/WebCore/html/parser/HTMLEntityParser.h:37
> +String decodeNamedEntity(const char*);

Returning a String from this function is not a good idea for performance; we
don’t want to do memory allocation here. Since this string can only be a few
characters long, then I suggest using a fixed size buffer of some sort instead.

> Source/WebCore/xml/parser/XMLDocumentParserLibxml2.cpp:1184
> -    UChar c = decodeNamedEntity(reinterpret_cast<const char*>(name));
> -    if (!c)
> +    String decodedEntity = decodeNamedEntity(reinterpret_cast<const
char*>(name));
> +    if (decodedEntity.isNull())
>	   return 0;
>  
> -    CString value = String(&c, 1).utf8();
> -    ASSERT(value.length() < 5);
> +    CString value = decodedEntity.utf8();
> +    ASSERT(value.length() < 9);

For both the old and new code, doing this memory allocation is bad for
performance. We can convert from UTF-16 to UTF-8 without creating String and
CString objects, which result in memory allocation. We should use some short
fixed size arrays for this work.