[webkit-reviews] review requested: [Bug 15914] [GTK] Implement Unicode functionality using GLib : [Attachment 26793] 1/4 - Moving WTF Unicode backend to GLib (v2)

Fri Jan 16 07:46:07 PST 2009

Dominik Röttsches <dominik.roettsches at access-company.com> has asked  for
review:
Bug 15914: [GTK] Implement Unicode functionality using GLib
https://bugs.webkit.org/show_bug.cgi?id=15914

Attachment 26793: 1/4 - Moving WTF Unicode backend to GLib (v2)
https://bugs.webkit.org/attachment.cgi?id=26793&action=review

------- Additional Comments from Dominik Röttsches
<dominik.roettsches at access-company.com>
(In reply to comment #62)

Darin, thanks for taking the time to review the patch. Here's another iteration
that tries to address your comments. 

> (From update of attachment 26711 [review])
> > Index: JavaScriptCore/wtf/unicode/glib/CasefoldTableFromGLib.h
> > ===================================================================
> > [...]
> 
> Is this file really needed?

I got rid of it at the price of a single-character ucs4->utf8->casefold->ucs4
conversion. See below regarding the conversion costs.

> > +} casefold_table[] = {
> This is not the normal naming for identifiers in WebKit. This would typically

> be inside the WTF::Unicode namespace and be named something like
caseFoldTable
> or CaseFoldTable.

gone.

> > +#include "CasefoldTableFromGLib.h"
> Do we really need to compile in a copy of the table into each source file? 

gone as well.

> > +	     inline UChar32 foldCase(UChar32 ch)
> 
> These functions are pretty long. I don't think it makes sense to inline them.

I disabled inlining for the longer ones and moved them to a new file
UnicodeGLib.cpp

> > +	     inline int umemcasecmp(const UChar* a, const UChar* b, int len)

> Converting to UTF-8 just to do the case folding is going to be very
> inefficient. This is going to result in a quite-slow GTK port if it's used
> anywhere.
> 
> In general we had to work hard to eliminate the conversion from UTF-16 and
> UTF-8 and back that used to be present in WebKit and JavaScriptCore in its
core
> algorithms, and move that conversion to the external API. I think you need to

> do some performance tests if you're going to introduce memory allocation and
> UTF-8 conversion into these algorithms, unless it's OK to have the
performance
> of the GTK port suffer.

Looking at the public GLib Unicode API
(http://library.gnome.org/devel/glib/stable/glib-Unicode-Manipulation.html) it
seems difficult to call the more interesting unicode functionality with utf-16
strings directly - casefolding, collation, normalization all require utf-8.
Please see below for my my view regarding these costs.

> > +#ifndef UnicodeGLibTypes_h
> > +#define UnicodeGLibTypes_h
> > +
> > +typedef uint16_t UChar;
> > +typedef int32_t UChar32;
> > +
> > +#endif
> 
> Seems a little excessive to have a separate header file for these, but it
might
> be OK.

Pulled that into UnicodeGLib.h

Also added a FIXME in umemcasecmp that discusses the discrepancy to the icu
implementation.

> I'm going to say review- because of some of the issues with the case folding
> table and the performance of the UTF-8 conversion.

While I am aware it's probably not a very appropriate benchmark, for a first
and very rough idea, I compared the ICU build against the GLib build running
sunspider. You can see a perfomance regression of 2-3% in most of the string
tests. I will attach these results.

My understanding is that this patch helps to reduce ROM footprint when
deploying on an embedded platform by eliminating the ICU dependency, saving 
approximately 10MB (as Alp reports). Currently, the glib backend is optional,
icu would remain default. So in my opinion, the benefit of this patch is that
it gives a choice to integrators between sacrificing a little performance in
favor of the binary reduction or package ICU onto their targets. Future changes
in GLib could lead to removing the utf-16 to utf-8 conversions eventually.