[Webkit-unassigned] [Bug 15914] [GTK] Implement Unicode functionality using GLib

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri Jan 16 07:46:08 PST 2009


https://bugs.webkit.org/show_bug.cgi?id=15914


dominik.roettsches at access-company.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #26711|0                           |1
        is obsolete|                            |
  Attachment #26793|                            |review?
               Flag|                            |




------- Comment #63 from dominik.roettsches at access-company.com  2009-01-16 07:46 PDT -------
Created an attachment (id=26793)
 --> (https://bugs.webkit.org/attachment.cgi?id=26793&action=view)
1/4 - Moving WTF Unicode backend to GLib (v2)

(In reply to comment #62)

Darin, thanks for taking the time to review the patch. Here's another iteration
that tries to address your comments. 

> (From update of attachment 26711 [review])
> > Index: JavaScriptCore/wtf/unicode/glib/CasefoldTableFromGLib.h
> > ===================================================================
> > [...]
> 
> Is this file really needed?

I got rid of it at the price of a single-character ucs4->utf8->casefold->ucs4
conversion. See below regarding the conversion costs.

> > +} casefold_table[] = {
> This is not the normal naming for identifiers in WebKit. This would typically
> be inside the WTF::Unicode namespace and be named something like caseFoldTable
> or CaseFoldTable.

gone.

> > +#include "CasefoldTableFromGLib.h"
> Do we really need to compile in a copy of the table into each source file? 

gone as well.

> > +        inline UChar32 foldCase(UChar32 ch)
> 
> These functions are pretty long. I don't think it makes sense to inline them.

I disabled inlining for the longer ones and moved them to a new file
UnicodeGLib.cpp

> > +        inline int umemcasecmp(const UChar* a, const UChar* b, int len)

> Converting to UTF-8 just to do the case folding is going to be very
> inefficient. This is going to result in a quite-slow GTK port if it's used
> anywhere.
> 
> In general we had to work hard to eliminate the conversion from UTF-16 and
> UTF-8 and back that used to be present in WebKit and JavaScriptCore in its core
> algorithms, and move that conversion to the external API. I think you need to
> do some performance tests if you're going to introduce memory allocation and
> UTF-8 conversion into these algorithms, unless it's OK to have the performance
> of the GTK port suffer.

Looking at the public GLib Unicode API
(http://library.gnome.org/devel/glib/stable/glib-Unicode-Manipulation.html) it
seems difficult to call the more interesting unicode functionality with utf-16
strings directly - casefolding, collation, normalization all require utf-8.
Please see below for my my view regarding these costs.

> > +#ifndef UnicodeGLibTypes_h
> > +#define UnicodeGLibTypes_h
> > +
> > +typedef uint16_t UChar;
> > +typedef int32_t UChar32;
> > +
> > +#endif
> 
> Seems a little excessive to have a separate header file for these, but it might
> be OK.

Pulled that into UnicodeGLib.h

Also added a FIXME in umemcasecmp that discusses the discrepancy to the icu
implementation.

> I'm going to say review- because of some of the issues with the case folding
> table and the performance of the UTF-8 conversion.

While I am aware it's probably not a very appropriate benchmark, for a first
and very rough idea, I compared the ICU build against the GLib build running
sunspider. You can see a perfomance regression of 2-3% in most of the string
tests. I will attach these results.

My understanding is that this patch helps to reduce ROM footprint when
deploying on an embedded platform by eliminating the ICU dependency, saving 
approximately 10MB (as Alp reports). Currently, the glib backend is optional,
icu would remain default. So in my opinion, the benefit of this patch is that
it gives a choice to integrators between sacrificing a little performance in
favor of the binary reduction or package ICU onto their targets. Future changes
in GLib could lead to removing the utf-16 to utf-8 conversions eventually.


-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the webkit-unassigned mailing list