[Webkit-unassigned] [Bug 15914] [GTK] Implement Unicode functionality using GLib

Thu Jan 15 10:26:01 PST 2009

https://bugs.webkit.org/show_bug.cgi?id=15914

darin at apple.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #26711|review?                     |review-
               Flag|                            |

------- Comment #62 from darin at apple.com  2009-01-15 10:26 PDT -------
(From update of attachment 26711)
> Index: JavaScriptCore/wtf/unicode/glib/CasefoldTableFromGLib.h
> ===================================================================
> --- JavaScriptCore/wtf/unicode/glib/CasefoldTableFromGLib.h	(revision 0)
> +++ JavaScriptCore/wtf/unicode/glib/CasefoldTableFromGLib.h	(revision 0)
> @@ -0,0 +1,200 @@
> +/*
> + *  Copyright (C) 2006 George Staikos <staikos at kde.org>
> + *  Copyright (C) 2006 Alexey Proskuryakov <ap at nypop.com>
> + *  Copyright (C) 2007 Apple Computer, Inc. All rights reserved.
> + *  Copyright (C) 2008 Jürg Billeter <j at bitron.ch>
> + *  Copyright (C) 2008 Dominik Röttsches <dominik.roettsches at access-company.com>

Is this file really needed?

I don't think the copyrights are right on this file.

> +} casefold_table[] = {

This is not the normal naming for identifiers in WebKit. This would typically
be inside the WTF::Unicode namespace and be named something like caseFoldTable
or CaseFoldTable.

> +#include "CasefoldTableFromGLib.h"

Do we really need to compile in a copy of the table into each source file? That
seems like a very bad idea. Since the case folding table is declared static, it
will have internal linkage so each source file that uses this header will get a
separate copy of the table!

> +        inline UChar32 foldCase(UChar32 ch)

These functions are pretty long. I don't think it makes sense to inline them.

> +        inline int umemcasecmp(const UChar* a, const UChar* b, int len)
> +        {
> +            GOwnPtr<char> utf8a;
> +            GOwnPtr<char> utf8b;
> +
> +            utf8a.set(g_utf16_to_utf8(a, len, 0, 0, 0));
> +            utf8b.set(g_utf16_to_utf8(b, len, 0, 0, 0));
> +
> +            GOwnPtr<char> foldedA;
> +            GOwnPtr<char> foldedB;
> +
> +            foldedA.set(g_utf8_casefold(utf8a.get(), -1));
> +            foldedB.set(g_utf8_casefold(utf8b.get(), -1));
> +
> +            int result = g_utf8_collate(foldedA.get(), foldedB.get());
> +
> +            return result;
> +        }

Converting to UTF-8 just to do the case folding is going to be very
inefficient. This is going to result in a quite-slow GTK port if it's used
anywhere.

In general we had to work hard to eliminate the conversion from UTF-16 and
UTF-8 and back that used to be present in WebKit and JavaScriptCore in its core
algorithms, and move that conversion to the external API. I think you need to
do some performance tests if you're going to introduce memory allocation and
UTF-8 conversion into these algorithms, unless it's OK to have the performance
of the GTK port suffer.

> +#ifndef UnicodeGLibTypes_h
> +#define UnicodeGLibTypes_h
> +
> +typedef uint16_t UChar;
> +typedef int32_t UChar32;
> +
> +#endif

Seems a little excessive to have a separate header file for these, but it might
be OK.

I'm going to say review- because of some of the issues with the case folding
table and the performance of the UTF-8 conversion.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.