[webkit-dev] compact ICU unicode

Thu Jun 13 18:31:34 PDT 2013

On Sat, Jun 8, 2013 at 3:15 AM, Salisbury, Mark <mark.salisbury at hp.com>wrote:

> Hello,
>
> What would people think about including specific ICU data tables in WTF in
> order to provide a lightweight (but functional) unicode implementation?
>

FWIW, I'd suggest you port ICU to your platform or if the size is too
large, port the portion of it that WK uses, and then use that portion.
However, I think the ICU library or even a subset should NOT be added to
WTF.

>
> On embedded systems the size of ICU is prohibitive.  Determining the right
> way to package it to make it small enough isn't simple either.
>
> A patch was reviewed once that attempted to add ICU data tables directly
> in WTF and there were two concerns:
> 1) Checking in generated files (
> https://bugs.webkit.org/show_bug.cgi?id=27305#c8)
> 2) Questions concerning if the ICU license is compatible with WebCore (
> https://bugs.webkit.org/show_bug.cgi?id=27305#c9)
>
> I believe the patch could be done differently as to not check in generated
> files.  Regarding the second concern, ICU has a very permissive license (
> http://www.icu-project.org/repos/icu/icu/trunk/license.html).  There are
> three requirements, basically that the copyright and permission notice has
> to appear with copies of the software.  I believe that is already a
> requirement for distributions of webkit that use ICU.  Except for WChar
> unicode, I believe all webkit builds now use ICU Unicode.
>
> This Unicode path could replace WCHAR_UNICODE or be introduced as a third
> option, call it what you like - BASIC_ICU_UNICODE, ICU_LITE_UNICODE,
> COMPACT_ICU_UNICODE, etc..  I think it might be valuable for other ports
> that are size conscious - the up and coming NIX port comes to mind.
>
> Thanks,
> Mark
>
> Background:
> After rebasing my WinCE port of webkit, I ran into an ASSERT in
> WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp,
> acquireLineBreakIterator().  I thought I'd be able to easily fix this,
> since I had already modified how LineBreakIterator works to take prior
> context into account (on my own branch) and find line break in a stream of
> non-ASCII characters.
>
> However, the WCHAR Unicode implementation is very bare bones and does not
> even support returning the Unicode character category (
> http://trac.webkit.org/browser/trunk/Source/WTF/wtf/unicode/wchar/UnicodeWchar.cpp#L35).
>  WCHAR Unicode was originally called WinCE Unicode, then it was properly
> renamed as it had nothing to do with WinCE.
>
> WinCE Unicode originally came in here:
> https://bugs.webkit.org/show_bug.cgi?id=27305.  The reason it was
> introduced was to save space (filesystem and RAM).  ICU, if not packaged
> very carefully (http://userguide.icu-project.org/packaging), is actually
> larger than webkit itself.  On embedded systems, this is a big deal.  The
> original plan with the bug above was to include specific ICU data tables in
> webkit.
>
> I've been compiling WTF with Unicode tables embedded for some time now.  I
> don't believe I've seen many layout test regressions due to using a
> simplified ICU implementation.
>
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20130614/0281ce78/attachment.html>