[webkit-dev] compact ICU unicode

Fri Jun 14 13:26:12 PDT 2013

Thanks Glenn for the feedback.  I rather like how Torch Mobile slimmed it down back in '09 for porting to WinCE, but if the primary concern is architectural purity / long term maintainability -- that argument makes sense and I can see why we don't want to muddy WTF with it.

At the same time, ICU IS huge and contains way more functionality than WebKit uses ... perhaps it can be slimmed down sufficiently following standard procedures (http://userguide.icu-project.org/packaging) and have it still work.  If anyone has had success doing this for WebKit and wouldn't mind sharing what they did to tune it down, I'd appreciate hearing about it.  The way it's built for windows (and shipped with WebKitLibraries.zip) it is rather large - larger than WebKit.dll.

Thomas is right about WebKit growing substantially - but I notice recently it has shrunk a few MB.  (I presume due to pulling out huge features like memory inspector / shadow DOM that are not unmaintained...)

Mark

-----Original Message-----
From: Thomas Fletcher [mailto:thomas at cranksoftware.com] 
Sent: Thursday, June 13, 2013 7:38 PM
To: Glenn Adams
Cc: Salisbury, Mark; WebKit Development (webkit-dev at lists.webkit.org)
Subject: Re: [webkit-dev] compact ICU unicode

What if we created a new project, based off of ICU called lilICU .. 
would the WebKit community then accept an alternative binding to this new library?

Not to split hairs, but that is essentially what it seems that we would have to do, create a new library, before the WebKit community becomes interested in ways of trimming down WebKit for embedded devices where the resource impositions of dependent libraries are significant.

Having worked at porting WebKit to a variety of embedded platforms over the last five years (most of the work non-recontributable due to lack of interest in esoteric and non-mainstream platforms) the size of a typical WebKit build has grown significantly while the number of tuning options has decreased.

Thanks,
  Thomas

Glenn Adams wrote:
>
>
>
> On Sat, Jun 8, 2013 at 3:15 AM, Salisbury, Mark <mark.salisbury at hp.com 
> <mailto:mark.salisbury at hp.com>> wrote:
>
>     Hello,
>
>     What would people think about including specific ICU data tables in
>     WTF in order to provide a lightweight (but functional) unicode
>     implementation?
>
>
> FWIW, I'd suggest you port ICU to your platform or if the size is too 
> large, port the portion of it that WK uses, and then use that portion.
> However, I think the ICU library or even a subset should NOT be added 
> to WTF.
>
>
>     On embedded systems the size of ICU is prohibitive.  Determining the
>     right way to package it to make it small enough isn't simple either.
>
>     A patch was reviewed once that attempted to add ICU data tables
>     directly in WTF and there were two concerns:
>     1) Checking in generated files
>     (https://bugs.webkit.org/show_bug.cgi?id=27305#c8)
>     2) Questions concerning if the ICU license is compatible with
>     WebCore (https://bugs.webkit.org/show_bug.cgi?id=27305#c9)
>
>     I believe the patch could be done differently as to not check in
>     generated files.  Regarding the second concern, ICU has a very
>     permissive license
>     (http://www.icu-project.org/repos/icu/icu/trunk/license.html).
>       There are three requirements, basically that the copyright and
>     permission notice has to appear with copies of the software.  I
>     believe that is already a requirement for distributions of webkit
>     that use ICU.  Except for WChar unicode, I believe all webkit builds
>     now use ICU Unicode.
>
>     This Unicode path could replace WCHAR_UNICODE or be introduced as a
>     third option, call it what you like - BASIC_ICU_UNICODE,
>     ICU_LITE_UNICODE, COMPACT_ICU_UNICODE, etc..  I think it might be
>     valuable for other ports that are size conscious - the up and coming
>     NIX port comes to mind.
>
>     Thanks,
>     Mark
>
>     Background:
>     After rebasing my WinCE port of webkit, I ran into an ASSERT in
>     WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp,
>     acquireLineBreakIterator().  I thought I'd be able to easily fix
>     this, since I had already modified how LineBreakIterator works to
>     take prior context into account (on my own branch) and find line
>     break in a stream of non-ASCII characters.
>
>     However, the WCHAR Unicode implementation is very bare bones and
>     does not even support returning the Unicode character category
>     (http://trac.webkit.org/browser/trunk/Source/WTF/wtf/unicode/wchar/UnicodeWchar.cpp#L35).
>       WCHAR Unicode was originally called WinCE Unicode, then it was
>     properly renamed as it had nothing to do with WinCE.
>
>     WinCE Unicode originally came in here:
>     https://bugs.webkit.org/show_bug.cgi?id=27305.  The reason it was
>     introduced was to save space (filesystem and RAM).  ICU, if not
>     packaged very carefully
>     (http://userguide.icu-project.org/packaging), is actually larger
>     than webkit itself.  On embedded systems, this is a big deal.  The
>     original plan with the bug above was to include specific ICU data
>     tables in webkit.
>
>     I've been compiling WTF with Unicode tables embedded for some time
>     now.  I don't believe I've seen many layout test regressions due to
>     using a simplified ICU implementation.
>
>
>     _______________________________________________
>     webkit-dev mailing list
>     webkit-dev at lists.webkit.org <mailto:webkit-dev at lists.webkit.org>
>     https://lists.webkit.org/mailman/listinfo/webkit-dev
>
>
> _______________________________________________
> webkit-dev mailing list
> webkit-dev at lists.webkit.org
> https://lists.webkit.org/mailman/listinfo/webkit-dev