[webkit-dev] compact ICU unicode

Thu Jun 13 02:02:01 PDT 2013

Hello all,

Any update on this topic ?
We are also very interested in this as we're using the Qt port on embedded devices.

Thanks,

Julien

-----Message d'origine-----
De : webkit-dev-bounces at lists.webkit.org [mailto:webkit-dev-bounces at lists.webkit.org] De la part de Salisbury, Mark
Envoyé : vendredi 7 juin 2013 21:16
À : WebKit Development (webkit-dev at lists.webkit.org)
Objet : [webkit-dev] compact ICU unicode

Hello,

What would people think about including specific ICU data tables in WTF in order to provide a lightweight (but functional) unicode implementation?

On embedded systems the size of ICU is prohibitive.  Determining the right way to package it to make it small enough isn't simple either.

A patch was reviewed once that attempted to add ICU data tables directly in WTF and there were two concerns:
1) Checking in generated files (https://bugs.webkit.org/show_bug.cgi?id=27305#c8)
2) Questions concerning if the ICU license is compatible with WebCore (https://bugs.webkit.org/show_bug.cgi?id=27305#c9)

I believe the patch could be done differently as to not check in generated files.  Regarding the second concern, ICU has a very permissive license (http://www.icu-project.org/repos/icu/icu/trunk/license.html).  There are three requirements, basically that the copyright and permission notice has to appear with copies of the software.  I believe that is already a requirement for distributions of webkit that use ICU.  Except for WChar unicode, I believe all webkit builds now use ICU Unicode.

This Unicode path could replace WCHAR_UNICODE or be introduced as a third option, call it what you like - BASIC_ICU_UNICODE, ICU_LITE_UNICODE, COMPACT_ICU_UNICODE, etc..  I think it might be valuable for other ports that are size conscious - the up and coming NIX port comes to mind.

Thanks,
Mark

Background:
After rebasing my WinCE port of webkit, I ran into an ASSERT in WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp, acquireLineBreakIterator().  I thought I'd be able to easily fix this, since I had already modified how LineBreakIterator works to take prior context into account (on my own branch) and find line break in a stream of non-ASCII characters.

However, the WCHAR Unicode implementation is very bare bones and does not even support returning the Unicode character category (http://trac.webkit.org/browser/trunk/Source/WTF/wtf/unicode/wchar/UnicodeWchar.cpp#L35).  WCHAR Unicode was originally called WinCE Unicode, then it was properly renamed as it had nothing to do with WinCE.

WinCE Unicode originally came in here:  https://bugs.webkit.org/show_bug.cgi?id=27305.  The reason it was introduced was to save space (filesystem and RAM).  ICU, if not packaged very carefully (http://userguide.icu-project.org/packaging), is actually larger than webkit itself.  On embedded systems, this is a big deal.  The original plan with the bug above was to include specific ICU data tables in webkit.

I've been compiling WTF with Unicode tables embedded for some time now.  I don't believe I've seen many layout test regressions due to using a simplified ICU implementation.

_______________________________________________
webkit-dev mailing list
webkit-dev at lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev

________________________________

This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postmaster at nds.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes.
To protect the environment please do not print this e-mail unless necessary.

An NDS Group Limited company. www.nds.com