[webkit-dev] compact ICU unicode
Salisbury, Mark
mark.salisbury at hp.com
Fri Jun 7 12:15:36 PDT 2013
Hello,
What would people think about including specific ICU data tables in WTF in order to provide a lightweight (but functional) unicode implementation?
On embedded systems the size of ICU is prohibitive. Determining the right way to package it to make it small enough isn't simple either.
A patch was reviewed once that attempted to add ICU data tables directly in WTF and there were two concerns:
1) Checking in generated files (https://bugs.webkit.org/show_bug.cgi?id=27305#c8)
2) Questions concerning if the ICU license is compatible with WebCore (https://bugs.webkit.org/show_bug.cgi?id=27305#c9)
I believe the patch could be done differently as to not check in generated files. Regarding the second concern, ICU has a very permissive license (http://www.icu-project.org/repos/icu/icu/trunk/license.html). There are three requirements, basically that the copyright and permission notice has to appear with copies of the software. I believe that is already a requirement for distributions of webkit that use ICU. Except for WChar unicode, I believe all webkit builds now use ICU Unicode.
This Unicode path could replace WCHAR_UNICODE or be introduced as a third option, call it what you like - BASIC_ICU_UNICODE, ICU_LITE_UNICODE, COMPACT_ICU_UNICODE, etc.. I think it might be valuable for other ports that are size conscious - the up and coming NIX port comes to mind.
Thanks,
Mark
Background:
After rebasing my WinCE port of webkit, I ran into an ASSERT in WebCore/platform/text/wchar/TextBreakIteratorWchar.cpp, acquireLineBreakIterator(). I thought I'd be able to easily fix this, since I had already modified how LineBreakIterator works to take prior context into account (on my own branch) and find line break in a stream of non-ASCII characters.
However, the WCHAR Unicode implementation is very bare bones and does not even support returning the Unicode character category (http://trac.webkit.org/browser/trunk/Source/WTF/wtf/unicode/wchar/UnicodeWchar.cpp#L35). WCHAR Unicode was originally called WinCE Unicode, then it was properly renamed as it had nothing to do with WinCE.
WinCE Unicode originally came in here: https://bugs.webkit.org/show_bug.cgi?id=27305. The reason it was introduced was to save space (filesystem and RAM). ICU, if not packaged very carefully (http://userguide.icu-project.org/packaging), is actually larger than webkit itself. On embedded systems, this is a big deal. The original plan with the bug above was to include specific ICU data tables in webkit.
I've been compiling WTF with Unicode tables embedded for some time now. I don't believe I've seen many layout test regressions due to using a simplified ICU implementation.
More information about the webkit-dev
mailing list