[webkit-dev] Proposal: Use ICU in WebKit code

Sun Oct 6 12:42:22 PDT 2013

I think the question was about the performance impact of using UTF-16 as
an internal representation of characters.

The original claim was in effect that the encoding conversion to UTF-16
is so costly that it offsets any gain of doing codepoint operations on
UTF-16 instead of UTF-8.

It is a very strong claim because experiments so far have proven the
opposite. I think the statement against ICU/UTF16 needs to be backed by
experimental data.

Benjamin

On 10/6/13, 12:31 PM, Alp Toker wrote:
> Geoffrey, http://userguide.icu-project.org/conversion/converters says:
> 
> "Since ICU uses Unicode (UTF-16) internally, all converters convert
> between UTF-16 (with the endianness according to the current platform)
> and another encoding."
> 
> That said, I don't think it's a major concern because ICU works on byte
> streams. It's not like these strings will persist internally somewhere
> eating lots of memory.
> 
> From experience, the old WTF in-place converters found in WebKit
> "mobile" ports of past were way-buggy and probably only ever tested with
> ASCII. I'd say use ICU and don't look back :-)
> 
> Alp.
> 
> 
> On 06/10/2013 20:08, Geoffrey Garen wrote:
>>> There is an issue with ICU: it uses UTF16 as its internal representation, while most of the Web nowadays is UTF8. Therefore, page text goes through unnecessary encoding conversion, and takes more memory than in UTF8 (for most of languages). So it might be not a good development direction to tie up WebKit to ICU.
>> Is there a benchmark or website that can verify these claims?
>>
>> Thanks,
>> Geoff
>> _______________________________________________
>> webkit-dev mailing list
>> webkit-dev at lists.webkit.org
>> https://lists.webkit.org/mailman/listinfo/webkit-dev
>