[Webkit-unassigned] [Bug 67022] New: Investigate how to pass 8-bit characters to ICU

bugzilla-daemon at webkit.org bugzilla-daemon at webkit.org
Fri Aug 26 00:16:18 PDT 2011


https://bugs.webkit.org/show_bug.cgi?id=67022

           Summary: Investigate how to pass 8-bit characters to ICU
           Product: WebKit
           Version: 528+ (Nightly build)
          Platform: Unspecified
        OS/Version: Unspecified
            Status: UNCONFIRMED
          Severity: Normal
          Priority: P2
         Component: Web Template Framework
        AssignedTo: webkit-unassigned at lists.webkit.org
        ReportedBy: wangxianzhu at chromium.org
                CC: sullivan at chromium.org


ICU is a big category of the final usages of const UChar* on certain platforms (see https://docs.google.com/a/google.com/spreadsheet/ccc?key=0AsX8ECAuTqEzdEVFYnZaUlZ3QzJCbnBKamwxdGMxVkE&hl=en_US&ndplr=1#gid=10 for details). To efficiently support 8/16-bit string buffers, we should make sure we can pass the strings to ICU without conversion between 8/16-bit formats in most cases.

For now WebKit passes const UChar* parameters to the following components of ICU:

* format: accepts only 16-bit strings. Only used from NumberInputType -> LocalizedNumberICU to parse and format numbers which seem not performance critical.

* ubrk: from 3.4, ICU added ubrk_setUText which accepts an abstract UText parameter to support alternative string formats. We can have our own UText callbacks to support the new 8/16-bit string buffers. We should make sure if the version is available on each platform. We also need to measure the performance of UText concerning the cost of function callbacks.

* ucnv: because we store only ASCII characters in 8-bit string, we can simple take different branch for 8- and 16-bit strings. For 8-bit strings, we simply calls the ASCII conversion functions which accept const char* parameters.

* ucol: we can pass UCharIterator to ICU to support 8/16-bit strings. Also need to measure the performance concerning the cost of function callbacks.

* unorm: When calling unorm_*(), UNORM_NFC is the only mode used by WebKit. In this mode, unorm_*() has no effect on ASCII strings, so we can just test 8/16-bit format and call unorm_*() for only 16-bit strings.

* usearch: accepts only 16-bit strings. The only place in WebKit calling usearch_*() is WebCore/editing/TextIterator.cpp.

* uset: uset_openPattern() accepts only 16-bit strings, but this is not a problem because it’s only called twice in each WebKit process.

-- 
Configure bugmail: https://bugs.webkit.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.


More information about the webkit-unassigned mailing list