[webkit-dev] 8 Bit Strings Turned On in JavaScriptCore

Michael Saboff msaboff at apple.com
Fri Nov 18 11:14:12 PST 2011

With the recently landed changes in <http://trac.webkit.org/changeset/100510> (and two subsequent fixes in <http://trac.webkit.org/changeset/100523> and <http://trac.webkit.org/changeset/100729>), strings in JavaScriptCore are stored internally in either 8 bit or 16 bit forms.  This is implemented in the StringImpl class and the classes based upon it like JSC::UString and WTF::String. Since most platforms have a signed "char" type while a few have an unsigned char type or char signedness is selectable via a compiler option, we added typedef unsigned char LChar in <wtf/unicode/Unicode.h>.

Changes to Using Strings

Although the UChar* characters() method for the various string classes still works, all new code should check what "flavor" a string is constructed by using the new is8Bit() method on the various string classes.  After determining the flavor, a call to either LChar* characters8() or UChar* characters16() as appropriate should be done to access the raw characters of a string.  The call to characters() on an 8 bit string will create a 16 bit buffer and convert the native 8 bit string, keeping the conversion for future use, before returning the 16bit result.  Obviously the expense of this conversion grows with a string's length and it increases the memory footprint beyond what was required by the original 16 bit string implementation.

The various string construction methods as well as Identifier constructors have been modified to create natively sized strings.  The JavaScriptCore lexers and parsers favor making 8 bit strings where possibly, even if the source text is 16 bit.  There are cases where parsing an 8 bit native source string will produce a 16 bit string, e.g. the string literal "abc\u1234".

Future Work

This change and it's prior dependent changes are not the end of the 8 bit string work.  In fact it should be seen as the foundation for the real 8 bit string work tuning JavaScriptCore and also in WebCore. The goal is to make WebCore's processing of text use appropriately sized strings. For Latin-1 based documents, string processing will be done using 8 bits except where string escapes require 16 bit strings. 

- Michael Saboff
msaboff at apple.com

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.webkit.org/pipermail/webkit-dev/attachments/20111118/362a6d5b/attachment.html>

More information about the webkit-dev mailing list