[webkit-dev] custom containers advantage over stl containers

Mon Jan 7 13:19:22 PST 2008

On Jan 7, 2008, at 1:04 PM, Jakob Praher wrote:

> just out of curiosity, I would like to ask why you decided to  
> implement your own containter structures, like Vector or HashTable/ 
> Map/Set ...
>
> What was your driving force?

We didn't make a blanket decision to implement our own container  
objects. We decided separately about each one.

For HashMap and HashSet, there was no suitable standard library  
version available. The details of the hash algorithms are carefully  
tuned, and the way we can store a RefPtr in a hash table with minimal  
overhead is as well. The hash-based collections from the standard C++  
library are still not present in all the compilers we need to support,  
and if they were I believe they'd be insufficient.

For Vector, one of the reasons was that WTF::Vector has a feature  
where it uses the vector object itself to store an initial fixed  
capacity. We use this in contexts where we have a variable sized  
object but don't want to do any memory allocation unless it exceeds  
the fixed size.

The standard C++ library std::vector and std::hash_map also rely on C+ 
+ exceptions, and our entire project works with a limited dialect of C+ 
+ that doesn't use RTTI or exceptions.

Note that we do use the standard C++ library functions such as  
std::sort in a number of places.

> In addition why did you choose to make the string internal  
> representation (UChar) 2 bytes wide? Isn't it that most web-sites  
> are encoded in UTF-8/Latin1?

It's true that most websites are encoded in Latin-1 (although it's the  
Windows variant with different meanings for 0x80-0x9F). And many  
modern websites are encoded in UTF-8. Note, though, that those are two  
different encodings; the internal coding couldn't be Latin-1 because  
it can't cover all the Unicode characters. So the candidate for  
internal encoding is UTF-8.

There are multiple reasons we chose UTF-16 over UTF-8.

One "reason" is that the KHTML code base was already using UTF-16 when  
we started the WebKit project.

Another reason is that the JavaScript language gets at the DOM with  
JavaScript strings, and all JavaScript string operations are defined  
in terms of UTF-16 code units. If things were stored as UTF-8, they'd  
have to be converted back and forth from UTF-16. Or we could change  
JavaScript to use UTF-8, but then many JavaScript string operations  
would require scanning from the beginning of the string to count  
UTF-16 code units.

I'm sure the reasons I list here are not all the reasons for any of  
these decisions.

The theme seems to be performance.

     -- Darin