[webkit-dev] custom containers advantage over stl containers
Darin Adler
darin at apple.com
Mon Jan 7 13:19:22 PST 2008
On Jan 7, 2008, at 1:04 PM, Jakob Praher wrote:
> just out of curiosity, I would like to ask why you decided to
> implement your own containter structures, like Vector or HashTable/
> Map/Set ...
>
> What was your driving force?
We didn't make a blanket decision to implement our own container
objects. We decided separately about each one.
For HashMap and HashSet, there was no suitable standard library
version available. The details of the hash algorithms are carefully
tuned, and the way we can store a RefPtr in a hash table with minimal
overhead is as well. The hash-based collections from the standard C++
library are still not present in all the compilers we need to support,
and if they were I believe they'd be insufficient.
For Vector, one of the reasons was that WTF::Vector has a feature
where it uses the vector object itself to store an initial fixed
capacity. We use this in contexts where we have a variable sized
object but don't want to do any memory allocation unless it exceeds
the fixed size.
The standard C++ library std::vector and std::hash_map also rely on C+
+ exceptions, and our entire project works with a limited dialect of C+
+ that doesn't use RTTI or exceptions.
Note that we do use the standard C++ library functions such as
std::sort in a number of places.
> In addition why did you choose to make the string internal
> representation (UChar) 2 bytes wide? Isn't it that most web-sites
> are encoded in UTF-8/Latin1?
It's true that most websites are encoded in Latin-1 (although it's the
Windows variant with different meanings for 0x80-0x9F). And many
modern websites are encoded in UTF-8. Note, though, that those are two
different encodings; the internal coding couldn't be Latin-1 because
it can't cover all the Unicode characters. So the candidate for
internal encoding is UTF-8.
There are multiple reasons we chose UTF-16 over UTF-8.
One "reason" is that the KHTML code base was already using UTF-16 when
we started the WebKit project.
Another reason is that the JavaScript language gets at the DOM with
JavaScript strings, and all JavaScript string operations are defined
in terms of UTF-16 code units. If things were stored as UTF-8, they'd
have to be converted back and forth from UTF-16. Or we could change
JavaScript to use UTF-8, but then many JavaScript string operations
would require scanning from the beginning of the string to count
UTF-16 code units.
I'm sure the reasons I list here are not all the reasons for any of
these decisions.
The theme seems to be performance.
-- Darin
More information about the webkit-dev
mailing list