New hash template structure API design proposal
I've been planning for a while to make really good, high-performance generic HashSet and HashMap templates for WebCore, to replace the mix of QDict, QMap and hand-coded hashtables we have now. Problems with the current situation are: * QMap is too slow (algorithmically inefficient by design) * The WebCore implementation of QDict is too slow, and non-portable * QDict/QPtrDict is not nearly flexible enough * Sprinkling the code with hand-coded hashtables sucks * Even the forthcoming Qt 4.0 QHash will not cut it - it does not allow making multiple hashtables keyed by the same type but with different hash and equality functions (important for things like case insensitive string tables, or having pointer hashes with some other concept of equality than identity), and it seems likely to be inefficient for pointer types. Here's my suggested replacement API. I would use these HashSet and HashMap templates for all situations in WebCore that require any kind of set or dictionary data structure, except when other types are required for classes that are part of the kde khtml API. I've already started on this and have a HashSet that vaguely approximates this API. One important point that is not clear just from the headers is that these classes have shared value semantics with refcounting and no copy-on-write. // The defaultHash and defaultEqual template functions. would be specialized for all the interesting types. template<typename _Key, unsigned _Hash(const _Key&) = defaultHash, bool _Equal(const _Key&, const _Key&) = defaultEqual> class HashSet { public: typedef typename _Key KeyType; // all the obvious iterator operations, ++, --, *, ->, etc; all iterators are const since it would // almost certainly break the hashtable to replace a key in place class Iterator; HashSet(); int size() const; bool isEmpty() const; Iterator begin() const; Iterator end() const; // If an item equal to "key" is already present in the table, it is left in and not replaced Iterator insert(const KeyType& key); // a special version of insert() that finds the object by hashing and comparing with some other type, // to avoid the cost of type conversion if the object is already in the table - useful for making efficient // unique object factories template<typename T, unsigned HashT(const T&), bool EqualT(const KeyType&, const T&), KeyType ConvertT(const T&, unsigned)> Iterator insert(const T& key); Iterator find(const KeyType& key) const; bool contains(const KeyType& key) const; void remove(const KeyType& key); void clear(); } template<typename _Key, typename _Value, unsigned _Hash(const _Key&) = defaultHash, bool _Equal(const _Key&, const _Key&) = defaultEqual> class HashMap { public: typedef typename _Key KeyType; typedef typename _Value ValueType; // ?? should it iterate keys, values, key/value pairs? The last would be good but then we'd have to define // a pair type (or bite the bullet and use std::pair) class Iterator; HashMap(); int size() const; bool isEmpty() const; Iterator begin() const; Iterator end() const; // If an item equal to "key" is already present in the table, it is left in and not replaced Iterator insert(const KeyType &key, const ValueType &value); // ?? is the template version of insert ever needed for a map as opposed to a set? const ValueType& operator[](const KeyType&) const; Iterator find(const KeyType&) const; void remove(const KeyType&); void clear(); }
Google recently released a SparseHash project to SourceForge that's implemented in C++ with a BSD license. "An extremely memory-efficient hash_map implementation. 2 bits/entry overhead! The SparseHash library contains several hash-map implementations, including implementations that optimize for space or speed." http://sourceforge.net/projects/goog-sparsehash/ It's still in beta, though, at version 0.2. Dave On Jun 26, 2005, at 1:06 AM, Maciej Stachowiak wrote:
I've been planning for a while to make really good, high- performance generic HashSet and HashMap templates for WebCore, to replace the mix of QDict, QMap and hand-coded hashtables we have now.
Problems with the current situation are:
* QMap is too slow (algorithmically inefficient by design) * The WebCore implementation of QDict is too slow, and non-portable * QDict/QPtrDict is not nearly flexible enough * Sprinkling the code with hand-coded hashtables sucks * Even the forthcoming Qt 4.0 QHash will not cut it - it does not allow making multiple hashtables keyed by the same type but with different hash and equality functions (important for things like case insensitive string tables, or having pointer hashes with some other concept of equality than identity), and it seems likely to be inefficient for pointer types.
[...]
On Jun 26, 2005, at 5:22 AM, David D. Kilzer wrote:
Google recently released a SparseHash project to SourceForge that's implemented in C++ with a BSD license.
"An extremely memory-efficient hash_map implementation. 2 bits/entry overhead! The SparseHash library contains several hash-map implementations, including implementations that optimize for space or speed."
Cool, I'm looking at it. Thanks for the pointer! It looks like their "dense_hash_map" uses the same design I was planning, storing the values inline and requiring distinguished values for empty and deleted buckets. Most C++ hashtable templates (including the gcc implementation of hash_map) use separately allocated nodes for the data. This makes it easier to put more of the implementation in non-template code without losing a lot of speed, but it is much slower, since you must allocate for every insertion. sparse_hash_map is very interesting and could be useful in places where saving memory is more important than access speed. I will add, however, that the included hash functions are pretty weak. Regards, Maciej
participants (2)
-
David D. Kilzer
-
Maciej Stachowiak