New hash template structure API design proposal

26 Jun 2005

      I've been planning for a while to make really good, high-performance  
generic HashSet and HashMap templates for WebCore, to replace the mix  
of QDict, QMap and hand-coded hashtables we have now.

Problems with the current situation are:

* QMap is too slow (algorithmically inefficient by design)
* The WebCore implementation of QDict is too slow, and non-portable
* QDict/QPtrDict is not nearly flexible enough
* Sprinkling the code with hand-coded hashtables sucks
* Even the forthcoming Qt 4.0 QHash will not cut it - it does not  
allow making multiple hashtables keyed by the same type but with  
different hash and equality functions (important for things like case  
insensitive string tables, or having pointer hashes with some other  
concept of equality than identity), and it seems likely to be  
inefficient for pointer types.

Here's my suggested replacement API. I would use these HashSet and  
HashMap templates for all situations in WebCore that require any kind  
of set or dictionary data structure, except when other types are  
required for classes that are part of the kde khtml API.

I've already started on this and have a HashSet that vaguely  
approximates this API.

One important point that is not clear just from the headers is that  
these classes have shared value semantics with refcounting and no  
copy-on-write.

// The defaultHash and defaultEqual template functions. would be  
specialized for all the interesting types.
template<typename _Key, unsigned _Hash(const _Key&) = defaultHash,  
bool _Equal(const _Key&, const _Key&) = defaultEqual>
class HashSet {
  public:
     typedef typename _Key KeyType;

     // all the obvious iterator operations, ++, --, *, ->, etc; all  
iterators are const since it would
     // almost certainly break the hashtable to replace a key in place
     class Iterator;

     HashSet();

     int size() const;
     bool isEmpty() const;

     Iterator begin() const;
     Iterator end() const;

     // If an item equal to "key" is already present in the table, it  
is left in and not replaced
     Iterator insert(const KeyType& key);

     // a special version of insert() that finds the object by  
hashing and comparing with some other type,
     // to avoid the cost of type conversion if the object is already  
in the table - useful for making efficient
     // unique object factories
     template<typename T, unsigned HashT(const T&), bool EqualT(const  
KeyType&, const T&), KeyType ConvertT(const T&, unsigned)> Iterator  
insert(const T& key);

     Iterator find(const KeyType& key) const;
     bool contains(const KeyType& key) const;
     void remove(const KeyType& key);
     void clear();
}

template<typename _Key, typename _Value, unsigned _Hash(const _Key&)  
= defaultHash, bool _Equal(const _Key&, const _Key&) = defaultEqual>
class HashMap {
  public:
     typedef typename _Key KeyType;
     typedef typename _Value ValueType;

     // ?? should it iterate keys, values, key/value pairs? The last  
would be good but then we'd have to define
     // a pair type (or bite the bullet and use std::pair)
     class Iterator;

     HashMap();

     int size() const;
     bool isEmpty() const;

     Iterator begin() const;
     Iterator end() const;

     // If an item equal to "key" is already present in the table, it  
is left in and not replaced
     Iterator insert(const KeyType &key, const ValueType &value);

     // ?? is the template version of insert ever needed for a map as  
opposed to a set?

     const ValueType& operator[](const KeyType&) const;
     Iterator find(const KeyType&) const;
     void remove(const KeyType&);
     void clear();
}

Maciej Stachowiak

David D. Kilzer

Maciej Stachowiak

tags

participants (2)