One of the important additions to the C++ standard in TR1 is the hash tables. There are four such containers: unordered_map and unordered_multimap defined in header <unordered_map> and unordered_set and unordered_multiset defined in header <unordered_set>. Moreover, header <functional> contains a new template class called hash that is the implementation for the default hashing function for these containers.
About STL Containers
Until TR1, there were four types of containers in STL:
- Containers: A container is an object that stores a sequence of other objects; such a container must:
- Have a default constructor, a copy constructor, a destructor (that destroys all the objects in the container), and an assignment operator (that copies all the objects from one container to the other)
- Provide access to the stored objects (elements), both for constant and non-constant container objects (begin() and end())
- Overload comparison operators (==, !=, <, >, <=, and >=)
- Provide functions for retrieving the size of the container (size(), max_size(), and si empty())
- Allow swapping elements between two containers (swap())
- Reverse containers: Those containers that, in addition to regular containers requirements, allow accessing the elements in reverse order (rbegin() and rend())
- Sequence containers: These are containers with variable size, with elements arranged in a strict linear order, allowing inserting and removing elements
- Associative containers: These are containers with variable size that have efficient elements access based on keys (associated with stored values). The difference between an associative container and a sequence container is that the former does not allow insertion of elements at a specified position.
In TR1, a fifth type of containers was added:
- Unordered containers: (or hash tables) These meet all the requirements for a container except that they don’t implement the six comparison operators. They provide two new typenames:
- key_type: Gives the type of the key
- key_equal: Gives the type of the predicate used to compare two keys
A hash table groups the stored objects in sequences called “buckets,” each such bucket having zero or more elements. Inserting an element in a bucket is done based on the hashing value for the key associated with the element. In the case of Microsoft implementation for TR1, the hash tables have 8 buckets by default, with a loading factor of 4. This means that on average, buckets can store up to 4 elements. (There could be buckets with 20 elements and others with 0; only the average is important.) When a new element is added, and the loading factor becomes greater than the maximum allowed loading factor, a new bucket is added to the hash table.
To add a new element, the hashing value of the associated key is computed, and then the value is trimmed with modulo with the number of existing buckets and the result is the (zero-based) index of the bucket in which the object is stored. Time of insertion for an element is O(1); it does not depend how many elements are stored in the container or the bucket.
When an element is looked up, the same steps are taken to identify the bucket. The standard find() function is applied to localize the element within the bucket. This means the look-up time is proportional with the number of elements in the bucket. For a good performance, the number of elements in a bucket should be kept at a reasonable limit, and the way to do it is by adding more buckets to the container.
Unordered containers (hash tables) can be parameterized in several ways:
- At compile time, by specifying the hashing function and comparison functions (for comparing two keys)
- At runtime, by adjusting the maximum load factor for the buckets (which is an average on all buckets).
Examples on how the buckets change
In this section, I will show the buckets’ number and their content change when adding new elements to an unordered_map. I must say it again, when adding elements, the value is not important for the insertion point, only the key.
- Initially, the container is empty
buckets: 8 max load factor: 4 load factor: 0 bucket 0[0] bucket 1[0] bucket 2[0] bucket 3[0] bucket 4[0] bucket 5[0] bucket 6[0] bucket 7[0]
hash(1) = 16807 bucket index = 16807 % 8 = 7 buckets: 8 max load factor: 4 load factor: 0.125 bucket 0[0] bucket 1[0] bucket 2[0] bucket 3[0] bucket 4[0] bucket 5[0] bucket 6[0] bucket 7[1] [1,2]
hash(2) = 33614 bucket index = 33614 % 8 = 6 buckets: 8 max load factor: 4 load factor: 0.25 bucket 0[0] bucket 1[0] bucket 2[0] bucket 3[0] bucket 4[0] bucket 5[0] bucket 6[1] [2,3] bucket 7[1] [1,2]
hash(55) = 924385 bucket index = 924385 % 8 = 1 buckets: 8 max load factor: 4 load factor: 0.375 bucket 0[0] bucket 1[1] [55,100] bucket 2[0] bucket 3[0] bucket 4[0] bucket 5[0] bucket 6[1] [2,3] bucket 7[1] [1,2]
hash(9) = 151263 bucket index = 151263 % 8 = 7 buckets: 8 max load factor: 4 load factor: 0.5 bucket 0[0] bucket 1[1] [55,100] bucket 2[0] bucket 3[0] bucket 4[0] bucket 5[0] bucket 6[1] [2,3] bucket 7[2] [9,0] [1,2]
hash(17) = 285719 bucket index = 285719 % 8 = 7 buckets: 8 max load factor: 4 load factor: 0.625 bucket 0[0] bucket 1[1] [55,100] bucket 2[0] bucket 3[0] bucket 4[0] bucket 5[0] bucket 6[1] [2,3] bucket 7[3] [17,0] [9,0] [1,2]
hash(25) = 420175 bucket index = 420175 % 8 = 7 buckets: 8 max load factor: 4 load factor: 0.75 bucket 0[0] bucket 1[1] [55,100] bucket 2[0] bucket 3[0] bucket 4[0] bucket 5[0] bucket 6[1] [2,3] bucket 7[4] [25,0] [17,0] [9,0] [1,2]
hash(33) = 554631 bucket index = 554631 % 8 = 7 buckets: 8 max load factor: 4 load factor: 0.875 bucket 0[0] bucket 1[1] [55,100] bucket 2[0] bucket 3[0] bucket 4[0] bucket 5[0] bucket 6[1] [2,3] bucket 7[5] [33,0] [25,0] [17,0] [9,0] [1,2]
To see how new buckets are added to the table, I will use a maximum loading factor of 0.2. That means that at the second insertion a new bucket should be added.
- Initially, the container is empty
buckets: 8 max load factor: 0.2 load factor: 0 bucket 0[0] bucket 1[0] bucket 2[0] bucket 3[0] bucket 4[0] bucket 5[0] bucket 6[0] bucket 7[0]
hash(1) = 16807 bucket index = 16807 % 8 = 7 buckets: 8 max load factor: 0.2 load factor: 0.125 bucket 0[0] bucket 1[0] bucket 2[0] bucket 3[0] bucket 4[0] bucket 5[0] bucket 6[0] bucket 7[1] [1,1]
hash(2) = 33614 bucket index = 33614 % 8 = 6 buckets: 9 max load factor: 0.2 load factor: 0.222222 bucket 0[0] bucket 1[0] bucket 2[0] bucket 3[0] bucket 4[0] bucket 5[0] bucket 6[1] [2,2] bucket 7[1] [1,1] bucket 8[0]