A TR1 Tutorial: Unordered Containers

One of the important additions to the C++ standard in TR1 is the hash tables. There are four such containers: unordered_map and unordered_multimap defined in header <unordered_map> and unordered_set and unordered_multiset defined in header <unordered_set>. Moreover, header <functional> contains a new template class called hash that is the implementation for the default hashing function for these containers.

About STL Containers

Until TR1, there were four types of containers in STL:

  • Containers: A container is an object that stores a sequence of other objects; such a container must:
    • Have a default constructor, a copy constructor, a destructor (that destroys all the objects in the container), and an assignment operator (that copies all the objects from one container to the other)
    • Provide access to the stored objects (elements), both for constant and non-constant container objects (begin() and end())
    • Overload comparison operators (==, !=, <, >, <=, and >=)
    • Provide functions for retrieving the size of the container (size(), max_size(), and si empty())
    • Allow swapping elements between two containers (swap())
  • Reverse containers: Those containers that, in addition to regular containers requirements, allow accessing the elements in reverse order (rbegin() and rend())
  • Sequence containers: These are containers with variable size, with elements arranged in a strict linear order, allowing inserting and removing elements
  • Associative containers: These are containers with variable size that have efficient elements access based on keys (associated with stored values). The difference between an associative container and a sequence container is that the former does not allow insertion of elements at a specified position.

In TR1, a fifth type of containers was added:

  • Unordered containers: (or hash tables) These meet all the requirements for a container except that they don’t implement the six comparison operators. They provide two new typenames:
    • key_type: Gives the type of the key
    • key_equal: Gives the type of the predicate used to compare two keys

A hash table groups the stored objects in sequences called “buckets,” each such bucket having zero or more elements. Inserting an element in a bucket is done based on the hashing value for the key associated with the element. In the case of Microsoft implementation for TR1, the hash tables have 8 buckets by default, with a loading factor of 4. This means that on average, buckets can store up to 4 elements. (There could be buckets with 20 elements and others with 0; only the average is important.) When a new element is added, and the loading factor becomes greater than the maximum allowed loading factor, a new bucket is added to the hash table.

To add a new element, the hashing value of the associated key is computed, and then the value is trimmed with modulo with the number of existing buckets and the result is the (zero-based) index of the bucket in which the object is stored. Time of insertion for an element is O(1); it does not depend how many elements are stored in the container or the bucket.

When an element is looked up, the same steps are taken to identify the bucket. The standard find() function is applied to localize the element within the bucket. This means the look-up time is proportional with the number of elements in the bucket. For a good performance, the number of elements in a bucket should be kept at a reasonable limit, and the way to do it is by adding more buckets to the container.

Unordered containers (hash tables) can be parameterized in several ways:

  • At compile time, by specifying the hashing function and comparison functions (for comparing two keys)
  • At runtime, by adjusting the maximum load factor for the buckets (which is an average on all buckets).

Examples on how the buckets change

In this section, I will show the buckets’ number and their content change when adding new elements to an unordered_map. I must say it again, when adding elements, the value is not important for the insertion point, only the key.

  • Initially, the container is empty
  • buckets: 8
    max load factor: 4
    load factor: 0
    
    bucket 0[0]
    bucket 1[0]
    bucket 2[0]
    bucket 3[0]
    bucket 4[0]
    bucket 5[0]
    bucket 6[0]
    bucket 7[0]
    
  • Add (1, 2); hash value is 16807, and to find the bucket index, you apply a modulo 8 (number of buckets), and the result is 7.
  • hash(1) = 16807
    bucket index = 16807 % 8 = 7
    
    buckets: 8
    max load factor: 4
    load factor: 0.125
    
    bucket 0[0]
    bucket 1[0]
    bucket 2[0]
    bucket 3[0]
    bucket 4[0]
    bucket 5[0]
    bucket 6[0]
    bucket 7[1]     [1,2]
    
  • Add (2, 3)
  • hash(2) = 33614
    bucket index = 33614 % 8 = 6
    
    buckets: 8
    max load factor: 4
    load factor: 0.25
    
    bucket 0[0]
    bucket 1[0]
    bucket 2[0]
    bucket 3[0]
    bucket 4[0]
    bucket 5[0]
    bucket 6[1]     [2,3]
    bucket 7[1]     [1,2]
    
  • Add (55,100)
  • hash(55) = 924385
    bucket index = 924385 % 8 = 1
    
    buckets: 8
    max load factor: 4
    load factor: 0.375
    
    bucket 0[0]
    bucket 1[1]     [55,100]
    bucket 2[0]
    bucket 3[0]
    bucket 4[0]
    bucket 5[0]
    bucket 6[1]     [2,3]
    bucket 7[1]     [1,2]
    
  • Add (9,0);
  • hash(9) = 151263
    bucket index = 151263 % 8 = 7
    
    buckets: 8
    max load factor: 4
    load factor: 0.5
    
    bucket 0[0]
    bucket 1[1]     [55,100]
    bucket 2[0]
    bucket 3[0]
    bucket 4[0]
    bucket 5[0]
    bucket 6[1]     [2,3]
    bucket 7[2]     [9,0] [1,2]
    
  • Add (17,0);
  • hash(17) = 285719
    bucket index = 285719 % 8 = 7
    
    buckets: 8
    max load factor: 4
    load factor: 0.625
    
    bucket 0[0]
    bucket 1[1]     [55,100]
    bucket 2[0]
    bucket 3[0]
    bucket 4[0]
    bucket 5[0]
    bucket 6[1]     [2,3]
    bucket 7[3]     [17,0] [9,0] [1,2]
    
  • Add (25,0). At this point the 8th bucket (with index 7) would have 4 elements, but the loading factor is only 0.75, which means you can still add new elements without adding an additional bucket; the new element is added to the 8th bucket also.
  • hash(25) = 420175
    bucket index = 420175 % 8 = 7
    
    buckets: 8
    max load factor: 4
    load factor: 0.75
    
    bucket 0[0]
    bucket 1[1]     [55,100]
    bucket 2[0]
    bucket 3[0]
    bucket 4[0]
    bucket 5[0]
    bucket 6[1]     [2,3]
    bucket 7[4]     [25,0] [17,0] [9,0] [1,2]
    
  • Add (33,0); a fifth element was added to the last (8th) bucket
  • hash(33) = 554631
    bucket index = 554631 % 8 = 7
    
    buckets: 8
    max load factor: 4
    load factor: 0.875
    
    bucket 0[0]
    bucket 1[1]     [55,100]
    bucket 2[0]
    bucket 3[0]
    bucket 4[0]
    bucket 5[0]
    bucket 6[1]     [2,3]
    bucket 7[5]     [33,0] [25,0] [17,0] [9,0] [1,2]
    

To see how new buckets are added to the table, I will use a maximum loading factor of 0.2. That means that at the second insertion a new bucket should be added.

  • Initially, the container is empty
  • buckets: 8
    max load factor: 0.2
    load factor: 0
    
    bucket 0[0]
    bucket 1[0]
    bucket 2[0]
    bucket 3[0]
    bucket 4[0]
    bucket 5[0]
    bucket 6[0]
    bucket 7[0]
    
  • Add (1,1); this will be added to the 8th bucket (index 7)
  • hash(1) = 16807
    bucket index = 16807 % 8 = 7
    
    buckets: 8
    max load factor: 0.2
    load factor: 0.125
    
    bucket 0[0]
    bucket 1[0]
    bucket 2[0]
    bucket 3[0]
    bucket 4[0]
    bucket 5[0]
    bucket 6[0]
    bucket 7[1]     [1,1]
    
  • Add (2,2); new loading factor becomes 2/8 = 0.25, which is greater than 0.2. That means that, before actually inserting the new element, a new bucket must be added to the table. Only then the index of the bucket is computed and the element stored in the appropriate bucket.
  • hash(2) = 33614
    bucket index = 33614 % 8 = 6
    
    buckets: 9
    max load factor: 0.2
    load factor: 0.222222
    
    bucket 0[0]
    bucket 1[0]
    bucket 2[0]
    bucket 3[0]
    bucket 4[0]
    bucket 5[0]
    bucket 6[1]     [2,2]
    bucket 7[1]     [1,1]
    bucket 8[0]
    

More by Author

Must Read