Click to See Complete Forum and Search --> : HashTable Question


johnnyICON
November 10th, 2004, 08:42 PM
How big should a hash table be?

I am implementing my own, and I am using Horner's Method to develop a hash code for anything that I enter in. I have roughly 50000 , but my professor told me that I'll only need to make the HashTable big enough to hold about 2500.

Also, I should mention that I am using Separate Chaining.

Yves M
November 11th, 2004, 06:14 PM
The optimal size of a hash table depends on the hash function, mostly. If you have N data items, then usually, the size of the hash table should be M = 4/3N for good performance. I.e. you want to have more space in the hash table than items. For very good hash functions, you go to a 1/1 ratio, but this is rare in the general case.

Most real hash containers contain some code do to dynamic reallocation if the ratio items/slots becomes too large. You could also use that in your code, but that means that insert operations will not be guaranteed amortized constant time anymore, because it could trigger a reallocation. Reallocation strategies vary, but the easiest is to just allocate another table with a bigger number of slots (smallest prime bigger than twice the original amount of slots is usually a decent choice) and insert the exisiting items one by one (since you need to recalculate the hash value for all items).

RoboTact
November 12th, 2004, 01:25 PM
It all depends on how many elements with the same hash value can you permit. Typically, hash value is random and probability of every value is the same. You want to have a hash map with MAXIMUM number of values with the same hash limited by predefined value. So, efficient value to estimate the hash efficiency is average number of elements for a given hash table size that reaches the given number of elements with the same hash. Alas, I couldn't estimate this value analytically in a short time (maybe it's really difficult), but at least for n element you need hash table of size n*n to keep only one element in table which is too expensive; for other maximum allowed numbers of elements with the same hash there is the table obtained by numirecal simulation for 2500 elements:2: 12712
3: 5321
4: 3004
5: 1962
6: 1450
7: 1135
8: 884
9: 761
10: 605
11: 524
12: 454
13: 409
14: 364
15: 335
16: 310
17: 283
18: 260So, you need a table of 13000 elements for the least number of elements in chain (it would give the same result on 1000000 elements table).

And here are the results for different number of elements.
Format: (number-of-elements): max-elements-with0same-hash:hash-size
(100): 3: 86 5: 37 7: 24 9: 17
(200): 3: 214 5: 88 7: 57 9: 38
(300): 3: 377 5: 146 7: 88 9: 62
(400): 3: 505 5: 204 7: 127 9: 88
(500): 3: 645 5: 265 7: 164 9: 109
(1000): 3: 1531 5: 690 7: 363 9: 252
(2000): 3: 4022 5: 1477 7: 880 9: 536
(3000): 3: 6661 5: 2499 7: 1316 9: 885
(4000): 3: 10096 5: 3617 7: 1848 9: 1237
(5000): 3: 13047 5: 4662 7: 2438 9: 1574
(10000): 5: 10454 7: 5461 9: 3414
(20000): 7: 12710 9: 7430
(40000): 7: 27482 9: 17227