Click to See Complete Forum and Search --> : Hash Join Algorithm


jackee1234
February 11th, 2008, 02:54 AM
I have two tables, table implements using List<int[]> structure.
The table has two columns(which mean int array is of size two)

List<int[]> tablea;
List<int[]> tableb;

Content of tablea
a b
{1,2}
{3,4}
{5,6}

content of tableb
b c
{2,3}
{2,4}
{6,7}

Now I wish to join two tables by field b, hence the result is
a b c
{1,2,3}
{1,2,4}
{5,6,7}

I know I can do this is sort merge join algorithms, and the algorithms is quite intuitive.
But I heard hybrid hashjoin(mentioned here http://en.wikipedia.org/wiki/Hash_join#_note-1) is faster. For my case, I am more concern on the speed rather than space.

May I know how to implement hybrid hashjoin algorithms to join tablea and tableb in C#(Preferably) or java(Preferably) or C++?

spoon!
February 11th, 2008, 05:36 AM
create a HashMap (hashtable) from Integer to Collection<Integer> (it could be some list or set), then put the entries for the second table into the hash table, mapping first column to second column (i.e. 2 -> 3, 2 -> 4, 6 -> 7). The reason we have a collection is that there may be more than one value per key. Then, iterate through the first table; for each row, look up the second column number in the hash table; then create new rows for each of the values that are associated with that key. If there are no results, we don't create any rows.