Combinations in C++, Part 2

Introduction

There are four new algorithms present in this article. The first one is about speed optimization by using integers as array indexes (I’ll explain that later). The second one is about concurrent programming. (Now, with multi-core CPUs becoming mainstream in the near future, you can cut the time to find a lot of combinations by half and more.) The third and fourth ones deal with finding combinations with repetitions.

In my articles, I place a strong emphasis on explaining the techniques as clearly as possible. It is of utmost importance for you, the reader, to understand the explanations because once you have understand them, you need not know how I do it; you can implement it yourself, even in your favourite computer language. You can implement it yourself, iterative way? recursive way?, your way? Heck! your implementation could even be faster than mine. Who knows?

Source Code Version

For the source code changes, all the classes and functions fall under the stdcomb namespace and they are given the name “Combination Library” and a version (1.5.0), so that the users know which is the latest version because there are a few copies of source code floating around the Internet. Always get the latest version; it contains new features and/or bug fixes. You can always be sure that CodeGuru has the most up-to-date version.

Another Way to Find the Combinations

A combination is the way to pick a different unique smaller sequence from a bigger sequence, without regard to the ordering (positions) of the elements (in the smaller sequence). This article teaches you how to find combinations. For every technique presented in this article, I will explain them as well as I can before going on to teach you how to use the source code.

The notations used in this article:

n : n is the larger sequence from which r sequence is picked

r : r is the smaller sequence picked from n sequence.

c : c is the formula for the total number of possible combinations of r picked from n distinct objects : n! / (r! (n-r)! )

The ! postfix means factorial.

The Technique for Finding Combinations Without Repetitions

I added this section because I realised the explanation in the original article is not easy for the readers to visualise the underlying pattern. Let me explain the new approach now: the shifting technique.

To find combinations, you will use a technique that involves shifting the last combination element from the left to the right.

Find combinations of 1 from a sequence of 5

First, find combinations of 1 from a sequence of 5 numbers{0,1,2,3,4}. Please note that the red box is the combination element.

0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4

A total of five combinations are generated:
0
1
2
3
4

Find combinations of 2 from a sequence of 5

The first example is pretty easy, isn’t it? Now, find combinations of 2 from a sequence of 5 numbers{0,1,2,3,4}. Please note the red boxes are the combination elements.

0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4

Oops, you can’t shift the last element any more. Now, shift the first element and bring back the last element to its right side. This is shown below.

0 1 2 3 4

For the next two combinations, you continue to shift the last element.

0 1 2 3 4
0 1 2 3 4

Again, you cannot shift any more and shift the first element and bring the last element to its side.

0 1 2 3 4

Shift the last element as usual

0 1 2 3 4

Shift the first element and bring the last to its side.

0 1 2 3 4

Oops, you can shift neither the first nor last element any more. This is the end of all the combinations generated.

A total of 10 combinations are generated:
01
02
03
04
12
13
14
23
24
34

Find combinations of 3 from a sequence of 5

Now, go on to find combinations of 3 from a sequence of 5 numbers{0,1,2,3,4}. Please note the red boxes are the combination elements. But for this, I will not explain. Observe the pattern.

0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
0 1 2 3 4

A total of 10 combinations is generated:
012
013
014
023
024
034
123
124
134
234

Steps to Find Combinations

Of course, I can go on to demonstrate finding combinations of 4 elements out of 5 and so on. But I don’t. I trust that you can extrapolate the technique for finding different combinations from different sequences.

Let me define the steps for the shifting pattern you used to find all combinations.

  • Initially, All the element(s) must be on the leftmost of the array.
  • Shift the last red box until it cannot shift anymore, shift the next rightmost red box (if there is one), and bring back the last element to the rightside of it. This is the new combination.
  • Continue to shift the last element to the right.
  • It comes to a point where the last twp elements cannot shift any longer. Then, shift the 3rd one (from the right) and bring the last two to its side consecutively.
  • The previous point rephrased for all abitrary cases: (‘i’ here stands for number) It comes to a point where the last i elements cannot shift any longer; then, shift the i+1 element (from right) and bring the last i elements to its side consecutively.
  • Continue until all the elements cannot shift any longer/ All the combinations have been generated.

Optimised Version: Index Combination

Is there any way you can optimise the technique used in next_combination() function? The answer is yes but you have to give up the genericness of the next_combination().

The next_combination() compares the element that it is going to shift with other elements in the n sequence to know its current position in the n sequence. See if you can get rid of this finding operation.

Take a look at this combination (Take 2 out of 5):

01234
 x x

If the two elements in the r sequence are integers that stores its index position in the n sequence, the finding is not needed. And with this method, the combination returned is a combination of the indexes in the n sequence is no longer a combination of objects. The timing of finding combinations from a sequence of any custom objects is the same, O(1), because an integer index is used.

The == operator is not needed to be defined for this method to work; next_combination needs it to be defined unless you used the prediate version. Other requirements remain the same. Anyway, finding combinations of objects is actually the same as finding combinations of integers. And, integer variables should be faster than class objects because of the object overhead. So, you should use this!

This technique is implemented in the CIdxComb class. Here is an example on how to use the CIdxComb class.

#include <iostream>
#include <string>
#include <vector>
#include "IndexCombination.h"

using namespace std;
using namespace stdcomb;

void foo()
{
   CIdxComb cb;

   cb.SetSizes(5,3);

   // n sequence
   vector<string> vsAnimal;
   vsAnimal.push_back( "Elephant" );
   vsAnimal.push_back( "Deer" );
   vsAnimal.push_back( "Cat" );
   vsAnimal.push_back( "Bear" );
   vsAnimal.push_back( "Ape" );

   // r sequence
   vector<unsigned int> vi(3);
   vi[0] = 0;
   vi[1] = 1;
   vi[2] = 2;

   cout<< vsAnimal[ vi[0] ] << " "
      << vsAnimal[ vi[1] ] << " "
      << vsAnimal[ vi[2] ] << "n";

   int Total = 1;
   while ( cb.GetNextComb( vi ) )
   {
      // Do whatever processing you want
      // ...

      cout<< vsAnimal[ vi[0] ] << " "
         << vsAnimal[ vi[1] ] << " "
         << vsAnimal[ vi[2] ] << endl;

      ++Total;
   }

   cout<< "nTotal : " << Total << endl;
}

Benchmark Between next_combination() and CIdxComb

Below is a benchmark result of next_combination() and CIdxComb for finding all the combinations of r sequence of 6 from n sequence of 45, 10 times, using QueryPerformanceCounter(), on a Intel P4 2.66Ghz CPU and 1GB of DDR2 RAM PC.

next_combination – 1925.76 milliseconds
CIdxComb – 149.608 milliseconds

This benchmark program is included in the source!

There is a very strange occurrance that CIdxComb runs as slow or slower than next_combination in some benchmarks in debug build. I don’t know why, but I think this difference is due to how iterators (which next_combination() is used) and array subscript index (which CIdxComb is used) are handled in std::vector. The above result is from a release build without any optimization.

More by Author

Must Read