Implementing a Simple 2D Object Tracker


This article will implement a relatively simple object tracker that is able to track 2D objects. To make this process more interesting, you will use the ladybug as an object to track. First, figure out what an object tracker is and how it works. Assume that you see a ladybug on a green leaf and you want to know where it moves. You can focus your eye on the ladybug and track its movement and orientation (note that you can move yourself at the same time). You can write a program that will do that for you automatically; that program is called an object tracker.

Interesting link: Tracking is not limited to 2D objects. For example, here you can find the demo of a tracker that is able to track a human’s head motion in 3D space using an USB web camera.

Compiling the Code

To compile the code, you need to have Visual C++ 6.0 or later installed. Visual C++ Express Edition also fits your needs.

In this program, you use many functions that are provided with the OpenCV library (Intel Open Source Computer Vision Library), so you have to download (about 18 Mb) and install it to your computer.

After you’ve installed OpenCV, you need to tell Visual Studio where OpenCV is located. In your Visual Studio main window, open the Tools->Options menu item and click the Projects and Solutions -> VC++ Directories menu item. In the “Show directories for:” combo box, select “Library files” and add the following line to the list:

C:\Program Files\OpenCV\lib

In the “Show directories for:” combo box, select “Include files” and add the following lines to the list:

C:\Program Files\OpenCV\cv\include
C:\Program Files\OpenCV\cxcore\include
C:\Program Files\OpenCV\otherlibs\highgui

Now, open the workspace file (tracker2d.dsw or tracker2d.sln), compile, and run it.

Running the Demo

When you run the program, you will see a console window. You will be asked to choose what image alignment method to use:

Please choose an image alignment method:
1 - forwards additive
2 - forwards compositional
3 - inverse additive
4 - inverse compositional
Your choice (1-4)? >

Type a number between 1 and 4 to choose the desired method. For example, type 4 to choose the inverse compositional method. I will discuss image alignment methods later.

Then, you will be asked if additional windows (template and confidence map) should be displayed.

Show template and confidence map windows?
Press Y to show. Any other key to not show.

Press Enter to not show those windows. I will discuss later why template and confidence map windows are needed.

Then, you will see window titled “Video” displaying a ladybug on green leave (see Figure 1). If you do not see the “Video” window, it is behind the console window. Click “Video” in the task bar to show it. Position the “Video” window to see what is written to the console because you will print some diagnostic information to the console. Please ensure that the “Video” window has the input focus because all subsequent user input will be directed to this window (not to the console window).

Figure 1: The “Video” window.

What can you see in the “Video” window? You can see a ladybug looking to the west and a blue bounding rectangle around the ladybug. You also can see the current FPS (frames per second rate). And, you can see the current track time. The tracking time shows how much time is needed to estimate motion parameters of the ladybug between two subsequent frames.

Now, see what the console window shows:

Current motion type: still.
Press any key to choose another motion type.

You are asked to press any key to change the type of motion. Do that (see Figure 2). Now, the motion type is translation, and the ladybug starts moving along the X and Y axes. You can say that ladybugs can’t go backwards. It’s true. Assume that the ladybug is just not moving at all, but we move the camera to the west, so you have the effect of the ladybug moving backwards.

Current motion type: translation.
Press any key to choose another motion type.

You can see that sometimes the tracker almost loses the ladybug. This is because you use a dynamic template, and an error is accumulated during the sequence. I will discuss later what a dynamic template is.

Figure 2: Motion type is translation.

Now, press any key to choose the next type of motion (see Figure 3).

Current motion type: rotation.
Press any key to choose another motion type.

Figure 3: Motion type is rotation.

Because ladybugs, unfortunately, are very difficult to rotate, you assume that your camera rotates around the ladybug.

There are other three predefined types of motion (see Figures 4-6): scaling, combined motion (translation, rotation, and scaling at the same time), and slow translation. The last motion type is used to show that the tracker can lose an object even with very slow motion, when its accuracy is not enough to estimate that motion with a dynamic template.

Figure 4: Scale.

Figure 5: Combined motion.

Figure 6: Slow translation.

Finally, you will be asked to press any key to exit the application.

More by Author

Must Read