Simple 2D Object Tracker
In this section, you will see how your 2D tracker is implemented.
As you already know, the initial position of the object is found with an object detector (you use CObjDetectorStub for that purpose). You initialize the tracker by defining the initial position of the object model. But what is that model?
There is no universal model for every object, because a model reflects object shape, material, and some other important properties of the object. For your ladybug object, you take a simple elliptic model. This is because the ladybug's shape is similar to a ellipse. The math expression for the filled ellipse is:
(x^2)/(a^2)+(y^2)/(b^2) <= 1, where x and y are coordinates of a pixel inside an ellipse, a and b are horizontal and vertical radius of the ellipse, respectively.
To see the model, run the demo again, choose inverse compositional method (type 4) and press Y to show the confidence map (see Figure 7) and template windows. The confidence map window shows the model of the ladybug. Bright pixels belong to the ladybug model. You use a non-uniform intensity of pixels. The intensity of pixels is higher to the center of the ellipse; this reflects the probability that the given pixel belongs to the object. Pixels near the edge of the ellipse have much a smaller probability to belong to the object. This is used in tracking, where pixels with high probability contribute more to motion estimation than others.
[fig7.JPG]
Figure 7: Elliptic model of ladybug.
When initializing the model for the first frame, you create a confidence map image and you also create a template image (see Figure 8). The template image is a gray-scale copy of the image in the area of the object. The template is updated for each frame. After you estimate the motion of the object between two subsequent frames, you update the image of the object. This is called dynamic template. For comparison, a static template is when you create a template for the first frame only and do not update it for the subsequent frames.
The disadvantage of a dynamic template is that is has no "memory." If motion is estimated incorrectly, the error is accumulated. For example, refer to Figure 6 (slow translation). In that type of motion, inter-frame translations are very small, and the accuracy of the alignment algorithm is not enough to estimate motion correctly. So, the error is accumulated very rapidly.
[fig8.JPG]
Figure 8: Template image.
The template and confidence map are used for motion estimation. If lighting conditions are constant, you can assume that the appearance of the template has not changed on the next frame. You assume that only its position and orientation have changed. By minimizing the difference between the template and the next frame, you can estimate motion parameters.
The declaration for the CSimple2DTracker class is located in the simptrck.h file:
/* Class: CSimple2DTracker
* Brief: This class can be used to track 2D objects using four
* available methods:
* Lucas-Kanade method, Baker-Dellaert-Matthews method,
* forwards compositional method, and Hager-Belhumeur method.
*/
class CSimple2DTracker
{
public:
// Construction/destruction
CSimple2DTracker(MethodName method, CvSize frame_size);
~CSimple2DTracker();
// Operations
bool InitModel(IplImage* pFrame, CvRect& obj_rect);
bool TrackModel(IplImage* pFrame);
void DrawModelPosition(IplImage* pFrame);
IplImage* _GetImgT(){ return m_pImgT; }
IplImage* _GetConfMap(){ return m_pConfMap; }
private:
void UpdateTemplate(IplImage* pFrame, CvRect template_rect);
void UpdateConfMap(CvRect& bounding_rect);
MethodName m_Method; // Method name.
CvSize m_FrameSize; // Size of input frame.
CvMat* m_C; // Composition of transformation matrices.
// Warp matrix retrieved from image alignment method.
CvMat* m_W;
CvMat* m_Q; // Temporarily used matrix.
// Pixel coordinates in coordinate frame of template T.
CvMat* m_X;
// Pixel coordinates in coordinate frame of image I.
CvMat* m_Z;
// Size of object's initial bounding rectangle.
CvSize m_ObjSize;
// Incoming frame converted to gray-scale format.
IplImage* m_pImgI;
IplImage* m_pImgT; // Template image T.
IplImage* m_pConfMap; // Confidence map image.
// Current template's bounding rectangle.
CvRect m_template_rect;
// Pointer to currently selected image alignment method
CImageAlignmentMethodBase* m_pAlignMethod;
};
A class constructor takes two parameters: the name of image alignment method and size of frame.
CSimple2DTracker(MethodName method, CvSize frame_size);
The image alignment method can be one of the following: forwards additive, forwards compositional, inverse additive, and inverse compositional. I will discuss those methods in the following section.
enum MethodName
{
FORWARDS_ADDITIVE = 1, // Lucas-Kanade
FORWARDS_COMPOSITIONAL = 2, // forwards compositional
INVERSE_ADDITIVE = 3, // Hager-Belhumeur
INVERSE_COMPOSITIONAL = 4 // Baker-Dellaert-Matthews
};
The frame size should be the size of the frame retrieved from CSynthVideoCapture.
Next, you see several operations:
// Operations
bool InitModel(IplImage* pFrame, CvRect& obj_rect);
bool TrackModel(IplImage* pFrame);
void DrawModelPosition(IplImage* pFrame);
IplImage* _GetImgT(){ return m_pImgT; }
IplImage* _GetConfMap(){ return m_pConfMap; }
The InitModel() method initializes the tracker with an initial object position found with the object detector. TrackModel() tracks the object on the current frame of the video sequence. And, DrawModelPosition() is a helper method that is used to visualize the current position of the object.
The _GetImgT() and _GetConfMap() methods are two helper methods that return pointers to the template and confidence map images, respectively.
Image Alignment Methods
The user can select one of four available image alignment methods to use. An image alignment method is used to estimate an object's motion parameters between two subsequent frames. Because you use four image alignment methods in this program, it would be a good idea to write the base class for them. You can call this class CImageAlignmentMethodBase (see simtrck.h):
/* Class: CImageAlignmentMethodBase
* Brief: This class defines base interface for all image
* alignment methods.
*/
class CImageAlignmentMethodBase
{
public:
virtual bool AlignImage(IplImage* pImgT, IplImage* pConfMap,
CvRect template_rect, IplImage* pImgI, CvMat* W, int step)
= 0;
};
As you see, this class is abstract, and it has a declaration for AlignImage() method that should be implemented by all derived image alignment method classes.
See what parameters it has:
- pImgT: Pointer to the template image
- pConfMap: Pointer to the confidence map
- template_rect: Pointer to the template rectangle
- pImgI: Pointer to the gray-scale copy of the current frame
- W: Resulting warp matrix
- step: Step for template walking cycle
The image alignment method takes a template, confidence map, or current frame as input parameters. It estimates inter-frame motion parameters and returns the warp matrix W. The matrix W transforms the template to produce the warped template that has the same position and orientation as the object on the current frame (see Figure 9).
[fig10.JPG]
Figure 9: Motion estimation between two frames.
The step parameter is used to speed up the image alignment. The value of this parameter is equal to 2; this means to skip each second pixel (by X and Y axis) of the template while estimating motion parameters. It is possible to take each pixel of the template into account, but the speed will reduce. It is also possible to use a step larger than 2.
In this program, you use four image alignment methods:
- forwards additive (Lucas-Kanade) method: This method is implemented as the CForwardsAdditiveMethod class.
- forwards compositional method: This method is implemented as the CForwardsCompositionalMethod class.
- inverse additive (Hager-Belhumeur) method: This method is implemented as the CInverseAdditiveMethod class.
- inverse compositional (Baker-Dellaert-Matthews): This method is implemented as the CInverseCompositionalMethod class.
These methods are declared in the forwadditive.h, forwcomp.h, invadditive.h, and invcomp.h header files. I won't describe these methods in detail here. The interested reader can find more information and sample source code in the Image Alignment Algorithms article.
Here, I just mention how pixel weights are taken into account when estimating motion. In Figure 7, you can see that the template pixels have non-uniform weights. Some pixels in the template can be more reliable than others. You assume that pixels near the center of the elliptic model are more reliable than the pixels near the edge, so the pixels near the center should contribute more to motion estimation than others. You can incorporate weights into the image similarity function as follows:
[formula.gif]
where w is the weight of pixel x. Derivation of image alignment algorithms with weights can be found here.
Another thing I should mention is additional termination criteria for image alignment algorithms. As you can see from the Image Alignment Algorithms (Part II) article, almost half of the iterations are not needed during the minimization process because a mean error value is not reduced on them. To avoid such iterations, you add new termination criteria. You remember a mean error value for each N-th iteration and compare whether it becomes smaller on the next iteration. If the mean error doesn't become smaller for five neighbor iterations, you exit the minimization process.
/*
* Check termination criteria #1 - long oscillations
*/
if(iter==1)
{
min_err_val=mean_error;
cvCopy(m_W, m_BestW);
}
else
{
if(mean_error>=min_err_val)
{
if(bad_iter>MAX_BAD_ITER) break;
bad_iter++;
}
else
{
bad_iter=0;
min_err_val=mean_error;
cvCopy(m_W, m_BestW);
}
}
Conclusion
In this article, you implement a simple 2D object tracker with dynamic template and template pixel weights. Inter-frame object motion can be estimated using one of four available image alignment algorithms: forwards additive, forwards compositional, inverse additive, or inverse compositional. The base for this application is the OpenCV library.
Comments
There are no comments yet. Be the first to comment!