Building the Right Environment to Support AI, Machine Learning and Deep Learning
Perspective projection is a slightly more complicated method of projection, and more frequently used because it creates the illusion of distance and thus produces a more realistic image. Geometrically speaking, the difference between this method and orthographic projection is that in perspective projection, the view volume is a frustum—that is, a truncated pyramid—rather than an axis-aligned box. You can see this in Figure 4.
Figure 4: Perspective projection.
As you can see, the near plane of the view frustum extends from (l, b, n) to (r, t, n). The extents of the far plane are found by tracing a line from the origin through each of the four points on the near plane until they intersect the plane z = f. Because the view frustum gets increasingly wider as it extends further from the origin; and because you're transforming that shape into the canonical view volume, which is a box; the far end of the view frustum is compressed to a greater extent than the near end. Thus, objects further back in the view frustum are made to appear smaller, and this gives you the illusion of distance.
Because the shape of the volume changes in this transformation, perspective projection can't be expressed as a simple translation and scale like orthographic projection. You'll have to work out something a bit different. But, that doesn't mean that the work you did on orthographic projection is useless. A handy problem-solving technique in mathematics is to reduce a problem to one that you already know how to solve. So, that's what you can do here. The last time, you examined one coordinate at a time, but this time you'll do the x- and y-coordinates together, and then worry about z later on. Your plan of attack for x and y can be broken down into two steps:
Step 1: Given a point (x, y, z) within the view frustum, project it onto the near plane z = n. Because the projected point is on the near plane, its x-coordinate will be on the range [l, r], and its y-coordinate will be on the range [b, t].
Step 2: Using the formulae you derived in your study of orthographic projection, map the new x-coordinate from [l, r] to [–1, 1], and the new y-coordinate from [b, t] to [–1, 1].
Sound good? Then, have a look at Figure 5.
Figure 5: Projection of a point onto z = n using similar triangles.
In this diagram, you've drawn a line from a point (x, y, z) to the origin, and noted the point at which the line intersects the plane z = n—it's the one marked in black. From those points, you drop two perpendiculars to the z-axis, and suddenly you have a pair of similar triangles. If you've suppressed your memories of high school geometry, similar triangles are triangles that have the same shape but are not necessarily the same size. To show that two triangles are similar, it is sufficient to show that their corresponding angles are equal, and it's not hard to do that in this case. Angle 1 is shared by both triangles, and obviously it's equal to itself. Angles 2 and 3 are corresponding angles made by a traversal intersecting two parallel lines, so they are equal. And, right angles are of course equal to each other, so your two triangles are similar.
The property of similar triangles that you're interested in is that their pairs of corresponding sides all exist in the same proportion. You know the lengths of the sides that lie along the z-axis; they're n and z. That means that the other pairs of sides also exist in the ratio n / z. So, consider what you know. By the Pythagorean theorem, the perpendicular from (x, y, z) down to the z-axis has the following length:
If you knew the length of the perpendicular from your projected point to the z-axis, you could figure out the x- and y-coordinates of that point. But, that's easy! Because you have similar triangles, the length is simply L multiplied by n / z:
Thus, your new x-coordinate is x * n / z, and your new y-coordinate is y * n / z. Thus concludes Step 1. Step 2 simply called for you to perform the same mapping you did in the last section, and so it's time to revisit the formulae you derived in your study of orthographic projection. Recall that you mapped our x- and y-coordinates into the canonical view volume like so:
You now can invoke these same formulae again, except you need to take your projection into account; so, you replace x with x * n / z, and y with y * n / z:
Now, you multiply through by z:
These results are a bit odd. To write these equations directly into a matrix, you need them to be written in this form:
But clearly, that's not going to happen right now, so it looks like you're at a bit of an impasse here. What to do? Well, if you can find a way to get a formula for z'z like you have for x'z and y'z, you can write a matrix transform that maps (x, y, z) to (x'z, y'z, z'z). Then, you'll just divide the components of that point by z, and you'll end up with (x', y', z'), which is what you wanted.
Because you know that the transformation of z into z' does not depend on x or y in any way, you know that you want a formula of the form z'z = pz + q, where p and q are constants. And, you can find those constants pretty easily because you know how to get z' in two special cases: Because you're mapping [n, f] to [0, 1], you know that z' = 0 when z = n, and z' = 1 when z = f. When you plug the first set of values into z'z = pz + q, you're able to solve for q in terms of p:
Now, you plug in the second set of values, and get:
Substitute your value for q into that equation, and you can easily solve for p:
Now that you have a value for p, and you found earlier that q = –pn, you can solve for q:
Finally, if you substitute these expressions for p and q back into your original formula, you get:
You're nearly finished now, but the unusual nature of your approach to this problem requires that you do something with the homogeneous coordinate w. Normally, you're simply content to set w' = 1—you've probably noticed that the bottom row in a basic transform is almost always [0 0 0 1]—but now you're writing a transform to the point (x'z, y'z, z'z, w'z), and so instead of writing w' = 1, you write w'z = z. Thus, the final set of equations you'll use for perspective projection are:
And, when you write this set of equations in matrix form, you get:
When you apply this to the point (x, y, z, 1), it yields (x'z, y'z, z'z, z). But then, you apply the usual step of dividing through by the homogeneous coordinate, and so you end up with (x', y', z', 1). And that's perspective projection. Direct3D implements the above formula in the function D3DXMatrixPerspectiveOffCenterLH(). As with orthographic projection, if you assume that the view frustum is symmetrical and centered on the z-axis (meaning that r = –l and t = –b), you can simplify things considerably by writing the matrix in terms of the view frustum's width w and its height h:
Direct3D has a function for this matrix as well, called D3DXMatrixPerspectiveLH().
Finally, there's one more representation for perspective projection that often comes in handy. In this form, rather than worrying strictly about the dimensions of the view frustum, you define it based on the camera's field of view. See Figure 6 for an illustration of this concept.
Figure 6: The view frustum's height defined in terms of the vertical field of view angle a.
The vertical field of view angle is a. This angle is bisected by the z-axis, so with a bit of basic trigonometry, you can write the following equation that relates a to the near plane n and the height of the screen h:
This expression allows you to replace the height in your projection matrix. Furthermore, you replace the width with the aspect ratio r, defined as the ratio of the width of the display area to its height. So, you have:
Thus, you have a perspective projection matrix in terms of the vertical field of view angle a and the aspect ratio r:
In Direct3D, you can get a matrix of this form by calling D3DXMatrixPerspectiveFovLH(). This form is particularly useful because you can just set r to the aspect ratio of the window you're rendering in, and a field of view angle of p / 4 is usually fine. So, the only things you really need to worry about defining are the extents of the view frustum along the z-axis.
That's about all you need to know about the mathematics behind projection transforms. There are some other, lesser-used projection methods out there, and of course things are slightly different if you use a right-handed coordinate system or a different canonical view volume, but you should be able to figure out those formulae easily by using the results in this article as a base. If you want more information on projection and other transforms, take a look at Real-Time Rendering by Tomas Moller and Eric Haines; or Computer Graphics: Principles and Practice by James D. Foley, Andries van Dam, Steven K. Feiner, and John F. Hughes; these are two excellent books on computer graphics that I referred to in writing this article.
If you have any questions about this article, or spot any corrections that need to be made, you can contact me via PM on the CodeGuru forums, where I go by the name Smasher/Devourer. Happy coding!