Of the basic matrix transforms in any 3D graphics programmer’s toolkit, projection matrices are among the more complicated. Translation and scaling can be understood at a glance, and a rotation matrix can be conjured up by anyone with a basic understanding of trigonometry, but projection is a bit tricky. If you’ve ever looked up the formula for such a matrix, you know that common sense isn’t enough to tell you where it came from. And yet, I haven’t seen many resources online that will describe just how one derives a projection matrix. That is the topic I will address in this article.
For those of you just starting out in 3D graphics, I should mention that understanding where a projection matrix comes from may be a matter of curiosity to the mathematically inclined among us, but it’s not a necessity. You can make do with just the formula; and if you’re using a graphics API like Direct3D that will build a projection matrix for you, you don’t even need that. So, if the details of this article seem a little overwhelming, fear not. As long as you understand what projection does, you needn’t concern yourself with how it works if you don’t want to. This article is for programmers who like to know a bit more detail than is strictly necessary.
Overview: What Is Projection?
A computer monitor is a two-dimensional surface, so if you want to display three-dimensional images, you need a way to transform 3D geometry into a form that can be rendered as a 2D image. And that’s exactly what projection does. To use a very simple example, one way to project a 3D object onto a 2D surface would be to simply throw away the z-coordinate of each point. For a cube, that might look something like Figure 1.
Figure 1: Projection onto the xy plane by discarding z-coordinates.
Of course, this is overly simple and not particularly useful in most cases. For starters, you will not be projecting onto a plane at all; rather, your projection formulae will transform your geometry into a new volume, called the canonical view volume. The exact coordinates of the canonical view volume may vary from one graphics API to another, but for the purposes of this discussion, consider it to be the box that extends from (–1, –1, 0) to (1, 1, 1), which is the convention used by Direct3D. Once all your vertices have been mapped into the canonical view volume, only their x- and y-coordinates are used to map them to the screen. The z-coordinate is not useless, however; it’s typically used by a depth buffer for visibility determination. This is the reason you transform into a new volume, rather than project onto a plane.
Note that Figure 1 also depicts a left-handed coordinate system, where the camera is looking down the positive z-axis, with the y-axis pointing up and the x-axis pointing to the right. This is again a convention used by Direct3D, and one I’ll use throughout the article. None of the calculations are significantly different for a right-handed coordinate system, or for a slightly different canonical view volume, so everything discussed will still apply even if your API of choice uses different conventions than those used by Direct3D.
With that, you can get into the actual projection transforms. There are quite a few different methods of projection out there, and I will cover two of the most common: orthographic and perspective.