Turing Award | A Closer Look Into Edwin Catmull’s PhD Thesis

Introduction

Last month, the Association for Computing Machinery (ACM) named Edwin Catmull and Patrick Hanrahan as co-winners of the 2019 Turing Award. The highest distinction in computer science went to the pair for their “fundamental contributions to 3-D computer graphics.”

In this article, we take a look at Edwin Catmull’s doctoral dissertation published in 1974, which laid the groundwork for 3D computer graphics.

The motivation behind this dissertation was to create high-quality computer-generated images (CGIs) of surfaces and curved solid objects on a raster-scan machine. Catmull’s work improved not only the accuracy of displayed objects but also the quality of their shading and textures.

Generating pictures of curved patches requires knowledge of 1) the correspondence between points on the surface and the elements of the display raster, 2) hidden, or unseen parts of the patches, and 3) light intensities that should be displayed on the raster. Before we proceed, we will introduce the definitions and terminology used in this work.

A raster-scan device or raster display is a rectangular pattern of image capture and reconstruction. In a raster scan, an image is subdivided into a sequence of (usually horizontal) strips known as scan lines. Each strip can be thought of as an array of dots, where the dots will be referred to as raster-elements. In terms of a computer system, each scan line is divided into several discrete pixels. Scan lines are usually processed/produced sequentially, with each raster-element associated with a brightness value determined by some intensity values. The process of taking the intensity values and putting the dots on the raster display with the corresponding intensities is referred to as displaying.

A frame-buffer is a memory buffer that stores all the intensity values prior to displaying. The size of a frame-buffer is determined by the resolution of the raster display and the number of bits used to store intensity values.

Humans are used to the process of visualizing and understanding objects in 3-dimensional space. However, to visualize the same object on a 2-dimensional plane, the 3-dimensional object must first go through a perspective transformation so that we can understand it as the 3-dimensional object. We refer to the normal 3-dimensional space as the object space and the perspective transformed space as the image space, respectively. The image space is also a 3-dimensional space, but with the objects distorted in such a way that an orthogonal projection of the object onto the x-y plane would result in the expected perspective image. The dimensions of the image space are kept to 3 to preserve depth information. The projection of the image space in the x-y plane is referred to as the projected image, while the portion of the x-y plane associated with the raster is referred to as the screen.

In order to display curved patches, the relationship between the image space and the raster must be defined to transform information from the projected image to the raster. The screen is divided into small squares referred to as raster element squares, with each raster element square in a one-to-one correspondence with a raster element. Figure 1. shows the relationship between the different spaces, as well as the relationships between the image space, the frame buffer, and the actual display.

*Figure 1. Relationships between the various terms defined.*

The article is sectioned as follows. We will first discuss the subdivision algorithm upon which Catmull’s work is based and look at Catmull’s modifications to the subdivision algorithm for displaying curved patches. Then we will discuss the hidden surface problem, which tackles the challenge wherein any observed object should only display the parts which the observer can see. We will look at the two main algorithms Catmull utilizes to tackle this problem. Finally, we will go over the various factors involved in a raster display. e.g., intensity values, unwanted artifacts due to sampling, etc., as well as Catmull’s contribution in modifying the subdivision algorithm to reduce aliasing.

A General Algorithm for Displaying Curved Patches

The subdivision algorithm

The subdivision algorithm establishes the correspondence between a patch and the raster elements. The centre of each raster element is denoted as a sample point. We will discuss this algorithm by walking through the sub-figures of Figure 2.

Figure 2 (top). The grid-like dots are the sample points, while the lines represent the curved edges of the projected object. Recall that each sample point corresponds to a raster element, which is associated with an intensity value; Figure 2 (middle). The algorithm works by dividing the patch of sample points which the object covers into smaller sub-patches that cover a smaller set of sample points; Figure 2 (bottom). This process is continued until each sub-patch covers no more than a single sample point. Compute the intensity of each sub patch and write its value into the corresponding element of the frame buffer.

Some problems that arise from this algorithm include:

How is the subdivision process terminated?
What happens to patches which do not cover a sample point?
What if a sub-patch intersects the edge of the screen or is behind the eye?
How many times must a patch be sub-divided?
What problems may result from discrete sampling?

Catmull addresses each of these challenges in his thesis. Here, we will discuss the corresponding solutions at a high level.

Termination conditions
There are two termination conditions. The first termination condition for a patch or sub-patch (without loss of generality, these terms will be used interchangeably from here on) is when it covers only a single sample point. A patch can be approximated by forming a polygon by connecting the patch corners with straight edges. The polygon is then checked for whether only a single sample point exists within its region, as shown in Figure 3.

*Figure 3. A rectangular polygon approximation of a patch.*

The second termination condition is clipping, e.g., a patch will stop further partitioning if it does not appear on the screen. That is, if the patch in the image space, when projected onto the x-y plane, lies outside of the region of the screen, or if the patch is behind the eye such that it will not be displayed, it does not require further subdivision. This requires a method of determining whether a patch is completely within or outside the screen. For patches that are partially outside of the screen, the general rule is to sub-divide into smaller sub-patches and repeat the checking process.

The number of sub-divisions
The number of times a patch is sub-divided is proportional to the area of the patch on the screen. Generally, the ratio of number of sub-divisions to rectangular area (of the patch) is roughly 1/3. For curved patches the ratio may be larger.

The sampling problem introduces challenges
With the introduction of sample points, some natural issues arise. A patch with very small area may not cover a sample point, and therefore, will disappear as its intensity value is not assigned to any sample point. This can be solved by assigning the nearest sample point to the intensity value of said patch.

The sub-division algorithm described was first applied to bi-cubic patches. Bi-cubic patches are convenient for most applications. One of Catmull’s main contributions is improving the sub-division algorithm to be applied to other kinds of surfaces. This will be discussed in the next section.

Sub-dividing a Cubic Curve

Catmull presented a new difference equation determining the midpoint of a cubic curve segment. The midpoint is essentially the average of its two end points minus a correction term. A similar method is used to determine the derivative at the midpoint.

Consider the cubic: f(t) = a x t^3 + b x t^2 + c x t +d. The problem is to determine f(t) when f(t-h) and f(t+h) are known,
where f(t+h) = a x (t+h)^3 + b x (t+h)^2 + c x (t+h) x d, and f(t-h) = a x (t-h)^3 + b x (t-h)^2 + c x (t-h) x d,
where f(t-h) + f(t+h) can be calculated to be 2 x f(t) + 2 x (h^2 x ( 3 x (a x t +b) ) ),
the midpoint is then f(t) = ( f(t-h) + f(t+h) )/ 2 – h^2 x ( 3 x (a x t +b) ).

Where f(t) represents a cubic curve segment, a, b, c, d are scalar parameters that describe the curve. h is a scalar value indicating the distance away from the point f.

We can see that the midpoint is just the average of the two end points minus a correction term. Since f(t-h) and f(t+h) are known, the only thing left is the correction term h^2 x ( 3 x (a x t +b) ). The correction term at t can similarly be found from the correction terms at (t-h) and (t+h).

Let g(t) = h^2 x ( 3 x (a x t +b) ) be the mid point at t,
then g(t-h) = h^2 x ( 3 x (a x (t-h) +b) ) and g(t+h) = h^2 x ( 3 x (a x (t+h) +b) ),
calculating g(t-h) + g(t+h) = 2 x g(t),
thus, g(t) = ( g(t-h) + g(t+h) ) /2 .

Combining the calculations above to determine the final difference equation for the midpoint f(t), we can see that

f(t) = ( f(t-h) + f(t+h) )/ 2 – ( g(t-h) + g(t+h) ) /2.

Note that h is modified as needed depending on the level of sub-division. We will not go into the details of computing the h’s but provide the intuition. This dependency of h on the subdivision level is necessary because with more sub-divisions, each patch becomes smaller. Thus, a midpoint t should be the average of two points that are at a smaller distance than that of a larger patch. For example, consider the following (polygon) patch already divided into sub-patches A, B, and C.

In order to find the midpoint (point a) information of the top edge in sub-patch A, we require the information at the endpoints (a-h) and (a+h) of the edge. And in order to determine the midpoint (point b) information of the top edge in sub-patch B, the endpoints (b-h’) and (b+h’) are required. We can see that h > h’, and that a smaller sub-patch requires the h parameter to be smaller.

Catmull’s contribution in this regard was that he introduced a way to calculate the midpoints of curves, and was not limited to just straight edges of polygons!

From an application perspective, this modified sub-division algorithm allows us to determine the midpoint at which a curve (as to straight lines) is to be sub-divided. Mathematically, instead of the midpoint of two points, there is an extra correction added to calculating a midpoint given two end points. The calculations can be transformed into matrix operations for mathematical convenience. Readers who are interested in the mathematics perspective may refer to the thesis for more details.

Extending Cubic Sub-division to Surfaces

As a natural continuation of sub-dividing a curve, we must address the problem of sub-dividing a surface (with curved edges). With a cubic curve there is a value and a correction term at each end; with a cubic patch there is a value and correction term at each corner. Sub-dividing this patch involves finding:

the midpoints of each of the edges (sides) of the patch,
the midpoint of the patch itself.

In a nutshell, the midpoints of each of the edges can be calculated as mentioned in the previous section. The midpoint of the patch can then be interpolated using the 4 midpoints found on the 4 edges of the patch.

The Hidden Surface Problem

In order to create a realistic view of any object, we must display only the lines visible to the observer and forget about the lines which are not visible (to the observer). The identification and removal of these surfaces is called the hidden surface problem, or visible surface detection problem. This can be visually explained in Figure 4.

Catmull’s thesis discussed two approaches, the Z-buffer algorithm and the modified Newell algorithm.

A high-level description of the Z-buffer algorithm

The Z-buffer method is now one of the commonly used methods for hidden surface detection. The Z-buffer method compares surface depths at each pixel position on the projection plane, e.g., the plane that is mapped to the display. In general, the z-axis is represented as the depth. According to the direction of the observer, the algorithm takes the maximum Z value or minimum Z values to initialize the Z-buffer. Raster elements that are not overlapped with one another do not require any comparison and are directly written to the frame buffer.

Typically we consider the observer to sit on the positive z-axis and look towards the negative z-axis. The object with the smaller z-value is further away from the observer and an object with a larger z-value is closer to the observer. In the case of the observer looking in from the negative z-axis, the above is reversed. Thus, the Z-buffer method may depend on the viewing direction of the observer. Figure 5. illustrates the Z-buffer method when the observer sits at the negative z-axis looking towards the positive z-axis.

Figure 5. The observer is at the negative z-axis looking towards the positive z-axis. Objects with smaller depth are closer and are stored into the final Z-buffer, which is then displayed on the screen.

The Z-buffer method has several advantages. Hidden surface problems and intersections of arbitrary surfaces are handled trivially. That is, surfaces can be written into the buffer in any order with no additional processing. Many references for this method can be found online. However, the take-away concept of this algorithm is straightforward. Each (x, y, z) point of a polygon surface corresponds to the orthogonal projection onto the x-y plane, which then maps to a point on the raster display. At each point (x, y) of the raster display, objects are compared by using their depth (z-) values. Finally, as a note, the Z-buffer does run into problems when dealing with transparent surfaces. In this case, another common method known as the A-buffer algorithm is used.

Other common methods for the hidden surface (visible surface detection) problem are listed below. Interested readers can find online resources for further research and understanding.

Scanline algorithm: Works on a row-by-row basis rather than a polygon-by-polygon or pixel-by-pixel basis.
Warnock’s algorithm: Recursive subdivision of a screen until areas are obtained that are trivial to compute. Mostly used as a base for other algorithms, and is very parallel friendly.
Painter’s algorithm: Sorts all the polygons in a scene by their depth and then paints them in this order, farthest to closest. Mostly used for (simple) video games.
Ray tracing algorithm: Essentially geometric objects and physics done backwards. Imagine simply reversing the path that a light ray took to get to your eye, tracing it back to the source (or vice versa).

From my research on this topic, the Z-buffer and Ray tracing algorithms are more commonly used today.

A high-level description of the modified Newell algorithm

The modified Newell algorithm is an algorithm for displaying polygons that sorts the polygons in z-order and paints the polygons in that order into a frame buffer. The polygons further away from the observer are painted first. Subsequent polygons may be written over those already in the buffer, thus eliminating obscure polygons. If two polygons intersect or are difficult to sort according to their z-values, they are split into smaller pieces until they can be sorted correctly. In summary, the algorithm consists of the following steps:

Sort polygons into furthest minimum Z-order to nearest,
For each polygon, check other polygons it may overlap with,
See if any of the polygons are behind the polygon being tested,
If so, move this polygon behind it,
Split overlapping polygons into smaller pieces and repeat the above steps with each partition.

Catmull combines the modified Newell algorithm with the Z-buffer algorithm to process objects on the image space and write the intended intensity values into the buffer.

Intensity, Sampling, Rastering, and Aliasing

Intensity

As mentioned previously, once a sub-patch covers only a single sample point, an intensity value is then assigned to the sample point. There are several ways of obtaining the intensity at each point.

Calculate the intensity with the normal (orthogonal) to the surface,
Define some intensity function to calculate the intensity,
map the intensity values from some picture,
modify existing intensities for shadow or transparency

We will briefly touch on each of the above techniques below.

Using the normal to the surface
The normal of a surface is frequently used to calculate the intensity. A typical way of doing so is by taking the dot product between the light vector and the normal vector. Finding the normal vector, however, is not an easy task and is where the challenge lies in this method.

Using an intensity function
Instead of the intensity being a function of the orientation of the surface (as in the normal method), intensity could be defined as a function of pressure, strain, height, density, etc.

Using a mapping
Photographs, drawings, or any picture can be mapped onto bi-variate patches. Ultimately, this method attempts to make correspondences between any point on a patch and an intensity of a picture (which is mapped to said patch).

Using intensity modification
Once an intensity value is written to a buffer, there may still be reasons to modify it, e.g., to handle shadows or transparency. For example, an intensity value which requires modification might be a convex combination of the old and new intensity value. Mathematically,
new_value = modification_value + T x ( old_value – modification_value ) = T x old_value + (1-T) x modification_value
where T is a scalar parameter between 0 and 1.

For the interested reader, each of these methods is discussed in Catmull’s thesis in further detail.

We have now touched on most of Catmull’s contributions. Recall that generating pictures of curved patches requires knowledge of (1) the correspondence between points on the surface and the elements of the display raster, (2) hidden, or unseen parts of the patches, and (3) light intensities that should be displayed on the raster. Catmull developed a new difference equation to determine the midpoint of a cubic curve segment and bi-cubic patches. This aided in the sub-division problem of the patches. The sub-division algorithm is performed until each sub-patch covers a raster element accomplishing point (1) above. Catmull combined the Z-buffer algorithm and a modified Newell algorithm to tackle the hidden surface problem in point (2). Finally, Catmull discussed several techniques to determine the intensity values that are displayed, covering point (3).

In the final parts of his thesis, Catmull discussed unwanted artifacts generated when using a raster display and proposed some standard techniques to combat the problem. I will provide a brief introduction to the aliasing problem, but will not go into detail on how to combat it. The techniques Catmull discussed and used are standard in signal processing fields, and the goal of the next section is only to connect signal sampling to computer graphics.

Sampling, Rastering, and Aliasing

Simply put, a raster element (pixel) is a discrete sample of some continuous image. Sampling deals the with question: how densely do we need to sample something continuous to capture its essence accurately? Without diving into signals and systems knowledge, I will provide a quick example of a relationship between sampling and aliasing. Suppose you have a time signal you want to sample at regular intervals. The time duration between these sample points may result in significant differences in the resulting sampled signal. Figure 6. illustrates this by sampling a sine wave.

Figure 6 (top). The sample is fast enough that the reconstructed signals will have the same frequency as the original signal; Figure 6 (bottom). The sample is too slow such that the reconstructed wave will appear to have a much lower frequency than the original. This is called an aliased signal.

Thus, the sampling frequency must be high enough to correctly reconstruct the signal. In sampling theory, the Nyquist theorem states that to accurately reconstruct a signal, the sampling rate must be >= 2 times the highest frequency in a signal. However, we will not dive further into signal processing here.

With that said, what does this have to do with computer graphics? Because an image can always be seen as an intensity map of values defined at the pixel centres, it can also be seen as the point sampling of a continuous function, and writing pixel intensity values is exactly like sampling the function at the pixel centres.

Visually, aliasing is the visual stair-stepping of edges that occurs in an image when the resolution is too low. Figure 7. illustrates the aliasing effect.

Figure 7. The rendered image exhibits aliasing.

The aim of anti-aliasing is to try to avoid the effects of aliasing as much as possible. Many standard signal processing techniques can be found online showing how this is done. We will briefly discuss anti-aliasing at a high level. The below description may involve prerequisites in signal processing knowledge.

Sharp edges and small objects within an image typically correspond to high frequencies when the image undergoes a Fourier transform. The raster display, on the other hand, can only reproduce low frequencies. Thus, the upper limit on the frequency is determined by the resolution of the raster display. During the sampling process, frequencies that are higher than those that can be reproduced by the raster become indistinguishable, or aliased.

There are two main categories of algorithms for anti-aliasing,

Pre-filtering: treats pixels as an area, and computes pixel colour based on the overlap of the scene’s objects with a pixel’s area
Post-filtering: renders the scene at higher resolution, and computes the pixel value by a (weighted) average of the subpixels (supersampling)

Catmull discussed the pre-filtering technique of area sampling in some detail. As mentioned, in area sampling, pixel intensities are calculated proportional to areas of overlap of each pixel with objects to be displayed.

One final contribution of the thesis is modifying the sub-division algorithm to allow for anti-aliasing. The sub-division algorithm divides a patch until each sub-patch covers one and only one raster element. Catmull modified the algorithm to allow for area sampling. Consider the following example.

Each square represents a raster element square. The crossings of the horizontal and vertical lines are denoted as vertices. The modification Catmull introduced had each sub-patch cover at most one vertex (as opposed to a sample point). The polygon (dotted lines) that approximates the patch is used for area calculations. The polygon is divided into four pieces, belonging to each of the four squares. Each divided area is then used with the area averaging algorithm to calculate the new pixel value. For example, consider the top-right sub-patch. We can see the object sits on top of the background, with each having their own intensity values. The pixel contains a portion of the object and a portion of the background. The final pixel value is a weighted average of the intensity values of the background and the object.

Concluding Remarks

In his thesis, Catmull developed a process for computer graphics that addressed a specific set of research questions concerning the science of rendering complex objects, including the especially difficult task of rendering curved surfaces. Catmull introduced methods to calculate the midpoints of cubic curves to enable the sub-division algorithm — which previously worked for polygon patches — to also partition patches with curved boundaries. In addition, he introduced the Z-buffering algorithm, which utilized image depth information to determine the 2-dimensional view of a 3-dimensional object in computer graphics. Finally, he pioneered a modified sub-division algorithm that allowed for anti-aliasing through area sampling, which was a ground-breaking improvement in modern computer graphics. Many of the techniques introduced in Catmull’s thesis are still widely used today in video games and animated films.

Author: Joshua Chou | Editor: H4O & Michael Sarazen

Turing Award | A Closer Look Into Edwin Catmull’s PhD Thesis

Introduction