Ray Trace with Affine Transforms

This month there is only one article, Ray-Tracing with Affine Transforms. We will be adding the following articles over the next few months. An high efficiency single phase motor (Patent No. 5,300,870, granted April 5, 1994). The IMint interface (Patent applied for). Contouring in Hyperspace. Strange Recursion. And many others.

(C)opyright 1995, Seven seas Software Inc., MathVISION Inc.

Ray-Tracing with Affine Transforms Otto J. A. Smith June 13, 1995

For a hard copy version of this paper mail requests to the author at This article is available in post script by anonymous FTP from pub/sites/7seas under the name CONTAIN.PS

Abstract:: We derive affine transformations that change the basis of a three dimensional world space. The new basis for this space is chosen in such a manner that many ray-tracing calculations are simplified, particularly when doing z-buffering. We begin with the simple problem of determining whether a point is contained in a triangle and extend our derivation to three space. We present some algorithm optimizations that can make real-world computer programs faster.

Triangle Containment:

Let Tri( v1, v2, v3) be a triangle in 2-space. Each v and x are 2-dimensional row vectors. The parameters p1 and p2 are scalars. Figure 1

Given a point x, determine if x is interior to Tri( v1, v2, v3). In order to make this determination use vectors from the triangle to form a basis for the space, then any point can be represented by a parametric equation:

(A.) x = v1 + p1( v2 - v1) + p2( v3 - v2)

Now observe the following:

When p1 = 1 and 0 < p2 < 1 ,then x is on the line segment ( v2 , v3) .
When p2 = 0 and 0 < p1 < 1 , then x is on the line segment ( v1 , v2) .
When 0 < p1 < 1 and 0 < p2 < 1 and p1 = p2 , then x is on the
line segment ( v1 , v3) .

We constructed equation (A.) in the manner given here in order to insure that the three observations made above were true. Now if p2 < p1 then x is on the same side of the line determined by ( v1, v3) as v2 . If p2 > p1 then x is on the opposite side of the line. In other words, the point x is inside the triangle Tri( v1, v2, v3) when all of the following are true:

p2 < p1
0 < p1 < 1
0 < p2 < 1

Given the triangle Tri( v1, v2, v3) and x one can calculate p1 and p2.

Given the triangle Tri( v1, v2, v3) alone, one can calculate a two by two working matrix W such that ( x - v1) W =[ p1, p2 ] where [ p1, p2 ] is a row vector.

We construct a matrix M and define W = Inv(M) . The notation Inv(M) is used to indicate the matrix that is the inverse of M . That is Inv(M)M = I , where I is the identity matrix.

x = v1 + p1( v2 - v1) + p2( v3 - v2)

Let M =



        |   v2 -  v1  |

        |   v3 -  v2  |

Then:

[ p1, p2 ] M = x - v1
[ p1, p2 ] = ( x - v1)Inv(M)
[ p1, p2 ] = ( x - v1) W

In applications where many points will be checked against a single triangle this is a fast algorithm. It is fastest if p1 is calculated before p2 and verified against the limits zero and one so the algorithm can terminate prematurely if the point x is not contained in the triangle. If there are many points, the cost of calculating W = Inv( M ) is amortized over many points. W is a feature of the triangle and not the point being checked which is used in a multiplication with the matrix Inv( M ). When M is singular, W does not exist since v1, v2 and v3, all lie on a straight line. If this is the case the point is contained in the "triangle" ( line segment ) only if it is on the line defined by any two verticies from v1 , v2 and v3, and in addition has verticies from this set in both a positive and a negative direction.

Discussion of Basic Algorithm:

I have found no references to the derivation of this technique in the literature although the technique presented here produces an algorithm that could also be derived starting from barycentric coordinates. A discussion of this relationship is included in the appendix for this paper.

Before extending the algorithm to three (and higher dimensional spaces) and showing how it can simplify Ray-Tracing, particularly when doing z-buffering, we would like to enumerate some of its advantages and disadvantages.

One traditional method of determining containment in a triangle, or convex polygon, is to generate the equations for the bounding lines and then to determine which side of each bounding line the point falls on.

A second technique is based on the Jorden curve theorem. footnote1

I believe there are several reasons why the advantages of the triangle and hypertriangle containment algorithms presented here are not commonly used. One reason is that most computer 3-D graphics geometry has been based on homogeneous footnote7 system of coordinates and transforms taken from the theory of projective geometry, rather than affine transformations based in the theory of linear algebra.

Another reason is that normally a basis for the space would be constructed from the vectors (v2 - v1 ) and (v3 - v1) ( instead of (v3 - v2) ) so that hidden advantages of choosing other sets of vectors for the basis of the space have not been explored. We will call techniques, such as the one presented above, in which the vectors are not constructed from a common origin, Free Choice techniques. We will call techniques in which a common origin is used, Common Origin techniques. In common origin techniques, one point in the triangle, ( or hyper-triangle ) is chosen as the origin. In order to generate the set of vectors that form the basis for the new coordinate system, the value of this origin is subtracted from the values of the remaining points in the triangle.

I posted a common-origin algorithm on the Internet without comment about its derivation and recieved a reply from Prof. Dr. Heinrich Giesen pointing out that it is equivalent to the Barycentric coordinate technique, which is a well known technique. The demonstration of equivalency is included in the Appendix .

We will not use the common origin technique in this paper because by using free choice techniques, we can customize the results of our basis change to give us additional information that is useful and increases the efficiency of our algorithms.

Multi Dimensional Extensions:

The techniques presented above are easily extended to multi-deminsional spaces.

Let Tri( v1, v2, ... , vn) be a hypertriangle in (n-1) -space.

Given a point x, determine if x is interior to Tri( v1, v2, ... , vn) .

In order to make this determination use vectors from the hypertriangle
Tri( v1, v2, ... , vn) to form a basis for the space.

The Free Choice Extension:

Any point x can be represented by a parametric equation:

(B.) x = v1 + p1( v2 - v1) + p2( v3 - v2) + ... + pn-1( vn - vn-1)

When the following conditions are true, the point is contained in the hypertriangle.

0 < pi < 1
Given any i , j such that i < j then pi > = pj

Common Origin Extension:

In the common origin technique represent an arbitrary point x as follows:

(C.) x = v1 + p1( v2 - v1) + p2( v3 - v1) + ... + pn-1( vn - v1)

When the following conditions are true, the point is contained in the hypertriangle.

0 < pi < 1
p1 + p2 + ... + pn < 1 .

Given the triangle
Tri( v1, v2, ... , vn) and x we can calculate p1 + ... + pn.

Given the triangle
Tri( v1, v2, ... , vn) alone we can calculate a working matrix
W = Inv( M ) such that ( x - v1) W =[p1, p2, ... , pn] .

In this paper we will most frequently use the free choice technique to choose vectors with which to calculate the matrix.

Two Perspective Problems:

There are two perspective problems that are easily solved using hypertriangle containment.

The two problems for which we disclose new solutions here are,

The ray tracing perspective projection problem.
The standard perspective problem.

The Standard Perspective Problem:

The standard perspective problem is as follows. Figure 4 We are given the following information:

The location of a single point t, in world coordinates in three-space.
The location of the focal point f, of an eye or a camera in world coordinates in three space.
The location of the projection plane P in world coordinates in three space. This location is determined by a set of three vectors.
P is defined by the set { r1, r2, r3 } . footnote2

We are asked to determine the point at which a line from the focal point of the eye f to the single point t intersects the projection plane P. If this point of intersection exists, we will call it t'. We are asked to determine the coordinates of t' in the rectangular coordinate system of the projection plane P. The projection plane represents the surface of a video display device, such as a CRT tube and the coordinate system in which we want the information returned is the two dimensional pixel coordinates of the CRT tube.

The Ray Tracing Perspective Problem:

The ray tracing perspective problem is as follows, Figure 5 We are given the following information:

The location of the focal point, f of an eye or a camera in world coordinates in three space.
The location of the projection plane P in world coordinates in three space, (this is three points, the origin of the screen, the upper left of the screen and the lower right).
A point x on P in world coordinates footnote3
A two dimensional triangle T = Tri( v1, v2, v3 ) in world coordinates. This is a two dimensional triangle embedded in three space, its verticies are vectors with three coordinates. In the real world systems, the triangle is a portion of the surface of an object being ray-traced.

We are asked to determine two different things:

Does the line starting at the focal point f and passing through the point x, intersect the triangle T? If the intersect exists call it x'.
If the intersection exists, what is a measure of the distance from the point x to its intersection on the triangle T at x'.

Ray Tracing Problem Solved:

Let us use the free choice tetrahedron containment algorithm to develope a ray tracing perspective technique. If the line starting at the focal point f and passing through the point x intersects the triangle T, the first item we are asked to determine, is equivalent to asking if the point x is contained in the tetrahedron Tri( v1, v2, v3, f) . That is, is x contained in the tetrahedron defined by the union of the triangle with the focal point.? ( T U f ). The second item we are asked to determine, is a measure of the distance from x to its intersection on T.

We use this measure for z-buffering. We get this measure as a side benefit when we use the free choice technique of detecting whether x is in Tri( v1, v2, v3, f ) . Let the point x be represented as: footnote4

(D.) x = f + p1( v1 - f) + p2( v2 - v1) + p3( v3 - v2)

Calculate the working matrix W = Inv( M ) that transforms a point into the new coordinate system represented by the above equation.

The three by three matrix of row vectors vectors M is represented as:

M =

        | v1 -  f  |

        | v2 -  v1  |

        | v3 -  v2  |

from the three by three matrix M calculate W = Inv( M ). Represent W = Inv( M ) as a matrix of column vectors.

W =

     | c1   c2   c3 |

The algorithm to obtain the information we need is then as follows:

Calculate the dot product p1 = ( x - f) * c1 . If this product is greater than one or less than zero the algorithm terminates. Point x is not in the tetrahedron.
Calculate the dot product p2 = ( x - f) * c2 . If this product is greater than p1 or less than zero the algorithm terminates. Point x is not in the tetrahedron.
Calculate the dot product p3 = ( x - f) * c3 . If this product is greater than p2 or less than zero the algorithm terminates. Point x is not in the tetrahedron.
If the algorithm has not terminated, and we reach this step, We can use p1 as a measure of the distance from x to the triangle T since p1 is simply the ratio of the length of the line segment ( f, x) to the length of the line segment ( f, x') where x' is the point at which the line intersects T. Since ( f, x) has a fixed length, we can use the ratio as a measure.

In an actual computer system several things must happen in order to perform a ray tracing of a world object. First, areas that are not partially contained on the screen are clipped. One technique for doing this is disclosed as part of the algorithm for the standard perspective technique footnote5, then the real world coordinates for each pixel on the display screen is generated, then the ray from the focal point through the screen pixel is compared against triangles. The value of p1, our measure, is used for z-buffering to determine which points are in front of or behind other points.

Standard Perspective Problem Solved:

In the standard perspective problem we are given the following information:

The location of a single point t in world coordinates in three-space.
The location of the focal point f of an eye or a camera in world coordinates in three-space.
The location of the projection plane P in world coordinates in three space, (this is three points, the origin of the screen, the upper left of the screen and the lower right.).

Let the viewing area of P that we are interested in be represent by three coordinates in world-space. P = { r1, r2, r3 } .

That is, P is represented by the set of three vectors { r1, r2, r3 } . In "real world" situations, the vectors ( r2 - r1 ) and ( r3 - r1 ) are frequently chosen to be orthogonal, simply because the viewing area of the CRT is rectangular and has right angle corners. We can think of r1 as the lower left hand corner of the CRT, r2 as the lower right hand corner and r3 as the upper left hand corner. Let t be represented by the equation,

(E.) t = f + p1( v1 - f) + p2( v2 - v1) + p3( v3 - v1)

This equation has been generated differently than any of the above equations. It has been constructed this way in order to make two questions easy to answer.

Is t a point that can be projected onto P such that t', the projected point is in the rectangle determined by { r1, r2, r3, r2 + ( r3 - r1) } ?
What are the coordinates of t' in the two dimensional coordinate system determined by P where: footnote6
1. r1 is chosen to have the coordinates (0,0)
2. r2 is chosen to have the coordinates (1,0)
3. r3 is chosen to have the coordinates (0,1)

Now construct W = Inv( M ) from M as determined by equation (E.) above where we express M as:

M =

        | v1 -  f  |

        | v2 -  v1  |

        | v3 -  v1  |

We calculate W = Inv( M ) from M and let the columns of W be represented as:

W =

       |  c1  c2  c3 |

Now in order to solve the standard perspective problem we do the following:

Calculate p1 = ( t - f) * c1 . If p1 < 1 then the point t does not project onto the rectangle of interest on P and the algorithm terminates.
Calculate p2 = ( t - f) * c2 . If p2 < 0 or p2 > 1 then the point t does not project onto the rectangle of interest on P and the algorithm terminates.
Calculate p3 = ( t - f) * c3. If p3 < 0 or p3 > 1 then the point t does not project onto the rectangle of interest on P and the algorithm terminates.
If the algorithm has not terminated, then the point t does project onto P in the area of interest and t' has the coordinates t' = ( p2/p1, p3/p1 )

Optimizing the algorithm:

There are several ways in which the algorithm can be optimized. Without going into extensive detail we will give several of them here. Some properties of the triangles are invarient from different viewpoints. In the ray tracing problem only one of the rows of the three by three matrix M depends upon the viewpoint when the matrix is generated using the free-choice technique. Consequently, two rows of the matrix M are invarient and need be calculated only once for all viewpoints. This has the added benefit, that if the inversion is done by calculating the determinant by expansion of minterms, some of the minterms need be calculated only once for all viewpoints. Optimizations similar to the above can be realized for triangles that share a common edge.

One can avoid dividing by the determinant when calculating the inverses and postpone this computation until the comparisons have been made. In this case comparisons are made against the values of the determinant instead of one, and the divisions are only made if the containment has been established and we need to complete the computation.

Any set of four non-coplanar (non-colinear) points in three space can act as a basis for the space. From these four points we can find an affine transformation that takes any point in world space into a new coordinate system defined by these points. We can also find an inverse transformation that takes points from our new coordinate system back into world space.

The affine transformation shares the advantages of the homogeneous coordinate system with transforms. In both systems it is possible to multiply (concatenate) transforms in order to obtain a new transform equivalent to applying the transforms in sequence. Transforms for scaling, reflecting, rotating and distorting are easily constructed and combined and the inverses of these operations are easily found. This fact is not new, and has been mentioned in the literature before. The main arguments in favor of using a homogeneous coordinate system have been that, first, it reduces the complexity of code. With modern object oriented languages the code is only minimally more complex for the basic operations with a great increase in simplicity for more complex operations. Secondly, homogeneous coordinate systems are used because of the ease of explaining there use and for historical reasons.

This paper is an outgrowth of a larger research project in computer graphics that began with a 3-D modelling and display program that ran on an Apple II-E written around 1983. That system did not use a homogeneous system of coordinates and transformations but used affine transforms instead to translate points and calculate perspective. The free choice algorithms are a recent invention derived from that work. Containment is calculated by generating an affine transformation that takes the the point we are interested into a space in which simple comparisons can determine containment and the new coordinate system contains desirable information.

In that 1983 computer system, "cameras", that is coordinate systems associated with a viewpoint were stored or created from four points in three space. For orthographic projections the three vectors generated from these four points were orthogonal. Each camera could be used to generate an affine transformation that would move a vector from world space to camera space. Perspective transforms were done as they are traditionally done rather than as presented herein above. Transforms could be convolved into new transforms as they are in systems using homogeneous coordinates. Inverses of transforms could also be calculated to take a vector from camera space to world space. Transforms could also be converted back to cameras.

The system provided a simple language in which sequences of transforms and objects could be generated mathematically.

Some major advantages of the system were.

It was significantly faster than equivalent systems on the same machine.
Planes and lines did not need to be expressed in their cartesian form since they were easily expressed simply as two points or three points respectively.
Projections onto polygons were simplified.
The 30% overhead associated with storage of homogeneous coordinate system matricies was avoided.
Transformations from any camera (coordinate system) to any other camera in the system was easily and simply achieved.

On a completely biased note, I prefer the kind of system presented here not only because it is fast, but because the Mathematics of what is happening appears to me to be simpler and more transparent to a user of the system.

Appendix:

The Barycentric Coordinate Technique

Represent an arbitrary point x using the equation:

(F.) x = v1 + p1( v2 - v1) + p2( v3 - v1)

Now the point x is contained in the triangle only when all of the following conditions hold:

When 0 < p2 and
When 0 < p1 and
When p1 + p2 < 1 .

We call this technique, the common origin technique. Figure 3 This is based upon a traditional convention that when calculating a new basis for a space, the origin of the vectors originate from a common point.

This is equivalent to the using barycentric coordinates. Barycentric coordinates are based upon the fact that any point can be represented as the sum of the verticies of a triangle times a weight, where the sum of the weights equals one.

(G.)

x = p0v1 + p1v2 + p2v3
p0 + p1 + p2 = 1

Let p0 = 1 - p1 - p2, then (G.)-1 above becomes

x = (1 - p1 - p2) v1 + p1v2 + p2v3 x = v1 + p1( v2 - v1) + p2( v3 - v1)

This is identical to (F.) above.
Return to referance to appendix

Foot Notes

footnote 1, Triangle Containment Techniques Compared

Several of these techniques are compared by Eric Haines in Volume 5, Number 3, September 2, 1992, of the Ray Tracing News, an internet electronic journal. For triangles, his version of the barycentric coordinate system algorithm worked best for triangles, but the advantages were rapidly lost for polygons with more than three sides. We have some optimizations of the algorithm that drastically speed up these calculations.
Return to place document
footnote 2, Coordinates of CRT

That is, the three points { r1, r2, r3 } determine a rectalinear area in three space that we are interested in, possibly the origin, the upper left hand corner and the lower right hand corner of a CRT tube.
Return to place document
footnote 3, Translating pixel to world coordinates.

Translating pixel coordinates to world coordinates is easily accomplished. Let w0 be the world space coordinate of the origin of the screen at pixel coordinates (0,0) , let w1 be the world space coordinates of the top left of the screen at pixel coordinates (0,399) , let w2 be the world space coordinates of the bottom right of the screen at pixel coordinates (639,0) . .
The world space coordinates of (X,Y) is then
w0 + Y( w1 - w0 )/399 + X( w2 - w0 )/639.
Generate 640 values for x and 400 values for y and access them by their indicies to avoid frequent recalculations when generating scan lines.
Return to place document
footnote 4, Other free choice vector constructs

It is also possible to substitute a representation as in equation (E.) below for equation (D.) and change some of the checking conditions. This may be desirable in some real world programs.
Return to place document
footnote5, Other algorithms for clipping are known.

There are other more effective ways of doing this clipping when using BSD trees and other data structures that partition the world space, rather than depending on the algorithms presented here alone.
Return to place document
footnote6, Coordinates for projection plane should match CRT

This coordinate system is used to make reading this paper easy. In "real world" computer applications, the coordinates would be chosen to represent pixel coordinates on a CRT tube.
In otherwords, for a full screen 400 X 640 VGA system appropriate coordinates might be r1=(0,0), r2=(640,0), r3=(0,400).
Return to place document
footnote7, Definition of homogeneous

"homogenous", as used here does NOT mean a homogeneous system of equations, but derives from a projective geometry construct. Homogeneous coordinates are used in computer graphics as a redundant method of storing vectors. That is in 3-space a vector of length 4 is used to store a coordinate. The fourth coordinate is used to scale the first three coordinates. For example (12, 8, 4, 4), and (9, 6, 3, 3) both represent the coordinate (3, 2, 1).
In a homogenous system of coordinates, the equivalent of a matrix multiply, combined with a vector addition, is stored in a four by four matrix.
Return to place document

To MathVision Inc., (formerly 7seas software) Corporate page.

Appendix:

Foot Notes

footnote 1, Triangle Containment Techniques Compared

footnote 2, Coordinates of CRT

footnote 3, Translating pixel to world coordinates.

footnote 4, Other free choice vector constructs

footnote5, Other algorithms for clipping are known.

footnote6, Coordinates for projection plane should match CRT

footnote7, Definition of homogeneous