Topics in computer graphics

Ken Perlin



Images and colors

How does a computer store an image?

An image looks like a two dimensional array of pixels, nCols wide by nRows high, but it is generally stored in the computer as a one dimensional array of pixel values. The index into this array of a given pixel is:

row * nCols + col
In the illustration to the right, you can move your mouse over the image to select a single pixel. You can also select a pixel by moving your mouse over the linear array below the image. Each element of this array corresponds to exactly one pixel of the image.

How humans see colors

The human retina actually contains three different kinds of color receptor cells, each of which has a range of sensitivity to different parts of the visible light spectrum (which ranges from about 400 nanometers to 700 nm in wavelength). The illustration on the right shows an approximation to the color sensitivity of each of these types of receptors.

We approximate this in a computer by three numbers at each pixel: one for red, green and blue, respectively. But this is an imperfect approximation. In fact there are many combinations of [r,g,b] that the human eye cannot properly perceive, because our three types of color receptors overlap across the spectrum, as you can see by moving your mouse across the illustration to the right.

Complicating things even further, the "red", "green", and "blue" components of your computer screen each actually contain a mix of light frequencies.


Representing shapes with polygons

Cube

A unit cube is the cube that goes from -1.0 to +1.0 in all three dimensions. Its eight vertices are [-1,-1,-1], [-1,1,+1] ... [+1,+1,+1].

It is fairly easy to represent a unit cube with triangles.

Move your mouse over the illustration on the right to see one way to make a cube out of twelve triangles.

Cylinder

A cylinder can be approximated by triangles by constructing an approximation to a round tube and two approximately circular end caps.

Both the tube and the end caps will not truly be round, but rather will be built around an N-sided polygon which only approximates a circle. As N gets larger, the approximation will become better, while the cost of storing and computing the shape will go up.

You can see the appearance of different approximations by moving your mouse around the illustration to the right.

Sphere

There are many ways to approximate a sphere with triangular meshes. One way is by building a longitude/latitude grid.

Another way to approximate a sphere with polygons is to subdivide each of the six faces of a unit cube into a mesh, and then "inflate" the cube.

This inflation is done by scaling the length of each vertex [x,y,z] of the cube so that (x2 + y2 + z2) is equal for all vertices.

After this is done, all vertices will be the same distance from the origin, and the cube will deform into a sphere.

This method has the advantage that it is very simple to implement, while producing relatively similar sized polygons everywhere on the sphere.

Finally, if you want the result to consist only of triangles, you can split each of the squares into two triangles along one of its long diagonals.


Homogeneous coordinates

Represent a point in homogeneous coordinates, by [x,y,z,w], which is used to represent the physical location [x/w,y/w,z/w].

So a coordinate transformation could consist of X, Y and Z direction vectors and a position vector T as the translated origin. These can be written in a 4x4 matrix as shown below:

 

X
 
0

 

Y
 
0

 

Z
 
0

 

T
 
1


Forming a ray

Computing V

A simple place to put the camera eye point is along the positive z axis, so we can look at objects at the origin. If we place this eye point a distance of fl away (where "fl" is short for "focal length"), then we get:

V = [0.0, 0.0, fl, 1.0]

The homogeneous coordinate of V is 1.0, which indicates that V is a point, not a direction.

Computing W

We want the ray direction W to "scan" the image, as u and v each vary from 0.0 to 1.0. For any given value of [u,v], we can define a ray direction:

W = normalize([u-0.5, 0.5-v, -fl, 0.0])


Ray tracing to a sphere

Equation of a sphere, centered at c, with radius r:
(x-cx)2 + (y-cy)2 + (z-cz)2 - r2 = 0

Substitute components of ray equation into sphere equation:

Substituting:

x → Vx + t Wx
y → Vy + t Wy
z → Vz + t Wz
produces:
(Vx + t Wx - cx)2 + (Vy + t Wy - cy)2 + (Vz + t Wz - cz)2 - r2 = 0
which equals:
(t Wx + (Vx - cx))2 + (t Wy + (Vy - cy))2 + (t Wz + (Vz - cz))2 - r2 = 0
Regrouping terms, we get:

t2 ( Wx2 + Wy2 + Wz2 ) +

t ( 2 ( Wx (Vx - cx) + Wy (Vy - cy) + Wz (Vz - cz) ) ) +

(Vx - cx)2 + (Vy - cy)2 + (Vz - cz)2 - r2 = 0

which equals:
WW t2 + 2W•(V-c) t + (V-c)•(V-c) - r2 = 0

So we need to solve the quadratic equation for:

A = WW

B = 2W•(V-c)

C = (V-c)•(V-c) - r2

where the quadratic equation is:
t = (-B ± B2-4AC) / 2A
Since w is normalized, the value of WW is always 1.0, so the quadratic equation in this case simplifies to:
t = (-B ± B2-4C / 2

Interpreting the results:

If there are no real roots, then the ray has missed the sphere.

Otherwise, the smaller of the two roots is where the ray enters the sphere, and the larger of the two roots is where the sphere exists the sphere.

Using the root value to find the surface point:

Once we have found the smaller root t, we can substitute it back into the ray equation to get the surface point S:

S = V + t W


Simple lighting model:

Lights at infinity (like the Sun)

We are going to assume for now that light sources are infinitely far away, or at least far enough away that they can be considered infinitely far for practical purposes, like the Sun, which is 93 million miles from Earth.

This means that the direction vector L to the light source will have the same value for all points in the scene:

L = [ Lx, Ly, Lz, 0 ]

Lambert shading

We are going to assume for now that the surface is a perfect scattering diffuser, like moondust. In this case, incoming light from any direction scatters equally out to all directions. Such a surface is called Lambertian.

The brightness of a Lambertian surface depends only upon the cosine between its surface normal and the direction toward the light source.

Computing the surface normal

For a sphere, it is easy to compute the surface normal. It is simply the unit length (ie: normalized) vector from the center of the sphere toward the surface point:

n = normalize(s - c)
Once we know the surface normal vector n, then we can calculate Lambertian reflectance by:
[r,g,b] * max( 0, nL )

Where [r,g,b] is the color of the surface. There is usually some ambient light bouncing around in the scene in all directions. The above equation can be modified to account for an approximation to that ambient light:

[r,g,b] * (A + (1.0 - A) * max( 0, nL ))
Ambient reflectance is typically set to a low value, such as 0.1.


Reflections:

Physical model: clear plastic with embedded dye particles

There are many types of materials. One common type is plastic with embedded dye particles. Photons that penetrate the plastic will bounce off multiple particles. If the photon is not absorbed, it emerges in a random direction, to create Lambertian reflectance.

Those photons that simply bounce off the surface will create mirror-like reflections, if the surface of the plastic is smooth. This portion of the surface reflectance can be handled by computing a reflection ray, and then ray tracing again into the scene starting from the point of emergence.

Computing the reflection vector

Given an incoming ray direction W and a surface normal direction n, we can calculate the emergent reflection direction by:

R = 2 (-Wn) n + W

Shooting the ray (no recursion in shader programs)

We can therefore form a new ray, starting a small distance ε outside the sphere surface, as:

V' = S + ε R

W' = R

You should use a small positive value for ε, such as 0.001.

Adding in the reflection

The reflected ray can simply be multiplied by some [r,g,b] color and added to the final color of the surface point.

Note that you cannot have the ray tracer call itself recursively in the shader, because WebGL shader programs which run on the GPU do not permit recursive function calls. But you can call the ray tracer for the primary ray, and then call it again to compute a reflection ray.


Shadows:

Shooting the shadow ray

Shadows are a bit like reflections, but simpler. To compute whether a surface point is in shadow, simply shoot a ray toward the light source. If any object is hit by the ray, then the surface point is in shadow. The shadow ray is formed as follows:

V' = S + ε L

W' = L

Using the result

The effect of a surface point being in shadow is that the point will have only ambient reflectance [Ar,Ag,Ab].


Ray tracing to general second order surfaces

First a special case: ray tracing to an unit radius cylindrical tube

The implicit function for a unit diameter infinitely long cylindrical tube along the z axis is:

x2 + y2 - 1 ≤ 0
We can solve the ray equation manually, like we did for the sphere, as follows:

  • Substituting in V+tW for (x,y,z):
    (Vx + t Wx)2 + (Vy + t Wy)2 - 1 ≤ 0
  • Expanding out:
    (Wx2 + Wy2) t2 + (2 Vx Wx + 2 Vy Wy) t + (Vx2 + Vy2 - 1) ≤ 0
  • Which just equals:
    (WW) t2 + 2(VW) t + (VV - 1) ≤ 0
  • So to solve the quadratic equation in t:
    A = WW
    B = 2 (VW)
    C = VV - 1

But this would not give us the ability to transform the cylinder, nor would it generalize to other quadratic shapes, including cones, paraboloids and hyperboloids.

So instead, let's work on the general case of being able to transform any implicit quadratic function by 4×4 matrix, and then ray trace to it.

The general second order implicit surface:

Any quadratic implicit surface can be expressed as follows:

ax2 + by2 + cz2 + dyz + ezx + fxy + gx + hy + iz + j ≤ 0
The ten coefficients of this equation can be stored as a vector:
[a, b, c, d, e, f, g, h, i, j]
The unit sphere, the infinite unit cylinder and the infinite unit cone, respectively, can all be expressed in such a form:
Sphere: x2 + y2 + z2 - 1 ≤ 0 [1, 1, 1, 0, 0, 0, 0, 0, 0, -1]
Cylinder: x2 + y2 - 1 ≤ 0 [1, 1, 0, 0, 0, 0, 0, 0, 0, -1]
Cone: x2 + y2 - z2 ≤ 0 [1, 1, -1, 0, 0, 0, 0, 0, 0, 0]
Raytracing to a general second order implicit surface:

Ray tracing to the general form just requires substituting the ray into this equation to get a quadratic equation in t, exactly as we did above in the case of the cylinder:

  • Substituting in V+tW for (x,y,z):
    a(Vx+tWx)2 + b(Vy+tWy)2 + c(Vz+tWz)2 + d(Vy+tWy)(Vz+tWz) + e(Vz+tWz)(Vx+tWx) + f(Vx+tWx)(Vy+tWy) + g(Vx+tWx) + h(Vy+tWy) + i(Vz+tWz) + j ≤ 0
  • Expanding out:
    a(Vx+tWx)(Vx+tWx) + b(Vy+tWy)(Vy+tWy) + c(Vz+tWz)(Vz+tWz) + d(Vy+tWy)(Vz+tWz) + e(Vz+tWz)(Vx+tWx) + f(Vx+tWx)(Vy+tWy) + g(Vx+tWx) + h(Vy+tWy) + i(Vz+tWz) + j ≤ 0
  • Rearranging terms:
    (aWx2 + bWy2 + cWz2 + dWyWz + eWzWx + fWxWy) t2 + (2aVxWx + 2bVyWy + 2cVzWz + d(VyWz+VzWy) + e(VzWx+VxWz) + f(VxWy+VyWx) + gWx + hWy + iWz) t + (aVx2 + bVy2 + cVz2 + dVyVz + eVzVx + fVxVy + gVx + hVy + iVz + j) ≤ 0
  • So to solve the quadratic equation in t:
    A   = aWx2 + bWy2 + cWz2 + dWyWz + eWzWx + fWxWy
     
    B   = 2(aVxWx + bVyWy + cVzWz) + d(VyWz+VzWy) + e(VzWx+VxWz) + f(VxWy+VyWx) + gWx + hWy + iWz
     
    C   = aVx2 + bVy2 + cVz2 + dVyVz + eVzVx + fVxVy + gVx + hVy + iVz + j


Applying a linear transform to a second order surface

Let us first review the simpler case of linear surfaces: Half space P = (a, b, c, d) contains point p = (x,y,z), if and only if ax + by + cz + d ≤ 0. This can also be expressed as the product:

a b c d   •   x
y
z
1
  ≤   0

or, in other words, L pT ≤ 0. To find the plane that contains transformed points MpT, you need to replace L with LM-1.

Similarly, we can express quadratic equation

ax2 + by2 + cz2 + dyz + ezx + fxy + gx + hy + iz + j ≤ 0
as the following double product:

x y z 1   •  
af eg
0b dh
00 ci
00 0j
  •   x
y
z
1
  ≤   0

or in other words, p Q pT = 0, where Q denotes the 10 quadratic coefficients arranged into the above 4×4 matrix.

This means that if you want to find the three-variable quadratic equation that contains transformed points MpT, you need to replace Q by   (M-1)T Q M-1   because that will give you:

(M pT)T     (M-1)T Q M-1     (M pT)     =

pT MT     (M-1)T Q M-1     M pT     =

p Q pT

when you create the product of

(M-1)T P M-1
you can end up with non-zero values in all sixteen slots of your 4×4 matrix. This is because in addition to x*y, x*z, x*1, y*z, y*1, z*1 terms, you also end up with y*x, z*y, 1*z, z*y, 1*z, 1*z terms, as follows:
x*x x*y x*z x*1
y*x y*y y*z y*1
z*x z*y z*z z*1
1*x 1*y 1*z 1*1
Since multiplication of numbers is commutative, you can just add the six numbers in the lower left (shown in magenta above) into their mirror locations on the upper right, as follows:
AE IM
BF JN
CG KO
DH LP
    →
AE+B I+CM+D
0F J+GN+H
00 KO+L
00 0P
This will leave you with just the ten unique coefficients of the quadratic in three variables, in the upper right of your transformed matrix.

Computing the surface normal:

Once you have computed your transformed values for [a,b,c,d,e,f,g,h,i,j], you can use high school differential calculus to compute its vector valued derivative:

f(x,y,z) = ax2 + by2 + cz2 + dyz + ezx + fxy + gx + hy + iz + j

f'(x,y,z) = [  2ax + ez + fy + g  ,  2by + dz + fx + h  ,  2cz + dy + ex + i  ]

Just normalize this derivative to get the normal at the surface point:
N = normalize(f'(x,y,z))


Light sources not at infinity

Light sources do not need to be at infinity. A local light source can add interesting chiarioscuro effects to your scene.

Unlike light sources at infinity, a local light source at some position (x,y,z) in your scene will more brightly illuminate surfaces that are near to it, and less brightly illuminate surface that are farther from it.

If a local light source were an infinitely small point, then intensity drop-off from that point would be 1/r2, where r is the distance of the surface from (x,y,z). In reality, light sources, such as desk lamps, tend to have some non-zero extent, and so their intensity drops off with distance more slowly than 1/r2.

A simple computational trick to model this slower dropoff is to define a power p, where 1.0 ≤ p ≤ 2.0, and define the dropoff function as 1/rp.

Another visual trick you might use with local light sources is to give a very high ambient component to the surface material of one of your objects (eg: a sphere), and then place a local light source inside this object. The object will then appear to be a light source that illuminates the other objects in your scene.


Fog and atmosphere

Fog and atmosphere is very easy to implement and can make your ray traced scenes look a lot cooler.

In the presence of fog, objects gradually fade from view, as a function of their distance from your eye. Uniform fog (ie: fog with the same density everywhere) is quite easy to model and render. The key observation is that for any uniform fog there is some distance D that light will travel before half of the light has been scattered away by fog particles. The other half of the light continues along unobstructed.

Now consider what happens when the light which has made it through the fog travels further by another equal distance D. Half of the remaining light will now be scattered away, and half of the remaining light will make it through the second distance. In other words, the light that makes it through a distance 2D unobstructed is (1/2)2. And in general, the amount of light that makes it through a distance kD is (1/2)k.

To calculate uniform fog which scatters half of its light every multiple of D, you need to measure the distance d traveled by the ray (ie: the distance from the ray origin to the surface point that the ray hits). The proportion of this light that makes it through the fog unscattered will be t=(1/2)d/D. The effect of the fog can be shown by mixing t(surfaceColor) + (1-t)(fogColor), where surfaceColor is the color that you would have calculated had there been no fog. Some uniform shade of light gray, such as (0.8,0.8,0.8), generally works well for the fog color.


Move mouse up or down to change fog density


Simple boolean shape combinations

Trying to implement the general case of boolean shapes is very ambitious.

So for now let's just talk about the simplest case (and a very useful one): intersection of convex shapes.

A convex shape is one that is not reentrant. That is, once a ray has entered the shape, the ray can then exit the shape at most once, and then never reenter that shape again. For example, a cube and a cylinder are convex, but a donut is not, since a single ray can enter and then exit the same donut twice.

Unlike the general case, the rule when raytracing for intersection of two convex shapes is very simple:

  • Let A and B be two shapes.
  • Let V+tW be a ray.
  • Let the entry and exit roots of A be tA0 and tA1, respectively.
  • Let the entry and exit roots of B be tB0 and tB1, respectively.
The intersection A∩B is found as follows:
  • The ray will enter A∩B at max(tA0,tB0)
  • The ray will exit A∩ B at min(tA1,tB1)
Note: if max(tA0,tB0) > min(tA1,tB1), this means the ray does not intersect A∩B.

Example

To ray trace to a unit cylinder (a tube with front and back end-caps):

  • Start with the infinite unit tube:   x2+y2-1 ≤ 0
  • Then intersect with half space:   x - 1 ≤ 0
  • Then intersect with half space:   -x - 1 ≤ 0


A note on matrix inversion

In order to solve the general second order equation, you will need inverse matrices. There is a well known algorithm for inverting an n×n matrix by Gaussian elimination, but alas it is not a good fit for hardware accelarated shaders.

One way to deal with this is by building the inverse matrix in stages. Every time you do an operation that would transform the forward matrix, such as translation, rotation or scale, perform the equivalent operation to transform the corresponding inverse matrix.

For example, suppose you want to first translate by (tx,ty,tz), then rotate by θ about the x axis, then scale by (sx,sy,sz):

 1  0  0  tx 
 0  1  0  ty 
 0  0  1  tz 
 0  0  0  1  
 1  0  0  0 
 0  c -s  0 
 0  s  c  0 
 0  0  0  1 
 sx 0  0  0 
 0  sy 0  0 
 0  0  sz 0 
 0  0  0  1 
In the second matrix above, c = cos(θ) and s = sin(θ).

In shader code, this might look something like:

mat4 m = identityMatrix();
m = m * translationMatrix(tx,ty,tz);
m = m * xRotationMatrix(theta);
m = m * scaleMatrix(sx,sy,sz);

You can compute the inverse of the resulting matrix by applying the inverse of each successive step, each time multiplying on the left rather than multiplying on the right:

mat4 mInv = identityMatrix();
mInv = translationMatrix(-tx,-ty,-tz) * mInv;
mInv = xRotationMatrix(-theta) * mInv;
mInv = scaleMatrix(1./sx,1./sy,1./sz) * mInv;

The product m*mInv is guaranteed to be the identity matrix, so this method does indeed compute the inverse.

Using this approach, you can even, if you like, implement functions translate(x,y,z), rotateX(theta), rotateY(theta), rotateZ(theta) and scale(x,y,z) that each update both a forward and an inverse transformation.

Note also that when doing rotations, the following three rotation primitives are convenient building blocks:

Rot about x: Rot about y: Rot about z:
1  0  0  0
0  c -s  0
0  s  c  0
0  0  0  1
 c  0  s  0 
 0  1  0  0 
-s  0  c  0 
 0  0  0  1 
c -s  0  0
s  c  0  0
0  0  1  0
0  0  0  1


Procedural texture

Procedural textures provide a powerful and flexible way to create naturalistic and/or interesting variations in surface appearance. Once nice thing about them is that they allow you to use the (x,y,z) position of a surface point itself to tell you where to place the texture on the surface. This is particularly convenient for ray tracing.

You can create a great variety of naturalistic surface textures by using the band limited noise function (see below), which has the nice property that it contains variations only around a specific scale. This property allows you to combine calls to noise at different scales to create a great variety of texture effects.

For example, a procedural stripe pattern that would look very artificial by itself:

f(x,y,z) = sin(x)
can be make to look more natural by varying the position of the stripes through the use of noise:
f(p) = sin(px + a * noise(f*p))
In the above example, you can use f to vary the frequency of noise pattern, and a to vary its amplitude.

You can create variations in the stripe pattern at many different scales by adding sums of scaled noise:

f(x,y,z) = i ( sin(px + a * noise(2i*f*p) / 2i) )
You can vary this to create more realistic marble patterns by adding "creases" to the noise at each scale. You can do this by using the absolute value |noise(p)| rather than noise(p):
f(x,y,z) = i ( sin(px + a * |noise(2i*f*p)| / 2i) )

These are just a few examples. I encourage you to try your hand at making your own custom procedural textures.

More on the noise function:

Procedural texture and Phong reflectance:

You can use procedural textures to vary any aspect of the Phong reflectance computation. For example, you can use it to vary the ambient or diffuse or specular color, or the specular power (which creates variations in surface shininess).

You can also use it to vary the surface normal, to create the appearance of bumpy surfaces. To do this, you can first add a procedural texture to each of the x,y and z components of your surface normal vector, and then renormalize the surface normal vector to unit length.

You can also use a similar technique for mirror-like surfaces. Here you would use the same technique as above to vary the surface normal, and then use the resulting normal vector to compute your reflection vector.


Refraction

Snell's Law determines how much the light that enters a transparent material (such as glass, water or clear plastic) will change direction when it crosses the meterial's surface.

Every transparent material has an index of refraction n, which measures how slowly light travels through that material. The larger the value of n, the slower light travels through the material, according to the rule:

n1 sin θ1 = n2 sin θ2
where θ is the deviation of a ray's direction from the surface normal. The refractive index of air is essentially 1.0, since light travels pretty much as fast in air as it does in a vacuum. The refractive index of water is around 1.333, of alcohol is around 1.36, and of glass is around 1.5. The substance with the highest known refractive index is diamond, at 2.42.

As you can see in the diagram to the right, light coming from a less dense medium to a more dense medium bends more toward the normal direction. Light coming from a more dense medium to a less dense medium bends more away from the normal direction.

When going from a more dense medium to a less dense medium at progressively steeper angles, at some point the Brewster angle is reached. After this, the ray simply reflects back into the more dense medium. You can see this for yourself by moving your mouse within the lower half of the diagram to the right.


Layered fog

If fog is purely a function of y, then it is non-uniform, and therefore visually more interesting than uniform fog, yet also easy to integrate along a ray.

Here is an example of a layered fog density function, as a function of y (height):

d(y) = if y ≥ -π and y ≤ π
      cos(y) + 1
else
      0
The integral of this function with respect to y is:
dI(y) = if y < -π
      0
else if y ≤ π
      π + sin(y) + y
else
      2π
To compute the total density of a section of ray through this fog, we need to integrate the density along the ray (it doesn't matter which way -- we will be taking the absolute value of the result). Because the density varies only in y, the integral along the slanted ray is the same as the integral along a perfectly vertical ray, if it were "stretched" in length.

The effect of this stretching is to multiply the total density by 1/cosθ, where θ is the angle by which the ray deviates from the vertical.

A special case occurs when the two ends of the ray segment have the same value of y. In this special case, we can treat the fog the same way we would treat uniform fog.

Overview of algorithm:

  • Given ray origin V and surface point S:
    • Compute density integral d along vertical interval Sy...Vy
    • Compute stretch due to horizontal component: s = |V-S| / |Sy-Vy|
    • Transparency from V to S is (0.5)sd


Cone tracing

The basic idea of cone tracing is to replace the ray (V,W) by the cone, (V,W,α), where α is the radial angle at which the cone spreads out as it moves away from its origin point V.

For any distance t along the cone's medial ray, we consider not just the single point V+tW, but rather the entire disk around that point of radius t tan(α).

Using a cone instead of a ray accounts for the fact that the extent of a pixel is not a single point, but rather an area. Cone tracing is also useful for modeling diffuse reflection and refraction, such as reflection/refraction for non-smooth surfaces.

Tracing a cone to a sphere is slightly more complex than tracing a ray to a sphere. In addition to either hitting or missing the sphere, a cone can graze the sphere, in which case we need to approximate the fraction of the cone that intersects the sphere.

As John Amanatides described in his paper Ray Tracing with Cones, we can quickly compute whether a cone intersects a sphere as follows:

  • Given a cone with radial spread angle α, and a sphere centered at C of radius R, Compute the point P where ray V+tW most closely approaches C.

  • Consider the disk Dp = {P , t tan(α)} representing the cross section of the cone at distance t. This disk is centered at P and has radius t tan(α).

  • Consider also the disk Dc = {C, R/cos(α)} representing the visible extent of the sphere at distance t along the cone. This disk is centered at C and has radius R/cos(α).

  • The cone will not intersect the sphere if the sum of the two disk radii t tan(α) + R / cos(α) is less than the distance |P-C| from the cone to the sphere.

If the cone does intersect the sphere, then we obtain the fraction of the cone that has intersected the sphere b computing the intersection of disk Dp and disk Dc and comparing this value with the entire area of disk Dp.


Other kinds of implicit surfaces (advanced topic)

It is possible to create more sophisticated types of implicit surface functions, which allow more general shapes to be ray traced. The basic idea is to create individual functions that form simple shapes, such as spheres, but which, when summed, form surfaces that "melt" together into more complex continuous shapes.

The first person to publish a description of such a technique was Jim Blinn in his 1982 paper A Generalization of Algebraic Surface Drawing. Unfortunately, the technique described by Blinn could be computationally expensive for ray tracing.

In contrast, the metaball technique, developed by a team of graduate students under the direction of Koichi Omura at the University of Osaka in the 1980s, is much less computationally expensive to render by ray tracing. Given a point p, consider procedure D(p):
r = |p|

if r < 1/3
   1 - 3*r2
else if r < 1
   1.5 * (1-r)2
else
   0
This procedure describes a smooth function with both value and and gradient of zero at a radius of 1, and which is zero everywhere that r > 1.

We can create very complex smooth shapes by summing together many metaballs:

i ( ai D(Mip) ) - ε

where ai and Mi are the amplitude and transformation matrix of each metaball, respectively. Subtracting a small ε from the sum (eg: ε = 0.1) guarantees a smooth transition between shapes and a non-zero gradient at the surface.

Note that unless you want to stretch the metaballs, you don't really need to do a full matrix transformation for each metaball, since a metaball has spherical symmetry, and therefore rotation is not needed. Instead, you can just translate and scale, which is computationally less expensive.

To ray trace through such a sum of metaballs, you can find the quadratic boundaries along the ray of each transformed metaball. Each metaball will contain two such boundaries -- an inner one and an outer one, reflecting the three cases of the metaball procedure. Finding such a boundary is essentially just ray tracing to a sphere.

Within each of the regions formed between successive boundaries along a ray, the ray encounters only a single smooth function, which is a sum of smooth functions from individual metaballs. Finding the surface (if there is one) between two successive boundaries requires finding the roots of this function along the ray.

To shade a metaball surface you need to find its surface normal. To do this, first compute the gradient of the function at the ray/surface intersection. The surface normal can then be found by normalizing this gradient to unit length.



Improved noise in JavaScript:

I've ported my Improved Noise algorithm to JavaScript, as the file inoise.js.

You can use it to model shapes in various ways. For example, a bumpy spheroid might be implemented like this:

    var sph = function(u,v) {
       var theta = 2 * Math.PI * u,
           phi = Math.PI * (v - .5),
           cosT = Math.cos(theta) , cosP = Math.cos(phi) ,
           sinT = Math.sin(theta) , sinP = Math.sin(phi) ;
       var x = cosT * cosP, y = sinT * cosP, z = sinP;
       var r = 1 + noise(2*x, 2*y, 2*z) / 12
                 + noise(4*x, 4*y, 4*z) / 24;
       return [ r * x, r * y, 1.3 * r * z ];
    }

Feel free to use this function in your geometric modeling.


Triangle strips:

Triangle strips allow you to keep the transfer of geometry data from your CPU to your GPU down to rougly one vertex per triangle. In the version of gl.js I provided last week, g.TRIANGLE_STRIP is already enabled, so all you need to do is add vertices to your vertices array in the proper order.
 
A way to do this with a general M×N mesh it to scan through rows (0 ≤ n ≤ N) in the outer loop, and by columns (0 ≤ m ≤ M) in the inner loop, scanning from left to right for even rows, then back from right to the left for odd rows. At each step of the inner loop, append the two vertices [m,n] and [m,n+1] to the vertices array.

In order to avoid spurious triangles between rows, you can repeat the last vertex two more times in each row. This will create a degenerate triangle, which is harmless.

To the right is an example of such an ordering.


Bump mapping:

For fine perturbations of a surface, it can be very expensive to generate and render the large number of triangles required for true surface normal perturbation. Therefore for finely detailed bumps, we sometimes just use Bump Mapping, a technique first described by Jim Blinn about 40 years ago.

The basic idea is to modulate the surface normal, within the fragment shader, to reflect the changes in surface direction that would be produced by an actual bumpy surface. Since the human eye is more sensitive to variations in shading than to variations in object silhouette, this technique can produce a fairly convincing approximation to the appearance of surface bumpiness, at a fraction of the computational cost of building finely detailed geometric models.

To do bump mapping of a procedural texture T that is defined over the (x,y,z) domain (the noise function is an example of one such procedural texture), we can think of the value of T(x,y,z) as a variation in surface height. In order to simulate the hills and valleys of this bumpy surface, we subtract the derivative of T from the normal (because the normal will point toward the valleys), and then renormalize to restore the normal vector to unit length.

We can approximate the vector valued derivative at surface point (x,y,z) by finite differences (where ε below is some very small positive number):

p0 = T(x,y,z)

px = (T(x+ε, y, z) - p0) / ε
py = (T(x, y+ε, z) - p0) / ε
pz = (T(x, y, z+ε) - p0) / ε

which we can then use to modify the surface normal:
normal ← normalize( normal - vec3(px,py,pz) )

Advanced topic: Bump mapping from a texture image is more tricky, since you need to know the direction to move the surface normal with respect to changes in the u and v directions of the texture, respectively. Note that in the implementation of function createParametric(), I already compute exactly those quantities for each vertex, as ux,uy,uz and vx,vy,vz, respectively. These two direction vectors, which are both tangential to the surface, are sometimes called the tangent and binormal vectors.

In order to do bump mapping from a texture image, you will need to include the tangent and binormal vectors as part of your vertex data for each vertex. This will increase the data size in each element of the vertices array from 10 numbers to 16 numbers. Only try this if you are very confident, and you think you understand the interface to WebGL.


Forward kinematics

Often we want to create hierarchical mechanisms. Such hierarchically structured mechanisms generally use forward kinematics, in which transformations form a tree structure that descends from a single root.

Here is a swinging arm, a simple example of forward kinematics.

Given a matrix object m that has methods to implement translate, rotate and scale, as well as the ability to maintain an internal stack of matrix values via push() and pop() methods, the swinging arm is implemented via the following code:
   m.push();
      standardView(.5,.5,0,0,.5);
      m.translate(-.5,2,0);
	 m.push();
         m.rotateZ(sin(3*time));
         m.translate(0,-1,0);
	 m.push();
	    m.scale(.1,1,.1);
	    unitCube();
	 m.pop();
	 m.translate(0,-1,0);
	 m.rotateZ(.5 - .5 * cos(3*time));
	 m.translate(0,-1,0);
	 m.push();
	    m.scale(.1,1,.1);
	    unitCube();
	 m.pop();
      m.pop();
   m.pop();

For clarity, I implemented the above example using push and pop methods. But if you want to create a system that allows users to put together their own object hierarchies, you are better off using explicit objects.

In such a scheme, each object would have its own matrix transformation, and would also maintain a list of child objects. The transformation of a child object would be relative to its parent, thereby forming a tree of object nodes.


Animation over time -- key frame animation

When creating animations, it is often convenient to specify values only at certain frames, and then use smooth curves to interpolate values at the frames between these key frames. In-betweening with ease curves which start and stop with zero derivative, such as 3t2-2t3 produce natural looking interpolations.

Here is a hand that can be animated by setting key frames.

To show different animations of the hand, type "a1" or "a2" or "a3" followed by the space key. You can also read the on-line instructions on that page to learn how to vary the key-frame animation.


Inverse kinematics

Inverse kinematics is the opposite of forward kinematics. In an inverse kinematic system, you start with the location of an end effector (such as a hand or a foot or a robot gripper), and the system needs to figure out the corresponding chain of intermediate matrices from the root of the object hierarchy.

There are two common approaches to inverse kinematics, general N-link IK and special purpose 2-link IK. The N-link approach requires an iterative solver, whereas the 2-link approach has a closed form solution, with no need for iteration.

To talk about 2-link IK, consider the case of an leg, where the locations of the hip and ankle are known, and we need to find the location of the knee.

To make the math simpler, let's assume a hip located at the origin [0,0,0], an upper leg of length A and a lower leg of length B. If the ankle is located at some point P, we want to find the knee location Q.

Knee location Q can lie anywhere on the circle of intersection between two spheres: The sphere of radius A centered at hip location [0,0,0] and the sphere of radius B centered at ankle location P.

To pick the best point on this circle, we need to specify an aiming direction D that says which way is "forward" (since knees generally point forward).

What we need is a function that computes Q, given A, B, P and D.

Here is my solution to this, given a two-link chain at the origin, with limb lengths a and b, respectively, reaching toward point C, with elbow aim hint vector D:

   c = C•C
   x = ((a2-b2)/c + 1)/2
   D -= C(C•D)/c
   y = √max(0, a2 - x2c)
   Q = xC + yD/|D|
My implementation of this in Java is here. Feel free to use and adapt my algorithm.


Boids

There is an entire sub-field of computer animation devoted to swarms and particle animation. One historically important example of this was Craig Reynold's Boids, which he first published in 1987. This technique for simulating herding and flocking behavior showed convincingly that a few simple procedural rules can create the impression of compelling group and social behavior. In cinema, this technique was first used in the 1992 feature film Batman Returns, and has since become a staple of the movie and game special effects industry.


Procedurally animating a mesh over time:

You can create a procedurally animated mesh, in which you define a VertexFilter. Behind the scenes this object loops through a mesh's vertices one by one, handing each one to a method filterVertex(double[] v) that allows you to modify their x,y,z values in place.

If you implement something like this in JavaScript, you will need to keep two copies of your mesh: The original unmodified mesh, and the one that gets copied from the original and then vertex-filtered at every animation frame.

When you modify the mesh, you will end up needing to change the vertex normals. To recompute vertex normals for a mesh, you can use the following algorithm:

  1. Compute the normal to the plane of all the mesh's faces. You can do this for any polygonal face with vertices v0 ... vn-1 by summing the cross products of successive edges: (vi+1mod n - vi) × (vi+2mod n - vi+1mod n)

  2. For every vertex, sum the face normals of all faces adjoining that vertex, then normalize the result. This will give you a good approximation to the normal at that vertex.

If you are feeling ambitious, you can also try implementing this sort of filter in a vertex shader. In that case, you will need to be a bit more clever about modifying vertex normals. For example (since you will have greater compute power to work with), you can do finite differences to compute the new surface normals.


Making footsteps:

Here is a trick for placing feet during a procedural character walk.

The trick is to "freeze" time periodically for each foot position.

As the character's body moves along a path over time, each foot moves alongside the body as well, but the position of the foot is periodically "frozen" while the foot is pressed against the ground and supporting the weight of the body.

This is done by modifying the time argument, by adding a sawtooth function to the time value passed to that foot:

      timeL += saw(time)
      timeR += saw(time+1)
where:
      function saw(t) = { t%=2; return t<1 ? t : 2-t; }

Layered keyframe animation:

Layered animation allows you to create layered transparency for parts of movement, in a way analogous to how PhotoShop lets you do layered transparency for just some pixels of an image but not others.


General n-link inverse kinematics:

Here is a simple way to do N-Link inverse kinematics, given a chain of N+1 points P[0]....P[N], building on our two-link IK algorithm:

For each link in the chain i = 1 through N-1:
  • Move the end effector P[N] fractionally toward the goal.

  • Compute a new position for P[i] by solving for (P[i-1],P[i],P[N]) as a two-link IK chain.

  • Transform all points P[i+1] through P[N-1] to match the translation+rotation defined by the new locations of P[i] and P[N].
We can vary the compliance at each joint by adjusting the fractional amount that we move toward the goal at each step of the loop.

Introduction to particle systems:

Examples of uses of particle systems:

This week we just scratched the surface of particle systems. Next week we will go into more detail about this rich topic. Meanwhile, here's a high level introduction to the subject.

Particle systems are very flexible; they can be used to simulate many natural phenomena, including water, leaves, clouds/fog, snow, dust, and stars.

When they are "smeared out" so that they are rendered as trails, rather than as discrete particles, they can be used to render hair, fur, grass, and similar natural objects.

Basic mechanism:

Generally speaking, particles in a particle system begin by being emitted from the surface of an "emitter" object. When a particle begins its life, it has an initial trajectory, which is usually normal to the surface of the emitter object.

After that, the path of the particle can be influenced by various things, including gravity and other forces, and collisions with object surfaces.

Particles usually have a lifetime, after which they are removed from the system.

Also, a particle can itself be an emitter of other particles, spawning one or more other particles in the course of its lifetime. In this way, particles can be made to cascade, generating complex patterns such as flamelike shapes.

All of the qualities of a particle -- its lifetime, its velocity and mass, how many particles it spawns, can be a randomly chosen value within some range. By controlling the ranges from which these various properties are chosen, artists can control the look and feel of a particle system.

History:

Particle systems were first developed by Bill Reeves at Pixar in 1981. Its first public use was for the Genesis Effect in Star Trek 2, the Wrath of Khan 1982. Since then, it has become a mainstay of computer graphic films and games.

Rendering:

One nice thing about particle systems is that they are not that difficult to implement in vertex shaders. In addition to their behavior, their appearance can also be hardware accelarated. One common technique is to render each particle as a "billboard": a polygon that is always perpendicular to the camera. This polygon is textured with a translucent image of a fuzzy spot. The effect is to make the particle look like a small gaseous sphere, but at fairly low computational cost.


Linear blend skinning:

Here we discuss an approximation to animating the soft skin of game characters which is cheap and can be implemented very easily in vertex shaders.

In an animated character, the rigid bones of the character's articulating skeleton are generally covered in some sort of soft skin. A fairly accurate way to model this skin would be to think of each point on its surface (approximated by the vertices of a polygon mesh), as being influenced by the various rigid matrix transformations of nearby bones in the skeleton.

To do this properly, one would compute a composite transformation matrix that was influenced by all of those individual bone matrices. However, in practice this is a more expensive operation than can be accommodated in the real-time rendering budget of game engines.

So most games instead do a kind of cheat called linear blend skinning. The basic idea is to compute the matrix transformation of each vertex as a part of each of the various nearby bones. This will result in a different position for each bone. Then these positions are blended together into a weighted average to find the final position for the vertex.

To make this work, each vertex maintains a list of [bone,weight] pairs, where all of the respective weights sum to 1.0.

This technique is very fast, and very easy to implement efficiently in hardware accelarated vertex shaders, but it has some practical deficiencies. For example, twisting between the two ends of a limb can cause the middle of the limb to appear to collapse. To handle cases like this linear blend skinned skeletons are rigged with extra bones to mitigate the effects of such problems.


Marching Cubes:

Marching Squares (2D case):


Given a function f(x,y), where (x,y) are pixels in an image, marching squares is a way to approximate the curve along f(x,y) = 0.

For example, consider the function below (which you can edit), evaluated over the unit square:


To the right you can see a very low resolution (10×10) rendering of this function. Suppose we want to know the shape of the curve where this function has its roots (that is, where f(x,y) = 0).

Ideally we'd like to know this without having to evaluate the function at more samples.


Marching squares provides a way to get a sense of what a level-set curve of a unction looks like, without taking more samples.

The key insight is that the curve can be approximated just by looking at those pixels bounded by corner points (i,j),(i+1,j),(i+1,j+1),(i,j+1) for which the signs of f at the four corners are not all the same. If the signs of f are different at two adjoining corner points of a pixel's square, that means the curve will cut the edge which connects those two corners.

One thing we need to figure out is where :his transition happens along each such edge.

Given a value of A at corner a, and a value of B at adjoining corner b, we can compute the proportional distance t of the transition point along the edge [a,b] by observing, by similar triangles:

     t/A = (1-t)/-B

     -Bt = (1-t)A

     -Bt = A - tA

     (A-B)t = A

     t = A / (A-B)

Each corner can have two states: f<0 or f≥0, so in general, there are sixteen cases, as shown in the diagram to the right. Consider the second case along the top row of the diagram, where f at the top left corner (i,j) of a pixel is positive, but is negative at the other three corners of the pixel.

In this case, there is a transition point p along the top edge -- between (i,j) and (i+1,j), and another transition point q along the left edge -- between (i,j) and (i,j+1). Within this pixel, we can approximate the f(x,y)==0 curve by the line segment [p,q].

So that for any pixel we need to do three things:

  1. Figure out which edges, if any, of the pixel contain transition points
  2. Compute the locations of these points;
  3. Draw line segments between transition points, to approximate pieces of the curve.

Marching Cubes (3D case):

Marching cubes is the 3D equivalent of marching squares. Rather than approximate a closed curve where f(x,y)=0 via small straight edges inside square pixels, as in marching squares, the marching cubes algorithm approximates a closed surface where f(x,y,z)=0 via small triangles inside cubic voxels. The technical paper describing this algorithm, published by Lorensen and Cline in 1987, has been cited more often than any other paper in the field of computer graphics.

Each voxel cube has eight corners, which can be numbered as follows:

0 x=0 y=0 z=0
1 x=1 y=0 z=0
2 x=0 y=1 z=0
3 x=1 y=1 z=0
4 x=0 y=0 z=1
5 x=1 y=0 z=1
6 x=0 y=1 z=1
7 x=1 y=1 z=1

Because the value of f(x,y,z) at each of these eight corners can be either positive or negative, there are 28 or 256 cases to consider. These are shown in the figure to the right.

I have included a table to make things easier for you. The table has 256 entries, one for each of the 256 cases. Each entry contains between 0 and 4 triangles, which is the number of triangles that will be produced by the marching cube algorithm for a voxel of that type.

Each triangle is described by the three edges of the cube that contain its respective vertices, and each vertex is described by identifying one cube corner as well as the orientation of the cube edge that contains that vertex.

For example, a particular vertex of a triangle in the table may be described by the number sequence 0,1, indicating that this vertex lies on edge [0,1] of the cube. This is the edge that connects the x=0 y=0 z=0 corner of the cube and the x=1 y=0 z=0 corner of the cube.

Marching Tetrahedra (simpler to implement, less efficient):

To avoid the big table look-up of Marching Cubes, a technique I've used is to split up each voxel into six tetrahedra. Given the same corner numbering we used for Marching Cubes, we can partition the voxel cube by "turning on" the binary bits of the numbered corners in different orders, giving the six tetrahedra:
Since a tetrahedron has only four edges, there are only two non-trivial boundary cases: (1) the boundary is a single triangle, or (2) the boundary is a four sided shape, which can be split into two triangles.

This algorithm is less efficient than Marching Cubes, because it generally produces more triangles for each boundary cube. However it requires much less code, and therefore is easier to program, to debug, and to port to a vertex shader.