POST-PROCESSED SHADOW DETERMINATIONFORCOMPOSITION OF DEPTH IMAGESByRussell W. KrywoltB. Sc. (Computer Science) University of LethbridgeA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinTHE FACULTY OF GRADUATE STUDIESCOMPUTER SCIENCEWe accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIAOctober 1993© Russell W. Krywolt, 1993In presenting this thesis in partial fulfilment of the requirements for an advanced degree atthe University of British Columbia, I agree that the Library shall make it freely availablefor reference and study. I further agree that permission for extensive copying of thisthesis for scholarly purposes may be granted by the head of my department or by hisor her representatives. It is understood that copying or publication of this thesis forfinancial gain shall not be allowed without my written permission.Computer ScienceThe University of British Columbia2075 Wesbrook PlaceVancouver, CanadaV6T 1Z1Date:1 5—^i cict. AbstractIn the creation of computer generated pictures, many methods have been proposed toreduce the amount of time it takes to render an image or sequence of images. Effortshave been made to improve existing rendering algorithms and to create better ones, butthe compositing method has received little attention. The compositing method breaks ascene up into parts which can be rendered separately which are then put together in such away that the result is nearly indistinguishable from an image rendered from the completescene. This can not only reduce the total time taken to generate an image sequence, butcan also create composite images from others rendered by different methods. In 1984,Porter and Duff first introduced a well defined approach to computer image composition,and in 1985, Duff proposed an extension to the this method that employed depth images.Using depth images, visibility determination can be done by the computer, reducing thehuman effort required to produce the resultant image. The images produced are likely tobe missing some elements that would be present if all objects were rendered in a singlescene, such as shadows from objects in one image not being cast on objects from otherimages.This thesis addresses the problem of modifying composited images such that theresultant picture contains shadows that would normally only be present in an imagerendered from a scene containing all objects in both original pictures. A simple algorithmis developed to reconstruct the visible surface of objects in an image. The reconstructionis used to generate a shadow map for each image to be composited. The shadow mapindicates where shadows would fall on one image were objects in the other image presentwhen the original scene was rendered. The shadow map is then used to create shadows in11appropriate places on the original images, and these reshadowed images are compositedto obtain the desired result. The application of these techniques to other areas is alsoinvestigated.111Table of ContentsAbstractList of Figures^ viAcknowledgement^ viii1 Introduction^ 12 Background to Image Composition^ 83 Enhancements to simple depth composition^ 143.1 Compositing with centre sampled pixels ^143.2 Generating antialiased images for composition ^183.3 Compositing pixels with equal depths ^ 224 Visible surface reconstruction^ 264.1 Simple surface reconstruction ^284.1.1 World coordinate retrieval ^284.1.2 Surface construction from world coordinates ^314.2 Image segmentation ^ 354.2.1 The ideal solution ^364.2.2 Possible approaches ^365 The reshadowing process^ 405.1 Creating the shadow map 40iv5.2 Filtering the shadow map ^ 425.3 Using the shadow map 445.4 Reshadowing already shadowed images ^ 456 Other applications and extensions^ 496.1 CAR - Computer Augmented Reality 496.2 What now? ^ 547 Conclusions^ 56Bibliography^ 58A Depth Files^ 62A.1 Optik depth files ^62A.2 Other uses of depth files ^64List of Figures2.1 Table of 2D compositing operations from [port84] ^103.1 Division of pixel into four triangles and related depth values given by thedots at the corners ^153.2 Division of center sampled pixel into eight equal areas with associateddepth values ^173.3 Sampling values within a single pixel ^193.4 A red cube composited with a gray sphere. The first image shows theoverall appearance, the second shows a magnified section where coloursampling problems appears 213.5 Pixels partially covering a square polygon with sampling at the lower leftcorner ^213.6 The red cube and gray sphere from figure 3.4, but with proper depthsampling used ^ 223.7 Interpenetrating squares of differing colours ^234.1 Reshadowing problems created by not segmenting images before perform-ing surface reconstruction ^ 274.2 Viewing vectors for the scene 294.3 Vectors for calculating world coordinates ^ 30vi4.4 Reconstructing four triangles in one pixel from corner sampled depth data:black circles are known depth data, they grey circle is the interpolateddepth data ^ 324.5 Reconstruction of triangles for centre sampled depth data: black circlesare known depth values, grey circles are interpolated depth values, arrowsshow which known values contribute to the interpolated values ^334.6 Edge problems when reconstructed objects slope sharply away from thecamera. Original view is on the left, 90 degree rotation on the right. . . .^344.7 Edge problems for sharply sloped objects. Left shows standard reconstruc-tion, right shows reconstruction where sampled depth values are assignedto all parts of the pixel. ^354.8 Original image ^374.9 Associated depth map shown as greyscale ^ 385.1 A shadow map created by the modified raytracer ^ 425.2 A shadow map after filtering has been performed 435.3 The result of applying a weighted filter to the shadow map ^ 445.4 Rendered and composite images of the same scene. The rendered imageis on the bottom ^ 46viiAcknowledgementI would like to thank my supervisor, Alain Fournier, for providing many interestingproblems for me to work on, and for his understanding when I spent time on problemsand procedures not related to my thesis. In spite of having many demands on his time,he was always able to assist me when necessary. My thanks to David Forsey, not onlyfor being my second reader, but also for encouraging me to explore all of the aspectsof computer graphics that I found interesting. I would especially like to acknowledgethe work and effort put in by everyone in the Computer Science Department in order tocreate the pleasant and relaxed research atmosphere of which I have become so fond.Russell KrywoltOctober 1993.viiiChapter 1IntroductionThe generation of computer created images has always been limited by the speed atwhich such pictures can be rendered. While improvements in hardware have reducedthe time it takes to render an image by orders of magnitude, the scenes that are beingcreated today are vastly more complex than those made even five years ago. As a result,the increased computing power is being used to its fullest, and the average renderingtime for a typical scene today has not been reduced in proportion to the speed increaseof computers, and has remained fairly constant for those images with a high complexity.Since the trend of creating more complex images as faster hardware becomes available islikely to continue into the foreseeable future, computer graphics researchers have turnedto other methods to attempt to reduce rendering times. These efforts can be grouped intotwo general categories, the first being developing algorithms and strategies to reduce thecomputational complexity of rendering a scene or series of scenes, and the second splittingthe rendering effort into parts, rendering only those portions of the scene that changewith time, and then compositing, or assembling the parts of the images into a coherentwhole. The second method is obviously better suited to creating series of images thatare temporally related (an animation) rather than single pictures, while the first methodcan be applied to both still frames and animated sequences. It is for this reason thatmost of the effort put into decreasing overall rendering times has been directed towardsreducing the number of calculations done by a computer.1Chapter I. Introduction^ 2Despite the drawbacks of the compositing method when applied to single images,animation rendering times may be greatly affected. Since in many animations there is aconsiderable amount of the image that remains static as time progresses, the computerneed only render the unchanging portion (or background) of the animation once, storing it,rendering the parts of the scene that change from one frame to the next, then compositingthe two portions together on a frame by frame basis to produce the animation. Thismethod is best illustrated by looking at any of the Walt Disney classic animated films,such as Snow White and the Seven Dwarves, Pinocchio, and others. In these films, vastand elaborate backgrounds were painstakingly created. Trees were painted in such detailthat veins on leaves could be clearly seen, and features of other background elements weresimilarly detailed. The animated characters were then created a frame at a time on clearcelluloid, called and animation cell. Each cell was then placed over the background inthe proper position, and photographed to create a single frame of film. Similarly detailedforeground cells could be placed over the background and character if needed, creating amulti-layered scene where the motion was restricted to few cells in different layers. Thisprocess allowed the animation to be done much faster than if all the objects in everyframe were repainted each time, as well as letting the artists create much greater detailin the background than would have been otherwise possible.Another use for compositing is to form a single image from several images renderedby different processes. General purpose modeling and rendering software packages arebased upon established algorithms, and are thus not often useful in a research environ-ment. Current research can produce innovative modeling or rendering methods that justcannot be done with commercial software. Most often these creations are of a singularnature, that is, they are small pieces of software only intended to illustrate new waysof doing things. Many times the results of this research will somehow be used by otherresearch based software, or perhaps by commercial software. As an example of this,Chapter 1. Introduction^ 3the hierarchical B-spline surface modeler used in [fors88] has proven to be such a usefulmodeling tool that it is often used at Imager, the computer graphics laboratory at theUniversity of British Columbia, to create objects that are then exported to a commercialanimation system. This is done by approximating the B-spline surface by a triangularmesh. While this mesh representation is fairly accurate, it is not perfect, and meshes ofeven simple B-spline objects can be large, slowing down both the modeling and renderingof the commercial software. Depending on the use of the B-spline surfaces, in many casesit would be easier and more accurate to render the B-spline object without having toconvert it to a triangular mesh, and then composite it into an image rendered from thescene created by the commercial package. This use of the compositing technique wouldallow the creation of complex images that would otherwise be difficult to make.The first attempt to systematize the application of this method to computer graphicswas reported by Porter and Duff in 1984 [port84]. They took the multi-layer approachof Disney and applied it to computer graphics, enabling animators to render those partsof an animation that could be considered to belong to the same layer, if they were cells,render them separately, and then merge them such that the each rendered frame (fromthe 'top' layer down) obscured the one below it, but only where pixels held interestinginformation 1. In order to enable the computer to decide what information in a framewas considered important, Porter and Duff used what they termed the alpha channel ora-channel, which is the computer analogue of the matte signal used in video compositing.The a-channel can be thought of as a separate image of the same size as the original,but having a white pixel at the same coordinates as the image for a pixel that containedinteresting information, and a black pixel where the image had no important colourinformation. The a-channel value could also take on intermediate values between 1'Interesting information in this case refers to any area of the frame the animator decided was impor-tant, whether it contained colour information or notChapter 1. Introduction^ 4(white) and 0 (black) to indicate a transparency or partial coverage of a pixel in theoriginal image. If a pixel with an intermediate a value was composited over top ofanother pixel, the resultant image would contain a pixel that was a blend of both pixelcolours, with the upper pixel contributing a times its colour, the lower (1 — a) times itscolour. Porter and Duff also defined several other compositing operations not normallyused in cell animation, such as all boolean combinations of images ( and, or, not, xor,etc) as well as other operators that combine the upper and lower layers in various ways.These operations, however powerful, are not always adequate. For example, if a movingcharacter is supposed to be weaving in and out of various parts of the scene, many layershad to be constructed, and their operations specified exactly, in order to get the rightlook and feel for the final animation. This time consuming process would be simplifiedif all of the static portions of the animation could be rendered at once, rather than inlayers, and have the moving parts of the scene composited into it, with the computertaking care of any visibility problems. This can be done using a modified version ofthe popular Z-buffer visibility determination algorithm, as illustrated by Duff in [duff85].Duff's method requires that for each pixel in a rendered image, a corresponding depthvalue is stored. This depth value represents (in some form) the distance the pixel isfrom the eye plane. When compositing images, the depth information of correspondingpixels in each image is compared, and the pixel with the smallest depth is displayed inthe resultant image. This allows parts of animations to be composited easily, with someincrease in storage.While this method provides an easy way to composite images without much userintervention, the images produced can lack some elements that would be present if themodels of the two scenes were combined and then rendered. In many cases, if a scenewas rendered all at once, shadows from some objects would fall on other objects, but ifthese shadowing objects were rendered separately from the rest of the scene and thenChapter 1, Introduction^ 5composited, those shadows would not appear in the resultant image. Likewise, reflectionspresent in a rendered image may be absent in a composited image. This thesis will focusupon the former problem, presenting a solution that will allow the reshadowing of imagesto occur when compositing to give the composite image the same shadows as wouldappear in a rendered image of the same scene.When considering the complete reshadowing problem, many aspects of it must besolved before a complete solution is achieved. Because the scope of this thesis is insuffi-cient to offer a complete solution to each and every aspect of the general problem, somerestrictions are placed on the images to be reshadowed and composited. When solutionsto the various problems that necessitate the restrictions are found, the restrictions canbe removed and the techniques described in the thesis can be directly applied to helpsolve the complete problem.The difficulty of discerning what objects are present in a scene (detailed in Chapter 4)gives rise to the first restriction. If two images are to be reshadowed such that shadowsfrom one scene are to fall on the other, but not vice versa, then the scene that is causingthe shadows must be made up only of non-occluding objects, but the other may be ofarbitrary complexity. If the reshadowing is to be mutual, then both images must containonly non-occluding objects 2. The second restriction on images is that all must have thesame viewing parameters (such as image dimensions, eye position, eye roll angle, andsuch). The last restriction requires that all lighting parameters be the same (that is,number of lights, position, colouration etc.).Since the initial restriction placed on the images has two cases, the outline of thesolution is slightly different. In the first case, where only one image is used to reshadow2Consider the case where a ball moves above a plane such that the ball never occludes the plane, butcasts a shadow. If the ball and plane were two separate images that were to be reshadowed, only theimage of the plane need be reshadowed. Again, please see Chapter for a complete explanation of whythis is necessary.Chapter 1. Introduction^ 6the other, it is first necessary to get some notion of the original three-dimensional shapeof the objects in the restricted (non-occluding) scene. To do this, world coordinates foreach visible pixel of every object are retrieved using the stored depth data and existingray-tracing techniques. These world coordinates are used to create a simple triangulateddescription of the visible surface of each object. Once this surface description is createdit is passed to a modified ray-tracing renderer. If there are no problems with the recon-struction, a shadow map is created by the ray-tracing software. This shadow map is animage with the same dimensions as those being reshadowed but with all pixels havinga full white value. The world coordinates of each visible pixel in the other (arbitrarilycomplex) scene are retrieved by the same method as used in the non-occluding scene,but without reconstruction. From each of these coordinates, the ray-tracer sends a rayto each light in the scene (since the assumption is that light positions are known). If anyray intersects a triangle in the reconstruction of the non-occluding image, a black pixel isplaced in the shadow map at the same location as the pixel (from the arbitrarily compleximage) whose world coordinate was the origin of the ray. The shadow map is filtered tosmooth sharp intensity transitions, and overlaid on the arbitrarily complex image. Thisoverlaying causes pixels in the image to have their intensity reduced in proportion tothe amount of black present in the corresponding pixel of the shadow map. This is theactual reshadowing step, and once performed, the two images are composited (using themodified depth composition method discussed in Chapter 3) to give the final result.The second case is solved in a similar manner, but the process is repeated for bothimages. Thus, the first image is initially treated as the complex image (even though it isnot), and reshadowed. The images then reverse roles, the second image is reshadowed,and the images are composited.Both methods of solution are discussed in detail in this thesis, and the results oftheir implementations are presented. The more difficult problems facing the solution ofChapter 1. Introduction^ 7the general case are discussed, but not implemented. Applications of this technique tothe merging of real video and computer generated images are also discussed, along withrelated work currently being undertaken. Conclusions are made about the suitabilityof the developed methods to the general problem, and directions for future work areoutlined. Finally, other applications of the depth compositing algorithms, along withother uses for depth images, are discussed and implemented. It should be noted thatall of the steps mentioned previously are implemented as separate programs. Whilethis makes the process more cumbersome than if everything was automated, it allowsflexibility in the choice of other methods that may be implemented and tested with thesystem. Also, should any other uses be found for the parts of the reshadowing process,it enables easy exporting of the desired results without having to rewrite a large sectionof a big program.Chapter 2Background to Image CompositionThe use of image composition techniques in computer graphics has evolved for a numberof reasons. Composition provides a simple way to reduce the total time taken to producean animation by allowing static parts of an animation to be rendered only once, andhaving the moving components rendered as needed, then put together to make the finalscene. Using image composition, it is also possible to combine images rendered using dif-ferent methods into a coherent whole. Since many tools that exist for specific purposesdo not work in conjunction with complex general purpose renderers (or other tools), com-position allows effects from all of these sources to be combined in an image or animation.Computerized image composition also mimics the sophisticated (and sometimes expen-sive) video composition tools used in video post-production, allowing computer artistsand animators access to the effects without having to purchase separate equipment.While image composition has been in use for a long time in the film and video indus-try, most people who wished to do computer compositing ended up writing specializedsoftware to do specific tasks. In 1984, Thomas Porter and Tom Duff put forward anexplicit method for compositing computer generated images ({port841). This methodrelied upon the use of the a-channel. The amount by which an object covers a pixel isdetermined by the a value of that pixel. If pixel p in image a had an a value of 1, pwould be completely covered by this object. If p in a had an a value of 0, then no objectswould cover that pixel. Likewise, if the a value of p were between 0 and 1, the pixel8Chapter 2. Background to Image Composition^ 9would be partially covered by some object, and the ratio of covered to uncovered pixelarea would be cea/(1 — aa). When compositing a pixel from image b with the pixel p in a,Porter and Duff make the assumption that the object partially covering p in b will coverthe object partially covering p in a by the ratio ab/(l — at), since if the pixel coveragesfrom a and b were simply added, it would be possible to get a values of greater than one.The a value of a pixel is also used when that pixel is fully covered by an object that issemi-transparent, with a indicating the amount of the background colour that is blockedby the pixel, while (1 — a) is equal to the amount of the background colour allowedthrough the pixel. While this dual interpretation of the a value may not seem proper,Porter and Duff show that they are equivalent for compositing purposes. Emphasis isplaced on the multiplication of the actual colour information by the a information foreach pixel before compositing so that the alpha value of the resulting composite pixelcan be used directly to determine its colour. If this is not done, other, more complexmanipulations would have to be performed at a later stage in the compositing process.Using this information, Porter and Duff define operations to be performed when corn-positing pixels. These operations are summarized in the figure 2.1 (for a correspondingpixel in image a and b, and colours rgba and rgbb respectively).From figure 2.1, the operation of x over y means that the output pixel will be coveredas if x was placed in the foreground over y. Likewise, x in y means that portion of xthat would be obscuring a portion of y, were they composited using the over operator.The operator x out y results in that part of x being shown that would not cover y if theover operator was used. Using x atop y is equivalent in coverage to using x over y andsubtract the coverage given by x out y. Similarly x xor y is the same as taking x overy, then removing the portion covered by x in y. To get the colour of the output pixel,the rgba value is multiplied by Fa and added to the colour obtained by multiplying rgbbby Fb. The final a value is obtained by adding Fa to Fb.Chapter 2. Background to Image Composition^ 10Figure 2.1: Table of 2D compositing operations from [port84]Chapter 2. Background to Image Composition^ 11This approach is conceptually similar to the way the Disney multiplane camera works({levo771), where every image is treated as a layer in the compositing step. Thus to createan image where an object is interpenetrating another object, the images involved mustbe constructed in such a way that when composited, the result looks correct. The extraeffort in designing the scenes for composition was both time consuming, and not easilydone for some rendering software. To circumvent this extra work, Duff ([duff85]) putforward a way of compositing images so that the computer performs the object visibilitycalculations. The modified Z-buffer algorithm requires that some extra information bestored with the colour and alpha information of the image. For every pixel in the image,the depth value of that point (or the distance from the eye plane) is stored and usedfor visibility calculations. Given two images x and y, with depth values zx and zy, andcolour and alpha values rgbax and rgbay at pixel location p, the pixel that has thesmallest depth value is found by comparing zx to zy, and using the smaller value. Thepixel with the smaller depth value than becomes the foreground pixel, (denoted by f),and the other the background pixel (or pixel b). If the depth values are equal, Duffchooses the pixel from the first input image as the foreground pixel, and the pixel fromthe second input image as the background pixel. The depth value of the output pixelis the depth value of the foreground pixel, while the colour of the output pixel is givenby 0(f over b) (1 — 0)(b over f), where l3 is the fraction of the pixel covered by anobject in f 1. This method provides a very good way of creating one image from multipleinput pictures with little effort. Some of the problems that show up in two-dimensionalimage composition, specifically those lighting effects not common to all input images,or those that the viewer would expect to see in a real scene but do not show up in acomposited image (shadows, reflections, and other object-dependent light interactions),are still evident in the three-dimensional image compositing result.'This is put in context of this thesis and explained more in Chapier 3Chapter 2. Background to Image Composition^ 12Much use has been made of the compositing method of [port84]. Almost every majorproduction animation system in use today has a two dimensional image compositionsystem capable of performing many useful operations on image sequences, such as crossdissolves, wipes, and so on, in addition to the basic compositing operations. Manyof these animation package also include a three dimensional composition system basedon [duff85], although most do not extend the method as done with two dimensionalcompositing. Some research has been done into other uses of the image compositiontechnique, both in hardware and software.Nakamae, Ishizaki, Nishita and Takita ([naka89]) proposed a way to render portionsof an image separately and then use a three dimensional composite to correctly mergethem into a single picture. Their hope was that by raytracing only those portions of theimage where raytraced effects were desired and using scanline methods on the rest, thatthe rendering time for images would be reduced. While their method produced com-posite images nearly indistinguishable from singly rendered images of the same scene,the time component of their method, as was stated in their goal, was not analysed. An-other rendering compositing method was presented by Jahami and Beigbeder in [jaha91].Hardware implementation of compositing has been investigated for some time. Molnar,Eyles and Poulton ([moln92]) developed the PixelFlow system, a custom designed hard-ware architecture for image generation using image composition methods to combinevarious image elements produced by a network of hardware renderers. Mark Green andChris Shaw ([shaw901) described a parallel architecture for performing the three dimen-sional compositing in [duff85]. They designed a pipelined VLSI chip to implement thealgorithms described by Duff. Earlier, Booth, Forsey and Paeth investigated hardwareassistance and acceleration to the Z-buffer algorithm in [boot86]. Shaw and Green havealso done other work on hardware composition, as described in [shaw89]. Other usesof compositing, and methods of doing it have been investigated by Nakamae, Harada,Chapter 2. Background to Image Composition^ 13Ishizaki, and Nishita in [naka86], as well as Foley and Kim ({fole87]). Compositing pro-cesses and systems have been looked at by Potmesil and Hoffert (FRAMES - [potm87]),and Nadas and Fournier (GRAPE - [nada87]). Variants of the Z-buffer algorithm havebeen developed by Salesin and Stolfi (the ZZ-buffer - [sale90]) and Haeberli and Akeley(the Accumulation buffer - [haeb90]).While the various software and hardware methods of compositing have been useful,this thesis concentrates on the Duff method of [duff85]. The modifications made to theDuff three dimensional composition algorithm will be detailed in following chapters alongwith methods to use the depth data to assist in reshadowing composited images.Chapter 3Enhancements to simple depth compositionThis chapter presents some modifications of the fairly robust method presented Duffin [duff85] that will improve the results. Three main areas of the Duff method can beexamined. First, the output produced by a raytracer should be modified to calculatedepth and colour at the centre of the pixel instead of at the lower left corner. Secondly,the generation of antialiased images and the related depth values are only discussed brieflyin the Duff paper, but the methods to produce these should be more rigidly defined dueto problems that arise when calculating depth for antialiased pixels. Also, the methodgiven does not deal explicitly with compositing two pixels that have the same depth value.While this problem is not too serious when compositing single frames, it can result insome temporal aliasing problems when an animation is being produced.3.1 Compositing with centre sampled pixelsIn order to composite depth images, it is necessary to determine which pixels of eachimage should be displayed in the output image. This is done by comparing the depthvalue of corresponding pixels in each input image, and outputting the pixel with thesmallest depth value. While this may seem sufficient, in practice the point samplednature of the images will result in colour aliasing in the output image. These problemsare overcome by averaging colour values of pixels in each input image based on the pixeldepth values.14Chapter 3. Enhancements to simple depth composition^ 15Figure 3.1: Division of pixel into four triangles and related depth values given by thedots at the cornersIn [duff85] depth and colour values are assumed to have been sampled from the lowerleft corner of the pixel. While it is true that many renderers do this, an equal numbersample from the centre of the pixel, and this creates a more complex situation than Duffpresents. When the pixels in the image are sampled in the lower left corner, the valuesat every corner of every pixel are known 1. This allows us to divide up the pixel intofour triangles, each covering one-fourth of the pixel, and each having the depth value ofthe associated pixel corner (figure 3.1). By looking at the corners of the correspondingpixel in each composited image, it can be determined which pixel is nearer the camera,and thus which should be displayed in the final image. Given two images a and b andthe pixel at location p in each image, if all four corners of p in image a have a smallerdepth value than the corners of pixel p in image b, or all corners of p in b have a smallerdepth value then all corners of p in image a, the output pixel of the composite imageis taken directly from the appropriate input image. If, however, some corners of p in alExcept for the top and rightmost rows, which is why Duff stores an extra row of pixels off the topand rightChapter 3. Enhancements to simple depth composition^ 16are closer to the camera than in b, and some in b closer than in a (what Duff terms aconfused pixel), the output pixel colour must reflect this mix, and so should be a blendof the pixel colours in a and b, proportionate to the amount of p that the pixels fromeach image cover. This method can be thought of as performing colour filtering usingthe depth data to determine the filter effects. For corner sampled images then, the filterused is not symmetric around the origin point.Compositing with centre sampled pixels is done in much the same fashion, but somecalculations must be modified, since for each pixel there is no knowledge about the depthat the corners. In keeping with the Duff method, consider two centre sampled images, aand b, and a pixel of interest at location p in each. Since p has eight neighbours, and thedepth value at p is known only in the centre, the depth value of the neighboring pixels canbe used as the depth at the appropriate place in p. If the pixels surrounding p are labeledfrom 1 to 8, as shown in figure 3.2, the depth values at the corners of p are the same as thedepth values at pixels diagonally opposite p at the corner and the depth values at the sidesof pixel p are the same as the pixels that share that side with p. With this interpretation,nine polygons of equal area can be constructed such that they cover p completely, eachhaving the same depth value as the appropriate part of p (see 3.2). Using this, we cancompute the portion of the pixel covered by a, by taking the ratio of polygons in p coveredby a and dividing by the total number of polygons in the pixel, in this case, nine. If thisamount is 13, the output pixel will have rgba = 13(a over b) -F (1 — 0)(6, over a), with thedepth value equal to the minimum depth value of a and b, where over is the operator usedin [duff85]. As stated previously, this method can be thought of as a filtering operationusing the depth data to determine the resultant colour information. The main differencebetween this and the previous filter is that this one is symmetrical around the origin.As may seem obvious, it would be possible to use the original Duff filter (or compositingoperation) to composite centre sampled images, using the centre sampled points in placeChapter 3. Enhancements to simple depth composition^ 171^2^34 E-^ 56^7^8Figure 3.2: Division of center sampled pixel into eight equal areas with associated depthvaluesof the appropriate corner sample. Due to the methods used to produce antialiased colourinformation, this approach is not desirable, since elements present in all surroundingpixels may have already contributed to the colour of the pixel being composited. Filteringin the manner of the Duff paper takes only three surrounding pixels' information intoconsideration, while ignoring seven others that may contain information that should bereflected by the pixel being composited.The original algorithm provides a simple means to prevent colour aliasing in compos-ited corner sampled depth images, and is equally effective in dealing with centre sampledimages with the modifications presented in this section. It should be noted that in eithercase, the filtering is independent of the order which images are composited, thus depthcompositing a over b is equivalent to depth compositing b over a.Chapter 3. Enhancements to simple depth composition^ 183.2 Generating antialiased images for compositionDiffering methods of generating antialiased images can produce odd effects when imagesare composited together. In [duff85], no constraints are given for antialiasing methodsthat will produce consistent results when images are composited. This section looks intothe cause of the problems and presents a simple way of eliminating them.Antialiasing provides a way of masking, or even eliminating, problems in images andimage sequences caused by the discrete nature of computer generated pictures, and manypapers have dealt with methods for doing this ([crow77], [crow81], and many more, witha good summary of antialiasing methods given in [fole90]). When dealing with singleimages, the most apparent effects are jagged edges on colour or object boundaries. Thesejaggies, as they are often referred to, result from the space and colour quantizing ofthe computer's display hardware. For example, when rendering a polygon, if part of apixel in the image is covered by an edge of the polygon and part is not, there is no wayof dividing up the pixel so that one part displays the colour of the polygon, while theother displays the background colour. To overcome this, most antialiasing methods usean over-sampling approach. When rendering a scene, many samples of colour are takenwithin the area of a pixel. The colours obtained by each sample are then combined to givethe final colour of the pixel (call this p) in the image. Thus, as in the previous example,if the polygon colour (call this P) is found in m samples, and the background colour (B)is found in n samples, the colour of the pixel in the image would be (Pm+ Bn)/(m+n).This pixel is also given an a value representing the ratio of m to the total number ofsamples (figure 3.3), to be used for compositing purposes.Antialiased pixels solve one problem, but bring about another when during composit-ing ([fium83] and [carp84]). Considering the previous examples, it is not easy to seewhat the depth value for the pixel p should be. Since the multiple sampling methodChapter 3. Enhancements to simple depth composition^ 19aa) pixelb) polygon crossing part of pixelc) sample pointFigure 3.3: Sampling values within a single pixelChapter 3. Enhancements to simple depth composition^ 20of antialiasing works so well on colour values, it might be tempting to use it to try toresolve the depth value problem. Taking the average depth value of all the samples isdangerous, however, since the average value may not be at all representative of the colourvalue and its association to other pixels. If the depth of the polygon is D and the depthof the background I (consider this the largest value possible, representing the lack ofany depth information - see Appendix A), the antialiased depth of the image pixel wouldbe (Din + In)/(m n), as in the colour antialiasing, falling in between the polygon ofinterest and the background. Now consider that another image is composited such thatthe depth value of the second image at p would place it in behind the polygon with depthD. Because of the averaging done on the depth from the first image, p from the secondimage would have less depth than p from the first image. The pixel colour displayed inthe resultant image will be the colour of p from the second image, using the compositingalgorithm. When looking at the resultant image, jaggies at the boundary of p will benoticeable, since p carried antialiased colour information from the polygon in the firstimage that was lost upon compositing because the depth value at p did not accuratelyreflect the relationship of p to the polygon. An illustration of this is given in figure 3.4.To prevent the above situation, the depth of p must be taken to be the least depthof all the samples. If the depth value used is always a specific sample from the m nsamples taken, more problems can occur. If two pixels p and q are on opposite sides ofa square polygon as in figure 3.5, and the sample that is always chosen is taken at thelower left corner of the pixel, the depths of p and q will be different even though onewould expect them to be the same. The solution to this problem is to take the minimumvalue from all the samples in a pixel. This ensures that the colour and depth data arerelated to the polygon which the antialiased pixel partially covers. Sampling problems,while not eliminated, are reduced since the method will return a consistent result forpixels that are partially covered by objects.Chapter 3. Enhancements to simple depth composition^ 21Figure 3.4: A red cube composited with a gray sphere. The first image shows theoverall appearance, the second shows a magnified section where colour sampling problemsappearsFigure 3.5: Pixels partially covering a square polygon with sampling at the lower leftcornerChapter 3. Enhancements to simple depth composition^ 22Figure 3.6: The red cube and gray sphere from figure 3.4, but with proper depthsampling used3.3 Compositing pixels with equal depthsOne problem that arises when compositing depth images is that corresponding pixels mayhave equivalent depths 2. This occurs when two objects in the images to be compositedinterpenetrate in the final image. As an example, consider image a, a red square, andimage b, a blue square, arranged such that when the images are composited, the resultantimage has the blue square and the red square intersecting at a forty-five degree angle(shown in figure 3.7). If a pixel p has the same depth in both images a and b, then itlies on the surface of the red square at the point where it intersects the blue square. Thecolour of point p in the composite needs to be determined.Using the Duff algorithm, the pixel of interest (assuming it is opaque) would have aresultant colour the same as the pixel in the first image, in sequence, to be composited,modified by the compositing operation. Because the compositing operation acts as a2Equivalent, in this context, means that the depths of the two pixel are equal within some user definedtolerance E.Chapter 3. Enhancements to simple depth composition^ 23Figure 3.7: Interpenetrating squares of differing coloursChapter 3. Enhancements to simple depth composition^ 24filter for the depth values, it is unlikely that this choice of pixel will cause colour aliasingto appear. For example, consider a single row of pixels in two depth images where thedepth values in the first image are increasing, the depth values in the second decreasing,and the depth values equal at pixel p on the row. After compositing without filteringthe depth values, the colour of the row of pixels will undergo an abrupt change at pixelp, the point at which the depths are equal. Performing the same composite with depthfiltering results in pixel p having a colour that is a blend of the colour of the pixels fromthe original images at p. While this may not seem significant for single images, in ananimation, abrupt colour changes (depending on the motion) can distract a viewer andreduce the quality of the animation. The filtering done by the compositing algorithmwill generally produce more desirable results than if filtering is not performed.In some cases, compositing may be performed with objects where a large numberof pixels have quivalent depths. Consider a composition involving two images of thesame square, but with the square in one image red, the other blue. If the assumptionis that blending is desirable, as with the previous example, the composite image wouldshow a magenta square. Because there is no real world example of where differing colourobjects occupy the exact same space, this may seem a reasonable treatment. However,were this part of an animation where one square was moving with respect to the other,the effect would be very disconcerting, since a large region of a color not present in theoriginal scenes would be introduced. Thus some selection of colour must be done, ratherthan a blending. The Duff compositing algorithm performs this by selecting the colourof the first input image where two pixels' depths are equivalent, as noted earlier, andusing this in the final calculation after filtering the depth values. In this case, however,all depth values are equivalent, so the filtering does not affect any pixel depth values,and the colour of the first input image results for all equivalent depth pixels, giving theconsistent colour selection desired.Chapter 3. Enhancements to simple depth composition^ 25While no mention is made of the behaviour of compositing equal depth pixels in[duff85], it turns out that the compositing algorithm does, in fact, give the result that isappropriate in the above situations.Chapter 4Visible surface reconstructionIn order to reshadow composited images, world coordinate information must be extractedfrom the images, and some notion of the shape of the object must be inferred from thisdata. Because a depth image does not store enough information to completely describe anobject, the only information about an object comes from those portions that are presentin the picture, and thus only the visible surface of the object can be reconstructed. Ofcourse, other assumptions about the shape of the object can be made based on eitherhuman intuition, various other aspects of the image, such as existing shadows, or changesin the visible surface of the object over a sequence of images. Since there is much workbeing done on all of these methods ([horn86], [ba1182], many others), and the purpose ofthis thesis is to verify a process that can be used to reshadow images (and not to solveall parts of that process), they are not discussed here in any detail. For the purposes ofthis thesis, a surface reconstruction algorithm giving a reasonable model of the visiblesurface of objects is sufficient. This visible surface model is then used by a modified raytracer to reshadow the images to be composited (see Chapter 5).For arbitrarily complex images, the reconstruction of shape from visible surfaces isvery difficult. Separate objects in the image must be identified and reconstructed sepa-rately, since if object a occludes object b in the scene, a surface reconstruction algorithmmay consider the two objects as one, and create a model that does not reflect the sep-aration of the objects. This would give rise to errors in the reshadowing calculations26Chapter 4. Visible surface reconstruction^ 27because light that would pass between objects would be blocked by the reconstructedone, as shown in figure 4.1. To properly separate objects, some sort of image segmenta-Figure 4.1: Reshadowing problems created by not segmenting images before performingsurface reconstructiontion must be done. Work on this has been going on in the area of computer vision forsome time, but the accurate segmentation of arbitrary images is still a long way frombeing usable. Because of the complexity of the problem, this thesis deals with the shad-owing problem within the restrictions described in Chapter I, that is the lighting andChapter 4. Visible surface reconstruction^ 28viewing parameters being the same in all images to be composited, and the scenes arecomposed of non-occluding objects. The image segmentation problem is important andstill relevant however, and various aspects of it are discussed later in this chapter.4.1 Simple surface reconstructionThis section details the method developed for reconstructing visible surfaces from depthinformation. The surface created is intended only to be used as an illustration that themethod for reshadowing composited images is a valid one, and is not intended to be anideal solution to the object reconstruction problem.In the introduction to this thesis, it was stated that all of the viewing parametersfor both images to be composited are the same, and are known. The knowledge ofthese parameters comes directly from the raytracing software used to produce a depthimage (see Appendix A). Along with the depth information, the raytracer stores otherinformation about a scene, such as the window size, distance from window to eye, eyelocation and bank, as well as lighting information, such as the location, type, and colourof all the lights in each scene. With this information, coupled with the depth of eachpixel, a simple surface reconstruction is fairly straightforward. First, the original worldcoordinates of each pixel covered by an object in the scene must be obtained, and thenthese points joined to form a surface.4.1.1 World coordinate retrievalTo retrieve world coordinates of pixels from a depth image, a modified raytracing algo-rithm can be used. When raytracing, a ray is cast from the eye position through a pixelinto the scene. The ray is then checked for intersections with all objects in the scene,and if an intersection is found, the world coordinate of that point, as well as the distanceChapter 4. Visible surface reconstruction^ 29along the ray at which it occurred, is returned. In the raytracer used to render depthimages, (called optik, see [aman87], [buch92], and Appendix A) this intersection depthis recorded as the depth of the pixel through which the ray was cast. Thus the depthmap for the image represents depth from the eye point, but an option exists within optikthat allows this distance to be projected onto a ray perpendicular to the image plane.In order to retrieve the world coordinates of the scene, this process must be performedin reverse. Calculating the various vectors needed from the eye and window informationis done exactly as it is for standard ray tracing, since the viewing parameters are known([whit79], [whit80], [fole90]). This results in two vectors of interest, namely the unitImage planeFigure 4.2: Viewing vectors for the scenevector representing the direction of the ray from the eye through a pixel p on the window-I(the vector Ray), and the unit vector that is perpendicular to the viewing plane throughthe eye point (the vector in). It is also necessary to know the eye position relative tothe world coordinate origin, Eye, that is stored in the depth file (see figure 4.2). Oncethis is done the world coordinates of pixel p (which shall be denoted as P) can be foundfor each of the two ways the depth values are stored.Chapter 4. Visible surface reconstruction^ 30If the depth value, call it d, is stored as the distance along the eye ray, the worldcoordinates of p can be described as P = (d • Ray) + Eye. The calculation of d • Raygives the world coordinates of p relative to the world origin, since Ray is a unit vectorin the correct orientation, and d is a scalar. The addition of Eye moves the origin of theeye ray to the eye position, and is necessary because the depth is computed from the eyelocation.Where the depth value is stored as the distance along the ray projected onto the vectorIn, some additional calculations are required. First the scalar d multiplies the unit vectorIn. The resulting vector, T , gives the depth of p from the viewing screen. Because the-vector Ray is a unit vector, f cannot simply be projected on to it. The correct treatmenta) Simple projectionb) Desired projectionc) Image planeFigure 4.3: Vectors for calculating world coordinatesfor that will give the proper P is P = Ray (I 12^(f Ray)) + E-ye. This calculatesChapter 4. Visible surface reconstruction^ 31ifayalong R the depth would have been when projected on In orignally, and movesthe resultant vector to the eye position.4.1.2 Surface construction from world coordinatesOnce the world coordinates for each pixel are found, a simple surface over the points canbe constructed. The method used here is to construct a simple triangular mesh whereeach triangle shares every vertex with three other triangles. This is done because of theway world coordinates for pixels are retrieved. Since each retrieved point occupies a spacein a regular grid (the window) with respect to all the other points, a triangulation can betrivially constructed. The triangular mesh has also been chosen due to the simplicity ofintersection calculations required when creating the shadow map to reshadow the images.For the case of lower left corner sampled pixels, consider that each pixel has definedworld co-ordinates at the corners. If the simplest construction was done, each pixelwould be a square on a surface. This does not work, however, if the corner points are notco-planar Since this is the more likely case it would be possible to use a bilinear patchto represent the surface, but bilinear patches require complex intersection calculations(see Chapter 5), and would not be usable if one vertex in the pixel had no depth value,since any patch with a null vertex would not be reconstructed (as explained later in thischapter) . If each pixel is split into two triangles any null vertex would again give rise toproblems since neither of the triangles would be usable. This leads logically into breakingthe pixel into four triangles. While some triangles would not be used, there is a verygood chance that some of the pixel information would be reconstructed. To do this, anextra vertex must be introduced at the centre of the pixel where the four triangles sharea vertex as in figure 4.4. The coordinates of the centre vertex are found by adding thecoordinates of the surrounding four points and dividing each component by four.Chapter 4. Visible surface reconstruction^ 32Figure 4.4: Reconstructing four triangles in one pixel from corner sampled depth data:black circles are known depth data, they grey circle is the interpolated depth dataA similar method can be used to reconstruct centre sampled pixels. Since the centervertex coordinates are already known, the four corner coordinates to construct the fourtriangles must be found. Each corner of the pixel has a total of four surrounding pixelsadjacent to it. Since each of these pixels has a point associated with its centre, the averageof all the centre points around the corner under consideration are used to calculate itsvalue (see figure 4.5). Repeating this for each of the four unknown corners of the pixelallows us to make four triangles. There may be cases where the world coordinates ofsome vertices in a pixel cannot be retrieved because the depth value is undefined. Whenthis occurs, any triangle with a null vertex is should not be included in the reconstructedsurface, and the calculation of interpolated vertices must reflect this. Thus if there areonly three non-null vertices (assuming corner sampled depth values) the vertex createdby interpolation should only use the non-null data, and the coordinates of the centrevertex are found by adding all three valid vertices and dividing by three, instead of four.Applying this procedure to a variety of images shows that certain types of objectsare reconstructed poorly. Objects that slope sharply away from the camera position willdisplay jagged edges on those parts furthest away from the camera (see figure 4.6). Thisis caused by the way the depth information is sampled at each pixel. For edge sampledChapter 4. Visible surface reconstruction^ 33N A\\\././\ /\00\\• \\•• Known vertex0 Interpolated vertexKnown vertexcontributions tointerpolated vertexFigure 4.5: Reconstruction of triangles for centre sampled depth data: black circles areknown depth values, grey circles are interpolated depth values, arrows show which knownvalues contribute to the interpolated valuesimages exhibiting this problem, pixels on the back edge are surrounded mainly by pixelswith no depth value. This causes the reconstruction algorithm to throw out several ofthe triangles of the pixel's reconstruction. If this is done consistently along the rearedge, only one triangle per pixel could be reconstructed, leaving a jagged edge. Centresampled images show this aliasing as well, since triangles of a pixel would also be thrownout under the same conditions. One way to solve this problem is to extrapolate thesampled depth from the pixel to all corners that would otherwise have no depth value.While this stops the algorithm from discarding triangles, it gives the reconstruction aflanged look (triangles whose values were extrapolated at null vertices would be parallelto the image plane, and not follow the slope of the rest of the object) where it was jaggedbefore (figure 4.7). Some way might be developed to estimate depth information for nullvertices from the existing vertices and triangles, but this enters into the more complexChapter 4. Visible surface reconstruction^ 34Figure 4.6: Edge problems when reconstructed objects slope sharply away from thecamera. Original view is on the left, 90 degree rotation on the right.realm of general surface reconstruction and thus is not investigated by this thesis. Thejaggedness exhibited does not detract from the reshadowing method, since utilising abetter surface reconstruction would eliminate the problem.The visible surface reconstruction method put forward here gives a fairly good ap-proximation of the surface. It is only useful for range images for which uniform densedepth data is present. If a non-uniformly sampled or sparsely valued depth image is used,the reconstructed surface will show some significant problems. This method is also sus-ceptible to some sampling problems. If a very small object is reconstructed the shape ofthe object may be obscured by the discontinuities between adjacent triangles on the sur-face. Even for surfaces without these problems, the reconstructed triangle mesh may notbe as accurate as some would like. Even though the reconstruction procedure proposedhere is meant only to verify that the overall reshadowing method is feasible, it would bepossible to perform some smoothing on the mesh to get a more accurate representation(a spline-based surface, for example). Other methods of retrieving surface informationChapter 4. Visible surface reconstruction^ 35Figure 4.7: Edge problems for sharply sloped objects. Left shows standard reconstruc-tion, right shows reconstruction where sampled depth values are assigned to all parts ofthe pixel.from depth data have been investigated by many people Ghopp92], [schm91], [tana92]),and perhaps some image space methods, such as shape from shading and surface frommotion ([horn86] gives an overview) may also provide superior surface reconstruction tothe procedure used here.4.2 Image segmentationAs mentioned before, this thesis does not deal with the most general case of compositingtwo arbitrarily complex images. This is mainly due to the problems encountered whentrying to retrieve some notion of where objects are relative to other objects in the scene.Much work has been done on the image segmentation problem in the field of computervision, and the techniques developed can be useful for reshadowing purposes. This sectionputs forward what should be included in a complete solution to the image segmentationproblem, and discusses some implementation problems.Chapter 4. Visible surface reconstruction^ 364.2.1 The ideal solutionAn ideal solution to the image segmentation problem for compositing purposes wouldresult in each pixel in an image belonging to a known object. An object would be knownif its boundaries with all adjacent, occluding, or occluded objects are known. If all of thisinformation was available, it would be easy to apply the simple surface reconstructionalgorithm (given previously) to each object in turn, by treating each object as if it werethe only thing in the image. When the surface reconstruction is complete for all objects,they could be merged into the same scene and viewed in three dimensions.4.2.2 Possible approachesBoundaries of objects can be determined by some form of edge detection (such as theone given in [cann86]). Typical edge detectors operate on an greyscale (luminance only)image, and find areas of the image where there is a steep intensity gradient. This is doneby using a two dimensional filter over the image and finding where the second derivativeof the intensity changes from positive to negative (a zero-crossing). The various edgesfound by this method are then joined together, and, in the ideal case, show the boundariesof objects in the scene. If the depth information is considered as intensity information,that is each depth value is mapped to a displayable greyscale value, this method can beused to find depth discontinuities and thus detect objects (figure 4.8 and figure 4.9).Some problems occur with this approach however. When edge detection is done ona greyscale non-depth image, abrupt changes in intensity due to texture or shading areusually identified as edges. With depth images, there is no texturing or shading infor-mation present, but now other problems occur. When separate objects occupy similarareas in depth (such as a slightly sloped ramp on a desk), they are often not identifiedas separate objects. Complex objects can also induce problems when detecting intensityChapter 4. Visible surface reconstruction^ 37Figure 4.8: Original imageedges. If the object has areas where there is a rapid change in slope direction (ie: atriangular prism viewed from above the long axis where two rectangles meet), there isan edge that would be detected, but it is not a boundary edge. These different types ofedges must somehow be differentiated, which is difficult. The problems make it difficultto properly identify boundary edges.Once all edges are found, each boundary edge must be joined with those from the sameobject to form the complete boundary of that object. In an ideal case, each boundaryedge would touch only two other boundary edges that belong to the same object. Sincethe ideal case rarely happens, and many edges that are not boundary edges could touchthe boundary edge at various places, a way of identifying the boundary edges must befound, along with a method for correctly determining which adjoining edge is a boundaryedge on the same object. If this can be done then the only remaining barrier to theChapter 4. Visible surface reconstruction^ 38Figure 4.9: Associated depth map shown as greyscaleChapter 4. Visible surface reconstruction^ 39segmentation of the image would be to identify occluding objects. For example, if theimage was of a sphere in front of a rectangle, it would have to be determined if therectangle was actually continuous behind the sphere or not so that any boundary edgesdetected could be joined together properly.People working in the area of computer vision have tackled these problems for manyyears, and will continue to do so, hopefully coming up with better segmentation methodsthat can be applied to the compositing process. When this is done, the reshadowing ofcomposited images will be more usable.Chapter 5The reshadowing processCreating shadows on composited images is a fairly straightforward process. Using thevisible surface reconstructed from one image, a, (via the method presented in the previouschapter), the locations of shadows cast by this surface onto the objects of the other image,b, are found using a simple raytracing method. This generates a shadow map, which isan image of the same dimensions as the ones to be composited, containing a black pixelwherever a pixel in image b would be shadowed by the object in image a. The shadowmap generated will be used to place shadows on image b, and must be filtered somehowto remove all hard edges and jaggies which are caused by the binary nature of the shadowmap, as first produced. Once image b as been reshadowed the two pictures are compositedto produce the final image. In the case of images that are appropriate to use for mutualshadowing as mentioned in Chapter I, the reshadowing process is repeated twice, withthe images swapping roles the second time through. This results in both images beingreshadowed, and the compositing is then performed.5.1 Creating the shadow mapThe purpose of creating a shadow map is to show where shadows should be cast on imageb from the reconstructed object in image a. This situation is analogous to the shadowdetermination problem when raytracing a scene. When a scene is raytraced, a ray is castfrom the eye into the scene, where it may intersect some object. If this is the case, the40Chapter 5. The reshadowing process^ 41point of intersection is found, and another ray is cast from that point to the light. Ifthis ray intersects any object, then the point from which it was cast is considered to bein shadow. For depth images, it is a simple matter to retrieve the world coordinates ofeach pixel in the image (see Chapter 4). The world coordinate is the intersection point ofthe eye ray from the original rendering with an object in the scene. As with raytracingshadows all that is needed is to cast a ray from that point to the light, and find anyintersections along it. If an intersection is found, a black pixel is written to the shadowmap in the same location as the pixel from image b being considered.The existing raytracer, optik (see Appendix A), was modified to create the shadowmap because many procedures that already exist are needed or readily adaptable for thispurpose. A routine was added to read in a depth image and to convert all depths toworld coordinate information using the same method as for surface reconstruction. Thelight position was also retrieved from the depth image header and added to the scene.Using the existing interface for optik, the triangles making up the reconstructed surfacecan be easily read into the scene. A shadow ray between a world coordinate point fromthe depth image and the light is created using existing routines in optik, and the originalimage coordinates of the point are noted. The triangles from the reconstruction arechecked for intersections with the cast ray using the ray intersection routines in optik,and if there is an intersection, a black pixel is written to the final image at the positionthat was noted before. If there is no intersection, a white pixel is written. This process isrepeated for every pixel that has a depth value from the original image b. Because thereare likely to be many triangles in the visible surface reconstruction, using optik allows theexisting ray-intersection acceleration techniques to be used Gwoo89], [aman87]), reducingthe time taken to produce the shadow map.Chapter 5. The reshadowing process^ 42Figure 5.1: A shadow map created by the modified raytracer5.2 Filtering the shadow mapThe shadow map produced by the above method contains only black or white pixels.If this were used to reshadow an image, the result would be a hard edged shadow thatwould have jagged edges, depending on its orientation. Because the shadow produced bymany rendering methods is soft-edged (to mimic real world shadows), the shadow createdhere should be soft edged as well. To do this, a simple filtering technique is used, whichalso smooths jaggies or other artifacts that can result from shadows that do not follow arow or column of pixels, as in figure 5.1.To smooth the shadow map, it is easiest to darken the white pixels that border blackones. The simplest way to do this is to darken the pixel being considered by a percentageequal to the ratio of black to white pixels surrounding it. If there were four white, andfour black pixels in the shadow map surrounding the one of interest, it would be darkenedto fifty percent of its original value. If there are no black pixels in the area, then nothingChapter 5. The reshadowing process^ 43Figure 5.2: A shadow map after filtering has been performedis done. Doing this for all pixels in the shadow map completes the filtering (see figure5.2). Slightly better results can be obtained using different weighting on the simple boxfilter. If every white pixel is given a percentage of black equal to twenty percent times thenumber of surrounding black neighbours (up to five black surrounding pixels, where thewhite pixel would be changed to totally black), the intensity falloff around black edgesbecomes closer to a smooth gradient than the initial simple filter. This appears less jaggedto the human eye, and can thus be judged better than the initial method (figure 5.3).By varying the filter width (from a single pixel to many) differing smoothing effects canbe created that is tailored to a particular person's taste, since not everyone will agree onwhat amount of filtering is best. In an animation created using the first method, where arotating square from one image was placed above a stationary square from another (figure5.4), the filtering was sufficient to mask any jaggies from the original shadow map.Chapter 5. The reshadowing process^ 44Figure 5.3: The result of applying a weighted filter to the shadow map5.3 Using the shadow mapTo reshadow an image, the antialiased shadow map is used as an overlay. Because theshadow map is the same size as the image to be reshadowed, each pixel in the imagehas a corresponding pixel in the shadow map. For every pixel p in the image, the samepixel in the shadow map is evaluated. If the pixel is white, nothing is done to p in theimage, but if it is not white, reshadowing occurs. This is done by comparing the colourof the pixel in the shadow map to black. If the pixel in the shadow map is black, thecorresponding pixel in the image is reduced in intensity by half, in order to approximatethe default shadowing that optik produces. If p in the image originally had red, green,and blue values of 150, 90, and 60, respectively, the reduction would change the valuesto 75, 45, and 30. If the pixel is between pure black and white, then the amount of blackin the pixel, expressed as the the black value divided by the maximum black value, ishalved, and used as the percent by which to darken the original pixel. This method ofChapter 5. The reshadowing process^ 45reshadowing produces a soft shadow on the image that is a very close approximation ofa shadow made by optik under default conditions (as in figure 5.4).When raytracing the original scene, the selection of parameters for surfaces (specu-larity, ambient illumination, etc), and light intensity can cause the reshadowing processto produce shadows that are less acceptable. The shadows may reduce the intensityof pixels too greatly, or not enough. To overcome this, a measure of control has beenadded so that the default intensity reduction can be altered by employing the equation((S * (1 — D)) D) * c, where S is the shadow map value at a pixel p between 0 (black)and 1 (white), D is the intensity reduction to be performed when S is 0, and c is thecolour of p in the image to be reshadowed. This allows the user to adjust the effect of theshadow to some degree. As an extension to this work, a more interactive interface shouldbe created, allowing the user to change the reshadowing (not the shadow position, merelythe intensity) and view it to judge whether it is satisfactory. As it is now, the interfaceis manageable, and reshadows images well for default optik images. The completion ofthe reshadowing process occurs when the reshadowed image and the image from whichthe surface was reconstructed are composited together using the method described inChapter 3. In the case of two images shadowing each other, the reshadowed version ofboth images are composited.5.4 Reshadowing already shadowed imagesThroughout this chapter, the reshadowing process has been discussed as if no shadowexisted in either original image. If original shadows do exist, problems will occur whenusing the method presented here. If the shadow map created had a shadow on pixel pin an image, and p was a shadow pixel from the original image, it would be additionallydarkened by the overlay process. This would result in an image where newly createdChapter 5. The reshadowing process^ 46Figure 5.4: Rendered and composite images of the same scene. The rendered image ison the bottomChapter 5. The reshadowing process^ 47shadows and original shadows could be differentiated by the original shadows being darkerthan the original ones, which is undesirable. To prevent this from happening, a couple ofsteps could be taken. First, each pixel in an image could have an extra bit of informationstored with it, a 1 if the pixel is in shadow, or a 0 if it is not, or optionally use one byte todenote the amount of shadow present. This extra information would be determined bythe renderer when the image is first created. When compositing, this information wouldbe used to determine if a pixel which would be reshadowed should be darkened. If thepixel should be reshadowed, the compositor would then switch the shadow bit on so thatfurther compositing operations would know the pixel is in shadow. The second methodfor preventing reshadowing of already shadowed pixels is to reconstruct the scene everytime it is composited, and to create a shadow map for the scene without consideringany other images or reconstructions. After doing this, another shadow map would beconstructed as described in this chapter. The first shadow map would then be comparedto the second, and wherever there was a black pixel in both, that pixel would be turnedwhite in the second. The modified shadow map would then be overlaid on the image tobe reshadowed, with the result being that since pixels already in shadow are noted aswhite on the shadow map, they would not be reshadowed, and thus the output imagewill look better.The first method is a very good one, with the exception that added storage is requiredto keep the shadowing information. While the ratio of shadow information to colour in-formation would be quite small (one byte, or 256 levels of grey, of shadow informationversus four bytes of rgba information for the image), the extra data would not be used formost normal image operations, wasting some space. Since a person would not necessar-ily know the use to which an image might be put in the future, most would include thisshadow information on the off chance that it may be needed at some time. The secondprocedure outlined above has some obstacles to successful implementation. First of all,Chapter 5. The reshadowing process^ 48the problem of image segmentation needs to be solved before it can be determined whereshadows are in a single image. Secondly, the problem of complexity arises. When de-termining shadows some surface reconstruction must be performed, which takes time aswell as space, whereas the first solution presented above merely takes more space. Also,shadow map creation via this method will not be as accurate as one created from therenderer itself, due to the reconstruction process. The advantage this method has overthe renderer created one is that a shadow map can be made for any depth image, withouthaving to store information while rendering, but it is currently impractical. Many of theshadowing problems noted here are also discussed in [gers87], as well as those related tothe more general illumination issue.With the double shadowing problem either ignored or worked around by the waysgiven here, images produced with this reshadowing method are nearly indistinguishablefrom those that were rendered from a complete scene under default conditions without adetailed side-by-side comparison of reshadowed and rendered images.Chapter 6Other applications and extensionsA logical extension of the reshadowing method presented in this thesis is to combinereal video images with computer generated ones in the same way two computer picturesare combined. This would eliminate the need for teams of artists painstakingly creatingshadows on images that combine computer objects and a real scene (the films Terminator-2 and Jurassic Park, for example) as is done currently.6.1 CAR - Computer Augmented RealityAs computers become faster and algorithms are developed and improved, it has becomeeasier to use computer graphics to create special effects for film and video. Technology isapproaching the point where computer graphics will be able to be seamlessly integratedinto real images with very little effort. The methods that are being developed to dothis can be grouped into an area that has been termed computer augmented reality(or CAR). Various aspects of CAR are being investigated at the University of BritishColumbia, of which this thesis will hopefully form a part. Problems dealing with globalillumination (highlights, adding and removing lights etc), viewing parameter matching,and local illumination (shadows, reflections, etc) are all being investigated by the UBCCAR project, and a brief overview of all the problems involved will be presented here,along with the specific application of this thesis to CAR.The main problem with CAR is to translate various aspects of the real image into49Chapter 6. Other applications and extensions^ 50values that can be understood by computer graphics programs. For example, much needsto be known about the camera in the real image, such as the position it is in relative tothe objects in the scene, whether it is tilted, zoomed, defocused an so on. Most computergraphics programs define the computer generated camera by several values, such as thepoint in space at which it is located, the point in the scene at which it is aimed, the angleof view, any rotations or tilts relative to the global axes, and the screen distance from theactual point where the camera is located. When taken together, all of this informationforms the viewing pyramid of the scene. This viewing pyramid encompasses those partsof the scene that are visible to the camera, and is used to create the two dimensionalimage of the scene. In order to be able to effectively use the real image, the viewingpyramid for the camera that took the image must be determined.The success of this determination depends largely on what can be inferred from thescene. It is not possible to retrieve the viewing pyramid from a scene unless somethingis known about the object in the scene. If the scene is a staged one, every object in thescene can be measured for size and relationship to other objects, then a computer modelmade of the real objects. This can then be manipulated until the objects in the computerscene match those in the real one. The viewing parameters of the computer model canthen be taken as the viewing parameters of the real camera. If the real image is actuallya part of a sequence, this process can be repeated for each frame of the film or video.This method works well, but needs the measurements of the real objects. It also takeslarge amounts of time, mainly by people, to match the real scene with the computermodel. Ways of making the computer aid in the scene matching are currently beinginvestigated, but the process is still mostly manual ([four931). Means of determiningthe viewing pyramid without measuring the scene are more desirable than the previousmethod, since if you do not have to have the physical objects present, camera parameterscan be determined for any images, not just staged ones. Several methods exist is theChapter 6. Other applications and extensions^ 51computer vision field for tracking objects within a scene ([horn86], [lowe90J), and thesecan be used in conjunction with human methods. If a person outlines objects in a scene,the computer can then go and make a guess as to the objects' dimensions and relations toeach other. Once this is done, the viewing parameters can be calculated, and tested witha defined object that is inserted in to the scene. If the user thinks the inserted objectfits with the scene, the viewing parameters guessed at can be used as the parametersof the real camera. Tithe object does not fit, another guess, based on how the insertedobject appears, can be made and the process repeated. Once the parameters of one imageare known, the objects in subsequent images can be related to the ones in the knownimage by the tracking method mentioned previously. While this method is more widelyuseful than the manual measurement procedure, more problems can occur because of thedifficulty of tracking objects between images. Also, while the objects are outlined by theuser, and the relative sizes are easy to compute, the dimensions of the objects must beguessed at, which could take a large amount of time. Ti, when object detection algorithmsbecome capable of doing so, the computer is used to determine object boundaries, theprocess would be sped up considerably.Once the viewing parameters for the real image are known, the parameters for thecomputer generated image must be manipulated to match those of the real image, or viceversa. Changing the viewing parameters of one image to those of another is not a welldefined problem. Since the parameters of the two images could be completely different,the transformation required might take more information than the image provides, andnot be possible. Simple operations, such as coordinate transformations when all but theviewing point are the same, are easy to implement, but aspect ratio changes and depthscaling are much more difficult. One answer to all of these image manipulation problemsis to construct the computer scene using the viewing parameters of the real image. Thiseliminates all of the effort required to massage one set of viewing parameters to anotherChapter 6. Other applications and extensions^ 52at the expense of being able to use an arbitrary computer generated image or sequence.Having somehow retrieved the viewing parameters between the real and computergenerated images, and having a fairly accurate model of the real scene, some combinationof the two images can be performed. If lights exist in the computer scene that are notvisible in the real scene, the effects of these lights can be calculated, and the real imagereshaded to show the new lighting effects. Likewise, if a scene model exists, and thelocation of the real lights are known, these can be used to shade the computer generatedobjects in relation to the scene. The location of shadows caused by computer generatedlights shining on the real scene can be found by using the model of the real scene.Finally, all of these effects can be applied to the real image. Highlights can be added orremoved, and shadows created or eliminated in their proper locations by use of imageprocessing techniques. Correctly shaded computer generated objects can the be renderedand composited into the real scene by standard two dimensional compositing. Becausethe computer generated image is created specifically for the particular real image froma model of the real scene, the computer objects can be rendered so that they appear tointerpenetrate and be occluded by real objects. This creates effects that make the finalimage or sequence look almost as if the computer objects and effects were present in theoriginal scene ([four93]).As an alternative to the methods for real scene modeling described above, stereoimage pairs can be used to obtain a depth map ([horn86], [bake81]) of the images, ormethods that operate on sequences of images Gshao88], [matt88]) might be employedto determine a depth map. This depth map can then be utilised by the reshadowingprocedure described in this thesis as if the real image were, in fact, computer generated.Because the depth map of the real image will be sparser than that of a computer generatedone due to the method of retrieval, some form of reconstruction should be performed onthe depth data so that the reshadowing process can make use of it. Also, real imagesChapter 6. Other applications and extensions^ 53will most likely be fairly complex, which presents problems for the current reshadowingmethod, exactly as in the case for complex computer generated scenes. As mentioned inChapter 1, the solution to this problem is to come up with a good image segmentationalgorithm so that objects in the image can be logically separated from one another forthe reconstruction process. While the reconstruction method presented in Chapter 4works well for visible surfaces, it does not give a good notion of the overall shape of theobject, which may be needed to retrieve the viewing parameters of the real scene. If onlyreshadowing is desired the viewing parameters need not be retrieved, but the depths ofthe computer generated image and real image must be matched in some fashion so as toprevent errors of scale when compositing. This might be achieved by using an interactivetechnique where the user dynamically scales the depths of one image against the other soas to get a satisfactory correlation between them. Problems with perspective and aspectratio could occur as well, in which case the viewing parameters of the real image wouldhave to be retrieved. In this case, and if the real image was to be used in other CARrelated operations, some way of finding the shape of an object, and not just its visiblesurface, should be found, as discussed in Chapter 4. This would facilitate the recovery ofthe viewing parameters, as well as be of assistance when performing global illuminationtasks. Once the two images have been altered enough so that reshadowing can proceed,the procedures described in this thesis can be employed to reshadow the real image. Ifthe computer generated image is to be reshadowed as well, some way of finding the lightposition in the real image must be found.Despite all of the hurdles yet to be overcome, the method of reshadowing computergenerated images presented in this thesis shows promise for applications using computergraphics to enchance real images. Research currently being conducted on CAR problemsmay also provide results that will be useful in broadening the scope of the reshadowingChapter 6. Other applications and extensions^ 54method of this thesis to include general reshading of images, better surface reconstruc-tion, and even image manipulation where necessary. It is also hoped that the proceduresdescribed here will be of use to those delving into the CAR field once the various limita-tions have been overcome.6.2 What now?As has been mentioned throughout this thesis, there are many things that should bedone to make the reshadowing process truly useful. Most important of these is findinga good image segmentation method, so that arbitrarily complex depth images can becomposited and reshadowed. Edge detection techniques may prove useful in doing this,as might modified surface fitting and discontinuity finding algorithms, such as [terz83],[marr84], [gamb87], and [terz88], which have been successfully applied to restricted do-main problems. After this is done, a better reconstruction method should be found.Initially a better visible surface finding algorithm should be created, where a smoothersurface is constructed from the points that will be at most as expensive to intersect in theshadow map calculations as the current triangular mesh. Some heuristic method shouldalso be used to guess at the best way to reconstruct the complete object, using humaninput to verify the guesses, unless it becomes possible to easily retrieve the entire objectdescription. Current shape from shading methods, as well as reconstruction from imagesequences, coupled with the depth and colour information, will hopefully provide a basisfor doing this.Once it is possible to reshadow arbitrarily complex images, a method of dealingcorrectly with multiple, extended, and area light sources should be found. This entails notonly the creation of shadows cast from such light sources, but also the removal of lesseningof those shadows that would be illuminated by lights if the scenes were composited. ThisChapter 6. Other applications and extensions^ 55leads into the more general reshading problem of CAR, where methods from there couldbe borrowed to create new highlights on objects illuminated by a light from a compositedimage, as well as removing highlights from reshadowed areas of images. At the same time,ways should be found of overcoming problems encountered when compositing images withseparate viewing parameters, again CAR research will hopefully be able to assist in this.Reshadowing real images should be investigated in parallel to all of these improve-ments. Depth maps for real images should be obtained somehow, either from imagemethods, such as stereo image matching, or from other methods, such as laser rangefinding or sonic echo location. Once the real images have the information that is neededto composite two computer generated pictures, all of the things that can be done tocomputer images can be performed on the real image. Thus, any combination of realand computer generated images will be able to be composited using the same sets oftools. Ultimately this method should be used in some fashion by CAR research groups,providing a means of assisting both local and global shadowing calculations.Chapter 7ConclusionsThis thesis has extended the three dimensional compositing method of Duff to enable ob-jects within differents scenes to interact with each other in such a way to create shadowson images as if all the objects in the scenes were present at once. While the proceduredescribed here does not solve all problems associated with this process, it does show thatreshadowing of composited images can be done easily for images with restricted charac-teristics, and provides thoughts and possible methods of dealing with images of arbitrarycomplexity. A fast and reasonably accurate visible surface reconstruction algorithm wasgiven that allows a modified raytracer to create a shadow map for use in reshadowingcomposited images. Applications of this process to the area of computer augmented real-ity have been discussed. A short video was produced in the course of this thesis (of whichfigure 5.4 is a part) that shows the results, both positive and negative, of the methodsdescribed.The results of performing these processes to reshadow images shows great promise.As it stands now, to produce a reshadowed image with the desired qualities too manyrestrictions must be imposed on the images to make the methods presented in this thesisusable in anything other than a research environment. If the improvements suggestedthroughout this thesis are performed, the reshadowing process will be made more robustand useful. This thesis has shown that the reshadowing process can prove valuable inmany aspects of image production and modification, and it is hoped that it will be an56Chapter 7. Conclusions^ 57important initial effort upon which others can build.Bibliographylaman871 J. Amanatides and A. Woo. "Optik Users' Manual". Technical Report DGP1987-2, University of Toronto, August 1987.[bake81] H. H. Baker and T. 0. Binford. "Depth from edge and intensity based stereo".Proc. 7th Int. Joint Conf. on Artificial Intelligence, pp. 631-636, 1981.[ba1182] D. H. Ballard and C. M. Brown. Computer Vision. Prentice-Hall, 1982.[boot86] Kellogg S. Booth, David R. Forsey, and Alan W. Paeth. "Hardware assistancefor Z-buffer visible surface algorithms". Proceedings of Graphics Interface '86,pp. 194-201, May 1986.[buch921 John Buchanan. "[kmpsxroptik". Proceedings of the 1992 Western ComputerGraphics Sympo sium, pp. 147-152, April 1992.[cann86] J. F. Canny. "A computational approach to edge detection". IEEE Transac-tions on Pattern Analysis and Machine Intelligence, Vol. 8, No. 6, pp. 679-698,1986.[carp84] Loren Carpenter. "The A-buffer, an Antialiased Hidden Surface Method".Computer Graphics (SIGGRAPH '84 Proceedings), Vol. 18, No. 3, pp. 103—108, July 1984.[crow77] Franklin C. Crow. "The Aliasing Problem in Computer-Generated ShadedImages". Communications of the ACM, Vol. 20, No. 11, pp. 799-805, November1977.[crow81] F.C. Crow. "A Comparison of Antialiasing Techniques". IEEE ComputerGraphics and Applications, Vol. 1, No. 1, pp. 40-48, January 1981.[duff851 T. Duff. "Compositing 3-D Rendered Images". Computer Graphics (SIG-GRAPH '85 Proceedings), Vol. 19, No. 3, pp. 41-44, July 1985.[fium83] E. Fiume, A. Fournier, and L. Rudolph. "A Parallel Scan Conversion Algo-rithm with Anti-Aliasing for a General Purpose Ultracomputer". ComputerGraphics (SIGGRAPH '83 Proceedings), Vol. 17, No. 3, pp. 141-150, July1983.58Bibliography^ 59[fole87] James D. Foley and Won Chul Kim. "Image composition via lookup tablemanipulation". IEEE Computer Graphics and Applications, Vol. 7, No. 11,pp. 26-35, November 1987.[fole90] J.D. Foley, A. van Dam, Steven K. Feiner, and John F. Hughes. Fundamen-tals of Interactive Computer Graphics. Addison-Wesley Publishing Company,second edition, 1990.[fors88] David R. Forsey and Richard H. Bartels. "Hierarchical B-Spline Refinement".Computer Graphics (SIGGRAPH '88 Proceedings), Vol. 22, No. 4, pp. 205-212,August 1988.[four93] A. Fournier, A. Gunawan, and C. Romanzin. "Common Illumination BetweenReal and Computer Generated Scenes". Graphics Inerface (GI) '93 Proceed-ings, May 1993.[gamb87] C. Gamble and T. Poggio. "Visual integration and detection of discontinu-ities: the key role of intensity edges". AI-Memo-970, MIT AT Laboratory,Cambridge, MA, 1987.[gers87] Ron Gershon. The use of color in computational vision. Ph.D. thesis, Dept.of Computer Science, University of Toronto, 1987.[haeb90] Paul Haeberli and Kurt Akeley. "The Accumulation Buffer: Hardware Supportfor High-Quality Rendering". Computer Graphics (SIGGRAPH '90 Proceed-ings), Vol. 24, No. 4, pp. 309-318, August 1990.[hopp92] Hugues Hoppe, Tony DeRose, Tom Duchamp, John McDonald, and WernerStuetzle. "Surface reconstruction from unorganized points". Computer Graph-ics (SIGGRAPH '92 Proceedings), Vol. 26, No. 2, pp. 71-78, July 1992.[horn86] B. K. P. Horn. Robot Vision. McGraw-Hill, 1986.[jaha91] Ghassan Jahami and Michel Beigbeder. "A rendering compositing method".COMPUGRAPHICS '91, Vol. I, pp. 504-513, 1991.[levo77] M. Levoy. "A color animation system based on the multiplane technique".Computer Graphics (SIGGRAPH '77 Proceedings), Vol. 11, No. 2, pp. 65-71,July 1977.[lowe90] D. G. Lowe. "Integrated treatment of matching and measurement errors forrobust model-based motion tracking". Proc. 3rd International Conference onComputer Vision, pp. 436-440, 1990.Bibliography^ 60[marr84] J. L. Marroquin. "Surface reconstruction preserving discontinuities". AI-Memo-792, MIT Al Laboratory, Cambridge, MA, 1984.[matt88] L. Matthies, R. Szeliski, and T. Kanade. "Incremental estimation of densedepth maps from image sequences". Proc. IEEE Conf. Computer Vision andPattern Recognition, 1988, 1988.[moln92] Steven Molnar, John Eyles, and John Poulton. "PixelFlow: High-speed ren-dering using image composition". Computer Graphics (SIGGRAPH '92 Pro-ceedings), Vol. 26, No. 2, pp. 231-240, July 1992.[nada87] Tom Nadas and Alain Fournier. "GRAPE: An Environment to Build DisplayProcesses". Computer Graphics (SIGGRAPH '87 Proceedings), Vol. 21, No. 4,pp. 75-84, July 1987.[naka86] E. Nakamae, K. Harada, T. Ishizaki, and T. Nishita. "A Montage Method:The Overlaying of the Computer Generated Images onto a Background Pho-tograph". Computer Graphics (SIGGRAPH '86 Proceedings), Vol. 20, No. 4,pp. 207-214, August 1986.[naka89] Eihachiro Nakamae, Takao Ishizaki, Tomoyuki Nishita, and Shinichi Takita."Compositing 3D Images with Antialiasing and Various Effects". IEEE Com-puter Graphics and Applications, Vol. 9, No. 2, pp. 21-29, March 1989.[port84] T. Porter and T. Duff. "Compositing digital images". Computer Graphics(SIGGRAPH '84 Proceedings), Vol. 18, No. 3, pp. 253-259, July 1984.[potm82] M. Potmesil and I. Chakravarty. "Synthetic Image Generation with a Lensand Aperture Camera Model". ACM Transactions on Graphics, Vol. 1, No. 2,pp. 85-108, April 1982.[potm87] Michael Potmesil and Eric M. Hoffert. "FRAMES: Software Tools for Mod-eling, Rendering and Animation of 3D Scenes". Computer Graphics (SIG-GRAPH '87 Proceedings), Vol. 21, No. 4, pp. 85-93, July 1987.[sale90] David Salesin and Jorge Stolfi. "Rendering CSG Models with a ZZ-Buffer".Computer Graphics (SIGGRAPH '90 Proceedings), Vol. 24, No. 4, pp. 67-76,August 1990.[schm91] F. Schmitt, Xin Chen, and Wen-Hui Du. "Geometric Modelling from RangeImage Data". Eurographics '91, pp. 317-328, September 1991.[shao88] M. Shao, T. Simchony, and R. Chellappa. "New algorithms for reconstructionof a 3D depth map from one or more images". Proc. IEEE Conf. ComputerVision and Pattern Recognition, 1988, 1988.Bibliography^ 61[shaw89] Christopher D. Shaw, Mark Green, and Jonathan Schaeffer. "Anti-aliasingissues in image composition". Proceedings of Graphics Interface '89, pp. 113-120, June 1989.[shaw90] Christopher D. Shaw, Mark Green, and Jonathan Schaeffer. "A parallel graph-ics system using raster image composition". Proceedings of the 1990 WesternComputer Graphics Symposium, March 1990.[tana92] H.T. Tanaka and F. Kishino. "Recovering and Visualizing Complex Shapesfrom Range DataAdaptive". Visual Computing (Proceedings of CG Interna-tional '92), 1992.[terz83] D. Terzopoulos. "The role of constraints and discontinuities in visible-surfacereconstruction". Proc. 8th Int. Joint Conf. on Artificial Intelligence, pp. 1073-1077, 1983.[terz88] D. Terzopoulos. "The computation of visible surface representations". IEEETransactions on Pattern Analysis and Machine Intelligence, Vol. 10, No. 4,pp. 417-438, July 1988.[whit791 T. Whitted. "An improved illumination model for shaded display". ComputerGraphics (Special SIGGRAPH '79 Issue), Vol. 13, No. 3, pp. 1-14, August1979.[whit80] Turner Whitted. "An Improved Illumination Model for Shaded Display".Communications of the ACM, Vol. 23, No. 6, pp. 343-349, June 1980.[woo89] Andrew C. H. Woo. "Accelerators for Shadow Determination in Ray Tracing".M.Sc. Thesis, Department of Computer Science, University of Toronto, 1989.Appendix ADepth FilesThroughout this thesis there have been many references to the optik depth file format.While the procedures given can be applied to virtually any depth file format, it has beenuseful to give some concrete examples using the optik format. Here the optik format shallbe explained in more detail, and other uses for the depth file format will be outlined.A.1 Optik depth filesOptik is a program that renders mathematically modeled scenes using ray-tracing tech-niques based on [whit80]. It, and the variations developed by students in the courseof their work, is described in [aman87] and [buch92]. One of the variations of optikwas created by Chris Romanzin (one of the authors of [four93]) to write out a file withdepth information in addition to the regular rgba information. This depth file uses theexisting rgba storage routines in order to store depth information for every pixel in thecorresponding image. Included in the header of the image, which describes the size of theimage and the number of bits of information in each pixel, is information that completelydescribes the viewing pyramid and information about lighting, such as location, intensitycolour, and so forth. This extra information is used to reshadow images in the mannerdescribed by this thesis.In optik, it is possible to store between one and three bytes of colour information,with an optional byte of a information. To take advantage of this variability to store62Appendix A. Depth Files^ 63the minimum amount of depth information that the user needs, the depth data can alsobe stored as one to four bytes per pixel of information. In order to do this, the depthinformation, as calculated by optik from object-ray intersection points, must be put in aform where meaningful information can be stored at many resolutions. The depth datais thus mapped into 2871 discrete values, where n is the number of bytes per pixel touse to store the depth information, defined by the user. Because the depth for a pixelis originally calculated as a floating point number (from the ray-object intersection), amapping scheme is used where the original floating point value can be retrieved so thatit is possible to composite depth images that are stored with differing numbers of bytesper pixel. This is done by storing the minimum and maximum depth values of the scenein the depth map header. The floating point depth value is then mapped to the 28nvalues, with the 2° value indicating pixels with the minimum depth, the 2' — I valueindicating pixels that have the maximum depth value, the 28' value indicating that thepixel has no depth information (that is no ray-object intersection was found), and otherfloating point depth values mapped to the appropriate integral value by the formulaV = ((fp — min)/(max — min)) * 28n — I, where V is the integral value, fp the floatingpoint value, min and max are the minimum and maximum depth values, respectively.While there are many other possible methods for storing depth data, this is the one usedby optik.The way the depth values are computed can also be controlled by the user, allowingfor some flexibility in using optik depth files with other depth file formats. The user maychoose that the floating point depth value be stored as the distance along the ray from theeye through the pixel (as determined by the ray-object intersection calculations), or thisvalue may be projected onto a vector perpendicular to the eye plane before being stored inthe depth file. Various methods of obtaining the depth information when oversampling(as discussed in Chapter 3) are also implemented, namely taking the minimum depthAppendix A. Depth Files^ 64from all samples, the maximum depth from all samples, or choosing a consistent samplenumber.Additional modifications have been made in the course of work on this thesis. Aprovision has been made to allow optik to directly read in depth map files, automaticallyretrieve world coordinates from them, and send shadow rays from each of these pointsto the light sources defined in the depth map header. These rays are then used to checkfor intersections with reconstructed objects (read in as a group of optik objects). Thisversion of optik also creates a shadow map, of the form discussed in this thesis, withvalues that depend on the outcome of the aforementioned intersections.A.2 Other uses of depth filesThere are other uses that can be made of the depth information besides employing itfor compositing purposes. Potmesil and Chakravarty ([potm82]) use depth informationto create a depth of field affect in images. Duff in [duff85] uses the depth informationto create a fog effect. Each of these methods have been implemented in the course ofthe work on this thesis, but because they are not directly relevant to the subject of thisthesis, and because the original implementations are documented elsewhere, the detailsdo not have to be presented here.Other possible application of depth files can be seen when considering applications ofvarious effects at the object or scene level. Because the depth values give partial objectinformation, it may be possible to create postprocessed effects that simulate those createdas modeled effects. For example, treating each depth value and corresponding pixel as aseparate object, some particle system calculations could be performed, allowing objectsto explode, drop, bounce, and so forth, without a need to know any extra informationabout a scene. Glows around objects calculated from the depths might be a useful effect.Appendix A. Depth Files^ 65These are just some of the uses to which depth files might conceivably be put, and it ithoped that some effort may be put into the exploration of said uses at a later date, orby others.