UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Iterative depth from defocus (I-DFD) algorithms Sewlochan, Ray 1995

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1995-0270.pdf [ 6.66MB ]
JSON: 831-1.0064877.json
JSON-LD: 831-1.0064877-ld.json
RDF/XML (Pretty): 831-1.0064877-rdf.xml
RDF/JSON: 831-1.0064877-rdf.json
Turtle: 831-1.0064877-turtle.txt
N-Triples: 831-1.0064877-rdf-ntriples.txt
Original Record: 831-1.0064877-source.json
Full Text

Full Text

Iterative Depth F r o m Defocus ( I -DFD) Algor i thms by Ray Sewlochan B.A.Sc, The University of Waterloo, 1991 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in THE FACULTY OF GRADUATE STUDIES DEPARTMENT OF ELECTRICAL ENGINEERING We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA April 1995 © Ray Sewlochan, 1995 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract Depth From Defocus (DFD) seeks to obtain depth information from camera images by measuring the amount of defocus in the images. Although DFD avoids many of the problems associated with stereo vision by maintaining only one point of view, other issues such as optical modelling and computational intensity determine speed and accuracy of DFD systems. This thesis presents the optical theory that establishes the relationship between defocus and depth. Defocus is modelled by the convolution of a scene and a defocus kernel to produce a defocussed image. Dependencies on the scene are removed by measuring the relative defocus between two images of the scene. This is modeled by the convolution of the sharper image with a relative defocus kernel to produce the blurrier image. The determination of the relative defocus kernel is the challenge of DFD. Spatial domain algorithms avoid the errors and assumptions associated with frequency domain transformations. Iterative techniques provide an accurate way of finding the relative defocus kernel, at the expense of computational intensity required by convolution calculations. This thesis develops a hierarchy of Iterative DFD (I-DFD) algorithms for the spatial domain. The hierarchy is divided into defocus modelling method (theoretical (T) or measured (M)), relative defocus shaping method (Gaussian (G) or regularized (R)) and convolution implementation (two-dimensional (2), separable (S) or integrated (I)). Four of the algorithms (TR2, TRI, TGS, TGI) in this hierarchy are implemented and fully tested on three classes of images. TR2 has been previously published and is used as a benchmark. TGS and TGI significantly reduce computational effort, but suffer from degraded accuracy. TRI reduces computational effort by several orders of magnitude with no degradation of accuracy for two of the three classes of images. Recommendations are made to alleviate the degradation in the third class of images. 11 Table of Contents Abstract ii List of Tables x List of Figures xi Chapter 1 INTRODUCTION 1 1.1 M O T I V A T I O N 1 1.2 M A C H I N E V I S I O N 2 1.3 E N V I R O N M E N T 3 1.4 S C O P E 4 Chapter 2 DFD OVERVIEW 6 2.1 I N T R O D U C T I O N A N D O V E R V I E W 6 2.2 B A S I C C O N C E P T 6 2.2.1 De focus 6 2.2.2 Depth of F ie ld and Foca l Grad ient 7 2.2.3 D F D 8 2.2.4 B io log ica l C o m p a r i s o n 9 2.3 A L T E R N A T I V E S 10 2.3.1 F o c u s B a s e d 10 2.3.2 S te reo B a s e d 10 iii 2.4 A D V A N T A G E S 11 2.4.1 S ing le Point of V iew 11 2.4.2 Su i tab le for Paral le l izat ion 11 2.4.3 Pass iv i ty 12 2.4.4 Low Reso lu t ion Algor i thm 12 2.5 D I S A D V A N T A G E S 13 2.5.1 Sensit iv i ty 13 2.5.2 S p e e d Const ra in ts 13 2.5.3 Env i ronmenta l Const ra in ts 13 Chapter 3 LITERATURE REVIEW 15 3.1 I N T R O D U C T I O N A N D O V E R V I E W 15 3.2 D E P T H F R O M D E F O C U S 15 3.2.1 A lex Pent land 16 3.2.2 J o h n E n s 16 3.2.3 Mura l idhara S u b b a r a o 17 Chapter 4 DEFOCUS THEORY 19 4.1 I N T R O D U C T I O N A N D O V E R V I E W 19 4.2 O P T I C A L T H E O R Y O F D E F O C U S 19 4.2.1 Assump t i ons 19 4.2.2 L e n s Mode l 20 4.2.3 Geomet r i c De focus 22 iv 4.3 C O N V O L U T I O N M O D E L O F D E F O C U S . . 26 4.3.1 Geomet r i c Opt ics 28 4.3.2 Diffraction Opt ics 29 4.3.3 Exper imenta l ly Der ived p s f s 31 Chapter 5 ITERATIVE DFD THEORY 35 5.1 I N T R O D U C T I O N A N D O V E R V I E W 35 5.2 E N V I R O N M E N T 35 5.2.1 C a m e r a Paramete rs 35 5.2.2 Two Solut ion Ambigu i ty 37 5.2.3 Soft / Sha rp E d g e Ambigu i ty 38 5.3 S Y S T E M M O D E L 39 5.3.1 Image Acquis i t ion Mode l 39 5.3.2 G e n e r a l D F D S y s t e m 41 5.3.3 Der ivat ion of H 3 42 5.3.4 Iterative D F D S y s t e m M o d e l 47 5.4 R E L A T I V E D E F O C U S K E R N E L G E N E R A T O R 49 5.4.1 Overv iew 49 5.4.2 Theoret ica l Mode l s (T) 50 5.4.3 M e a s u r e d Mode l s (M) 51 5.4.4 Rela t ive De focus Kerne l Ca lcu la t ion 52 5.4.5 Regu la r i zed S h a p i n g (R) 55 5.4.6 G a u s s i a n S h a p i n g (G) 64 v 5.5 C O N V O L U T I O N I M P L E M E N T A T I O N 68 5.5.1 Two D imens iona l Convo lu t ion 69 5.5.2 Sepa rab le Convo lu t ion 70 5.5.3 Integrated Convo lu t ion 72 5.5.4 Convo lu t ion S u m m a r y 7 5 5.6 A L G O R I T H M H I E R A R C H Y 76 5.6.1 Prev ious Work 78 5.6.2 N e w Work 78 5.6.3 Implementat ions 78 5.7 W O R K I N G A R E A 81 5.7.1 Foca l Grad ien t 82 5.7.2 Spat ia l Suppor t 83 5.7.3 E x a m p l e 85 Chapter 6 IMPLEMENTATION 86 6.1 I N T R O D U C T I O N A N D O V E R V I E W 86 6.2 E N V I R O N M E N T 86 6.2.1 Equ ipment 86 6.2.2 C a m e r a Paramete rs 87 6.3 I M A G E P R E P R O C E S S I N G 88 6.3.1 Radiometr ic Cal ibrat ion 89 6.3.2 Other Cor rec t ions 91 vi 6.4 S Y S T E M C H A R A C T E R I Z A T I O N 91 6.5 S E L E C T I N G R E G I O N O F I N T E R E S T 93 6.5.1 Interactive l-DFD 94 6.5.2 Depth M a p l-DFD 94 6.5.3 E d g e Detect ion l - D F D 94 6.5.4 R O I S i z e 97 6.6 D E P T H R E C O V E R Y 97 6.6.1 R D K G Sca l i ng 98 6.6.2 Compara to r 100 6.6.3 Dec is ion 102 6.6.4 l - D F D F lowchar t 103 Chapter 7 RESULTS 104 7.1 I N T R O D U C T I O N A N D O V E R V I E W 104 7.2 M E A S U R E M E N T S 104 7.2.1 A c c u r a c y 104 7.2.2 S p e e d 105 7.2.3 C o n s i s t e n c y 106 7.3 S T E P E D G E I M A G E S 106 7.4 A L P H A B E T I M A G E S 111 7.5 L O G I M A G E S 118 7.6 P E R F O R M A N C E E V A L U A T I O N 125 7.6.1 E n s ' s T R 2 125 7.6.2 A c c u r a c y 127 7.6.3 S p e e d 127 vii 7.6.4 Overa l l Pe r fo rmance 128 Chapter 8 CONCLUSIONS 132 8.1 I N T R O D U C T I O N A N D O V E R V I E W 132 8.2 C O N T R I B U T I O N 132 8.3 F U T U R E W O R K 133 8.3.1 Opt ica l S y s t e m 133 8.3.2 A lgor i thms 133 Bibliography 135 Appendix A NOTATION AND SYMBOLS 138 Appendix B ABEL TRANSFORM 139 B.1 I N T R O D U C T I O N A N D O V E R V I E W 139 B.2 F O R W A R D A B E L T R A N S F O R M 139 B. 3 I N V E R S E A B E L T R A N S F O R M 142 Appendix C PROPERTIES OF PSFS AND LSFS 146 C . 1 I N T R O D U C T I O N A N D O V E R V I E W 146 C . 2 D E F I N I T I O N S 146 C.2.1 Opt ica l S y s t e m 146 C .2 .2 Point S p r e a d Funct ions (psfs) 146 C .2 .3 S p r e a d 147 viii C . 3 D E R I V E D R E L A T I O N S H I P S 148 C.3.1 G a u s s i a n Isf 148 C .3 .2 Pi l lbox Isf 149 C .3 .3 Isf — psf S p r e a d Rela t ionsh ip 149 C.3 .4 S p r e a d of a G a u s s i a n psf 151 C .3 .5 S p r e a d of a Pi l lbox psf 152 C .3 .6 Isf Convo lu t ion Re la t ionsh ip 153 C .3 .7 Unit Vo lume of Relat ive De focus Kerne ls 155 C .4 G A U S S I A N R E L A T I V E D E F O C U S 156 C.4.1 Spec t rum of G a u s s i a n Isfs 156 C .4 .2 G a u s s i a n h 3 157 C .4 .3 Separabi l i ty 157 C.4 .4 U n i q u e n e s s of G a u s s i a n Rela t ive De focus psf 158 ix List of Tables Table 1 Convo lu t ion Computa t ion C o m p a r i s o n 75 Table 2 Implemented D F D Parame te rs 87 Table 3 S tep E d g e Resu l t s 111 Table 4 A lphabe t Resu l t s 118 Table 5 Log Resu l ts 125 x List of Figures Figure 1 Thin L e n s Mode l 20 F igure 2 Definit ion of Foca l Points 21 F igure 3 L e n s S y s t e m M o d e l 22 F igure 4 Geomet r i c De focus 23 F igure 5 Convo lu t ion Mode l of an Opt ica l S y s t e m 26 F igure 6 Pi l lbox Modula t ion Transfer Funct ion (MTF) 29 F igure 7 Image Acquis i t ion Se t Up 39 F igure 8 Image Acquis i t ion Mode l 41 F igure 9 Relat ive De focus S y s t e m 42 F igure 10 Relat ive De focus 43 F igure 11 Iterative D F D S c h e m e 48 F igure 12 Relat ive De focus Kerne l Genera to r ( R D K G ) 50 F igure 13 Regu la r i zed S h a p i n g R D K G 58 F igure 14 Pi l lbox C a m e r a L ine S p r e a d Funct ions 60 F igure 15 Relat ive De focus Kerne l (No Constra ints) 61 F igure 16 Rela t ive De focus Kerne l (Smoo thness Constra int) 62 F igure 17 Rela t ive De focus Kerne l (Smoo thness and Ze ro Tail Constra ints) 63 F igure 18 C o m p a r i s o n of Or ig inal and Ca lcu la ted h 2 64 F igure 19 G a u s s i a n Approx imat ion to Pi l lbox Kerne ls 66 F igure 20 Restr ic ted Convo lu t ion S i z e Re la t ionsh ip 69 F igure 21 l - D F D Algor i thm Hierarchy 77 F igure 22 Rad iomet r ic Cal ibrat ion Charac ter is t i cs 90 F igure 23 L S F S p r e a d Charac ter iza t ion 93 XI Figure 24 S e c o n d Derivat ive E d g e Detector 95 F igure 2 5 R O I Se lec t ion 98 F igure 26 Comp le te l -DFD S y s t e m 99 F igure 27 Iterative D F D F low Char t 103 F igure 28 S tep E d g e Error Funct ion E x a m p l e 107 F igure 29 S tep E d g e Images 108 F igure 30 Sma l l ROI S tep E d g e Test 109 F igure 31 Large ROI S tep E d g e Test 110 F igure 32 A lphabe t Error Funct ion E x a m p l e 112 F igure 33 A lphabe t Images 113 F igure 34 TRI A lphabet Test 114 F igure 35 T R 2 A lphabet Test 115 F igure 36 T G S A lphabe t Test 116 F igure 37 TGI A lphabet Test 117 F igure 38 Log Error Funct ion E x a m p l e 119 F igure 39 L o g Images 120 F igure 40 TRI Log Test 121 F igure 41 T R 2 Log Test 122 F igure 42 T G S Log Test 123 F igure 43 TGI Log Test 124 F igure 44 S tep E d g e Per fo rmance P l a n e 129 F igure 45 A lphabe t Pe r fo rmance P l a n e 129 F igure 4 6 L o g Per fo rmance P l a n e 130 F igure 47 Depth M a p E x a m p l e 131 F igure 48 Project ion of a Two D imens iona l Funct ion 140 F igure 49 First Order Approx imat ion of f 2 143 xii Chapter 1 INTRODUCTION 1.1 MOTIVATION Automation may be the key to effective and efficient use of resources for many industries. In the past, human workers were the tool used to implement many procedures. This could mean placing people in demanding and sometimes hazardous conditions. It would be desirable to use machines to perform hazardous, lower level tasks and employ people to perform safer, higher level ones. The automobile industry has for years used robots on assembly lines to do repetitive, menial tasks. Many chemical processes are very automated in nature. Some tasks are not as straight forward nor as repetitive as the previous examples. At the present time there is not sufficiently advanced technology to completely automate these more sophisticated processes. Tasks such as these may progress towards automation gradually. At an intermediate step to automation, these tasks may be called semi-autonomous. Both the forestry and construction industry may become more and more semi-autonomous. For example, the task of cutting trees and collecting logs is too complex to be done autonomously with current technology. However, the efficiency of the industry may be greatly enhanced if some of these tasks could be done with machines under some human supervision. This could increase both safety and productivity factors. A human supervisor could be safely and comfortably seated away from a hazardous cutting site while overseeing the higher level functions of the robots being used. This could be termed semi-autonomous tele-robotics. There are however, many challenges still to be faced on the road to automation or semi-automation. I 1.2 MACHINE VISION In order to use machines to perform sophisticated tasks, they must be able to understand and interact with their environment. The most powerful tool that humans have for perceiving their environment is vision. This naturally leads to the desire to synthesize machine vision. Horn 112 p.3] states that "a machine vision system analyzes images and produces descriptions of what is imaged." He points out that the description that is needed will depend on the task that is to be performed. Descriptions of a scene may include: 1. Finding objects in the image. 2. Identifying or classifying the objects. 3. Determining the location of the object. Some, all, or additional descriptions may be required for a particular task. Determination of each of the above descriptions may be thought of as machine vision functions. Obtaining the third descriptor is the subject of this thesis. Determining the location of an object may be approached in many ways. Some of them, such as pressure sensors, active communication with the object or active ranging (radar and sonar) do not rely on information found in an image so they would not fall into Horn's definition of machine vision. None the less, they may produce the required descriptor. The third descriptor may also be obtained by image based methods.'In acquiring an image, a three dimensional scene is mapped onto a two dimensional image. In other words, the depth has been lost. Determining the location of an object from an image may then be termed depth recovery. The research presented in this thesis seeks to perform depth recovery using a technique known as Depth from Defocus (DFD). 2 1.3 ENVIRONMENT There are many depth recovery algorithms and variations of algorithms. The choice of which algorithm to use is determined by the environment that the machine vision system must function in. The environment is in turn dictated by the application for which the system is intended. This definition of environment includes both the scene of interest and the automated task to be performed. The application of most interest to this research is the forestry and construction industry. Some general machine vision environmental issues are: 1. Lighting in the scene. 2. Working area. 3. Nature of the scene. 4. Speed required. 5. Accuracy required. An application may or may not permit varying degrees of control of these issues. All image based depth recovery algorithms require some light to illuminate the scene. Sometimes this light is actively provided by the machine vision system. Some algorithms require special lighting to take advantage of colour cues, shadow cues or patterned light cues to aid in depth recovery. In a laboratory or assembly line, the lighting of the scene may be very controllable and consistent. In a forestry application, i.e. outdoors, the light level would be subject to weather conditions and vary from one day to the next. The working area is the range of distances that the depth recovery algorithm is expected to recover. This may vary drastically depending on the application. A microchip manufacturer works in depths in the order of micrometers. Forestry machinery may function in a workspace of several meters. 3 The nature of the scene being viewed may play an important role in the effectiveness of a depth recovery algorithm. Mirrored or highly reflective surfaces create problems for many algorithms. An assembly line may have only a specific set of well defined, uniform objects such as machine parts that are being imaged. A forestry scene would consist of trees, logs, rocks, forestry equipment and so on. The speed required from a depth recovery algorithm depends on the application and on the nature of automated task to be performed. An assembly line may require a depth recovery algorithm that is fast enough to keep up with a conveyor belt while allowing time for a robot to pick up a part passing by on the belt. The speed required in forestry applications would be determined to a large extent by the speed of the machinery. There are few other sources of motion in this environment. Processing a few (2 or 3) images per second should be sufficient to keep up with the machinery. The accuracy required also depends on the application and on the nature of the automated task to be performed. Finding small defects in parts on an assembly line may require a high degree of accuracy. Locating a log to be loaded onto a waiting truck could tolerate errors of several centimeters. It is important to keep the intended application in mind when researching a potential depth recovery algorithm. A forestry application permits little control of the scene and the scene illumination. However, it is tolerant in terms of speed and accuracy requirements. 1.4 SCOPE This thesis presents the research that the author undertook in pursuit of his M.A.Sc. degree in Electrical Engineering at U B C . The work done here was part of the research into the field of semi-autonomous tele-robotics being done under the supervision of Dr. Peter Lawrence. The purpose of the thesis was to develop a depth recovery algorithm with the intended application of machine vision systems for the forestry and construction industries. 4 ' A depth recovery algorithm known as Depth from Defocus (DFD) is examined in detail. One particular set of DFD algorithms (Iterative DFD or I-DFD) is analyzed, implemented and tested. Conclusions are drawn and recommendations are made. The thesis is organized as follows: On overview of DFD and its comparison with other depth recovery methods is presented. This permits an understanding of the literature review which then follows. Some background optical and DFD theory is then presented. The DFD algorithms (I-DFD) that were implemented are analyzed in detail. The performance is examined and improvements are sought. To improve the readability of this thesis, important terms are emphasized with bold faced type when they are first introduced. Conventions for notation may be found in Appendix A 5 Chapter 2 DFD OVERVIEW 2.1 INTRODUCTION A N D OVERVIEW The purpose of this chapter is to present an overview of the concept of Depth from Defocus (DFD) and other depth recovery methods. DFD is examined in Section 2.2. Alternatives are examined in Section 2.3. This overview of depth recovery algorithms permits the understanding of the comparisons that follow in Sections 2.4 and 2.5. 2.2 BASIC C O N C E P T Defocus In order to understand DFD, the fundamental concept of defocus must first be understood. Defocus may be thought of as blurring or smearing. It is as the name implies a lack of sharp focus in an image. An image is a two dimensional representation of a three dimensional (real) scene, acquired with a camera. Defocus is an effect associated with the camera's acquisition of an image and not with the actual scene. Defocus is a blurring or smoothing of sharper edges in the scene to produce softer edges in the image. This can also be described as a loss of high frequency components. Note that frequency refers to spatial frequency instead of temporal frequency since images are functions of space rather than time. This will be assumed throughout this thesis unless otherwise specified. The measurement of defocus in an image is the key to DFD algorithms. Defocus could be measured in terms of the image's sharp edge or high frequency content. However, this is only a relative measure since it is dependent on the scene as well as the camera. A scene with many sharp edges would lead to an image with a higher frequency content than a scene with fewer sharp edges. High frequency content could only be used to measure the difference in defocus among 6 several images of the same scene. For different images of different scenes, a change in frequency content could reflect a change in the scene rather than a change in defocus. An absolute measure of defocus therefore would require' an unchanging scene. This can be defined to be the theoretical point source of light. The image of the point would be another point if the image acquisition system (camera) introduced no defocussing, or a blurred version of the point if the camera introduced some defocussing. The image of the theoretical point is known as the point spread function (psf) of the camera. A psf is a two dimensional analogy to an impulse response in one dimension. A system that introduces a larger amount of defocus would produce a larger (blurrier) image of the point. Therefore an absolute measure of defocus could be defined to be the size of the blurred image of a theoretical point source. This is known as the blur circle. A blur circle may not be finite in size. It may extend from a bright centre to a dim edge infinitely far away. In this case statistical quantities such as the standard deviation of its spread may be a more suitable measure of defocus. Depth of Field and Focal Gradient It will be shown that there is a certain distance from the lens at which an object will yield an image that is in sharp focus. This distance defines a "plane of sharp focus". The location of this plane is determined by several camera parameters: 1. Focal Length. (F) 2. Aperture f number, (f) 3. Lens to Image Sensor Distance. (D s) Any object displaced from this plane will yield a defocussed image. The amount of defocus is determined by the above camera parameters as well as the magnitude of the displacement. In short, defocus = f(F, f, Ds, displacement) (1) 7 Depth of field is a common photographic term that is used to summarize this effect. It is a measure of the range of distances at which objects are "reasonably" well focussed. Focal gradient coined by Pentland [20] is another such term. It is a measure of the rate of increase or decrease in defocus as an object is moved away from or towards the plane of sharp focus. A narrow depth of field is equivalent to a steep focal gradient. Of the camera parameters mentioned above, aperture size will be of particular interest in this thesis. A larger aperture increases the focal gradient. A smaller aperture decreases a focal gradient. In the limit of this case, a pinhole camera would yield an image in which all objects would be in sharp focus regardless of their location. DFD Assume that the camera parameters listed above are known. This implies that the location of the plane of sharp focus is also known. If the location of an object is known (i.e. its displacement from the plane of sharp focus), then the amount of defocus may be calculated as indicated in equation (1). DFD attempts to solve the converse of this relationship. Namely, if the amount of defocus is known, then the displacement can be calculated. In any one image, different points (objects) will possess different amounts of defocus that depend on the object's displacement from the plane of sharp focus. Each point can be processed individually (and in parallel) to create a depth map of all points in the image This inverse calculation is not as straightforward as the forward calculation. Firstly, there are certain ambiguities to overcome. Secondly, the amount of defocus must be measurable. The accuracy and reliability of this measurement determines.the accuracy and reliability of the displacement calculated. Reliable and accurate measurement of defocus presents many challenges. A l l of these issues will be addressed later in the thesis. 8 Biological Comparison DFD is not only a process used in machines. There is evidence that defocus information is also used in human vision. However, it is not precisely known how useful this is as a depth cue. Pentland devoted some time to studying human DFD [21]. He experimentally investigated subjective human perception of depth as a function of defocus. His experimental results indicate that people find a (subjective) increase in the depth of a scene if farther objects are more defocused. The same was not true for closer objects being defocused. He surmised that this may be due to the fact that people are used to focussing on closer objects. Furthermore, his subjects indicated a significant lack of (subjective) depth perception when viewing the world through the sharp images formed with a pinhole. Grossman [11] and Pentland [21] both note that the human eye continuously accommodates (changes focus) as it scans a scene. It is possible that this is a use of focus information as a depth cue, however Grossman could not find any data to prove this. He notes that the small aperture in human vision systems would limit the amount of defocus information and therefore would make DFD a weak depth cue on its own. Krolkov [16] examined two visual behaviours. Accommodation is the change in the curvature or thickness of the eye that is used to focus an image. Vergence is the rotation of the eyes either inward or outward in order to fixate an object. His references agree that either one of these mechanisms would provide little depth information on their own. However, the human vision system can combine these mechanisms to infer a sense of depth. Focussing and defocussing are mechanisms used in human vision. By itself, there is little evidence to suggest that this is used by people to accurately infer depth. Commercial cameras possess larger apertures than those found in human eyes. This could produce images with steeper focal gradients and therefore more useful defocus information. This would make DFD a useful tool on its own as a depth recovery method. 9 2.3 A L T E R N A T I V E S DFD is only one of many methods of depth recovery. A brief examination of other methods, would provide a better understanding of what DFD has to offer. Focus Based Depth From Focus (DFF) is similar to DFD in that it makes use of defocus information to recover depths in a scene. In DFF, the focus on the camera is adjusted until the object is brought sharply into focus. When this occurs, the lens position infers only one particular plane of sharp focus. Therefore, one can determine the location of the plane (i.e. the object) from the lens position. This can be repeated for many objects in the scene to create a depth map. Stereo Based Stereo disparity obtains depth estimates by acquiring simultaneous images of the same scene from two different points of view. This is usually done with two cameras whose optical axes are parallel. The object in question is located in each image. This image coordinate location, along with the known separation of the two cameras can be used in a triangulation scheme to determine the depth of every point that appears in both images, creating a depth map. Stereo vergence is similar to stereo disparity in that it uses two images of the same scene from different points of view. In a vergence scheme, the cameras are turned inwards (convergence) or outwards (divergence) to "centre" (on the optical axis) an object in both images. Once centered, the distance to that object can be triangulated from the camera angles along with the known separation of the two cameras. This can be repeated for many objects in the scene to create a depth map. DFD and DFF are analogous to stereo disparity and stereo vergence. In each of the former, all points in the acquired images can be processed to obtain a depth estimate. In each of the latter, only one point at a time in each image can be processed. Processing a new point (i.e. object) requires a 10 mechanical adjustment, either lens refocussing or camera rotation. These mechanical adjustments make each of the latter methods slow and difficult to implement in a parallel computing structure. 2.4 A D V A N T A G E S Single Point of View The most significant advantages of DFD stem from the fact that the image acquisition is done with one camera and one point of view. The obvious reduction in size, complexity and cost of the image acquisition system is only the first benefit gained by this feature. Stereo algorithms that require two or more points of view inherently include correspondence and occlusion problems. The correspondence problem is the problem of matching pixels (objects) from one image onto the corresponding pixels (objects) in the second image. This is the most challenging problem in stereo vision and is often very computationally intensive. There is only one point of view in DFD. An object at a particular location in one image, would appear at the exact same location in any other image since it would have been acquired from the same point of view. Hence there is no correspondence problem in DFD. When examining images from two different points of view, an object may appear in one image, but may be hidden behind something in the second image. This is known as the occlusion problem. In DFD, if an object is visible in one image, it must be visible in all images taken from this point of view. Therefore there is no occlusion problem in DFD. The above assumes that no relative motion occurs in the interval between acquiring DFD images. Suitable for Parallelization In DFD, the depth of each point in the image can be calculated simultaneously. Each calculation may proceed on its own. This feature lends itself very well to implementation in parallel hardware. Methods such as stereo vergence or DFF require not only sequential processing 1 1 of new points, but mechanical movement before processing a new point. This limits the speed of these algorithms. This is not to say that depth information from neighbouring points cannot be used to aid in determining a current depth estimate. This could certainly aid in maintaining continuity constraints in the image. However, it is not necessary. Passivity Laser or sonar ranging are active techniques. They must actively "illuminate" the scene with light or sound. Other methods such as photometric stereo require special structured lighting of the scene. In some environments, this may not be very feasible. Since DFD is a passive technique, it requires no special lighting mechanisms beyond enough illumination (natural or artificial) to acquire proper images. This places few constraints on the scene. Although DFD does not require structured lighting, it may be enhanced by it as described by Girod [9]. Low Resolution Algorithm Since DFD is based on defocus, it possesses advantages over systems such as stereo that require sharply focussed images. In some applications where the working area is distant, telephoto lenses are desirable. Telephoto lenses have a very narrow depth of field which implies larger amounts of defocus. This may inhibit sharp focus methods, but it actually enhances DFD. In another special case, if the image acquisition system were limited in resolution, sharply focussed images may simply not be available. Again DFD could still perform due to the fact that it does not desire sharply focussed images. 12 2.5 D I S A D V A N T A G E S Sensitivity Ens [6 pp.54-55] described the DFD calculations as ill-posed. It is sensitive to noise and requires careful calibration of the camera and lens. The error in the depth estimate increases quadratically with distance [6 p.9][15] as opposed to linearly with distance as stereo algorithms do. This can somewhat alleviated by choosing camera parameters (e.g. the focal length of the lens) that define what is "near" as the desired working area. This concept will be examined in more detail later. Speed Constraints Defocus is often modelled as a convolution operation. In some algorithms, including one implemented in this thesis, the calculation of two dimensional convolutions makes DFD a computationally intensive algorithm. The speed of DFD can be improved by paralleling the convolutions in special hardware. Furthermore, the speed of DFD algorithms is addressed by some of the new algorithms presented in this thesis. A DFD algorithm requires two or more images that contain different amounts of defocus. In order to avoid- occlusion or correspondence problems, the images must either be acquired simultaneously or fast enough so that there is no relative motion between the two images. Environmental Constraints DFD is a passive technique that requires no special illumination. But it does require some illumination. Active methods provide their own illumination. Active illumination could be added to any DFD algorithm, but if the working environment is already illuminated, it would be unnecessary. Furthermore the level of illumination must be such that the images of interest do not saturate and add non-linearites to the acquisition process. Other passive ranging techniques such as stereo also suffer from these effects. 13 Certain types of scenes are not well suited for DFD (or other image based ranging) methods. Specular or mirrored surfaces produce inaccurate depth estimates. Repetitive patterns can also introduce significant errors. While this is a correspondence problem in stereo, it produces another effect, phase inversion, that leads to inaccuracies in DFD [6 pp. 136-140]. There exists an ambiguity in DFD. An image of an object possessing a certain amount of defocus could be located either in front of or behind the plane of sharp focus. This can be resolved by either focussing at infinity or constraining the working area to be only on one side of the plane. 14 Chapter 3 LITERATURE REVIEW 3.1 INTRODUCTION AND OVERVIEW This chapter discusses the research being done in the area of DFD. The bulk of the articles found in this literature review were obtained from the following sources: 1. A recursive search of references in John Ens's Ph.D. thesis. [6] 2. Science Citation Index. 3. EI (Engineering Index) Page One. 4. IEEE / TEE Publications On-disk (IPO). 5. Science Information Network (Scinfonet) - "Current Awareness". Sources (2) to (5) are indexes available at either the Main Library or the Woodward Biomedical Library at the University of British Columbia. Section 3.2 reviews research in the use of defocus as a depth cue as pursued by several current or recent researchers. 3.2 DEPTH FROM DEFOCUS Research into DFD is a relatively new field when compared to other depth recovery methods. The following is a summary of some of the major contributions to this field. In general the object of a DFD algorithm is to infer depth information from defocus information. To achieve this objective, the amount of defocus in an image must be measured. Much of the research in this field is an attempt to accurately measure defocus. In a single image, this measurement is impossible without some additional information. A soft edge in an image may be due to a sharp edge in the scene that is out of focus or it may be due to a soft edge in the scene that is in focus. To overcome this ambiguity, two or more images with differing focal gradients are usually acquired. 15 Alex Pentland Alex Pentland [20][21] was the DFD pioneer. He coined the term focal gradient to describe the change in defocus with varying distance. He justified modelling defocus by a convolution with a two dimensional Gaussian psf: h(x>y) = a{r,a) = —[—e~^ = —l—e~ ^ (2) lira" lircr-He used this model to develop two methods of depth recovery, both based on frequency domain analysis. This is intuitive since the defocussing effect corresponds to a loss of high frequency components of the image. Both methods compared the defocus across multiple images of the same scene. The first method required two images, one being obtained through a pinhole aperture. The second method removed the pinhole requirement, but necessitated a third image. He later [19] implemented his pinhole method using a Laplacian band pass filter instead of Fourier Transforms to determine the high frequency content. This system was able to achieve an accuracy of 2.5% over a one cubic meter workspace. His system was able calculate four 32x32 depth maps per second. The Gaussian space parameter, a, of the defocus kernel, possesses a definite relationship to the geometric optics model of defocus. Pentland assumed the specifics of the relationship rather than measuring it. Ten-lee Hwang [17] extended Pentland's work by adding a preliminary phase to the DFD algorithm that calibrated a with respect to geometric optics. John Ens John Ens [6][7] identified that one of the most significant sources of error in DFD algorithms was the use of transforms to the frequency domain, and even the traditional convolution. Both of these methods assume a signal of zero outside the window of interest, when the signal is in fact not zero, but unknown. He addressed DFD in the spatial domain. The defocus psf of the camera and lens as a function of distance was determined experimen-tally ahead of time. Two images of the scene were acquired through different apertures. Matrix 16 and regularization techniques were used in an iterative search to find a defocus psf that mapped the sharper image onto the blurrier one. Once determined, this psf was a unique indicator of depth. His experiments in the one meter range yielded an accuracy of 13% RMS error (relative to the distance from the camera) and 7.4% RMS error (relative to the working area imposed on the system). Muralidhara Subbarao Muralidhara Subbarao is currently the most active researcher pursuing DFD. He has developed many ideas and tools in his endeavors. In [32] he used a Gaussian psf model, g(r,a), and frequency based analysis as Pentland did. He analyzed methods of depth recovery based on the comparison of the defocus of two images of a scene. The difference in the images would be intentionally caused by small (differential) changes in camera parameters (F, f, D s ) . The power spectral density (PSD) over a small region in space and some bandwidth was used as measurement of the defocus in the images. The difference in defocus measured between the images yielded a solvable quadratic in a which in turn implied a particular depth. He also formulated a calibration procedure [33] that determined a relationship between the spatial spread of the point spread function (psf) and distance. The psf was only constrained to be rotationally symmetric (not necessarily Gaussian). He used this calibration to extend [32] to a more practical implementation in [27] that allowed larger changes in the camera parameters. Another of his methods is called DFD IF [30], (a one dimensional version of DFD2F). This method is similar to Ens's [6][7] in that a look up table is searched for a best fit of defocus between two images. The look up table was created from Fourier domain information of the defocus psfs of the lens as a function of depth. The best fit in the table indicated a particular depth. Subbarao obtained an accuracy of 6% of the rotation of the focussing ring. One important difference from Ens's approach is that Ens did his measurement of the defocus difference in the spatial domain. 17 Recognizing the advantages of spatial domain analyses, Subbarao [29] defined the S Trans-form and the S Transform Method (STM). The S Transform relates the input and output of a linear shift invariant system based on statistical moments of the system psf, h(x,y). In a DFD context, the scene is the input, the camera / lens / frame grabber is the system and the acquired image is the output. By approximating the image (in some particular window) as a cubic polynomial, the forward and inverse S-transforms can be defined in terms of the second order statistics of h(x,y), or a2. Using two images acquired under differing defocus conditions yields a quadratic in a, which once solved infers a particular depth. Some suitable calibration would have to be clone ahead of time to relate a to depth. Once again, the defocus model could be any rotationally symmetric psf. Surya [34] implemented STM for changing aperture images (STMAP). He found it useful in autofocussing applications since it was computationally cheaper than other Fourier Transform based Depth From Focus (DFF) algorithms. He was able to achieve an accuracy of 2.3% at 0.6m, increasing linearly to 20% at 5m. 18 Chapter 4 DEFOCUS THEORY 4.1 INTRODUCTION A N D OVERVIEW The purpose of this chapter is to present the optical theories and models that will form the foundation of all later theoretical discussion. No attempt will be made to derive fundamental optical theory. Detailed examination of these topics may be found in [I4][8][l]. The results and implications of the fundamentals will be discussed. Practical and useful models of optical effects, specifically defocus, will be derived. Section 4.2 defines the framework of paraxial geometric optics. This framework is used to explain defocus. Section 4.3 presents defocus as a linear shift-invariant system with a characterizing psf. Several different models of psf's are explored. 4.2 O P T I C A L T H E O R Y O F D E F O C U S Assumptions This study of optical theory can begin with geometric optics. Klein [14 p.129] defines geometric optics as the optics of image formation that can be described by the laws of refraction, reflection and rectilinear propagation. These three phenomena can be used to explain and model many of the optical effects that will be encountered in this research. In a nutshell, geometric optics is ray tracing. It does not model diffraction effects of light. At some point it may be desirable to broaden the scope of the model to take diffraction effects into account as well. The vast majority of optical systems in use today employ surfaces that are spherical in nature. A system could be made up of lenses, mirrors and differing media such as air, glass, water, and so on. We shall restrict our discussion to optical systems of spherical surfaces unless otherwise noted. 19 Exact equations of refraction, reflection and rectilinear propagation could be used to model such a given system. However, without further assumptions, the exact equations can become cumbersome. A paraxial assumption assumes that an optical axis can be defined for the system under study and that all light rays and surface normals make small angles with the optical axis. This paraxial assumption model is sufficient to describe many systems. Finally, the shape of the aperture will have an effect on the model used to describe an optical system. Most commercial cameras today are made with polygonal apertures that approximate a circle. A circular aperture will be assumed unless otherwise specified. It is with paraxial geometric optics that the study begins. Lens Model A thin lens is a lens in which its thickness is much smaller that the focal length (F or F'), the object distance (D 0) and the image distance (Dj) [14 p.157]. It can be modeled as shown in Figure 1. A l l refraction takes place at the single plane that defines the lens. Light travels undeviated before and after the plane. Figure 1 Thin Lens Model The image focal point (F') is the point at which parallel rays entering the lens would converge as shown in Figure 2a. The distance from the lens to the image focal point is the image focal 20 length (also F'). The object focal point (F) is the point from which rays would become parallel after passing through the lens as shown in Figure 2b. The distance from the lens to the object focal point is the object focal length (also F). If the medium (e.g. air) on either side of the lens is the same, then these two distances are equal. Figure 2 Definition of Focal Points / / \ (a) (b) A thin lens in air is characterized by the thin lens equation [14 p.166]: ~D~o + D~ ~ F (3) It defines the location of the sharply focused image point of a specific object point for a given lens, F. Positive object distances are measured to the left of the lens while positive image distances are measured to the right of the lens (Figure I). Most practical lenses are actually made up of several lenses. A system of lenses can be modeled as a thin lens referred to a set of (mathematical) principal planes [14 p.157] as shown in Figure 3. This means that all refraction can be thought of as taking place at the two principal planes. Light travels undeviated before the front principal plane and after the rear principal plane. Between the two planes, the light is translated horizontally from the front plane to the rear plane. These are mathematical descriptions only. In reality, the light does refract at the surface of the lens of course. The location of the principal planes are determined by the geometry and optical qualities of the lens system being modelled. Each plane may be located within the physical lens system or external to the physical lens system. The planes may even be crossed so that the front principal plane is located behind the rear principal plane. The lens equation (3) can now be applied to this model of a complicated system of lenses if the distances are measured as shown in Figure 3. Figure 3 Lens System Model O b j e c t ^ ^ - ^ T . . . . . . . . . . j ^ \ J m a g e F . . . . -W V . . . . . . F' Do Di Front Rear Principal Principal Plane Plane Geometric Defocus The above lens model can be used to explain and quantify defocus. Figure 4 shows a geometric explanation of defocus. The optical system has a focal length F, and an aperture diameter d. The object (O) is a point source of light. The focused image (I) located behind the rear principal plane is also a point of light. The object (D 0), image (Dj) and sensor (D s) distances are measured as indicated in Figure 4. If the image sensor were located at a distance Ds = Di the observed image would be a point. However, in Figure 4 the image sensor is located in front of I (Ds < Di). This corresponds to the 22 plane of sharp focus being located at a depth D p where DP > D0. D p is measured from the front principal plane to the plane of sharp focus. The Lens Equation (3) summarizes these relationships: In the geometry shown in Figure 4. The image observed is a circle. In actuality, the shape of the defocused image will be the same shape as the aperture of the lens system [29][31J[34]. We shall however, assume that the aperture is circular. It should be noted that Figure 4 describes a typical commercial camera. An out of focus image implies that the image sensor plane and the sharply focused image are not aligned. Turning the focussing ring on the camera adjusts the relative geometry in an attempt to have Ds = D,;. Unit focussing, the simplest mechanism moves the entire lens system forward (relative to the image sensor plane). Several other focussing mechanisms are described by Sydney Ray [23 pp.86-89] 1 1 1 J_ F 1_ F 1 (4) Figure 4 Geometric Defocus Front Rear Principal Principal Plane Plane Image Sensor Plane O d Ds 23 In Figure 4 , the radius of the blur circle, R, may be used as a measure of the amount of defocus introduced by the lens system. It is derived as follows: AABI ~ ACDI AB _ CD ~BI ~ DI d _ 211 Ui ~ Di - Ds Solving this for R, R = dfDi-D, 2 d 2 Di 1-^1 Di (6) Rearranging the lens equation (3), we can substitute 1 1 1 in (6) to get: (7) Di F D0 d( , JI 1 R=A1-^\F-DC = d_cU\dD± 2 IF 2D„ In optics, the ratio of the lens focal length (F) to the lens aperture diameter (d) is denned as the f-number: A F f = T (9) a 24 Using this to substitute for d in equation (8), we have: R . F »' • I n» ( 1 2/ 2/ ' 2 / V /'.. n FD, ( 1 \ F - D, R=v-{Dj + -iT (11> Equation (II) expresses the size of the geometric optics blur circle as a function of camera parameters (F, f, D s ) and the object distance (D 0). This is the geometric optics equation corresponding to equation (1). DFD will try to solve the converse relationship obtained by rearranging Equation 11: FD, °° = D,-F + 2fR ( , 2 ) The above derivation assumed that the image sensor was located in front of the sharply focused image I (Ds < £)„-, Dp > D0). For the case in which the image sensor is located behind the sharply focused image (I) (Ds > Dj, Dp < D0), a similar derivation would yield: 'FD, ( 1 \ , F - D ; R • [ 2/ \D0) 2/ (13) and D° = Ds-F-2fR ( 1 4 ) In geometric optics, depth of field would be defined by some maximum allowable radius (Rmax) of the blur circle. There would be a range of object distances bracketing the plane of sharp focus that would yield a blur circle radius (R) less than this arbitrary R e -setting R = R m a x in Equations 12 and 14 we obtain: Dnear = n±: Ds F -\- 2 f Rmax F Ds D-?ar = n _ IT _ 9 f p ( - 1 5 - ) 25 depth of field = Dfar - Dnear (16) The focal gradient measures the rate of change of radius (R) with respect to object distance (Do): focal gradient = dR ±FD, (17) dD0 2/ Since there is a first order relationship between the blur circle radius and inverse object distance, a more useful measure of focal gradient may be the rate of change of R with respect to inverse object distance: dR ±FDS focal, gradient 2/ (18) It is now clearly obvious that the camera parameters that determine both depth of field and focal gradient are focal length (F), aperture size (f) and image sensor distance (D s). In the remainder of this thesis, the environment shall be constrained such that the plane of sharp focus is behind any object of interest. This can be implemented in practice by focusing beyond the workspace. Equations 11 and 12 and the corresponding focal gradient expressions describe an optical system under this constraint. 4.3 CONVOLUTION MODEL OF DEFOCUS An optical system can be modelled by the convolution operation [6 p. 13] as shown in Figure 5. It is assumed that the optical system is linear and shift invariant. This may not be perfectly true due to lens aberrations and imperfect optics, but it is a good starting point. Figure 5 Convolution Model of an Optical System s(x,y) s c e n e h(x,y) c a m e r a / l e n s i(x,y) i m a g e 26 The image formed by a camera and lens may then be expressed as: i(x,y) = h.(x,y)*s(x,y) i(x,y) = 1 1 h(x - a, y - ,8)s(a, /3)dadf3 (19) where: • s(x,y) is the scene, • i(x,y) is the image of the scene, and • h(x,y) is the psf of the optical system (camera and lens). For any optical system that does not absorb or produce light, the integral of the intensity of the psf must equal unity. Furthermore, for circular apertures, the psf will be rotationally symmetric. In short: h(x,y)dxdy = 1 (20) and h(x,y) -> h(r) w here r2 = x2 + y2 (21) The notation used throughout this thesis assumes that: (x,y) are spatial rectangular coordinates. (r,0) are spatial polar coordinates. 27 where: 2 2 , 2 r = x + y tan 6 = — x x — r cos 9 y — r sin 6 (22) Since the optical system is characterized by the psf, an essential part of DFD is in determining h(x,y) or at least its characteristics. Several possible choices of psfs shall now be examined. Geometric Optics Geometric optics reveals that the defocused image of a point is a blur circle of radius R. To a first approximation, the intensity is uniform within this circle. Therefore, geometric optics defocus can be modelled by the following system psf: where R is the radius of the blur circle derived in (11). This cylindrical shape is commonly referred to as a pillbox and this model as the pillbox model of defocus. Taking advantage of the rotational symmetry of the psf, the optical transfer function (OTF) which is the Fourier Transform of the psf may be shown to be [6 p.22]: (23) or, in polar coordinates: (24) H{P) = 2J1(2TTRP) 2irRp (25) 28 where: • h(p) is first order Bessel function, • R is the radius of the blur circle derived in (11). • p is the magnitude part of polar frequency coordinates. The magnitude of the OTF, known as the modulation transfer function (MTF) has a general low pass characteristic as shown in Figure 6. This is consistent with the concept of defocus introduced in Section 2.2 that links defocus to a loss of high frequency components (i.e. low pass filtering). Figure 6 Pillbox Modulation Transfer Function (MTF) Diffraction Optics If the optical model used is broadened to include diffraction effects that are described by the wave nature of light, then the psf / OTF of the optical system becomes much more complicated. 29 For incoherent monochromatic illumination, the diffraction model OTF can be expressed as [32][14 p.571]: H(a) = \ — (cos 1 a — a V l — or) ,• U a < 1 otherwise where : A P 2p0 A _ J _ 9 0 ~ 2XDi (26) where: • d is the aperture diameter of the lens, • D; is the distance to the sharply focused image, • A is the wavelength of the monochromatic light, • p0 is the cutoff frequency for a coherent light system, and p is the magnitude part of polar frequency coordinates. For incoherent white light (polychromatic) illumination, the OTF is the result of the cumu-lative effect of each wavelength of light. This leads to a very complicated model of defocus. Ens [6 pp.20-23] compared the use of diffraction optics and geometric optics models of defocus for use in DFD. He found that the resulting transfer functions were very similar for the camera parameters he tested. Savakis [25] studied the use of diffraction optics models in the restoration of blurred images. He too found that there was no significant improvement in his results for the camera parameters he tested. There appears to be little reason to chose the complicated diffraction model over the simpler geometric model. However, the complicated psf is often modelled by a simple two dimensional Gaussian shape by many researchers including Pentland [20][21][19], Subbarao [32][27], Cardillo 30 [5][3][4] and Hwang [17].' This psf is: Kx>y) - o 1 (27) e or h(r) = 1 (28) where a is the spread factor of the Gaussian shape. It is related to the pillbox radius, R, by a constant: of this model will vary from lens to lens. It may be improved by appropriate calibration procedures to determine an accurate value for k such as in [17]. Experimentally Derived psf s If a more exact model if defocus is desired, the psf may be experimentally measured. A point source of light could be simulated by a dot on a plain background or by a light source located behind a pinhole. The image formed of this "point light source" would be the psf of the lens system. Alternatively, the problem may be reduced to one dimension by using a step edge as the scene and deriving the line spread function (Isf) of the optical system. If the scene s(x,y) is a step edge of height b: where u(x) is the unit step function and the Dirac delta function, <5(x), is its derivative. Equation (19) expresses the image i(x,y) of this scene as: R = ka (29) The constant of proportionality is approximately equal to A / 2 for many lenses [29]. The accuracy x, y) = s{x) — a + bu(x) (30) i{x,y) = s{x) * h{x,y) (31) 31 where h(x,y) is the unknown (rotationally symmetric) psf of the optical system. Consider di(x, y) d . , , , , V ' J J = — [s(x)*h(x,y)} ox da dx ds(x) dx h(x,y) a + bu(x)] * h(x, y) d_ dx b8(x) * h(x, y) b5(x - a)h(ayj3)dad(3 j 5{x — a)h(a, j3)da b J h(x,[3)dp bhx(x) dp (32) where K{x) = J h(x,y)dy (33) and is known as the l ine spread funct ion (lsf) of the optical system h(x,y). The edge height b, may be found from the integral; di(x,y) di(x,ij) dx dx bhx(x)dx = b h(x,y)dy bJJ h(x,y)dxdy 6(1) b dx (34) 32 Combining Equations 32 and 34 yields: hx(x) = dx (35) In short, the line spread function can be measured from the one dimensional derivative of the image of a step edge. Once the Isf is known, the psf can be found by performing an inverse Abel Transform. An Abel Transform maps a two dimensional rotationally symmetric function (e.g a psf) onto a one dimensional projection of that function (e.g. an Isf). A detailed derivation of an implementation of the inverse Abel Transform may be found in Appendix B. The applicable results of Appendix B may be summarized as follows: Given a the projection hx(x) of a rotationally symmetric function h(x,y) or h(r,0) = h(r): The values of h(r) may be computed in an iterative manner if it is assumed that /i(oo) = 0. This assumption is true for the above geometric, diffraction and Gaussian models of defocus. Il is thus a reasonable assumption to make and may even be added as a constraint to the defocus model as done in [6 pp.97-99]. In a discrete system, this assumption takes the form of: h[r] = 0 for some r > R for some integer R. Beginning at the outermost non-zero value other values of h[r] may be computed as follows: (36) h[R h[R 2] = 1] = hx[R-l] hx[R-2}-h[R-l}(A0o_-Ali2) Ah2 ...etc.... n - l h„[R - n] - £ h[R - i]{Ai_1>n - Aiin) h[R n} = i=l n = 1,2,3,...,./? ...etc.... 33 MO] M O ] - JJ2 h[R - i](A-i,R - A,R) i = l A R-l.R (37) where: Al)} 4 2sJ(R-i)2 -(R-jf (38) From the now known polar form of h(r) it is straight forward to obtain the Cartesian h(x,y) using equation (22). Some implementations of DFD such as [33] do not require exact knowledge of h(x,y). It may be sufficient to know defining properties of h(x,y) such as the spread of its shape (i.e. its standard deviation). Statistical quantities such as this may be measured directly from the lsf with no need to compute the actual psf. 34 Chapter 5 ITERATIVE DFD THEORY 5.1 INTRODUCTION AND OVERVIEW The purpose of this chapter is to present the theory behind the Iterative DFD algorithms presented in this thesis. Specifics of the algorithm implementations are included. However, specifics of the equipment (camera, lens, framegrabber etc.) are presented in the following chapter. The type of environment required to implement any DFD scheme is described in Section 5.2. An iterative DFD (l-DFD) system is described in Section 5.3. There are two main features of the l -DFD system that are not straightforward. These are analyzed in Sections 5.4 and 5.5. Section 5.6 organizes these various aspects of l-DFD into specific algorithms to be implemented. Finally Section 5.7 defines the specific working area applicable to these algorithms. 5.2 ENVIRONMENT The function of a DFD (or any depth recovery) system can be summarized as follows: Given a set of images of a scene, determine the distance to one or more points in the image. In the DFD systems presented here, it will be assumed that the camera is well understood and controllable. The specifics of this are stated explicitly below. Furthermore, there are certain ambiguities to overcome. This is easy to do if the images are acquired under certain (very relaxed) conditions. This too is stated explicitly below. Camera Parameters Any DFD algorithm assumes that the camera is a known and controllable entity. This means that for any image acquired, the following parameters are known: 1. focal length (F) 2. f number (f = F/aperture size) 35 3. location of the image sensor plane (D s) The focal length (F) and the aperture (f) are printed on the barrel of any commercial lens. D s can be calculated from Equation 4 if the location of the plane of sharp focus, D p is known. Dp is printed on the focussing ring of commercial lenses. However, this is not an accurate source of measurement. More reliable measurements may be made by visual inspection of images to determine the distance at which objects appear sharpest. More accurate methods are described in Section 6.4. Since distances (e.g. D p ) are measured from the principal planes, it is implied that the locations of the principal planes are also known. These locations may be obtained from the lens manufacturer or by experiments as described by Ray [23 p.3 8] or may be approximated to some location near the lens. The third method would lead to significant errors in the depth estimates of nearby objects (small D 0 ) . For a DFD system to analyze a scene, at least two images of the scene must be acquired under different focal gradients. This is required in order to eliminate the soft / sharp edge ambiguity described below. Therefore, in addition to being known, at least one of the camera parameters that determine focal gradient (F, f, D s ) must also be controllable. Aperture size (f) is easily controllable by means of the aperture ring on the lens. Focal length (F) may be adjusted by either "zooming" on a zoom lens or by changing lenses if the lens has a fixed focal length. Obviously zooming is preferable to changing lenses as the latter would be slow and cumbersome. Image sensor distance (D s) is easily and independently adjusted by refocussing the lens to a new location (in unit focussing cameras). In the implementations that follow, the focal gradient is adjusted by changing the aperture (f). There are several advantages in this choice. . ' • The aperture size (f) is easy to control. The exact setting of the aperture size is easily repeatable if the lens has a "click stop" aperture ring as most lenses do. In contrast, a change 36 in focal length (F) on a zoom lens is a continuous setting on commercial lenses. This would make it more prone to error in attempting to reproduce identical camera parameters. A calibrated motor or other such mechanism would solve this problem, but add complexity to the system. Interchangeable lenses would also allow repeatable control of focal length, but would again add complexity to the system. It would also be highly impractical if not impossible to change lenses during real operation. Changing the aperture requires no magnification adjustment. Changing the focal length (F) or image sensor location (D s) changes the magnification of the image [23 p.45][I4 p. 146] and would therefore require either image processing to correct the images or some algorithm to match points in the first image to points in the second image. The disadvantage of using aperture size to control focal gradient is the resulting change in image intensity. The change in intensity may be compensated by means of a simple image processing correction. This correction can simultaneously eliminate intensity non-linearities. Since the intensity linearization is a required step, the aperture intensity correction does not slow the system. The intensity / linearization correction will be discussed in greater detail in Section 6.3. Note that the aperture intensity correction could be partially accomplished by adjustment of the shutter speed. Shutter speed affects image intensity, but would not affect the focal gradient. However, shutter speed compensation would not be as accurate as the image processing described above, nor would it be able to remove non-linearities from the system. Two Solution Ambiguity As previously mentioned, a certain amount of defocus can imply a location on either side of the plane of sharp focus. In order to remove this ambiguity, it shall be assumed that all objects of interest are located on one side of the plane of sharp focus. This can be implemented in practice by simply focussing beyond the known working area (D p > D 0 ) . The working area required will depend on the application for which the depth recovery algorithm is intended. 37 Soft / Sharp Edge Ambiguity Given a single image of a scene, a certain amount of defocus could be measured in the image. This measurement would be some reflection of the frequency spectrum distribution of the image. The defocus measured may be due to a sharp edge that is out of focus or a soft edge that is in focus. Without knowing some information about the scene being imaged, it is impossible to determine whether the defocus measured is due to the scene or due to the optical effects of the camera. It is the optical effects that the camera has on an object located at D 0 that reveals its depth. Information about the scene could be obtained by acquiring a second image through a pinhole. A pinhole aperture ( / = oo) has a constant focal gradient of zero. That is, it introduces no defocussing into the image. If this image were available, then any defocus present in the original image that is not present in the pinhole image would be a true measure of the defocussing effects of the optical system. Pinhole apertures allow very little light to reach the sensor and therefore need either very sensitive sensors or very long exposure times. Fortunately, a similar approach can be used with any two images as long as they are acquired under different focal gradients (one not necessarily a pinhole). The optical system would introduce different amounts of defocus into each image. The difference in defocus between the two images is called relative defocus. Since the same scene is used for both images, the relative defocus is a function of the optical system only and not the nature of the scene. The optical system would be defined by F, f], D s and D 0 for the first image and F, I2, D s and D 0 for the second image. If the camera parameters (F, f'i, t'2, D s ) are known and the relative defocus can be determined, then the only unknown would be the object's distance, D 0 . This relationship shall be explicitly derived in Section 5.3. Measuring the relative defocus is the principal challenge of a DFD system. 38 5.3 SYSTEM MODEL Image Acquisition Model A typical image acquisition system is shown in Figure 7. A scene being imaged is captured by a C C D camera. The discrete values in the C C D sensor are converted to an analog signal (typically N T S C format) and then captured by a frame grabber that converts this analog signal back to a digital matrix of numbers (i.e. a digital image). Image processing techniques can then be applied to the image to achieve a variety of results. For example an image correction may be performed in order to compensate for non-linearities in the lens, C C D and frame grabber. Figure 7 Image Acquisition Set Up F R A M E G R A B B E R (AID) IMAGE C O R R E C T I O N image S C E N E LENS C A M E R A To overcome the soft / sharp edge ambiguity, two or more images are acquired under different focal gradients. This can be represented as shown in Figure 8. The two defocus psfs h| and f)2 include the effects of the lens, C C D camera, frame grabber and initial image processing corrections. The image correction is designed so that it exactly cancels the non-linearities and intensity variation between apertures. These psfs can be termed camera psfs. The "scene" / "image" can be chosen to be a small region of the entire "scene" / "image". This small region can then be thought of as having one particular depth associated with it. Figure 8 then shows one particular hi and one particular h2 defocussing a local region of a scene to produce two images of the local region. 39 The camera psfs hi and Ii2 may be described by one of the models of defocus described in Section 4.3. These models (e.g. pillbox, Gaussian, etc.) are parametric models. The shapes remain constant, but they vary in "size" for differing amounts of defocus. The size is a function of the known camera parameters (F, fj, t'2, D s ) and the unknown depth (D 0). Note that the psfs are independant of the nature of the scene. They are functions of the optical system only. Some model for the camera psfs could be assumed. For example the geometric model has: The above equation reveals that pillbox psfs would have differing radii for differing apertures (fj). Since the depth (D 0) is unknown, the exact size of the psf (R) is also unknown. Throughout this thesis (and with no loss, of generality) it will be assumed that image 1 is the sharper image and image 2 is the blurrier image. This is easy to control by using the smaller aperture (larger f) to acquire image 1 (Equation 39). A 9 •where A{ — itR^ • A F D > 1 * = 1,2 + F-D, 2 / i (39) 40 Figure 8 Image Acquisition Model scene h i image 1 h2 image 2 General DFD System In a general DFD system, the acquired images are given. The camera parameters under which these images were acquired are known. Some model for the camera psfs has been assumed. The object distance (D 0) is unknown. Therefore the size of the camera psfs is also unknown. (See Equation 39). The size of the camera psf is a measure of the amount of defocus. Recall that in order to overcome the soft / sharp edge ambiguity, the relative defocus must be measured. Relative defocus can be thought of as defining how much more the sharp image would have to be defocused in order to be as blurry as the blurry image. This is equivalent to blurring image 1 with a new defocus psf, li3 as shown in Figure 9. This psf is not a camera psf, but a mathematically defined psf. 41 Figure 9 Relative Defocus System A large amount of relative defocus would be indicated by a large spatial spread of h3 or a very narrow (low) frequency distribution in the spectrum of 113. Similarly a small amount of relative defocus would be indicated by a small spatial spread of 113 or a wider frequency distribution in the spectrum of 113. The goal of a DFD system could now be summarized as follows: Given images 1 and 2 acquired under differing focal gradients, determine some measure of the relative defocus psf, [13 and use this measure to imply a particular depth. Derivation of H 3 If Ii3 is to be used as an indicator of depth, it should also be expected that a measure of 1:3 would not suffer from ambiguities of its own. Although it is impossible to prove that I13 is corresponds to a unique depth for arbitrarily shaped camera psfs, it is intuitive when considering realistic camera psfs. Consider a DFD system focussed at some D p . Both images would be exactly in focus at that depth. The relative defocus at D 0 — D p would then be zero (corresponding to hi(x,y) = Ii2(x,y) = h'i{x,y) — 6(x,y), the two dimensional impulse function). 42 As D 0 is displaced from D p , the amount of defocus in each image increases. However, this occurs at different rates corresponding to the different focal gradients determined by the different aperture sizes (f\, f2). The relative defocus should be a measure of the separation of these two defocus amounts. This would obviously increase as the D 0 is further displaced from D p . This is shown graphically in Figure 10. This increase in relative defocus should reveal itself in an increase in the spatial spread of I13. A different spatial spread would imply a different 113. Ens [6 pp.64-65] proved that (13 was a unique indicator of depth assuming the geometric (pillbox) model of camera defocus. Appendix Section C.4 contains a proof that I13 is a unique indicator of depth assuming the Gaussian model of camera defocus. Figure 10 Relative Defocus Several methods of measuring the relative defocus have been used by various researchers. These often involve a ratio of the frequency spectrums of hi and fi2. This approach is derived below. 43 The following relationships can be written upon inspection of Figure 9: h(x,y) = s(x,y) * hx(x,y) i2(x,y) = s(x,y)*h2(x,y) hn{x,y) = H{X, y) * I13(x, y) (40) In the frequency domain: /J./••-./:) >•[).:•, I11 .{.(.• • j), ) (41) H f x J y ) = s(fx,f,J)n2(fxjy) (42) hn(U-Jy) = h(fX,fy)H3(fx>fy) (43) where the following are Fourier Transform pairs: h(f*,fy) = Hh(x,y)} Wx,fy) = F{i2(x,v)} hn{fx , fy) = F{i2n(z,y)} Hx{fx,fy) = T{h1{x,y)} IIAL-.JV)---:'--{!>•.!(•••.•.!,)} ./:,-../„) ^F{h:d^!l)} (44) From Equation (43): MU,fy)=Iriff"!S} (45) h{.tx,.ly) When l2n(fx,fy) and i2n(x,y) are exactly equal to l2(f x,f y) and i2(x,y), then H3 will be a measure of the relative defocus between image 1 and image 2. 44 /:/ I f f \ _ hjfxjy) iiaUxJy) - 7 7 7 — T T S(L,fy)H2(fx,fy) S{UJy)Hl{fXJy) H2(fx,.fy) Hl{fx,fy) That is: which can be rewritten as: (46) TT If f \ _ hjfxjy) _ H2(fx,fy) , h U - J " ] " '7777777) ( 4 7 ) I-h(fx, fy)H3(fx, fy) = H2(f.,Jy) hi(x,y)*h3(x,y) = h2(x,y) (48) From the given images, H3 could be determined by the element by element division of the given image spectrums, Ij and I 2 . The frequency spectrum distribution of H 3 is often used as the measure of relative defocus and therefore an indicator of depth. The psf h 3 could then be obtained via an IFFT. Some measure of the spread of h 3 could also be used to as the measure of relative defocus and to imply depth. There are some problems inherent with this obvious approach. It would require a trans-formation to the frequency domain. An FFT makes several assumptions about the data. Some form of windowing must be employed. This would inherently include all the problems associated with windows such as truncation problems (Gibbs phenomenon), resolution problems, etc. Fur-thermore, windows make assumptions about the signal outside the window. Zeros are assumed 45 outside the window if processing is done in spatial domain. A periodic signal is assumed if FFT's are applied to a windowed signal. The signal outside the window is neither zero nor repetitive, it is unknown. These factors significantly reduce the accuracy of any frequency domain DFD analysis. [6 pp.69-76] The obvious alternative is then to contain the analysis in the spatial domain. Equation 47 can be rewritten as W * , f y ) = h(L,fy)H3(fX,fy) h(x, y) = ii(x, y) * h3(x, y) (49) The psf h 3 could be determined by an "inverse convolution" or "inverse filtering" approach. The discrete case of the above convolution can be represented as a matrix multiplication [10 pp.205-228]: «2 = H l}3 h3 = H1i1 (50) where • i2_ is an M N x l vector representing an M x N image 2 sector in stacked row form. • li3_ is an M N x l vector representing an AxB kernel padded to M x N with zeros in stacked row form. • ji_ is an M N x M N matrix representing a CxD image 1 sector padded with zeros to obtain a block circulant matrix. M > A + C - 1 N > B + D - 1 46 Note that the above matrix equation represents a conventional convolution in which the psf (h3) is convolved beyond the edges of the input image sector (ii) resulting in a larger output (ii). In order to overcome border effects that arise from the assumption of zeros beyond the image sector border, a restricted convolution can be implemented. A restricted convolution only convolves the psf (h3) within the input image sector (ii) resulting in a smaller output O2). In this case, the matrix vy becomes a Toeplitz-block Toeplitz matrix [6 p.67, Appendix B]. In either case, the spatial domain inverse filtering requires a matrix inversion to solve for h 3 . However, since the data (i| and i2) being used in the matrix inversion is subject to noise, the inversion becomes very unstable [6 p.79]. Various smoothing techniques could be used to try to alleviate this problem. An alternative approach is to avoid the inverse formulation with image data and solve the DFD problem using only the forward formulation of the image relative defocus relationship. This is the impetus for Iterative DFD (I-DFD). Note that forward and inverse formulations here are used with respect to image data. I-DFD will also require an inverse formulation, but it will not be with unconstrained noisy image data. Iterative DFD System Model Iterative DFD (I-DFD) seeks to measure the relative defocus psf (h3) without solving the inverse convolution. This can be implemented as shown in Figure 11. This type of DFD implementation was first used by Ens [6]. 47 Figure 11 Iterative DFD Scheme s(x,y) h^x.y) i/x.y) h3(x,y) Relative Defocus Kernel Generator NEXT Do CURRENT Do i2 n( x-y) i?(x-y) l-DFD SYSTEM Decision Do The Relative Defocus Kernel Generator (RDKG) would generate a unique relative defocus psf (I13) for any particular depth (D 0). The comparator would determine how well matched image i2n and image \ 2 are. Based on this measure, a new (D0,li3), combination would be tried. The process could be repeated until the comparator determines that the two images are a good match. The relative defocus psf used to obtain this match is associated with a unique depth. This is the depth (D 0) that is desired. The comparator is straightforward in design. The difference in intensities of each pixel between the two images could be averaged over the local region of interest as a measure of how well the images match. The decision function selects the next depth to try. There are many "canned" searching schemes that can be used. One specific implementation will be described later. The decision 48 function also determines when the match between the two images is "good enough". The accuracy of this system will depend on how well the relative defocus psfs match the camera psfs and correlate to the desired depth. There are an infinite number of relative defocus psfs that could be used. A well designed approach to determining these psfs would be needed. This issue is pursued in Section 5.4. The speed of this system will depend on how quickly the convolution with the relative defocus psf can be performed. Since this is an iterative scheme, the convolution will have to be performed many times for each local region being processed. The two dimensional convolutions would be the bottleneck in the throughput of the system. This issue is pursued in Section 5.5 5.4 RELATIVE DEFOCUS KERNEL GENERATOR Overview The purpose of the relative defocus kernel generator (RDKG) is to generate a relative defocus kernel (h^) for a given object depth (D 0) input. The camera parameters are assumed known. The R D K G should be able to output a kernel for any depth within a given working area. The relative defocus kernel (113) must characterize the relationship between the two camera psfs ( h i , h 2). Two main issues in the R D K G then arise. The first issue is determining how to model the camera psfs. The models available may be broadly grouped into two categories: theoretical models and measured models. The second issue is deals with shaping and smoothing h 3 . Straightforward calculation of h 3 does not yield useful relative defocus psfs. Two methods of shaping and smoothing examined are: Gaussian shaping and regularized shaping. A block diagram of the R D K G is shown in Figure 12. The blocks T, M , R and G represent Theoretical models, Measured models, Regularized shaping and Gaussian shaping respectively. Either T or M may be used, but not both. Either R or G may be used, but not both. 49 Figure 12 Relative Defocus Kernel Generator (RDKG) Do Determine hi T M Determine h 2 Do h i R G h 2 RDKG. Determine h 3 h 3 Theoretical Models (T) This subsection describes the "Determine h i " and "Determine h 2 " blocks under "T" control. A theoretical model of camera psfs can be written and evaluated as an analytic function of depth (D 0). For example, from Section 4.3, the pillbox model of camera psfs is: hi{x, y) = 0, otherwise where A{ = irRJ 2fi \DJ 1 2fi i= 1,2 (51) 50 The Gaussian model of camera psfs is: ht(x,y) 27T(7? 2ft \D0J 2fi J i 2kfi\D0)+ 2k i = 1,2 (52) where k is a constant approximately equal to A / 2 for many lenses. In either of the above models, the psf can be fully described for a given D 0 . Note that a theoretical model does not preclude measurements or calibrations. For example, the Gaussian model relates the spread of the Gaussian shape to other optical parameters by a single constant (k). If a calibration were performed to accurately determine the value of the constant, then the psf could still be modelled at any depth. This is still an analytic model. If however, a calibration was needed at each new depth value, then the psf would not be analytic in nature. It would be a measured model. The advantages of theoretical models of camera psfs are obvious. The psf can be described at any depth value. No calibrations (or fewer calibrations) are needed. Therefore the model is easily transferrable to different cameras and lenses. The disadvantage of theoretical models is that they may not model a given lens as accurately as a measured model would. Measured Models (M) This subsection describes the "Determine h i " and "Determine h 2 " blocks under " M " control. A measured model requires experimental calibrations in order to describe the camera psf as a function of depth (D 0). Since only a finite number of depth values could ever be calibrated, 51 it would be impossible to fully describe the psfs. Some interpolation scheme would have to be used between the calibrated depth values. Section 4.3 described a procedure for obtaining lsfs and psfs from the image of a step edge scene. This process could be repeated with the step edge located at many different depths. Thus a table of camera psfs could be created for a range of depth values within the working area. Simple linear interpolation could be used to generate relative defocus psfs for depth values located between tabulated values. The obvious advantage of a measured model psf is that it would be optimized to the camera and lens in use. The disadvantages of a measured model is the need to perform calibrations, the inability to calibrate the psf at all depth values and the lack of transferability to other systems. Relative Defocus Kernel Calculation This subsection describes in general terms, the "Determine 113" block. Once the camera psfs are known, the relative defocus psf should be calculated. In practice, the point spread functions are rotationally symmetric. Therefore the camera line spread functions fully describe the system. Appendix Section C.4 formally derives the following intuitive lsf relationship: Hlx{Jx)Hzx(fx) = H2X(fx) M * ) * M * ) = M * ) (53) The discrete version of this relationship may be written as; h2x[x) - hix[x] * h3x[x] 52 a (54) Gonzalez [10 pp.209-213] describes in detail a matrix implementation of a discrete convo-lution as follows. Given the discrete relationship: g[x] = f[x] * h[x) Let the lengths of the signals f and h be A and B respectively. To avoid circular wrap around errors (aliasing), the signals should be zero padded to a length of M , where M > A + B — 1. The convolution can then be written as: M-l It is easy to verify by expanding that this can be equivalently written in matrix form: (56) r .#] i r Mo] h[-i] = h[l) h[0] -g[M-i}. _h[M - 1] h[M - 2] h[-M + 1] h[-M + 2] /ifOl /[0] /[I] f[M - 1] (57) Furthermore, the nature of sampled signals inherently assumes periodic signals (with period M). Therefore the above relationship may be rewritten to use only positive indices by adding M to all negative indices: r m i -9[l] = .g[M-l]. h[0] h[l] h[M - 1] MO] Mi] h[2] .h[M - 1] h[M - 2] • f[Q) /[I] J[M - 1] (58) In a simpler notation this is: 9 - he f (59) 53 A single underscore indicates a vector. A double underscore indicates a matrix. h c is a circulant matrix formed from the elements of h[x] as indicated below. A circulant matrix has rows that are circular shifts of each other. The (k,i)th element of the matrix is given by: Applying this result to the camera defocus relationship yields: h-2x - h l x c h 3 x . . . (61) From this, the relative defocus kernel (I13) can be solved as: h 3 x = (hixc) h 2 x (62) It is important to understand the difference in solving for l i 3 X in this expression compared to solving for h 3 x in an inverse filtering approach (Equation 50 in Section 5.3). In inverse filtering, h3 X is determined from image data because the camera lsfs (h]X and h 2 x) are not known. In the R D K G , the camera lsfs (and the depth) are known. They are modelled based on either theoretical or a measured optics. Ens [6 pp.76-78] has demonstrated that the errors associated with solving for h3 X based on camera psfs are smaller than solving for h3 X based on image data. This is due to two factors. The first is the noise the additional noise that would be present in image data compared to theoretical camera lsfs or measured (and averaged) camera lsfs. The second is the more severe effects that windowing would have on image data that is definitely not spatially limited, nor necessarily smooth and tapered compared to the more appealing shapes, smoothness and spatial extent of camera lsfs. However, the l i 3 X that results from this process still has undesired properties. For the simple case of pillbox camera lsfs, the relative defocus lsf is erratic and does not taper to zero. A smooth fi3 X would be desired as it would then not overreact to noise in image data during the convolution operation. Zero tails are also desired in order to effectively contain the kernel in a finite spatial extent. 54 In order to overcome these deficiencies, several approaches may be taken. Regularization may be used to impose smoothness and zero tails. Alternatively, a specific smooth, zero tailed function (e.g. Gaussian) may be imposed on the kernel. Regularized Shaping (R) This subsection describes the "Determine h 3 " block under "R" control. Regularization is a technique that can be used to accomplish smoothing and shaping. Ens [6] was the first to apply regularization techniques to the DFD problem. The procedure described below follows his guidelines. The matrix equation h'2x = hlxc Il3x (63) can be rewritten as follows. Find the h 3 x that minimizes hlxc h3x — h.2x = minimum where the norm of a vector (v) of length M is defined by: M - l N I = E ( " H ) 2 (64) (65) The above minimization is equivalent to the earlier matrix equation (Equation 62). If other constraints other that merely satisfying the equation were desired, other terms would be added to the minimization. To add a smoothness constraint, a term that detects roughness of a function (such as a derivative) would be added. An appropriate weight would be associated with this term. To add a zero tail constraint, a term that detects non-zero tails would be added with an appropriate weight. In order to maintain consistency with Ens's [6 pp.96-99] results, the h 3 x desired is the one that minimizes the expression: hixc h3x-h2x\\ + Xi\\Rc h3x\\ + A 2 Id h3x\\ = minimum (66) 55 The weights were chosen as A, = 0.1 and A 2 = 1.0. Rc is a derivative function (R[x]) implemented in a circulant matrix. Recall from Equation 59 that premultiplication with a circulant matrix implements a discrete convolution. Specifically, R[x] is the Laplacian of a Gaussian: Y ' A / 2 ^ 2 choose a — 1 (67) R[x] = V2G{x) mtti \Pi-KCJ f R[k - i], \ R[k -i + M] 6 V 1 n-2 i < k i > k (68) where the Laplacian is defined by: V 2 G ( x - , y ) ^ (JL + ^ G(X)y) (69) Td is a diagonal matrix that weights non-zero tails. Specifically, |2 T[x] ± . / 7TX 7T ±(T[k], k = i ^\k,d \0, k ^ i (70) T[x] is zero at x = 0 and unity at x = +M/2. The matrices Rc, and hixc are made from shifted (by M/2) functions R[x], T[x] and h l x[x] respectively. This is done in order to centre the origin in the middle of the matrix. The resulting li.3x also has its origin shifted to the centre of the vector. 56 By taking partial derivatives of Equation 66 with respect to every element of h_3X and setting them equal to zero, a closed form expression for the h3 X that minimizes Equation 66 may be found to be: /'•3.x- = {hlxc hlxc + ^i Rc R-c + 2^ Tg1 Tj ) hlxc h2x (71) The above determination of h 3 x could be done during run time. However, the computational effort required by the above matrix calculation would severely slow the system. Therefore the above calculation should be done ahead of time for a range of input depths (D 0). The results of the calculations could then be stored in a look up table indexed by the depth value. For entries located between tabulated depth values, linear interpolation could be used. In short, for regularized smoothing, the R D K G is run ahead of time and a Look Up Table (LUT) is implemented during actual run time as shown in Figure 13. The entry in the look up table could be either the lsf or the psf if the Inverse Abel Transform is used. This choice would depend on which convolution scheme is being used. This shall be discussed in greater detail in Section 5.5. 57 Figure 13 Regularized Shaping R D K G Do Determine hi T M Determine h2 h1x R h h2x a prion run time Determine h3 LUT with Linear Interpolation R D K G . h3 Do 3x s 1xc 1xc '1 c c '2 d d' 1xc 2x To fully illustrate this relative defocus kernel creation, an example is presented below. The parameters used are: F = 50 mm Dp = 2817.22 mm D s = 50.9034 mm fi = 2.8 f2 = 1.4 DR = 65.4545 pixels/mm 58 D 0 = 2000 mm DR is the digitization ratio used when acquiring the image. Using a theoretical pillbox model for the camera psf yields: h\(x, y) — pillbox(Ri) Ii2(x, y) = pillbox(R2) The corresponding lsfs for pillbox psfs are [Appendix Section C.3, Equation 180] 2 2  / The radii are given by Equation 51 R l = ^ 1 (-L) + ^—^1 = 0.06592mm R2 = IEL(±.)+ IZJ?L = 0.13184mm 2.12 \L>o J 2 /2 Converting to pixels for discrete implementation yields: Rip = (Ri)(DR) = (0.06592)(65.4545) = 4.31476pia;e/s Rlp = (R2)(DR) = (0.13184)(65.4545) = 8.62952pix-e/s These pillbox lsfs are plotted in Figure 14. 59 Figure 14 Pillbox Camera Line Spread Functions PIXEL From these lsfs, the relative defocus Isf (fi3X) may be solved according to Equation 71. The above lsfs were implemented in two vectors of length 19 pixels with the origin located in the centre. They were then padded on the right to a length of M = 39. M was chosen to be double the size of the input vectors (plus one to maintain an odd length). The roughness and non-zero tail detection matrices were then generated according to Equa-tions 68 and 70 were implemented in MxM = 39x39 matrices. Three solutions to Equation 71 are shown. Figure 15 shows the resulting Ii3 x if neither smoothness nor zero tails are constrained. This is equivalent to solving Equation 62 directly or solving Equation 71 with A| = X2 = 0. Note that this kernel is very rough and still erratic at the tails. 60 Figure 15 Relative Defocus Kernel (No Constraints) -0.21 , , I I -20 -15 -10 -5 0 5 10 15 20 PIXEL Figure 16 shows the resulting h3x if smoothness is added as a constraint. This is equivalent to solving Equation 71 with A| =0.1 and A2 = 0. Note that this kernel is much smoother than the previous one. The tails are still too erratic though. 61 Figure 16 Relative Defocus Kernel (Smoothness Constraint) 0.1 0.08-Uj 0.06-Q Z) • 0.04-< 0.02--0.02-• h l v / \ M / i \ * 0.1 / 1 1 f \ x2= o.o S( \ 1 1 \ 1 1 1 - i i i i i i i i Tt -20 -15 -10 -5 0 PIXEL 10 15 20 Figure 17 shows the resulting h3x if smoothness and zero tails are added as constraints. This is equivalent to solving Equation 71 with X\ =0.1 and A2 = 1.0. This is the kernel that would be entered into the RDKG Look Up Table (LUT). 62 Figure 17 Relative Defocus Kernel (Smoothness and Zero Tail Constraints) 0.12 -0.02-I • > 1 1 • 1 • 1 -20 -15 -10 -5 0 5 10 15 20 PIXEL The generated relative defocus kernel has been warped away from completely satisfying the original equation (Equations 54 or 61). It should be verified that the regularized h 3 x still satisfies the original relationship reasonably well. Figure 18 shows a plot of the original h2 x as well as h2 x obtained by using the regularized h 3 x in Equation 54. The original h2 x has been shifted in order to centre the origin in the middle of the vector. The two lsfs are still very similar. 63 Figure 18 Comparison of Original and Calculated fi2 0.08 - 0 . 0 2 - 3 0 - 2 0 In a similar manner to the above example, other relative defocus lsfs (of differing lengths) can be created for different depths. They could be stored in the LUT as lsfs or as psfs. The LUT would then characterize the relative defocus for all depths in the table (i.e. the specified working area). Gaussian Shaping (G) This subsection describes the "Determine hj" block under "G" control. Gaussian shaping could be thought of as an extreme case of regularization. In Gaussian shaping, the Gaussian shape is imposed on the relative defocus Isf. This is different from' regularization in that it is not a requirement that could be relaxed according to the weight associated with it. Here the weight associated with a Gaussian shape would be "infinite". There are many reasons for choosing a Gaussian as the shape to impose on the relative defocus Isf. A Gaussian function is smooth and has tails that taper to zero. Both of these are desired properties in the relative defocus Isf. Furthermore, it is a separable function and could 64 be implemented in separable (and therefore faster) convolutions. This speed factor is pursued in the next section. A Gaussian shape may be imposed on the relative defocus lsf (h3x) regardless of the model (theoretical or measured) that is being used for the camera lsfs. Several approaches may be taken in determining the exact Gaussian to relate a given pair of camera lsfs (hix, h2x). One approach would proceed as follows. 1 If Gaussian shapes were imposed on the camera lsfs, then the relative defocus lsf would also be a Gaussian with a space parameter <r3 defined by: <rl=<>l- °l (76) where <j\ and <T2 are the spread parameters of the Gaussians used as the camera lsfs. This is derived in Appendix Section C.4. The remaining question is then, how to determine the Gaussians to impose on the camera lsfs. This may be simply a minimum square error criteria. Forexample, for a theoretical pillbox model of camera psfs, choose the Gaussian with the a that minimizes the error function: minimum error = mini [g(x,y,a) — p(x,y, R)]" > (77) l(*-v) J where g is a Gaussian: g{x,y,a) = -—je ^ (78) and p is a pillbox: A H , x2+y2<R2 w ^ ' " ^ ' 0, otherwise here A = TTR,1 „ A FD, 1 F - D , 65 The summation over (x,y) may be taken over the range ±3a since taking the summation over infinity would be impractical. A kernel extending to infinity could never be implemented. By performing the above calculations for a range of both radii (R) and Gaussian spread (cr), it was found that the minimum error was obtained when the spread was chosen to be a = 0.6R. A plot of these results is shown in Figure 19. Of course different relationships would have to be derived for different camera Isf models. Figure 19 Gaussian Approximation to Pillbox Kernels OC O rr rr LU Q LU DC < a CO CO 0.01 0 . 0 0 7 5 0 . 0 0 5 0 . 0 0 2 5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1 SIGMA [Normalized To Radius] 8 2 In short, this Gaussian shaping RDKG would operate as follows: 1. Choose hi(x,y) = (n(x,y,ai) with a\ — 0.6i?i 2. Choose Ii2(x,y) = </2(£, y, c2) with <x2 = 0.6i?2 3. Define h3(x,y) — g3{x,y,(r3) with <r| = <r2 - <r\ 66 It should be noted that the above operations are not computationally expensive. This RDKG can therefore be implemented as is with no look up table. This also removes the need to interpolate between tabulated values. A kernel can be calculated "on the fly" for any depth value input. An example of such a calculation based on the same parameters in the previous subsection follows: Recall from Equation 74 that the pillbox lsfs are defined by: (80) Therefore, according to the above procedure: = O.6R1 = (0.6)(0.06592) = 0.039552mm cr2 = 0.6i? 2 = (0.6)(0.13184) = 0.079104m??} (81) From this the relative defocus kernel may be written as: 27TCT* 1 or h-3(z,y) = h3x{x)h3y{y) = ( ) (82) The latter form lends itself to separable implementation as described in the next section. That is the RDKG would output two ID kernels rather than one 2D kernels. 67 5.5 CONVOLUTION IMPLEMENTATION One of the main elements of the l-DFD system (Figure 11) is the convolution of a local image region (i|) with a relative defocus kernel (h3). There are several ways in which this could be done. A straightforward convolution is not appropriate. This would imply convolving the kernel (h3) beyond the ends of the input image sector (i[) assuming zeros outside the image sector. The output image sector (\2„) would be larger than the input image sector. Since the assumption of zeros is not true, the output image sector would contain side effects. Often, various windowing schemes are used to try to minimize these side effects. An alternative approach that Ens [6] found to be a superior solution was a restricted convolution. This kind of convolution only convolves the kernel within the limits of the image sector. The output (J2N) of this operation would be smaller in size than the input image sector. Note that the kernel (h3) is assumed to be zero outside its spatial support. Three methods of implementing restricted convolutions are presented below. These are two dimensional, separable and integrated convolutions. The definitions and descriptions presented are for restricted, not conventional convolutions. Consider an input image sector of size AxB and a relative defocus kernel of size CxD. The output image sector would be of size ExF. If the kernel is only allowed to convolve within the input image sector, then the output image sector's dimensions must satisfy: Sz(ouiput) + Sz(kernel) — Sz(input) + 1 (83) where Sz(x) represents the size of x (see Figure 20). Specifically: E + C = A + 1 F + D = B + 1 (84) 68 Figure 20 Restricted Convolution Size Relationship C Kernel Image Input Image Output Two Dimensional Convolution The two dimensional (restricted) convolution is the most straightforward. It is defined as: i-2n{x,y) = h(x, y) * h3(x, y) r • ii(a, fj)h3(x — a, y — (3)dad(3 (85) The discrete version of this would be: i2n[x, y) = h[x,y] * h3[x,y] Pa (86) 69 The limits for the above two definitions are defined by: • (x,y) exist within the ExF Image Output Sector. • (<x,P) exist within the AxB Image Input Sector. This convolution obviously requires a 2D input images sector and a relative defocus psf (as opposed to lsf). The output is a 2D image sector. Using the dimensions defined in Figure 20, the computational effort of a two dimensional convolution can be calculated. For each output pixel, the kernel elements are multiplied by input image pixels and summed. This would be CD multiplications and (CD-I) additions for each output pixel. The total for the entire output sector is then (CD)(EF) multiplications and (CD-1)(EF) additions. Assuming square image sectors and kernel dimensions for simplicity (A=B, C=D, E=F) yields: For most practical kernels, C 2 would be much more than one. Separable Convolution A separable convolution can only be done with separable kernels. Appendix Section C.4 proves that convolution with a 2D Gaussian kernel is separable and may be implemented as two ID Gaussian convolutions. The separable convolution is defined as: i2,i(x, y) = ii(x,y) * h3x{x) * h3y{y) mutts = C2E2 adds = (C2 - l)E2 ~ C2E2 (87) (88) 70 The discrete implementation of this would be: a followed by * 2 « [x, y] = J2 ^ a> $ H h ~ ft ^ P The limits for the above two definitions are defined by: • (x,y) exist within the ExF Image Output Sector. • (Q',/3) exist within the AxB Image Input Sector. This convolution requires a 2D input image sector and a separable relative defocus psf. The output is a 2D image sector. This scheme would be possible if Gaussian smoothing was used, but not possible if regularized smoothing was used. Additionally, it would be very easy to implement slightly different kernels in each direction to compensate for images with non-unity aspect ratios. Using the same dimensions defined in Figure 20, the computational effort of separable convolution can be calculated. The first convolution (in x) requires C multiplications and (C-1) additions for every output pixel. There are EB output pixels for the first convolution. Note that the number is not EF since additional pixels (above and below the output sector) must be calculated for use in the second convolution. The second convolution requires D multiplications and (D-l) additions for each of the EF output pixels. In total this yields: mults = (C)(EB) + (D)(EF) adds = (C - 1){EB) + (D - l)(EF) (90) Assuming square dimensions (A=B, C=D, E=F) yields: mults = (C)(EA) + (C)(E2) 71 = C(E2 + AE) adds = (C - 1){EA) + (C - l)(E2) = (C-1)(E2 + AE) ~ C(E2 + AE) (91) In practice, the size of the kernel (C) varies from one pixel to about the same size as the output image sector (E). This means that as a worst case, A is approximately equal to 2E (see Equation 84). A more typical approximation would be A equal to 1.5E. Using the worst case scenario implies: mults ~C(E2 + 2EE) = ZCE2 adds ~ C(E2 + 2EE) = ZCE2 (92) Integrated Convolution Integrated convolutions can be derived as follows: From Appendix Section C.3, it was shown that for a system described by: hx{x,y) * h3(x,y) = h2{x,y) (93) the following is also true: hix(x) * h3x(x) = h2x{x) (94) Since no special properties of camera lsfs were used to obtain the above result, it may also be stated that for a system described by: ii(x,y)* h3(x, y) = i2n{x, y) (95) 72 the following is also true: ilx(x) * h3x(x) = i2nx(x) where i\x{x) = J ii(x,y)dy i-2nx{x) = J i-2n{x, y)dy h3x(x) = J h3(x,ij)dy This is the basis of integrated convolutions which are defined as: l2nx{x) = In a discrete implementation this becomes: 'hx[x\ = ^2'ii[x,tj] y followed by i2nx[x] = X "iix[a]h3x[x - a] Similarly, the orthogonal relationship would be: iiy[y\ = Y^ldx,y] X followed by hnyiy] = Yjily^h^y ~ ft The limits for the above definitions are defined by: • (x,y) exist within the ExF Image Output Sector. • (a,/3) exist within the AxB Image Input Sector. v(x,y)dy\ * h3x(x) 73 Note that due to rotational symmetry, h3x(x) is identical to h3y(y). The same is not true for the integrated images, i|X(x) and iiy(y) since no rotational symmetry could be assumed. There are important effects associated with integrated convolutions. An integration operation is a low pass filtering operation. This has two consequences. Firstly, integration would eliminate any zero mean noise that occurs in the direction of integration. Secondly, integration may mask important frequency information that occurs in the direction of integration. Fortunately, this information may not be lost in the orthogonal direction. Consider the black (0) to white (1) step edge represented by the matrix: --o 0 1 1 1- •3" 0 0 1 1 1 3 0 0 1 1 1 3 0 0 1 1 1 3 .0 0 1 1 1. . 3 . [0 0 5 5 5] The column vector on the right represents the results of an integration in the horizontal direction. The row vector below represents the results of an integration in the vertical direction. All traces of the step edge are lost in the horizontal integration, yet it remains perfectly intact in the vertical integration. The magnitude of the elements would need to be scaled if comparison to the original matrix were desired. An implementation that uses integrated convolution would therefore have to perform a convolution in at least two directions in order to not miss important information. Integrated convolution requires integrated image sectors and integrated kernels (lsfs). This means that the Inverse Abel Transform used to convert lsfs to psfs would not be needed. Additionally, it would be very easy to implement slightly different kernels in each direction to compensate for images with non-unity aspect ratios. Using the same dimensions defined in Figure 20, the computational effort of integrated convolutions can be calculated. Only the horizontal convolution (vertical integration) shall be 74 considered for the moment. The integration of the input image would require (B-l) additions for each integrated pixel or (A)(B-I) additions to integrate the entire input image sector. Once integrated, the convolution requires C multiplications and (C-l) additions for each (integrated) pixel in the output image sector, or (C)(E) multiplications and (C-1)(E) additions for all E (integrated) pixels in the output image sector. In total, this is: mults = CE adds = A{B - 1) + (C- \){E) ~ A0 + CE (101) where it is assumed that C » 1. Furthermore, if square image sectors and kernels are assumed (A=B, C=D, E=F), this becomes: mults = CE adds ~ A2 + CE (102) To process in both the horizontal and vertical directions, the computational effort would double: mults = ICE adds ~2A2 + 2CE (103) Convolution Summary The computational effort required by the above convolutions are summarized in Table 1 Table 1 Convolution Computation Comparison 2-Dimensional Separable Integrated Multiplications C 2 E 2 3CE2 2CE Additions C 2 E 2 3CE2 2CE+2A2 75 Table 1 (Continued) Convolution Computation Comparison 2-Dimensional Separable Integrated CE Mults N 2 3N1-5 2N CE Adds N 2 3NK5 6N From the above table, it is easy to see that 2-D convolutions vary quadratically with the image sector size (E) and the kernel size (C). Separable convolutions vary quadratically with the image sector size, but only linearly with the kernel size. Integrated kernels vary linearly with both. Multiplication would be the dominant operation. Assume that the output image sector (E) is about the same as kernel size (C). Define N as follows: N = CE = C2 = E2 A=C+E-1~C+E= 2C A'2 = (2C*)2 = 4C' 2 = 4N CE2 = E3 = N1'5 (104) The above relationships can then be used to describe the convolutions in terms of CE operations as indicated in the table. This quickly reveals how fast each of the convolutions are. 5.6 ALGORITHM HIERARCHY The l-DFD systems presented here can be divided into a hierarchy. First the type of model for the camera lsfs must be chosen (Theoretical or Measured). Secondly, the type of shaping and smoothing used in determining h3 must be chosen (Regularized or Gaussian). Thirdly the type 76 of convolution to be implemented must be chosen (2-D, Separable or Integrated). This hierarchy is shown in Figure 21. Figure 21 I-DFD Algorithm Hierarchy (ENS) (ENS) The algorithms shall be referred to by three letter acronyms. For example, the TR2 algorithm uses a theoretical camera Isf model, regularized shaping and two dimensional convolutions. Note that there is no two dimensional convolution (2) associated with Gaussian shaping (G). There is no difference in the result of a two dimensional convolution compared with a separable convolution (S). However, the separable convolution is faster. Note that there is no separable convolution associated with regularized shaping (R).Tn general, a regularized kernel would not be separable. 77 Previous Work Previous research in DFD has concentrated on non-iterative techniques. Much of this work also concentrates on frequency domain approaches. Iterative DFD algorithms has been published by Subbarao and Ens. Subbarao's iterative method processed data in the Fourier domain. Ens iterative method processed data in the spatial domain and corresponds to TR2 and MR2 in the hierarchy. TR2 shall serve as a benchmark for comparison of the new algorithms. Refer to the literature review for further information on these and other DFD schemes. New Work Although Gaussians are often used as camera Isf models, they have not been pursued as relative defocus kernels in iterative schemes. Integrated image sectors have been used (by Subbarao), however the used of integrated kernels and integrated convolutions has not been used in Iterative DFD. The remaining algorithms (TRI, TGS, TGI, MRI, MGS, MGI) could then be considered new approaches to Iterative DFD. Implementations TR2, TRI, TGS, TGI, MR2 and MRI were implemented. TR2 and MR2 will serve as a basis for comparison for the new algorithms. In order to implement MGS and MGI, it would be necessary to develop a general Gaussian approximation scheme as the measured lsfs would not correspond to any predictable shape. This could be done based on statistical measures of the measured lsfs. However, the reliability of such a system has not yet been determined. Therefore MGS and MGI were not implemented. It was found that the lens used in the experiments behaved quite closely to theoretical pillbox models. Although MR2 and MRI algorithms were implemented, emphasis was placed on the theoretical algorithms. The theoretical algorithms are easier to restructure to different environments 78 because they do not require calibrations. It is expected that any measured algorithm could only improve on the results obtained from the theoretical algorithms. The specifics of the blocks "Determine hi", Determine "h2" and "Determine IV are stated below. The theoretical pillbox model is used for each of the Txx algorithms. The digitization ratio would be used to convert the expressions in mm to expressions in pixels. Note that the TRI, TGS and TGI RDKG's output two ID kernels rather than one 2D kernel. These algorithms use two digitization ratios (horizontal and vertical) in order to accurately reflect non unity aspect ratios. TR2 does not separate the psf into horizontal and vertical components, therefore a single average digitization ratio is used. The TR2 and TRI algorithms use a look up table (LUT) as indicated in the regularized RDKG block diagram (Figure 13). The TGS and TGI do not use a LUT as indicated in original RDKG block diagram (Figure 12). For the TR2 algorithm: 1. Determine hi: (105) 2. Determine h2: (106) 3. Determine h3: h3\x,y\ = Inverse Abel Trans f orm{h3x} (107) Store the h3 psf in the LUT. 79 For the TRI algorithm: Determine h i : Determine h2: Determine h 3 : 2 l>3x — ( hixcT h\xc + Ai RCT Rc + X-2 TdT Td Similarly: ll3y — (jllycT h\yC + Ai RCT Rc + A 2 TrfT Trf Store the horizontal and vertical h3 lsfs in the LUT. For the TGS algorithm: Determine h i : Only the pillbox radius needs to be defined. 2 / i \DJ ' 2 / i Determine h2: Only the pillbox radius needs to be defined. Determine h 2 / 2 V^ o7 2 / (Tl = O.6.R1 <J2 = 0.6^2 ° 3 = \JA ~ cr'f 1 - - ^ h3x(x) = 2 e J ' 3" \/27r(7g H O ) = J - ^ T e y 27T(T3 80 For the TGI algorithm: 1. Determine h\: Only the pillbox radius needs to be defined. 2A \D0J 2h 2. Determine h2: Only the pillbox radius needs to be defined. FDS / 1 \ F - Ds ^1 = -^" "FT +-ST-L (H5) Determine hy. FDS / 1 \ F - Ds ^ = I F U J + — (,,S> ax = O.GRi a2 = 0.6R2 <r3 = \/v2 - o-! 1 ft3y(y) = ~ i ^ e " ^ (117) \/2 7r<To Note that the TGI RDKG is identical to the TGS RDKG. This is due to the fact that the integrated Gaussian kernel is identical to the separated Gaussian kernel. The kernels are however used differently in the convolutions as previously described. 5.7 WORKING A R E A The working area specifies the range of depths in which the l-DFD system is expected to operate. This range is a somewhat arbitrary choice and would depend on many factors. Although the environment (i.e. intended application of the l-DFD system) would define a required working area, some analysis of l-DFD parameters could determine whether the system could function reliably in the required working area. There are two main l-DFD parameters that determine the applicability of the working area. These are focal gradient and spatial support. 81 Focal Gradient DFD schemes measure defocus and use this to infer depth. A steep focal gradient implies that a small change in depth corresponds to a large change in defocus (high sensitivity). Consequently, a large error in defocus measurement would only correspond to a small error in depth. This is clearly the ideal situation. Conversely, a shallow focal gradient would imply that a small error in defocus measurement would correspond to a large error in depth estimate. Recall from Equation 17, that the focal gradient (based on geometric optics) is: dR ±FDS ( 1 \ focal gradtent=— = — j (118) Since focal gradient is a proportional to inverse distance squared, it would be desirable either to use a working area with small D 0 (and therefore large focal gradient) or to choose camera parameters to compensate for larger D0. Furthermore recall that DFD systems are based on relative defocus. Therefore the focal gradient of the relative defocus must be considered. The same equation as above can be used for relative defocus, if a relative aperture (f3) is used. The relative aperture will relate the two apertures (f,, f2) under which the images were acquired. The remaining parameters (F, Ds) are constant between the two images. Appendix Section C.4 derives a relationship between the apertures: (1.9) f i ' f i fl Although this relationship was derived for Gaussian psfs, it also holds for geometric (pillbox) psfs. A similar proof as in Appendix Section C.4 could be used by assuming R = ka where R is the pillbox radius and a is the Gaussian spread parameter. If the average focal gradient (fgavg) for a given working area [Domjn, D o m a x] is defined as: fgavg = 7 p ; / {.focal gradient)d,D0 'omin J DOM,N 82 it is straightforward to show that: FD, 2 /3 ) ( D, omm 1 D, omax ) (121) This average focal gradient could be used as a measure of. the expected accuracy in a given working area. The size of the relative defocus kernels varies with depth. The spatial support available for the relative defocus kernels should be able to effectively contain even the largest kernel that the working area requires. Of course truncation and / or windowing of the kernels could be used to make the most of the available spatial support. Say the size of matrix available for the relative defocus kernels is K pixels (or KxK for 2D kernels). The working area could be defined as [D o mj n, D o m a x]. D o m i n is the depth at which the size of the relative defocus kernel is K. D o m a x is the depth at which the size of the relative defocus kernel is some minimum size. For simplicity in the analysis, this size could be taken to be one pixel, corresponding to an impulse function. This would of course occur at the distance Recall that the environment has been constrained such that all objects are closer to the camera than the plane of sharp focus (i.e. D 0 < Dp). The size of the relative defocus kernel depends not only on the optical system, but also on the type of shaping (regularized or Gaussian) used in the RDKG. Spatial Support D. omax (122) 83 In regularized shaping, the relative defocus kernel is created from a matrix that is made from padding the camera lsfs to twice the size of the larger camera Isf. The larger camera Isf is h2 and has a size of 2R2. After padding, this has a size of 4R2. Recall from geometric optics that: FDS / 1 \ F - Ds *2 = Tf7UJ + ^ T m m m ( 1 2 3 ) For this to fit within the given spatial support: 4* 2 <oio • ( 1 2 4 ) DR is the digitization ratio and converts K pixels to K/DR mm. Expanding R2 and solving for D 0 yields D o m i n . FDX ( 1 \ + F - Ds ^ K 2 / 2 \D0J 2 / 2 A(DR) A(DR)FDS D0 > 2f2K -4(DR)(F - Ds) MDR)FDS o m m ~ 2f2K - A(DR)(F - Ds) ( ' In Gaussian shaping the relative defocus kernel is a Gaussian contained to +3cr3. This has a size of 6<J3. In Appendix Section C.4, Equation 233 relates the relative defocus spread as: FD. f l \ . F-Dx ka3 2 / 3 \D0J 2 / a f'i / f /1 where -5- = -5- - - j (126) In Gaussian shaping for pillbox psfs, the model a = 0.6R was used. That is, k = = 1.667 For the kernel to fit within the spatial support, 2kf3 \D0J^ 2kh DR D0 > Kkf3-3(DR)(F - Ds) D • - HDR)FDS ( 1 2 ? ) o m m ~ Kkf3 - i(DR)(F - D,) { ' 84 Example As an example of a working area calculation, the following parameters will be used: • F = 50 mm Dp = 2817.22 mm D s = 50.9034 mm f, = 2.8 f2 = 1.4 Therefore f3 = 1.61658 DR = 65.4545 pixels/mm • K = 64 pixels k = 1.6667 Domin according to the regularized derivation (Equation 125) is 1603mm. D o n , i n according to the Gaussian derivation (Equation 127) is 1429mm. This range can be further extended by employing windowing and scaling techniques to kernels that would have to be truncated to (it within the spatial support of K. In doing so, it is easier to accommodate a particular working area required by a particular application. 85 Chapter 6 IMPLEMENTATION 6.1 INTRODUCTION AND OVERVIEW The purpose of this chapter is to describe the specifics of the various I-DFD implementations studied in this thesis. This includes the specifics of the equipment as well as other details of the algorithm that were not described in the previous chapter. Section 6.2 defines all camera parameters as well as the equipment used in the I-DFD experiments. Sections 6.3 and 6.4 describe the calibrations required to effectively use the equipment. Sections 6.5 and 6.6 specify the final details of the remaining steps in I-DFD. 6.2 ENVIRONMENT Equipment I-DFD was implemented using an image acquisition system as depicted in Figure 7 in the previous chapter. The following is a description of the equipment used: l. Lens: Fujinon CF50B 50mm C mount lens (s/n 662724). 10 bladed manual click stop aperture (F/1.4 to F/22). The decagon shaped aperture was a good approximation to a circle. A lens with a triangular aperture was also tested, but performed poorly. • Manufacturers specifications list the front and rear principal planes as 17.67 and 32.47 mm forward of the C mount flange (respectively). Note that in this lens the principal planes are crossed, i.e. the front plane is behind the rear plane. 86 2. CCD Camera: Panasonic black and white WV-BD400 CCD Camera (s/n 9IB 10819). • 8.8 (H) x 6.6 (V) mm2 CCD sensor. This is equivalent to a 2/3" sensor. Manual shutter speed 1/60 s (off) to 1/10000 s. NTSC output. • Automatic Gain Control (AGC) available, but turned off. 3. Frame Grabber: Datacube frame grabber connected to a unix workstation. It samples the NTSC image at 512 pixels horizontally by 480 pixels vertically. 4. Platform: The I-DFD algorithm was coded in C on a SPARC IPX unix workstation. Camera Parameters The camera parameters chosen for calibration and testing are shown in Table 2 Table 2 Implemented DFD Parameters Quantity Symbol Value Units Notes Focal length F 50 mm -Small aperture fi 2.8 - -Large aperture h 1.4 - -Slow shutter speed - 1/60 (off) s 1 Fast shutter speed - 1/250 s 1 Plane of sharp focus DP 2817.22 mm 2 Image sensor plane D s 50.9034 mm 3 Digitization Ratio (H) DRh 58.181818 pixels/mm 4 Digitization Ratio (V) DRV 72.727272 pixels/mm 4 Digitization Ratio (avg) DR 65.454545 pixels/mm 5 Kernel spatial support K 63 pixels -Close distance Domin 1000 mm -Far distance Domax 2500 mm -87 Table 2 (Continued) Implemented DFD Parameters Quantity Symbol Value Units Notes Working area - 1500 mm 6 LUT resolution - 100 mm 7 Notes: 1. The shutter speeds were chosen to help compensate for the difference in illumination allowed by each aperture. Aperture 1'! was used at 1/60 s. Aperture f2 was used at 1/250 s. In addition a third combination was tested (F/2.0 and 1/125 s). This provided little additional insight so these results were not pursued. 2. The distance indicated on the lens barrel was labeled 3m. However the calibration done in Section 6.4 provided the more accurate estimate. 3. The sensor plane distance was determined by combining the lens law (Equation 4) with the stated focal length (F) and plane of sharp focus (Dp). 4. The digitization ratios must be calculated based on the size of the camera CCD sensor (8.8mm x 6.7mm) and the resolution of the frame grabber (512 pixels x 480 pixels) and not the CCD resolution. This is due to the fact that the frame grabber resamples the (analog NTSC) image after it is output by the CCD camera. 5. The average DR is simply the mean of the DR|, and DRV. 6. The working area is simply the difference between the far and close distances. 7. The LUT resolution is the interval between adjacent entries in the RDKG LUT (for TRI and TR2 algorithms). 6.3 IMAGE PREPROCESSING The image acquisition process consisting of the lens, camera and frame grabber has two deficiencies. First the two apertures pass differing amounts of light to the CCD sensor (even with 88 shutter speed compensation). Secondly, the acquisition system is non-linear. An acquired pixel intensity of say 100 does not imply twice the amount of light as a pixel intensity of 50. The convolution models of defocus all assume a linear shift invariant (LSI) system. Therefore it is essential that the intensities are corrected to linearly represent the amount of light. This type of linearization is common to many DFD algorithms. A correction to the acquired pixel intensities is known as a radiometric correction. This correction should be applied to all images before any further processing is done. Once done, it would linearize the acquisition process. Radiometric Calibration A radiometric calibration of a system can be done with the aid of a Kodak Grey Scale. A Kodak Grey Scale is a set of printed rectangles of varying darkness. The variation in darkness is caused by the variation in density of (black) dots. The grey scale varies in density from d = 0.0 (white) to d = 1.9 (black). The amount of light reflected (as a percentage) is given by [6 p. 103]: Images of each rectangle were acquired through each aperture at the appropriate shutter speed. Care was taken to ensure that the lighting conditions remained constant throughout the calibration. Each rectangle was located in the same position for image acquisition. A displacement of just a few centimeters would compromise the calibration due to non-uniform illumination of the scene. Each rectangle was acquired at the centre of the image to minimize space varying lens imperfections. The intensity of the pixels in a small central region were averaged to eliminate (128) 89 any zero mean noise. The % reflection versus this average intensity was plotted and a polynomial fitted to the data (Figure 22). Figure 22 Radiometric Calibration Characteristics P I X E L I N T E N S I T Y The polynomials that characterized the image acquisitions were: 1. F/2.8: Pl(x) = (1.03362 x 10"5)x-3 - (0.00270419):r2 + (0.61623).x- - (21.6218) (129) 2. F/1.4: P2(x) = (1.49751 x 10-5)a;3 - (0.00419507)x2 + (0.798026)x- - (26.0572) (130) 90 These equations were applied to all images acquired for processing by l-DFD. All pixels acquired with an intensity of x are output with an intensity of p(x). The two characteristics are similar due to the shutter speed compensation. Note that the characterizing polynomials become negative at low pixel intensities. In practice such pixels were clipped to a % reflectance of zero. Similarly large pixel intensities were also clipped if they went beyond 100% reflectance. Furthermore all the pixels were scaled by 2.5 in order to map the images to a more appealing scale of 0 to 255. Other Corrections There are several other corrections that could be done to the images (but were not) in order to more closely approximate a true LSI (linear shift invariant) system. These include corrections for: 1. Cosine Losses: The intensity of acquired light varies with the fourth power of the cosine of the angle between the incident light and the optical axis [23 p.64]. This variation in acquired light was masked by the non-uniformity of the available light sources. Therefore cosine losses were deemed to be small enough to ignore. 2. Vignetting: The blades that create the aperture physically stop incident light and cause a darkening at the outer portions of images. Since large apertures were used, vignetting effects were minimized. 3. Optical Imperfections: The lens itself would have small space varying imperfections that may be correctable. 4. Geometric Distortions: Outer portions of images tend to be distorted geometrically. This is more pronounced with smaller focal lengths however. 6.4 SYSTEM CHARACTERIZATION A DFD system (specifically the RDKG in an l-DFD system) can be characterized by the relationship that exists between the camera lsfs and depth. Pillbox and Gaussian lsfs maintain 91 the following relationship: R = kcr = FDS ( 1 + F - Ds 2/ 2/ \D0 = M i (131) The spread of a pillbox lsf (S[lsf] = R/2) and a Gaussian lsf (S[lsfJ = a) are derived in Appendix Section C.3. Combining these relationships with the one above yields a relationship of the form: for either pillbox or Gaussian lsfs. Experimental research has suggested that this first order relationship with inverse distance may hold true for any rotationally symmetric camera psf [33]. A DFD system could then be characterized by measuring the lsfs of a system (Section 4.3), determining the spread of the lsfs (Appendix Section C.2, Equation 174) and fitting a first order polynomial to it. Subbarao [33] suggests some additional noise cleaning steps. The above procedure was performed for the image acquisition system being used. Note that the radiometric correction was performed before lsf calculations were made. This was done for three apertures (F/1.4, F/2.0, F/2.8) and for a range of distances both closer and farther than the plane of sharp focus. The measured lsf spreads are plotted in Figure 23. (132) 92 Figure 23 LSF Spread Characterization The characteristics observed strongly follow the expected first order relationship with respect to inverse depth. There are six characterizations. Each of three apertures are characterized in front of and behind the plane of sharp focus. Theoretically all curves should intersect at S[lsf] = 0 at a depth of Dp. In actuality, the limited resolution of the sensor and frame grabber mask this point of perfect shaipness. However, by averaging the points of intersection, it was determined that the plane of sharp focus lies at Dp = 2817.22mm. This agreed well with visual observations of the sharpness of the images. 6.5 SELECTING REGION OF INTEREST As indicated in Figure 11, l-DFD requires a local region of interest (ROI) to process. Given an entire image, the locations and the sizes of the local RGTs must be determined. 93 Interactive l-DFD A user may specify the exact location of a ROI for processing. This has the advantage that the user can ensure that he/she acquires information about the part of the image that is the most interesting. The disadvantage is that this would have to be done one point at a time and would therefore not take advantage of a parallel processing nature of DFD. Because each ROI can be processed separately, large speed ups compared to non-parallelizable algorithms are possible. Depth Map l-DFD The most straightforward way of taking advantage of parallelization is to process a regular pattern (grid) of ROFs, thus creating a depth map output. The disadvantage of this method is that the ROFs selected may not be the optimum ROFs to use. The amount of defocus is most easily measured in terms of how defocussed an edge becomes. Therefore if the ROI's contain no edges, it would be very difficult to measure the defocus. In fact, a blank featureless wall would provide no defocus information regardless of how defocussed it was. Edge Detection l-DFD An edge detection based l-DFD scheme seeks to use the edges in an image as the ROI locations. The first task would then be to detect the edges. In this implementation, a Marr and Hildreth [18] edge detector was passed over the entire image in order to locate edge points. The sharper image (ii) was used (after radiometric correction) as the input image. The output image is known as the zero crossing image. An M&H edge detector is a filter based on the Laplacian of a 2D Gaussian. The Gaussian shape acts as a low pass filter and removes unwanted noise. The Laplacian (second derivative) 94 operator acts as a high pass filter and reveals an edge in the input image as a zero.crossing in the output image. Figure 24 shows how the second derivative of an edge becomes a zero crossing. Figure 24 Second Derivative Edge Detector 0 0 EDGE FIRST DERIVATIVE 0 SECOND DERIVATIVE The M & H filter is defined as: where G(x,y) d2 dx2 dy2 ~ 2na2' 1 _ £ i ± i i e ^ - (133) Considering the first term, d2G(x,y) dx2 1 2TT<J2 1 2wa 2 dx2 .d_ dx -Ix 2>2" e =^ 95 Similarly, the second term is 02G(x,y) - 1 dy2 2ira4 1 -_£±±J£± e 2o-2 (135) Summing the two yields V 2 G ( x , y ) ± ^ 1 x2 + y 2 2<T2 e (136) This is a 2D filter that can be implemented as a 2D convolution. However, it may be verified (by simple expansion) that this can also be written as: V * G ( i , y) = Q(Ji(x)g2(y) + Qgz{x.)g3{y) A 1 where Q 27TCT6 gx (x) = e~ 92(y) = (y2 - 2a g3(x) = x2e 9i{y) (137) In this form it is a sum of separable filters and may be implemented as such to improve the speed. A true M & H edge detector would use several choices for a in order to check for edges of varying "strengths". However, in the interests of speed only one a was used (a = 2). 96 The zero crossing image was searched for zero crossings in the horizontal and vertical directions. When a zero crossing was found, the input image was checked in order to determine length and height of the edge. The length was defined as the distance (in pixels) between the minimum on one side of the edge and the maximum on the other side of the edge. The height was defined as the difference in intensities at these two locations. If the product of the length and height exceeded 500, the edge was deemed a strong edge. Furthermore, the area of this proposed ROI was scanned to ensure that the intensities had not been clipped by C C D saturation or radiometric calibration. Pixel intensities of less than 5 and greater than 254 (on a 0-255 scale) were considered clipped. If no clipping was observed this location was stored as a ROI. ROI Size The size of the ROI should be larger than the largest possible relative defocus kernel. Windowing could be applied to the relative defocus kernel to reduce its size. The ROI should be small enough to exclude edges from depths other than the one of interest. Furthermore, smaller ROFs would speed up depth processing (especially for 2D algorithms). A variety of ROI sizes were tested. Specific results shall be shown for ROI sizes of 101x101 and 61 x61. In the larger ROI definition, no (further) truncation of the relative defocus kernels were necessary. In the smaller ROI definition the relative defocus kernels would often need truncation in order to fit within the ROI. 6.6 D E P T H R E C O V E R Y At this point the R D K G (with L U T if necessary) has been defined. The convolution implementation has been chosen. The image acquisition system has been calibrated to provide accurate camera parameters and radiometric correction characteristics. An ROI selection scheme has been chosen. 97 I-DFD then takes the form shown in Figure 26. The camera psfs (hi, h2) include the effects of the lens, camera, frame grabber and radiometric correction. The ROI selection scheme takes the form of one of three methods as shown in Figure 25. The remaining blocks to be defined are the RDKG scaling, the comparator and the decision blocks. Figure 25 ROI Selection IMAGE E D G E D E T E C T O R FIND S T R O N G E D G E S DEPTH M A P GRID \ — i U S E R INPUT ROI S E L E C T O R ROI LOCATIONS R D K G Scaling Given an ROI of size AxB and a relative defocus kernel of size CxD, the output image size is ExF according to Equation 84. Some minimum output image size is required for the comparator to have a good statistical representation of the convolution. The minimum ExF was set (by trial and error) as 17x21 pixels. They reflect the aspect ratio of the image. 98 o Q o oo o LU Q o Q 1-X LU LU LX rr O => rr c/3 rr < L U L U o o ^ Q _ < f r < H O a : o a L U CC rr O CM L U o < O LU _l LU OO o CC L U o < g L U O N 0 h W 0 1 < n ME KG KG ME CO CO RD RD 'OS 'OS < LU o o o < LU o LU 00 LU o < < LU o a o < ~Z. LU CD LU CO LU o < LU H > 00 Q LL Q CM LU z LU o CO 99 If the output image dimensions were less than 17x21, the relative defocus kernel was truncated to (A+l-17)x(B+l-21). If only one dimension needed truncation, then only one dimension was truncated. This truncation causes additional problems. Most notably, the volume under the kernel would no longer be unity. This would cause a darkening of the output image and lead to a biased comparison. Therefore, all truncated kernels were scaled to maintain a volume of unity. The effects of the sharp truncation were not serious and no windowing (other than rectangular) was needed. Note that if the ROI size were chosen exceedingly small, then the relative defocus kernel would no longer be able to accurately model the true blurring phenomenon. Comparator The comparator calculates a measure of similarity between the two images fen, i2)- For convolutions that produce 2D outputs (TR2, TGS), this is defined as i2n[x,y] = k[x,y] *h3[x,y] where [x,y] S (E x F) output image sector (138) For convolutions that produce ID outputs (TRI, TGI) this is defined as X «2mW = hx[x] * h3x[x] y 'hny[x] = iiy[x] * h3y[y] A (err orx + err ory) error = - — where [x,y] G (E x F) output image sector (139) 100 Ens's original algorithm [6 p.l 10] (similar to TR2) added a final radiometric correction (FRC) before image comparison. Image i2„ was scaled (by m) and shifted (by b) to produce an optimized error function: e r r o r A (-^j E E ^ x > y} ~ ( m )(*2n [ a ; , y\) + bf \' ' y x hn[x,y) = h[x,y] * h3[x, y] where [x, y] £ (E x F) output image sector (140) The coefficients m and b are chosen so that the error is a minimum. By taking partial derivatives of the error with respect to m and b, it may be shown that A SriSz SrS'i m = SaSz Si Si • A Sr — mSi b = where S; = E E y ) x y S r = EE7*^* )^ x y x y S r i = E E r(x>y)i(x> y) x y x y where i(x,y) refers to the calculated image ii„ and r(x,y) refers to the reference image i 2 . Sz is the size of the image sector. Note that there was a typographical error in [6]. To ensure that the FRC did not completely disrupt the system, the FRC parameters were constrained to: 0.98 <m < 1.02 -5.0 <6 < 5.0 (142) 101 This FRC would disrupt the RDKG scaling previously described. As this scaling was critical to the integrated algorithms (TRI, TGI), the FRC was not incorporated. In the TR2 algorithm, FRC seemed to function as well as RDKG scaling. In order to better compare with Ens's results, FRC was used instead of RDKG scaling in (and only in) the TR2 algorithm. Note also that Ens did not normalize the error by the output image area (EF). This made little difference to his algorithm, but was important in comparing other algorithms this author designed. Therefore the error normalization was used in all algorithms. Decision The function of the decision block is to determine the next.depth (D0) that I-DFD should try, and to decide when the comparator has found an image match that is "close enough". Brent's algorithm [22 pp.402-405] was used to perform the above functions. Brent's algorithm is an iterative algorithm that minimizes an error or cost function based on up to six of the most recent error function evaluations. The error function was used as defined in the previous subsection. Brent's advantages are that no derivative information is required and that only one computa-tion of the error function is required per iteration. This is a significant issue since the computation of the error function involves convolutions that could be very time consuming. As in other minimization algorithms, Brent stops when the abscissa (D0) is changing very little. Ens stated [6 p.l 10] that he stopped iterating when the ordinate (error) fell below a tolerance. However this implementation used Brent exactly as described in [22]. The parameters passed to Brent were the minimum (1000mm) and maximum (2500mm) distances and a starting point of half way in between (1750mm). The tolerance required was 0.0010. 102 I-DFD Flowchart l-DFD may be summarized into the flow chart shown in Figure 27. Figure 27 Iterative DFD Flow Chart ^ START Acquire image 1 under through aperture 1. Acquire image 2 under through aperture 2. Apply radiometric correction to images 1 & 2 Locate ROI's using one of three schemes. Select an ROI to process.. Determine h3 for current Do from RDKG and the RDKG scaler. Compute image 2n via convolution of image 1 with h3 using one of three convolution methods. Calculate the error measure from the difference between image 2 and image 2n. Select next Do according to the rules of Brent's algorithm. No ' Has Do \ changed very . little? , , Yes Output current Do as depth. No / Is this / the last \ ROI? Yes 103 Chapter 7 RESULTS 7.1 INTRODUCTION AND OVERVIEW The purpose of this chapter is to present the results of the evaluations performed on the four l-DFD algorithms described (TRI, TR2, TGS, TGI). The two major performance criteria are speed and accuracy. Section 7.2 describes how these criteria were measured. Three different classes of images were used as tests. Section 7.3 presents the results obtained by testing the algorithms using simple step edge images. Section 7.4 uses images of text. Section 7.5 uses images of logs. This is appropriate since this algorithm was intended for the forestry industry. Finally Section 7.6 analyzes the observations made in the preceding three sections. 7.2 MEASUREMENTS Accuracy Accuracy was measured by comparing l-DFD depth estimates with physically measured depths (i.e. with a tape measure). The physical measurements were adjusted to measure depth from the front principal plane of the lens. In each of the step edge images, every point possessed the same depth. Only one measurement was needed for each image. In each alphabet image, the letters were located on a plane that sloped away from the camera at an oblique angle. A sample set of physical of measurements were made to points in that plane and a characterizing curve fit to them. This established a relationship between image co-ordinates and depth. The curve fitting procedure also reduced random (physical) measurement errors. In the log images, the points in the image were located on a very irregular surface. This made a similar characterization very difficult. Instead, individual (physical) measurements were made to particular points on the log. These same points were then processed by l-DFD for evaluation. 104 An absolute error in mm was obtained from the above measurements for each point processed by l-DFD. This can be expressed as a percent error either relative to the true depth (D0) or relative to the working area. For a large working area, a percent error relative to true depth acknowledges the fact that a 1mm error at D 0 - 100mm is more significant than the same error at D 0 = 1000mm. However, if the working area is small relative to the depth, this method could be misleading. For example, say the working area ranged from 1000mm to 1100mm. The maximum error possible would then be 100mm. Relative to true depth, this would represent an error of only 10%. In this case it would make more sense to scale the error relative to the working area (i.e. 100mm). For the geometry of the experiments presented here, the working area (1500mm) is similar to the true depths (1000mm - 2500mm). Both methods yield similar results and both are presented. Speed The measured absolute speed (or throughput) of any kind of algorithm would of course depend on the platform on which the algorithm was implemented. The purpose of the speed measurements presented below serve mainly to compare the algorithms. If a C program (say DFD.c) on a Sun unix workstation is compiled with a —pg option, execution of a program would generate a timing file, gmon.out. An application called gprof analyzes this file. This analysis is based on the CPU time allocated to DFD.c. This method seems fair and impartial and was used to judge the algorithms' speeds. Some parts of the algorithm (such as radiometric correction and selecting the ROFs.) are common to all algorithms. A typical radiometric correction takes about 0.6s. An edge detection ROI selection typically takes 16s. The actual iterative depth calculation is the part in which the algorithms differ and therefore is the only part that was compared. 105 Consistency Four algorithms (TRI, TR2, TGS, TGI) were tested over twelve images. Each algorithm was tested using large ROFs (101x101 pixels) and small ROFs (61x61 pixels). Small ROI algorithms are identified with a suffix of —30 representing the sector radius. Large ROI algorithms are likewise identified with a suffix of —50. All small ROI algorithms processed the same points. All large ROI algorithms processed the same points. These two sets of points, although similar were not identical. Some points that would be used in a small ROI algorithm would be too close to either image borders or saturated (clipped) image regions to accommodate a large ROI. In determining accuracy averages, each image was given equal weight regardless of the number of points processed in each image. In determining speed averages, each point was given equal weight. 7.3 S T E P E D G E IMAGES A step edge of a dark to light transition was created with two pieces of poster board (of different colours). Images were acquired with the step edge at four different distances (see Figure 29): a. 1216 mm b. 1515 mm c. 1816 mm d. 2421 mm A combination of the edge detection and user selection schemes were used to determine the ROFs. In total 203 small ROFs and 184 large ROFs were selected. Figure 28 shows an example of the error function that Brent's algorithm would be trying to minimize. This is the error between images i 2 and i 2 n versus depth (D0) as calculated by the 106 TRI-30 algorithm. This particular error function is for the point (260, 174) in image (b) and has a minimum located at D 0 = 1457mm. The true depth at this point is D 0 = 1515mm. This error function is fairly smooth. The roughness away from the global minimum is seldom a problem for Brent's algorithm. His use of parabolic interpolation is well suited for error functions that possess an overall parabolic shape around the minimum. The performance of the algorithms are shown graphically in Figures 30 and 31. These results are summarized in Table 3. 107 Figure 29 Step Edge Images Figure 30 Small ROI Step Edge Test 9001 . . 1 0 30 60 90 120 . 150 180 210 POINT PROCESSED 109 Figure 31 Large ROI.Step Edge Test I F -D . L U Q 2400 1900-1400-900-True Depth x TRI-50 o T R 2 - 5 0 30 60 90 120 150 POINT PROCESSED 180 210 2400-E E 1900-U J Q 1400-900-True Depth o T G S - 5 0 x TGI-50 30 60 90 120 150 POINT PROCESSED 180 210 110 Table 3 Step Edge Results ALG RMS ERROR SPEED mm % D 0 % working area Iterations per point ins per point Relative time per point TRI-30 44 2.7 2.9 8.6 50 1 TR2-30 46 3.5 3.1 11.7 6198 125 TGS-30 90 6.5 6.0 11.3 764 15 TGI-30 89 6.5 6.0 ' • 10.7 59 1.2 TRI-50 43 2.5 2.9 9.4 91 1.8 TR2-50 47 2.8 3.1 11.1 42337 854 TGS-50 44 2.8 2.9 11.0 3728 75 TGI-50 44 2.8 2.9 10.1 94 1.9 7.4 A L P H A B E T IMAGES Figure 33 shows the alphabet images used for the next test. The images span the following ranges of depths (approximately): a. 1170 mm - 2610 mm b. 1190 mm - 2440 mm c. 1080 mm - 2600 mm d. 1520 mm - 2640 mm An edge detection ROI selection scheme was used to obtain 2547 small ROI's and 1670 large ROFs for processing. Note that some of these points may be redundant if they were acquired by the vertical edge search as well as the horizontal edge search. 111 Figure 32 shows an example of an error function that TRI-30 would minimize. This particular error function is for the point (242, 237) in image (d) and has a minimum located at D 0 = 1819mm. The true depth at this point is D 0 = 1896mm. Figure 32 Alphabet Error Function Example 1000 1500 2000 DEPTH [mm] 2500 The performance of the algorithms for all the points tested are shown graphically in Figures 34,35,36 and 37. These results are summarized in Table 4. Note that the timing results for TR2 were unavailable. The timer seemed to have "wrapped around" giving unreasonable results. The timing analysis of TR2 shall be based on the step edge and log images. This should not bias results severely as the speed is a much stronger function of the convolutional requirements rather than the particular image used. 113 Figure 34 TRI Alphabet Test 9004 , ,—.—.—.—.—,— 1000 1500 2000 2500 TRUE DEPTH [mm] 114 Figure 35 TR2 Alphabet Test 900-1 1 . . — 1000 1500 2000. 2500 TRUE DEPTH [mm] 115 Figure 36 TGS Alphabet Test 900-1 . 1 . — 1000 1500 2000 2500 TRUE DEPTH [mm] 116 Figure 37 TGI Alphabet Test 900-1 • • • • . • • • • . —— 1000 1500 2000 2500 TRUE DEPTH [mm] 117 Table 4 Alphabet Results ALG RMS ERROR SPEED mm % D 0 % working area Iterations per point ms per point Relative time per point TRI-30 96 5.4 6.4 8.7 49 1 TR2-30 96 5.9 6.4 11.0 4975 102 TGS-30 160 10.5 10.7 10.7 740 15 TGI-30 160 10.3 10.7 9.8 56 1.1 TRI-50 108 6.8 7.2 9.7 . 92 1.9 TR2-50 78 5.3 5.2 10.6 n/a n/a TGS-50 91 5.6 6.1 10.0 3721 76 TGI-50 115 7.2 7.7 9.8 97 2.0 7.5 L O G IMAGES Figure 39 shows the log images used for the final test. The log used for these images was approximately 300mm in diameter. The images span the following ranges of depths (approximately): a. 1200 mm - 2700 mm b. 1200 mm - 2500 mm c. 2410 mm - 2470 mm d. 1300 mm - 2400 mm In order to verify the results of I-DFD, a set of specific points were physically measured. These points defined the ROI's processed by I-DFD. In total 69 small ROI's and 59 large ROI's were processed. 118 Figure 38 shows an example of an error function that TRI-30 would minimize. This particular error function is for the point (31, 388) in image (d) which has a true depth of D 0 = 1827mm. This function is very rough near the global minimum. The local minimum that Brent found was located at D 0 = 1901mm. Not all the ROI's possess error functions this rough. However, the ones that are rough are difficult to use. Brent may iterate to an incorrect local minima. The roughness may also completely mask the desired minimum and general shape of the error function. 1000 1500 2000 Depth [mm] 2500 The performance of the algorithms over all the processed points are shown graphically in Figures 40, 41, 42, and 43,. These results are summarized in Table 5. 119 120 Figure 40 TRI Log Test E E 2400J ^900-\ True Depth x TRI-30 Q. LU Q 1400H 900J 1000 1500 2000 TRUE DEPTH [mm] 2500 2400J E £, 1900^  X F-D_ LU Q 1400-1 True Depth x TRI-50 900^  1000 1500 2000 TRUE DEPTH [mm] 2500 121 Figure 41 TR2 Log Test 900-1 1 . . — 1 0 0 0 1500 2 0 0 0 2 5 0 0 TRUE DEPTH [mm] 900-1 1 1 . — 1 0 0 0 1500 2 0 0 0 2 5 0 0 TRUE DEPTH [mm] 122 Figure 42 TGS Log Test 24004, • True Depth T G S - 3 0 E E 1900H CL LU Q 14004, 9004 1000 1500 2000 T R U E D E P T H [mm] 2500 24004 E E, 1900-I X I-Q. LU Q 1400J, True Depth » TGS- 50 9004 1000 1500 2000 T R U E DEPTH [mm] 2500 123 Figure 43 TGI Log Test LU Q 240CH 1900J 140CH True Depth TGI-30 900^  1000 1500 2000 TRUE DEPTH [mm] 2500 2400^  E E, 1900-] I I— Q_ LU Q 1400H • True Depth TGI-50 900-I I 000 1500 2000 TRUE DEPTH [mm] 2500 124 Table 5 Log Results ALG RMS ERROR SPEED mm % Do % working area Iterations per point ms per point Relative time per point TRI-30 366 18.2 24.4 10.6 65 1 TR2-30 279 15.1 18.6 • 11.3 6265 96 TGS-30 211 11.1 14.0 11.0 812 12 TGI-30 295 15.5 20.0 11.4 63 0.96 TRI-50 237 10.9 15.8 11.2 112. 1.7 TR2-50 129 6.1 8.6 11.3 49092 751 TGS-50 207 9.9 13.8 10.7 3464 56 TGI-50 223 11.4 14.8 11.2 104 1.6 7.6 P E R F O R M A N C E EVALUATION The performance of the algorithms has been measured in absolute terms for the particular scenes that were used as tests. Although, these tests are unbiased, a more global frame of reference would be beneficial. Fortunately TR2 can be used as this benchmark. TR2 is the same in principle to Ens's algorithm which has been previously analyzed [6]. There are however some minor differences the two algorithms that should be noted. Ens's TR2 For the sake of clarity, Ens's algorithm shall be referred to as TR2-JE. The algorithms tested here are TR2-30 and TR2-50 or simply TR2 to indicate both. 125 Some aspects of TR2-JE may make it slightly superior to the TR2 (in terms of accuracy). These aspects are minor and are also missing from TRI. TGS and TGI. Therefore they should produce no bias in any comparison. These missing aspects are: 1. TR2-JE added a correction to account for space varying cosine losses, vignetting and optical imperfections. These effects were negligible under these test conditions used. 2. TR2-IE employed a statistical determination of the ROI size. It was found that the size usually reached a preset maximum. This preset was simply always used. 3. TR2-JE only used ROI's that would required no kernel truncation. TR2-50 algorithm is similar in this regard. It was found however that scaling could effectively reduce the side effects of truncation to increase the effective working area in the —30 algorithms and also to improve the localization of an ROI. 4. TR2-IE averaged 16 frames to help eliminate noise. Only one frame was used for the algorithms tested here. Some aspects of TR2 may make it slightly superior to TR2-JE. These aspects are common to all tested algorithms: 1. TR2 uses floating point convolutions. TR2-IE used an integer convolution for greater speed. 2. TR2 uses an error calculation normalized to the ROI area. TR2-JE calculated a total ROI error. It was found that normalization produced slightly smoother error functions. There are also some differences between TR2-JE and the TR2 that should neither improve nor degrade either's performance. Again these differences are common to all algorithms tested. They are: 1. Slightly different "strong edge" criteria. 2. Slightly different stopping criteria for Brent's algorithm. 3. Different definitions of working area. TR2 used a working area that is larger than TR2-JE. All tested algorithms used the same working area. 126 Accuracy All algorithms performed well on the step edge and alphabet images. For small ROFs, TRI fared better than the other algorithms. This was most noticeable at closer depths. The large size of the relative defocus kernel coupled with the small ROI would create severe truncation effects at closer depths. TRI seems better able to deal with these effects. In fact the size of the ROI seemed to have very little effect on TRI. The Gaussian algorithms (TGS, TGI) suffered the most under these conditions. It may be surmised that upon truncation, the Gaussian approximation to a pillbox weakens. The relationship that was used assumed that the Gaussian extends to ±3(7. TGS and TGI showed the greatest improvement when using larger ROFs. When using large ROFs TRI performed similarly to small ROFs. However, the other algorithms all improved, especially TGS and TGI. All small ROI algorithms (especially TRI) performed poorly on the log images. While all large ROI algorithms fared better, this was most noticeable for TR2. The optics, geometry and orientation of the log images were very similar to the alphabet images. Therefore the reason for the inferior performance may be due to the scene itself. The log images are much more complicated and have a higher occurrence of edges. It is possible that the reason for poorer performance is the edge bleed effect. This occurs when the defocussed neighbouring edges bleed into the edge under observation. It is also possible that the integrated algorithms (TRI, TGI) suffered because the edges were not as distinct as the previous test images. The integration process may have masked too much information that was contained in the many (less distinct) edges. The two dimensional algorithms (TR2, TGS) would not lose this information. Speed The algorithms span a wide range of speeds. The speed does not appear to be a strong 127 function of the type of image being processed. It is true however that the log images required a slightly larger number of iterations for each point processed. This is likely due to the rougher (i.e. non-parabolic) error functions that force Brent to use more linear steps and fewer parabolic steps in search of the minimum. The integrated algorithms (TRI, TGI) are quite clearly the fastest. TGS is approximately 14 times slower for small ROI's and 37 times slower for large ROI's. TR2 is approximately 110 times slower for large ROI's and 450 times slower for large ROI's. In terms of absolute time on the platform used, TRI-30 could process 1000 points in an image in less than one minute. TR2-30 would need more than one and a half hours. TR2-50 would require almost seven hours. The above comparison is based on the algorithm having the full resources of the CPU. In actual practice it was found that TRI-30 could process about 13 points per second rather than the 18 points predicted by the timing analysis. All of the foregoing timing analysis has been based on only the time required by the algorithm to determine depth. Other activities such as radiometric correction and edge detection are not included in the analysis. Overall Performance The overall performance of the algorithms can best be shown by plotting their performances in a speed — accuracy plane. This is done in Figures 44, 45 and 46. The error (in mm) and the time (in ms) have been normalized to the TR2-50 algorithm. The diagrams plot error versus time required, so the optimum location would be the lower left of the diagram. The time axis has been plotted on a logarithmic axis in order to be able to show the large differences in speed. 128 Figure 44 Step Edge Performance Plane 2.5-1 rr O rr rr LU Q LU 1.5H N rr o 0.5H TGI-30 TGI-50 ^ TRI-50 TRI-30 TGS-30 TR2-30 TGS-50 TR2-50 0.001 0.01 0.1 NORMALIZED TIME Figure 45 Alphabet Performance Plane 3 2.5 rr O rr 2 rr LU Q LU 1.5 N rr O 0.54 TGI-30 • TGS-30 • TGI-50 *TRI-50 TRI-30 TR2-30 TGS-50* * TR2-50 0-0.001 0.01 0.1 NORMALIZED TIME 129 Figure 46 Log Performance Plane 3-2.5-rr O rr 2-rr LU Q LU 1.5-N rr o 0.5-0-* T R I - 3 0 T G I - 3 0 • T R 2 - 3 0 • T R I - 5 0 " T G I - 5 0 . T G S - 3 0 J G S - 5 0 T R 2 - 5 0 0.001 0.01 0.1 NORMALIZED TIME For the step edge and alphabet images, it is obvious that TRI is the optimum choice of algorithms. TRI achieves similar or better accuracy at a much higher speed than the other algorithms. TRI's accuracy weakens considerably in processing the log images. This algorithm may not be suitable for certain tasks such as log profiling for instance. Even TR2-50 with an accuracy in the order of 129mm would not be accurate enough for profiling work under these conditions. Recall that the diameter of the log was 300mm. However, TRI may still be suitable for tasks such as locating and determining the orientation of a log. Figure 47 shows a depth map produced by TRI-50 overlaying the original image. A white square indicates a closer point and a black square indicates a farther point. While the error is still significant in TRI-50, the orientation of the log is revealed. The log is closest in the lower right and farthest in the upper left of the image. 130 Figure 47 Depth Map Example : ^ III f"&. ... • V * • s j TGS and TGI always achieved similar accuracy (except for small ROI log images). Since TGI is much faster, it would be preferable to TGS. This accuracy is not exceptional and TRI-50 would consistently be a better choice in terms of speed and accuracy. TR2-50 remains the most accurate and consistent algorithm. TR2-30, although good, also suffers when processing the log images. TR2 also remains prohibitively slow. 131 Chapter 8 CONCLUSIONS 8.1 INTRODUCTION AND OVERVIEW The purpose of this chapter is to present an overview of the research that has been presented in this thesis. Section 8.2 summarizes the contributions made by this thesis. Section 8.3 makes recommendations for future work. 8.2 CONTRIBUTION A hierarchy of Iterative DFD algorithms has been created. This hierarchy is based on choices of defocus models (theoretical (T) or measured (M)), shaping techniques (regularized (R) or Gaussian (G)) and convolution implementations (two-dimensional (2), separable (S) or integrated (I)). Six of these algorithms have been implemented and four (TRI, TR2, TGS and TGI) have been thoroughly tested. TR2 is equivalent to an algorithm previously designed by Ens [6] and serves as a benchmark. The test conditions have gone beyond previous testing in using a larger working area and more realistic images. TGS improved the speed of I-DFD by approximately one order of magnitude. However, TGS's accuracy suffered, especially under conditions of restricted spatial support (small region of interest, close depth). TGS degraded to as much as twice the error of the benchmark. TGI improved the speed of I-DFD one-hundred fold for small regions and four hundred times for large regions. TGI's accuracy suffered, especially under conditions of restricted spatial support. This degraded to as much as twice the error of the benchmark. TRI achieved speed improvements similar to that of TGI. However, there was no or little degradation in accuracy for simple images. Accuracy was degraded to as much as twice the error of the benchmark for complicated images. A complicated image can be defined as one that has 132 many edges of corresponding to different depths within a given region of interest in the image. The bark of the logs in the log images produces a complicated image. TRI also proved to be more robust under conditions of restricted spatial support. That is, it was able to maintain low error measurements under these conditions. Although none of the algorithms tested would be suitable for log profiling, determining log location and orientation in a log sort yard is possible. 8.3 F U T U R E W O R K Optical System The NTSC link in the image acquisition process is not needed. This added step could only cause problems. A camera that has a digital output could be used to improve image quality. Varying focal length (F) rather than aperture (f) could create steeper relative defocus focal gradients. Additional processing would then have to be performed to properly match pixels in the two images of different magnifications. For forestry applications, the combination of a 50mm lens and a 2/3" sensor creates a very narrow field of view. A single log occupies the entire image in this system's usable working area. Either a larger sensor or a wider lens would make the system more practical. Note that a wider lens would reduce the focal gradient. Algorithms TGS and TGI suffered the most when under spatial constraints. This may be due to a degradation to the Gaussian approximation to a pillbox when using truncated kernels. It should be established if this is true. If it is, then an adaptive scheme could be used to change to Gaussian approximation depending on the amount of truncation. TRI suffered when processing complicated images. This may be due to an edge interference problem. This occurs when neighbouring edges are located at different depths than the edge 133 of interest. A single defocus kernel can not simultaneously model both edges. It should be established if this is the cause of the degradation. If it is, then a space varying or multicomponent blurring model could be used to reduce this effect. The TRI degradation may also be due to a loss of information in the integration process. Many edges of differing orientations may be "integrated away". It should be established if this is true. If it is, then TRI could be improved by using only one integration that is parallel to the edge of interest, and performing the convolution in the perpendicular direction. This would preserve the information in the edge of interest while "integrating away" interfering edges. This should simultaneously alleviate edge bleed problems. Each of the tested algorithms could also be tested with measured models of defocus rather than theoretical models. This should only improve the accuracy of each algorithm. It may be possible to speed up TR2 by transforming the images and relative defocus kernels to the polar (r, 9) plane. In this plane, the kernel which is rotationally symmetric is just a function of r and is therefore separable. The convolution could then be performed as a separable convolution similar to TGS. 134 Bibliography [I] Max Born and Emil Wolf. Principles of Optics. Pergamon Press, Toronto, 5th edition, 1975. [2] Ronald N. Bracewell. The Fourier Transform and Its Applications. McGraw-Hill Book Company, New York, 1986. [3] J. Cardillo, M. A. Sid-Ahmed, and J. J. Soltis. 3-d position sensing using a single camera approach. Proceedings of the 32nd Midwest Symposium on Circuits and Systems, pages 325-328, 1990. [4] J. Cardillo and Maher A. Sid-Ahmed. 3-d position sensing using a passive monocular vision system. IEEE Transactions on Pattern Analysis and Machine Intelligence, V.13(N.8):809-813, 1991. [5] John Cardillo. 3-d robot vision metrology and camera calibration from focus information. Master's thesis, University of Windsor, 1988. [6] John Ens. An Investigation of Methods for Determining Depth From Focus. PhD thesis, University of British Columbia, 1990. [7] John Ens and Peter Lawrence. An investigation of methods for determining depth form focus. IEEE Transactions of Pattern Analysis and Machine Intelligence, V. 15(N.2):97-108, February, 1993. [8] M.H. Freeman. Optics. Butterworths, London, 10th edition, 1990. [9] Bernd Girod and Stephen Scherock. Depth from defocus of structured light. SPIE: Optics, Illumination, and Image Sensing for Machine Vision IV, 1194:209-215, 1989. [10] Rafael C. Gonzalez and Paul Wintz. Digital Image Processing. Addison-Wesley Publishing Company, Reading, Massachusetts, 2nd edition, 1987. [II] P. Grossman. Depth from focus. Pattern Recognition Letters, V.5(N.l):pp. 63-69, January, 1987. [12] Berthold Klaus Paul Horn. Machine Vision. The MIT Press and the McGraw-Hill Book Company, New York, 1986. [13] Anil K. Jain. Fundamentals of Digital Image Processing. Prentice Hall, Englewood Clifs, New Jersey, 1989. [14] Miles V. Klein and Thomas E. Furtak. Optics. John Wiley and Sons, New York, 2nd edition, 1986. 135 [15] Eric Krotkov. Focusing. International Journal of Computer Vision, V.l(N.3):223-237, 1987. [16] Eric Krotkov and Ruzena Bajcsy. Active vision for reliable ranging: Cooperative focus, stereo, and vergence. International Journal of Computer Vision, V.l 1(N.2): 187-203, October, 1993. [17] Ten lee Hwang, James J. Clark, and Alan L. Yulle. A depth recovery algorithm using defocus information. Proceedings: IEEE Computer Society Conference on Computer Vision and Pattern Recognition., pages 476^ 181, 1989. [18] D. Marr and E. Hildreth. Theory of edge detection. Proceedings of the R. Soc.Lond. B, 207:187-217, 1980. [19] A. Pentland, T. Darrell, M. Turk, and W. Huang. A simple, real-time range camera. Proceedings: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 256-261, 1989. [20] Alex P. Pentland. A new sense for depth of field. Proceedings: Ninth International Conference on Artificial Intelligence, V.2:988-994, August, 1985. [21] Alex Paul Pentland. A new sense for depth of field. IEEE Transactions on Pattern Analysis and Machine Intelligence, V.9(N.4):523-531, July, 1987. [22] William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes in C - The Art of Scientific Computing. Cambridge University Press, Cambridge, 2rd edition, 1992. [23] Sydney F. Ray. The Lens in Action. Focal Press Limited, London, 1976. [24] Azriel Rozenfeld and Avinash C. Kak. Digital Picture Processing, volume 1. Academic Press, New York, 2nd edition, 1982. [25] A E. Savakis and H. J. Trussell. Restorations of real defocused images ising blur models based on gemoetrical and diffraction optics. IEEE Proceedings of Southeastcon '91, pages 919-922, 1991. [26] Ferrel G. Stremler. Introduction to Communication Systems. Addison-Wesley Publishing Company Inc., Philippines, 2nd edition, 1982. [27] Murali Subbarao. Parallel depth recovery by changing camera parameters. Proceedings: Second International Conference on Computer Vision, pages 149-155, December, 1988. [28] Murali Subbarao and Ming-Chin Lu. Computer modelling and simulation of camera defocus. Proceedings of SP IE: Optics, Illumination and Image Sensing for Machine Vision VII, 1822, 1992. [29] Murali Subbarao and Gopal Surya. Application of spatial-domain convolution/deconvolution transform for determining distance from image defocus. Proceedings of SPIE: Optics, 136 Illumination and Image Sensing for Machine Vision VII, V. 1822:159-167, November, 1992. [30] Murali Subbarao and Tse-Chung Wei. Depth from defocus and rapid autofocusing: A practical approach. Proceedings: 1992 IEEE Computer Society Conference on Computer Vision and Patent Recognition, pages 773-776, 1992. [31] Murali Subbarao and Tse-Chung Wei. Fast determination of distance and autofocusing from image defocus: A new fourier domain approach. Technical Report 1 1794-2350, State University of New York at Stony Brook - Department of Electrical Engineering, 1994. [32] Muralidhara Subbarao. Direct recovery of depth-map i: Differential methods. Proceedings of the IEEE Computer Society Workshop on Computer Vision, pages 58-65, December, 1987. [33] Muralidhara Subbarao and Natarajan Gurumoorthy. Depth recovery from blurred edges. Proceedings: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 498-503, 1988. [34] Gopal Surya and Murali Subbarao. Depth from defocus by changing camera aperture: A spatial domain approach. Proceedings: IEEE Computer Society Conference on Computer Vision and Pattern Recognition: CVPR'93, pages 61-67, June, 1993. 137 Appendix A NOTATION AND SYMBOLS This appendix provides a summary of notational conventions used throughout this thesis, (x, y) represents spatial Cartesian co-ordinates. (r, 0) represents spatial polar co-ordinates. Since many of the functions used in this thesis are rotationally symmetric, the angle co-ordinate is often omitted. Square brackets are used to index a function if the function is a discrete time function (e.g. f[x]). Vectors are represented by a single underscore (e.g. v_). Matrices are represented by a double underscore (e.g. M), M1 represents the transpose of matrix M. [lVf] . represents the (k,i)th element of the matrix M. The following is a list of symbols used throughout this thesis. F represents the focal length of the lens. f represents the aperture number. This is defined as the ratio of focal length to aperture diameter. D 0 represents the distance to an object point, measured from the front principal plane. Dj represents the distance to an image point, measured from the rear principal plane. D p represents the distance to the plane of sharp focus, measured from the front principal plane. It represents the location that the lens is focussed on. D s represents the distance to the image sensor, measured from the rear principal plane. DR is the digitization ratio expresses as pixels per mm. ii(x,y) represents the sharper image or a local area of the sharper image. i2(x,y) represents the blurrier image or a local area of the blurrier image. 138 Appendix B ABEL TRANSFORM B.1 INTRODUCTION A N D OVERVIEW This appendix provides some insight into the Abel Transform. The Abel Transform is a special case (rotationally symmetric) of the Radon transform. Further reading on the Abel and Radon transforms may be pursued in [2 pp.262-266][6 pp.90-94][l3][24] Section B.2 defines and describes the forward Abel Transform. Section B.3 derives an implementation of the inverse Abel Transform. This implementation of the inverse places an additional constraint on the function that it must go to zero at infinity. The Abel Transform relates a two dimensional rotationally symmetric function and its projection. Figure 48 shows a projection p(u) for a general case (i.e. f(x,y) not necessarily rotationally symmetric). The projection p(u) is given by: B.2 F O R W A R D A B E L T R A N S F O R M oo (143) — C O For a rotationally symmetric f, this can be rewritten as: C O (144) — oo where the co-ordinates are related by: 2 2 , 2 2 , 2 r = x + y = u + v u = x cos ip + y sin ip v = x sin ip — y cos ip x = u cos Tp — v sin tp y = u sin 'ip + v cos ip (145) 139 Figure 48 Projection of a Two Dimensional Function Equation (148) is known as the Abel Transform. Often, the projection is taken parallel to the y axis making u is equivalent to x. The Abel Transform may be rewritten as: where p(«) = J ku{r)f(r)dr (149) o ku{r)±\f^> r > U (150) 1 0, v < u We shall rewrite this in another form shortly. First, define the following: j A 2 <p = U A 2 p = r p 2 ( ' M 2 ) = P2(<j>) =P(u) / 2 ( r 2 ) = / 2 l > ) ^ / ( r ) K(a)±{]T5> - (151) 10, a > 0 The rotational symmetry of f(r) [f(r) = f(-r), p(u) = p(-u)] makes it possible to define p2 and f2 as indicated. Only non-negative values of r and u need to be considered. The remainder of the derivation is constrained to this domain. Furthermore it should be noted that the quantities <j> and p are also always greater than or equal to zero since the represent squared (real) numbers. Now consider: K(4-p) = {^*' ^ < P (152) I 0, <f>> p dpK'4 -p) = dp 2rdv \A'2-u2 ' 0, = ku(r)dr P < p f> > p 9 ^ 2 u < r u > r (153) Therefore p(u) = / ku(r)f(r)dr = I K{<f> - p)f2(p)dp (154) 141 Or O O p2(<f>) = J K(<j>-p)f2(p)dp o P 2(^) = /v(»*/2(» (155) That is, the Abel Transform can be represented as a convolution in the <j>, p domain. The Abel Transform in this domain is more correctly referred to as the Modified Abel Transform. B.3 INVERSE A B E L T R A N S F O R M Now an iterative implementation of the inverse Abel Transform shall be derived. Assume that the projection p(u) of a rotationally symmetric function is given. We wish to obtain the original function f(x,y) or equivalently in polar co-ordinates, f(r). The objective is to be able to use this inverse Abel Transform in a discrete implementation. Begin with the assumption that f(r) goes to zero outside some radius. If this is not true then no discrete implementation would be able to spatially contain the entire projection p(u) or function f(r). Mathematically: /(r) = 0, Vr > R p{u) = 0, Vw > R / 2 l » = 0, Vp>E 2 p2((j,) = 0, V<£ > R2 (156) Also, from equation (152), Therefore 155 becomes: K(4-p) = Q Vp<<f> (157) p 2 l » = I K(4> - P)h{p)dp = I K(4> ~ P)f2(p)dp, 0<<P<R (158) o It is important to keep in mind the domain (i.e. u, r or <j>, p) of the functions in the above expressions. In order to bridge the discrete and continuous cases, some assumptions must be made as to how the functions behave between known sample points. The projetion p(u) is known at integral 142 points in the u domain. A first order approximation would be that the functions are constant between sample points as shown in Figure 49. Mathematically: p[t]=p(«), i = 0,1,2,... /['] = /(>•), ' = 0,1,-',... P 2 [ i 2 ] ^ P 2 l » , ?2 = 02,12,22,... h\}2]=h{p\ i 2 = 0M2,22,... (159) Now, f{r)~f(r)±f[i\, i±lr) Mp)^Mp) = f2\}2], ^ = LV^J (160) The square brackets are used to emphasize the discrete nature of the expression. The "hat" notation indicates an approximation of a function. The floor symbol [_*J indicates the greatest integer less than *. Figure 49 First Order Approximation of f2 f2(p) = yp) =yLpJi=f • 2(p) 0 1 2 3 Equation (158) may then be broken down into a sum of integrals over each rectangle: R-l (< + 1 ) a PM= E J m - P)f2(p)dp, ^ = 0 2 ,l 2 ,2 2 , . . .(/?-l) 2 143 71-1 ^ 1 ) 2 ?2[<f>]=.J2 J I<^-P)h[i2]dp R-l W E U?\ J K(4-P)dp (*' + l ) 2 E M v^1 - r 9 n r / 1P=(»+1) 2 E / 2[* 2] [ 2 V ^ ] p = j 2 R-l E / 2 [ i 2 ] 2 ( V / ( f + l ) 2 - ^ - x A ^ ) (161) Note that f2 has been discretized, but K has been left continuous since K is known between sample points. Changing variables back into the u and r domain: p[«] = E / H 2 (\AJ' + 1 ) 2 - M 2 - ^ I 2 - . U 2 ) ( 1 6 2) Equation (162) is a discrete representation of the forward Abel Transform. The reverse Abel Transform may be determined iteratively starting at the outermost point (R-l). Recall from equation (156) that f(R) = 0 and p(R) = 0. First define A(j = 2\I(R- i)2 -(R- j')2 (163) Now rewrite the forward transform in a more useful form: R-l p[ij-i]= E /W2(v(« + i)2-(^ -i)2-v*2-(^ -i)2 i=R-l ^ p[R-l] = f[R-l](A0il) R-l P[R-2}= E m (\A I'+X)2 ~(R-2)2 - V I 2 - (R - 2 ) 2 i=R-2 ^ p[R-2] = f[R - 2]{Ai>2) + f[R - 1](A0,2 - Ax/Z) ...etc.... (164) 144 The general case being: R-l p[R- = J2 /W2 U{i + if -(R- nf - x/i2 ~(R- nf ) i=R-n ^ ' = /[iJ - n]2 (^J(R-(n-l)f - (R - nf - yj'(R - n)2 - (R - nf^j + f[R -(n- 1)]2 (^J(R-(n-2)f -(R-nf - (R - (n - l ) ) 2 - (R - nf^j ..etc. + /[iJ - 1]2 ^ ( i?) 2 — (R- nf - y/(R - if — (R — nf p[R-n] = f[R-n](An-1>n) + f[R - (n - l ) ] (A„_ 2 i n - A „ _ l i n ) + f[R - (n - 2)](A„_ 3, n - A n _ 2 , „ ) ...etc.... + / [ i J - l ] ( ^ o , n - A i , n ) n -1 pffl - n] = f[R - n](An_1>n) + J2^R~ * K ^ - i , » - ( 1 6 5 ) where n = 1,2,3,...R. The inverse transform (obtaining f from p) may be obtained by inspection of the above equations. Note that the outermost point f(R-l) is a function of only p(R-l). Inner points in f are functions of p at that point and outer values of f, which would have been previously determined. Mathematically: p[R-2]-p[R-l}(A0,2-Ali2) etc.... (166) The general case being: p[R - n] - "f: f[R - i](A,-_iin - Aiin) f[R - n] = 8 = 1 . (167) ^ n — l , n where n = 1,2,3,...R. Equation 167 is an expression for a discrete implementation of an iterative inverse Abel Transform. Once the polar form of f(r) is obtained, it is straight forward to determine the Cartesian form, f(x,y). 145 Appendix C PROPERTIES OF PSFS AND LSFS C.1 INTRODUCTION A N D OVERVIEW This Appendix establishes some fundamental properties of point spread functions (psfs) and line spread functions (lsfs). These properties are used throughout this thesis. Section C.2 defines many of the quantities of interest. Section C.3 derives properties and relationships of the psfs and lsfs. Section C.4 examines Gaussian psfs in detail. The following quantities are used to describe optical systems. 1. F is the focal length of the lens in mm. 2. D s is the distance from the rear principal plane to the image sensor plane in mm. 3. f is the ratio of focal length to aperture diameter. 4. D 0 is the distance from the front principal plane to an object in mm. 5. Dp is the distance from the front principal plane to plane of sharp focus in mm. A point spread function (psf) of a system is the response of the system if the input is the two dimensional impulse function, 5(x,y). Furthermore, a camera can be described by a camera psf. A camera psf, h(x,y), is a psf that models the defocussing introduced into images by the camera. It has two properties: 1. A camera psf is rotationally symmetric. 2. A camera psf has unit volume. That is, it neither absorbs nor adds light to the system. C.2 DEFINITIONS Optical System Point Spread Functions (psfs) (168) 146 A pillbox psf is a camera psf of the following form: h(x,y) i x2 + y2<R2 0, otherwise A ± TTR2 (169) Theoretical geometric optics predicts that the blur circle radius, R, is given by 2/ \DJ 2/ A Gaussian psf is a camera psf of the following form: Kx>y) - o — 2 e (170) (171) The space parameter, a, can be related to either the geometric optics radius, R, or directly to the object depth as follows: ka - R FD, f l\ F-Ds •2kf \D0 2kf 1 a = m I — | + b (172) where k is a constant approximately equal to \f2 for many lenses [29]. A line spread function (lsf) (hx(x)) is a projection of a point spread function (psf). K(x) = / h(x, y)dy (173) Spread The spread [S] of a function is defined as the square root of the function's second central moment. For a one dimensional function this is defined as: S2[/(*)] = / (x-x)2f(x)dx where x = j xf(yx)dx (174) For a two dimensional function this is defined as: S2[f(x,y)} (x - xf + (y - y)2 f(x, y)dxdy •jhere x = J xf(x,y)dxdy 17= j yf(x,y)dxdy (175) 147 C.3 DERIVED RELATIONSHIPS Gaussian Isf The Isf of a Gaussian psf is itself a Gaussian. The proof follows. M * ) = J Hx>v)dv 1 _ £ i ± j d , e 2<r2 ay 2TT<T2 1 luhere a 2ira2 _ 1 ~ 2TT<72 = ( J -A J _ 2(72 e 2 » 2 e 2 » 2 riy 00 e 2^2 I g 3,2 gfy e" 2 ^ 2 / e 0 1 / T T 2 V a (176) which takes advantage of the identity [26 p.640] (177) Therefore the resulting Isf is (178) K(x) = — 1 QED (179) This is a one dimensional Gaussian. 148 Pillbox lsf The lsf of a pillbox psf is: hx(x)= / h(x,y)dy i x2 + y2<R2 0, otherwise 2 A A dy hx(x) TfR2 \/R? - x2 VR2 - x2 (180) lsf — psf Spread Relationship The spread of a rotationally symmetric psf is equal to the spread of the corresponding lsf multiplied by a factor of root two: S\psf]= (y/2)S[l8f] (181) The proof follows. Consider the spread of the psf. S2\psf] = \(x- xf + (y - yf h(x,y)dxdy (182) A rotationally symmetric (about the origin) psf must have a centroid (x,y) = (0, 0). Therefore, s V / ] (x - 0)" + (y - 0)-\h(x,y)dxdy x2h(x,y)dxdy+ f f y2h(x,y)dxdy A + B (183) Considering the first term, A , and transforming to polar co-ordinates: A = 2h.(x, y)dxdy 149 (?• cos 6)2h(r,6)rdrd6 o o 2TT C O 1 cos2 Oh(r)drdO o o where h(r,#) = h(r) since the psf is rotationally symmetric. c o 2TT A - j r3h(r)dr j cos2 OdO o o lh(r)dr 1 1 - + - cos2^ de 2TT 0 ih(r)dr J o C O TT J r3h(r)d o Similarly, C O B = TT J r3h(r)dr So the spread squared may be written as: S2\psf] = A + B OO S2\psf]= 2TT j r3h(r)dr o Now consider the spread of the lsf. S2[lsf] = j {x-xfhx{x)dx \x — 0) hx(x)dx x2hx(x)dx h(x,y)dy x2h(x,y)dxdy dx (184) (185) (186) (187) (188) (189) 150 Transforming to polar co-ordinates, 52[/s/] = J j x2h{x,y)dxdy C O ) (r cosd)2 h(r, 0)rdrcW - C O — C O 2TT co 0 0 2?r co •d cosA 6h(r)drde 0 0 2JT r3h(r)dr [ cos2 i 2TT r3h(r)drj 2TT rdh(r)dr / -c*6> O O 52[/s.f] = TT y r3h(r)dr o By inspection of Equations 188 and 191, it is obvious that S2\psf] = 2S2[lsf] or S\psf]= [V2)S[lsf} QED Spread of a Gaussian psf The spread of a Gaussian psf is (\/2)(7. The proof follows: S2[lsf] — J x2hx(x)dx f - 1 _ J E L = / T2 , - P I"2 dx J O O 1 f 2 - -4, , = . / x e ^ dx J 151 oo (193) which takes advantage of the identity [26 p.640] x2e-ax'dx = (194) 4a V a o Therefore S2[lsf] = 1 (2)1^1^(2^) V l i r a 1 4 = cr2 (195) Taking the square root of both sides: S[lsf] = a- (196) Making use of the lsf-psf spread relationship (Equation 181): S[psf}= (y/2) S[lsf] S\p8f]=(V2)<T QED (197) Spread of a Pillbox psf The spread of a pillbox psf is ^j=. The proof follows. From Equation 188 the spread of a psf can be written in polar co-ordinates as: C O S'2\psf] = 2TT y r3h(r)dr (198) 152 Therefore, for a pillbox, s-'[psf] = 2w j i dr 2TT ~A <-3dr r„4 2TT ~A 2TT ~A _ 2TT RA ~A~X 2TTR4 VRH 2 ->R 4 (199) Therefore QED (200) Isf Convolution Relationship A system that can be described by the psf relationship: h!(x,y) * h3(x,y) = h2(x,y) can also be described by the Isf relationship: hlx(x) * h3x(x) = h2x(x) The proof follows. Beginning with /ii(a;, y) * h3(x,y) = h2(x, y) hi(a,(i)h3(x - a , y - (5)dad(3 = h2(x,y) (201) (202) (203) 153 Integrating both sides with respect to dy yields the lsf of lb as defined by equation 173 hi(a, P)h3(x - a, y - (3)dadf3dy = J h2(x,y)dy hi(a, f3)h3(x - a , y - [3)dadf3dy = h2x(x) (204) Reversing the order of integration: hi(a, /3)h3(x - a, y - /3)dydf3da = h2x(x) The inner integral is the definition of the lsf of fh h3(x - a , y - [3)dy dfSda = h2x{x) h3(x-a>y-f3)d(y-f3) d/3da = h2x(x) hi(a, f3)h3x(x - a)dj3da = h2x(x) (205) (206) Reversing the order of integration again yields the lsf of hi : h3x(x - a) / hi(cv, f3)d[3da = h2x(x) h3x{x - a) hi(a,p)df3 da — h2x(x) h3x(x - a)hyx(a)da = h2x(x) (207) This is simply the convolution definition expressed in terms of lsfs: hlx(x) * h3x(x) - h2x(x) QED (208) The frequency domain relationship is obtained by taking Fourier Transforms: Hlx(fx)H3X(fx) = H2X(fa) where Hix(fx) = F{hix(x)}, i = 1,2,3 (209) 154 Unit Volume of Relative Defocus Kernels If the camera psfs h] and h2 have unit volume, then the relative defocus psf that relates these two also has unit volume. The proof follows. Beginning with the Isf convolution relationship (Equation 201): hlx(x) * h3x(x) = h2x(x) h3x(a)hix{x - a)da = h2x(x) (210) h3x(a:)hix(x — a)dadx = / h2x(x)dx Since h2 has unit volume: Therefore 1 = / / h2(x,y)dxdy — I h2x(x)dx h3X(a)hix(x — a)dxda = 1 J h3x(a) J hXx(x - a)dx J h3x(a) j hlx(x - a)d(x - a) da = 1 da = 1 (211) (212) Since hi has unit volume: Therefore 1= hi(x,y)dxdy— / hix(x)dx = / hi(x — a)d(x — a) h3x{a)[l]da = 1 h3x(a)da = 1 h3x(x)dx = 1 (213) (214) Or in terms of psf: h3(x,y)dy j j h3(x,y)dxdy QED dx = 1 1 (215) This clearly states that fi3 must also have unit volume. 155 C.4 GAUSSIAN RELATIVE DEFOCUS The use of Gaussian psfs to model camera psfs has many mathematical advantages. Gaussian camera psfs lead to a Gaussian relative defocus psf (I13) This is proved below. A Gaussian function would be an analytic, easy to implement relative defocus kernel. Furthermore, Gaussian kernels have advantages when performing convolutions. Two dimen-sional Gaussians are separable. A separable function of x and y can be expressed as a function of x multiplied by a function of y. A two dimensional convolution can then be implemented as two one dimensional convolutions. This too is proved below. Spectrum of Gaussian lsfs The spectrum of a Gaussian Isf is easily found by taking the Fourier Transform of the Isf. If hix(x) 1 i = 1,2 (216) e then the spectrum is HiX(f*) = F{hix(x)} (217) which takes advantage of the identity [26 p.92] where uix = 2irf3 X (218) Therefore, Hlx(fx) = e~°>l H2x(fx) = e-'M (219) 156 Gaussian 113 The frequency domain relative defocus relationship is (from Equation 209): Hlx(fx)H3X(fx) = H2X(fx) (220) Solving for H3X H3x{fx) = ^ = ^ ± = e - < ^ ) (22,) The relative defocus lsf that relates two Gaussian lsfs, is itself a Gaussian lsf. Combining Equations 218 and 221: H3X(fx) = e-u-°3> where a\ = a\ — u\ (222) corresponding to 1 h3x{x) = where U 3 = a2, — a\ (223) in the spatial domain. By inspection of the psf— lsf relationships found in Equations 171 and 179, the relative defocus psf corresponding to the relative defocus lsf (Equation 223) is: 1 Q w ) h3(x, V) = Tr—i-e ^ /7T<7g where ai = <J\ — cr? (224) Separability A function f(x,y) is said to be separable if it can be expressed as f(x,y) = f(x)f(y). This is true for Gaussians. Furthermore, a two dimensional convolution with a separable two dimensional function is equivalent to two one dimensional convolutions. This is proved as follows. First, a two dimensional Gaussian can be shown to be separable as follows: 1 *2+«a — P 'Iv- P 2tr2 157 1 _ x i . e ^ 'jhere h(x) = and h(y) = h(x)h(y) 1 72 7 T 0 " 1 1 %/2J =e 2^  (225) Note that the separated functions h(x), h(y) are identical to the line spread functions of a Gaussian hx(x) and hy(y). Secondly the two dimensional convolution can be separated into two one dimensional convolutions. The proof below uses a subscript on the convolution operator (*) to indicate a one (1) dimensional or two (2) dimensional convolution. f(x,y) *2 h(x,y) f(a,P)h(x - a,y- P)dcvd/3 /(», P)hx(x - a)hy(y - f3)dcxdp hy(y-P) J f{a,(3)hx(x - a)dcx dj3 = J hy(y - p)g(p)d/3 where g(0) = / (» , /? ) * i hx(x) (226) Therefore f(x,y) *2 h(x, y) = J hy(y - P)g(P)d(3 = h.y{y) * i g{y) = hy(y) * i f(x,y) * i hx(x) — f{x,y) * i hx(x) * i hy(y) (227) That is, a two-dimensional convolution (of size N 2 ) can be evaluated as successive two one-dimensional convolutions (of size N). Uniqueness of Gaussian Relative Defocus psf If the Gaussian model of defocus is assumed, the relative defocus psf has an analytic form and is itself a Gaussian as derived above. An analytic relative defocus allows a straightforward proof that hj is a unique indicator of depth. 158 Recall from Equation 224 that the Gaussian relative defocus psf is defined as: 1 - i f l f l y) 2ir<r\ •jhere a\ = o\ — <J\ (228) Recall from Equation 172 that the spread parameters (a) of camera psfs are related to the theoretical blur circle radius by a constant: Solving this for a\ and a2 yields: FD,fl \ . F-Ds 2hk\D0J 2/ifc f FD, F - D , \ ( l \2k 2kD0 2k yv/i A Ti A (FD, | F-D, 2k w h ^ A = [ 2 m + - ^ ] ( 2 3 0 ) Similarly, A (?2 = y J2 (231) Note that F, D s and k would be constant over the image set. An image set is a set of images of the same scene acquired under the same conditions except for differing apertures. Similarly for any particular object (or image point), the depth (D0) would be constant. Therefore A is a constant under these conditions. The spread parameter of the relative defocus psf can be expressed as: 2 2 2 -(2 ~ f2 H t1 1 ft ~Tl A2 ~ fi 1 A 1 1 where -pr = -pr —-pr (232) fi fi fi 159 In order to examine the relationship between <r3 and depth (D0), A should be re-expanded. Note that f3 is not dependant on object depth. h = (FD° F-DA fl\ \2kD0+ 2k ) \ h ) 2kf3\D0J+ 2k h Note that this is the same relationship used for camera lsfs, but with an effective aperture f3 instead. where M 2kf, 3 and B 4 F fs (234) 2K J 3 Since neither M nor B are functions of D0, the above equation clearly shows that there is a one to one correspondence between the relative defocus spread parameter (<r3) and the depth (D0). Note that <r can only take on positive values as negative a would be meaningless. Therefore, the Gaussian model of camera defocus yields a Gaussian relative defocus psf that is a unique indicator of depth. The exception to the one to one correspondence occurs if M is zero or infinite. These conditions would not occur in any real camera. The above equation also reveals that as an object is moved farther from the plane of sharp focus (decreasing D0), the spread (size) of the relative defocus psf increases. Recall that the above derivations are based on the assumption that the camera is focused beyond the working area. 160 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items