UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Real-time location and parameterization of eyes in an image sequence and the detection of their point-of-gaze Smith, Michael David 1998

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_1998-0305.pdf [ 25.47MB ]
Metadata
JSON: 831-1.0065130.json
JSON-LD: 831-1.0065130-ld.json
RDF/XML (Pretty): 831-1.0065130-rdf.xml
RDF/JSON: 831-1.0065130-rdf.json
Turtle: 831-1.0065130-turtle.txt
N-Triples: 831-1.0065130-rdf-ntriples.txt
Original Record: 831-1.0065130-source.json
Full Text
831-1.0065130-fulltext.txt
Citation
831-1.0065130.ris

Full Text

REAL-TIME LOCATION AND PARAMETERIZATION OF EYES IN A N IMAGE SEQUENCE AND THE DETECTION OF THEIR POINT-OF-GAZE by MICHAEL DAVID SMITH B.A.Sc, The University of British Columbia, 1992 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE i n THE F A C U L T Y OF G R A D U A T E STUDIES THE DEPARTMENT OF ELECTRICAL ENGINEERING THE UNIVERSITY OF BRITISH COLUMBIA March 1998 © Michael David Smith, 1998 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of £T [ e c 4 > t ^ 1 Fz*\,~ eP./-.v-The University of British Columbia Vancouver, Canada Date kf>*\ ^9-f9^ DE-6 (2/88) ABSTRACT The determination of point-of-gaze of an imaged subject is the main focus of this thesis. Digitized image sequences from a single camera are analysed to determine the point-of-gaze of a subject without manual setup of algorithms for that subject or calibration of the system. A novel feature of this work is that eyes are initially located by using motion to characterize possible Eye-events. A priori knowledge of an eye's motions, spatial orientation and characteristics are used to find Eye-events. Deformable templates are used to parameterize eyes in an image sequence into a parameterization vector using a 2-D model of the eye and face. The pupil, iris, skin, hair, upper and lower eyelids of a subject are parameterized to form a knowledge base for that subject. The tracking of these eye parameters is performed similarly to their initial parameterization. Point-of-gaze is determined by applying the 2-D eye and face parameters to a 3-D model of the eye. Eye Location, Parameterization, and Tracking algorithms are run using both Matlab on a SPARC 5 workstation and C on a TMS32C40 parallel processing network. The algorithms described in this thesis run equally well on both of these systems. The algorithms are shown to be able to locate eyes in a series of images over a wide range of eye scales. The accuracy of the algorithms is analysed on the SPARC system by having a subject view sets of targets at known locations. The system's determination of a person's point-of-gaze was compared to the actual target positions, showing an error of 4.01° or less along the horizontal axis and 4.99° or less along the vertical axis for targets at or above the eye level of the subject. The accuracy of the ii system lowers to 5.41° or less along the horizontal axis and 15.76° or less along the vertical axis for targets below the eye level of the test subject. The C40 parallel processing system which runs these algorithms, automatically switching between the Location, Parameterization, and Tracking modes of operation, is described in detail. This system was able to track parameterized eye features at up to 15 frames per second (fps) during saccadic eye motion without a requirement for a fixed head position. The timing of these algorithms on the C40 network is analysed. iii A C K N O W L E G E M E N T S I would like to thank my parents, Ted and Violet Smith for all of their love and support over the past thirty years of my life. I would also like to thank Dr. Peter Lawrence for his support as supervisor during my extended degree/work term. I would like to thank Dr. David Lowe for sponsoring me in my use of the CISR LCI Lab. I would also like to thank Rod Barman and Stewart Kingdon for their help in configuring the LCI laboratory in order to perform eye-tracking experiments. I would like to thank Johan Thorntan for his beverage induced help on some of the figures in my thesis. I would also like to thank Mike and Agata for being my guinea pigs during the testing of this system. Last and certainly not least, I would like to thank the Lord for without whose help nothing would be possible. iv T A B L E OF CONTENTS A B S T R A C T II A C K N O W L E G E M E N T S I l l T A B L E O F C O N T E N T S : V LIST O F FIGURES VIII LIST O F T A B L E S XII G L O S S A R Y XIII 1. INTRODUCTION 1 1.1. RELATED WORK 2 1.2. THESIS OVERVIEW A N D SCOPE 9 2. 2-D AND 3-D E Y E AND F A C E M O D E L S 17 2.1. E Y E MODEL ORIENTATION 20 3. E Y E L O C A T I O N A L G O R I T H M S 23 3.1. E Y E REGION LOCALIZATION 23 3.2. INITIAL SPATIAL MOTION SEGMENTATION 26 3.3. SECONDARY SPATIAL MOTION SEGMENTATION 30 3.4. TEMPORAL MOTION SEGMENTATION 32 v 3.5. MOTION REGION LIMITS 33 4. E Y E P A R A M E T E R I Z A T I O N A L G O R I T H M S 42 4.1. FILTER CORRELATION 45 4.2. PUPIL A N D IRIS FILTER 46 4.3. COARSE-SCALE IRIS A N D PUPIL PARAMETERIZATION 51 4.4. PUPILLARY A N D IRIDAL REGION DIFFERENTIATION 54 4.5. FINE-SCALE IRIDAL PARAMETERIZATION 57 4.6. EDGE DETECTION AND REGIONS 60 4.7. SCLERAL TONE PARAMETERIZATION 62 4.8. SKIN A N D HAIR INTENSITY ESTIMATION 65 4.9. UPPER A N D LOWER EYELID SEMI-PARABOLIC PARAMETERIZATION .... 69 4.9.1. Eyelid Edge Locations 70 4.9.2. Initial Eyelid Slope Calculations 74 4.9.3. Semi-parabolic Eyelid Fitting 81 4.10. UPPER EYELID FULL-PARABOLIC PARAMETERIZATION 88 4.11. LOWER EYELID SEMI-PARABOLIC A N D FULL-PARABOLIC PARAMETERIZATION 89 4.12. GOODNESS-OF-FIT CALCULATION 94 5. T R A C K I N G A L G O R I T H M S 96 5.1. IRIDAL REGION TRACKING 98 5.2. EYELID TRACKING 102 vi 5.3. GOODNESS-OF-FIT CALCULATION 104 6. P O I N T - O F - G A Z E D E T E R M I N A T I O N 105 7. S Y S T E M I M P L E M E N T A T I O N AND R E S U L T S 108 7.1. MATLAB IMPLEMENTATION A N D TEST RESULTS 108 7.2. C40 NETWORK IMPLEMENTATION AND EVALUATION 120 7.2.1. System Modes 122 7.2.2. Mode 1: Motion Segmentation and Initial Parameterization 126 7.2.2.1. Frame Grabber Node 126 7.2.2.2. Motion Segmentation Node 128 7.2.2.3. Display Node 131 7.2.3. Mode 2: Eye-Parameterization and Iridal Tracking 132 7.2.3.1. Frame Grabber Node 133 7.2.3.2. Iridal Tracker Nodes 1 And 2 134 7.2.3.3. Parameterization Nodes 1 And 2 135 7.2.3.4. Display Node 139 7.2.4. Mode 3: Iridal and Eyelid Tracking 140 8. S U M M A R Y CONCLUSIONS AND R E C O M M E N D A T I O N S F O R F U T U R E W O R K 145 R E F E R E N C E S 151 vii LIST OF FIGURES FIGURE 1: FEATURE COMPLEXITY SURROUNDING EYE 10 FIGURE 2: EDGE INFORMATION SURROUNDING EYE (a=l) 10 FIGURE 3: EDGE INFORMATION SURROUNDING EYE (G=2) 11 FIGURE 4: OVERALL SYSTEM FUNCTIONALITY 14 FIGURE 5: THE 2-D EYE MODEL 18 FIGURE 6: THE 2-D FACE MODEL 19 FIGURE 7: 3-D EYE MODEL SHOWING POINT-OF-GAZE CALCULATION PARAMETERS 20 FIGURE 8: FA CIAL AXIS ORIENT A TION 22 FIGURE 9: FLOWCHART OF EYE LOCATION ALGORITHMS 24 FIGURE 10: UPPER LIMIT OF ALLOWABLE EYE SCALE 27 FIGURE 11: EXAMPLE OF AN INITIAL SPATIAL MOTION SEGMENTATION REGION 29 FIGURE 12: EXAMPLE OF SECONDARY SPATIAL MOTION SEGMENTATION 31 FIGURE 13: EXAMPLES OF EYE-EVENT REGIONS AT POINTS IN THE SCALE SEARCH SPACE 33 FIGURE 14: CALCULATING THE IRIDAL RADIUS USING AVERAGE EYEBALL DIMENSIONS . 35 FIGURE 15: MOTION REGION CAUSED BY EYEBALL ROTATION 37 FIGURE 16: THE MAXIMUM IRIDAL MOTION REGION OF A SINGLE DIFFERENCE IMAGE 39 FIGURE 17: MOTION REGION CAUSED BY EYELID MOTION 40 FIGURE 18: FLOWCHART OF PARAMETERIZATION ALGORITHMS 43 FIGURE 19: MINIMUM AND MAXIMIM ALLOWABLE EYELID ASPECTRATIOS 44 FIGURE 20: IMAGE TO BE PARAMETERIZED 45 viii FIGURE 21: IRIDAL/PUPILLARYDEFORMABLE TEMPLATE FILTER 48 FIGURE 22: IMAGES SHOWING LASH DOWN (LEFT) AND LASH UP (RIGHT) CONDITION.. 49 FIGURE 23: RO SUBREGIONDEFINITION 51 FIGURE 25: INITIAL IRIS/PUPIL SEARCH RESULT , 54 FIGURE 26: CENTER SEARCH REGION FOR IRIDAL/PUPILLARY BORDER DIFFERENTIATION : 5 5 FIGURE 27: CHOICES FOR IRIDAL/SCLERAL AND PUPILLARY/IRIDAL BOUNDARIES 56 FIGURE 28: ERROR IN IRIDAL EDGE PARAMETRIZATION WITH ELPUPIL/IRIS = 1 58 FIGURE 29: VERY FINE-SCALE IRIDAL PARAMETRIZATION 60 FIGURE 30: COARSE-SCALE IRIDAL FILTER CENTER SEARCH REGION 62 FIGURE 31: FINE-SCALE IRIDAL FILTER CENTER SEARCH REGION 62 FIGURE 32: SCLERAL SEARCH REGIONS 64 FIGURE 33: SCLERAL REGION SELECTION 65 FIGURE 34: SKINAND EYEBROW SEARCH REGIONS 66 FIGURE 35: SKIN/HAIR BOUNDARIES OF AN EYEBROW 69 FIGURE 36: EYELID BOUNDARY SEARCH REGIONS 71 FIGURE 37: SUBREGION SCALINGS OF THE EI AND E2 EYELID SEARCH REGIONS 72 FIGURE 38: POSSIBLE UPPER AND LOWER EYELID EDGE CHOICES 74 FIGURE 39: IMAGE OF EYE REGION WITH LASH IN THE DOWN ORIENTATION 75 FIGURE 40: IMAGE OF EYE REGION WITH LASH IN THE UP ORIENTATION 76 FIGURE 41: SUBREGIONS USED FOR EDGE FOLLOWING 78 FIGURE 42: EYELID EDGE DATA 79 ix FIGURE 43: EYELID SLOPES CALCULA TED FROM EDGE DATA 80 FIGURE 44: UPPER AND LOWER EYELID BOUNDARY FILTER 82 FIGURE 45: PARABOLIC PARAMETERIZATION BOUNDS FOR THE UPPER AND LOWER EYELIDS 85 FIGURE 46: UPPER EYE FILTER REGIONS DEFINED BY IRIDAL BOUNDARY 87 FIGURE 47: POSSIBLE EYELID SEMI-PARABOLA PARAMETERIZATIONS 88 FIGURE 48: LOWER EYELID FILTER SUBREGIONS 91 FIGURE 49: PARAMETERIZATION OFIRIDAL/SCLERAL BOUNDARY, UPPER EYELID, AND LOWER EYELID 95 FIGURE 50: FLOWCHART OVERVIEW OF TRACKING ALGORITHM 98 FIGURE 51: COARSE IRIDAL FILTER LOCATION SEARCH SPACE . 99 FIGURE 52: FINE SCALE IRIDAL FILTER SEARCH SPACE 101 FIGURE 53: UPPER AND LOWER EYELID TRACKING TEMPLATES 103 FIGURE 54: DETERMINATION OFPOINT-OF-GAZE USING SCLERAL AND IRIDAL CENTERS 106 FIGURE 55: THREE TARGET ARRANGEMENT........ 109 FIGURE 56: SUBJECT LOOKING AT TARGET 1 I l l FIGURE 57: SUBJECT LOOKING AT TARGET 2 111 FIGURE 58: SUBJECT LOOKING AT TARGET 3 112 FIGURE 59: NINE TARGET CONFIGURATION 113 FIGURE 60: GAZE AT TARGET 1 116 FIGURE 61: GAZE AT TARGET 2 116 X FIGURE 62: GAZE AT TARGET 3 117 FIGURE 63: GAZE AT TARGET 4 117 FIGURE 64: GAZE AT TARGET 5 118 FIGURE 65: GAZE AT TARGET 6 118 FIGURE 66: GAZE AT TARGET 7 119 FIGURE 67: GAZE AT TARGET 8 119 FIGURE 68: GAZE AT TARGET 9 .• 120 FIGURE 70: MODAL FLOW OF EYE TRACKING SYSTEM 124 FIGURE 71: SAMPLE SCREEN OUTPUT FROM C40 EYE TRACKING SYSTEM 125 FIGURE 72: MODE 1: MOTION SEGMENTATION AND INITIAL PARAMETERIZATION 126 FIGURE 73: MODE 2: EYE-PARAMETERIZATION AND IRIDAL TRACKING 132 FIGURE 74: MODE 3: IRIS AND EYELID TRACKING 140 x i LIST OF TABLES TABLE 1: THREE TARGETPOINT-OF-GAZE CALCULATIONRESULTS 110 TABLE 2: THREE TARGET POINT-OF-GAZE ERROR EVALUATION 110 TABLE 3: NINE TARGET POINT-OF-GAZE CALCULATION RESULTS 114 TABLE 4: NINE TARGET POINT-OF-GAZE ERROR EVALUATION 115 TABLE 5: FRAME GRABBER NODE TIMING 127 TABLE 6: VARIABLES USED IN THE MOTION SEGMENTATION ALGORITHM 130 TABLE 7: TIMING OF THE MOTION SEGMENTATION ALGORITHMS 131 TABLE 8: GOODNESS-OF-FIT CALCULATION THRESHOLD PARAMETERS 137 TABLE 9: PARAMETERIZATION ALGORITHM SUBSECTION TIMINGS 139 xii Alpha trimming Corneal sphere Deformable template Eye-event Eye gaze vector Goodness-of-fit sum Homogeneity measure GLOSSARY - The process of removing the top and bottom x% of the items in a sample group. - The clear dome covering the iridal and pupillary regions, extending from the scleral sphere. - A correlation filter containing regions defined by mathematical formulae and constrained to deform only within set parametric spaces. - A spatio-temporal volume region with a high likelihood of containing a portion of an eye from an imaged subject. - A vector defined as beginning a the center of the scleral sphere and passing through the center of the corneal sphere. - A mathematical sum representing the degree of accuracy to which a specific image feature has been represented by our models. - A parameter calculated for the correlation of a deformable template region over an image; representing the amount of intensity variation xiii Iridal Laplacian of Gaussian correlation Point-of-gaze Pupillary Sclera Scleral Scleral sphere Spatio-temporal Target plane Zero-crossing detection within that region. - An item pertaining to the iris. - Correlation performed using a filter described by a Laplacian of Gaussian operator used to detect intensity gradients within an image. - The visual attention point representing the intersection of the eye-gaze vector and the target plane. - An item pertaining to the pupil. - The white region of an eyeball surrounding the outside iridal boundary. - An item pertaining to the sclera. - The sphere formed by the scleral region; the eyeball sphere. - An item existing in both the spatial and temporal parametric spaces. - A plane formed by the targets which are being viewed by the imaged subject. - The determination of the point along a vector within an image at which point the Laplacian of Gaussian of the filtered image has a gradient crossing zero intensity. xiv 1. INTRODUCTION The human eye is generally accepted as an information receiving device. Throughout our lives, our eyes are used to gather vast amounts of data about the world around us and our path through it. A portion of the information gathered daily is from the faces of the people with whom we interact. As we are speaking with people, we analyse their faces gaining insight into their current state-of-being through their facial expressions. When we approach someone, we immediately locate their eyes and use these facial regions as focal points during our conversation with them. We can easily tell by looking at someone where it is they are currently looking. For instance, if we were facing someone approximately five feet away, we would be able to tell if they were looking at our left hand, elbow, or shoulder. Therefore, we are able to use their eyes as output devices, indicating where they are looking. This is the main goal addressed in this thesis, to determine where someone is looking (point-of-gaze) based on the visual information gathered from their face. Systems designed to determine a person's point-of-gaze can and have found use in a number of applications. The eyes provide a highly dexterous input to a machine through the monitoring of their point-of-gaze. Activities from the movement of the cursor on a computer monitor to the activation of certain functions within a control room can be achieved through point-of-gaze monitoring. 1 1.1. Related Work There are four basic techniques used by today's eye-tracking systems: the electro-oculography (EOG) method, the photoelectric method, the magnetic field method, and methods which use the imaging of the eye through a camera. One system designed around the EOG method was developed by Martin and Harris [1]. Using electrodes mounted in a ski mask, the electric potential across the eye is measured, and from this the point of gaze of the eye is detected. The foam of the ski mask helps to eliminate electrode thermal and pressure problems by absorbing excess perspiration and applying even pressure. The system is insensitive to common mode voltage fluctuations and uses low pass filtering to smooth muscular signals. The system is calibrated to a set of eye-gaze positions, remaining accurate by fixing the position of the head using a chinrest. The system described is used to select menu items on a video display. The photoelectric method is used in an eye-tracking system designed by Johnson, Drouin, and Drake [2]. An instrument was developed which enables horizontal eye position to be determined to within less than a degree of arc. The optical configuration places only imaging lenses in close proximity to the subject's eye, allowing for non-distracting observations of eye movements during normal activity. Reflected light gathered by these lenses is transmitted to remote detectors via lightweight optical fiber. There are two remote detectors per eye, detecting the left and right iridal/scleral boundaries. The amount of light sensed by each detector is proportional to the amount of iris in its viewing area. The left and right signals from 2 the two detectors are subtracted and the horizontal position of the eye is calculated. One system which uses the imaging of the eye through a camera was developed by Ebisawa et. al. [3]. The eye, illuminated by low level infrared light, is scanned using a CCD camera with an infrared light pass filter. With the infrared light source coaxial with the camera, the infrared light enters the pupil of the eye and is reflected off the retina, coming out through the pupil. The pupil image appears as a half-lighted disc called the bright eye against a darker background. Its position moves in the camera field following the motion of the eye. In addition to this, a fraction of the infrared light from the source is reflected off the corneal surface. This appears in the camera field as a small intense area, called the glint. The position of the glint relative to the bright eye is used to determine the eye's point of gaze. The camera used has auto-focus and has its zoom computer-controlled, using an ultra-sonic range finder to keep the size of the eye constant in spite of changing head range. The system keeps the eye being tracked in the field of view of the camera by servoing the camera around the vertical axis in response to movements of the bright eye image. Rotation about the horizontal axis is not performed. This system provides an output independent of head direction. This system has the drawback of having to be manually given the approximate position of the bright eye region. No algorithm for the automatic location of eye regions is currently implemented. Also, the system must be run through a calibration routine for each user, as reflection results vary with differing scleral and corneal surface curvatures. 3 Another system to use the imaging method was developed by Hutchinson et. al. [4] and Frey et. Al. [5]. This system, named ERICA for Eye-gaze Response Interface Computer Aid, uses the same glint/bright eye method used by the Shizoka system. An infra-red light source, coaxial with a CCD camera is used to illuminate the eye, creating the glint and bright eye regions. The vector distance between the glint and bright eye centers is mapped to a computer screen location and used to control a simple word processing system. This system requires the users' eyes to remain within a very small working volume in order to function correctly. An improvement on this system was made by K. P. White Jr., T. E. Hutchinson, and J. M. Carley [6], allowing for increased head motion. Another system utilizing a similar method to track eye motions was developed by T. N. Cornweet and H. D. Crane [7]. These systems suffer from the same limitations previously mentioned for the Ebisawa et. al. system. An eye system using the imaging of the eye through a camera and analysis of the actual eye features, was developed Yuile, Cohen, and Hallinan [8]. Although this system is not utilized to detect a person's point-of-gaze, it could be adapted for this purpose in a manner similar to that described in this thesis. This system uses deformable templates to parameterize the eye region . These templates are specified by a set of parameters which enable a priori knowledge about the expected shape of the eye features to guide the fitting process. A goodness-of-fit function is used to access the parameters of the template during the fitting process. The deformable template is fit to each image in a sequence and thus produces a sequence of eye 4 parameters describing the eyes in that sequence. This system requires that the deformable template be manually placed within a small region surrounding the eye at the start of the fitting process. It also requires that the weights used in the goodness-of-fit function be manually adjusted to help the template accurately fit the image data. The use of deformable templates in this way is also described in an earlier paper by Yuille, Cohen, and Hallinan [9]. A system developed Shackleton and Welsh [10], uses a similar approach to that used by Yuille, Cohen, and Hallinan in order to parameterize eyes in an image. They use the same deformable template, but change some of the energy functions which are used to guide the model to an accurate fit. The major improvement this system has over the Harvard system is the introduction of contrast enhancement in the area of the lower eyelid. This helps the model deform properly to an area which in many cases has a lack of grayscale contrast. However, it also suffers from the same limitations as the Yuile, Cohen,and Hallinan system. That is, it requires that its template be started in close proximity to the eye region and that its parameters be manually adjusted for proper template fit. Another technique for complex feature parametrization, similar to the deformable template methods used by some of the previous systems, uses a mathematical model known as a snake. A snake is an active contour model utilizing an energy minimization spline guided by external constraint forces and affected by image forces such as edges. Snakes are useful in describing objects with complex contours such as a human mouth. Two descriptions of the use of snakes in an image 5 processing system are given in papers by J. Waite and W. Welsh [11] and M . Kass, A. Witkin, and D. Terzopolous [12]. The use of deformable templates in a generic image sequence is described by K. V. Mardia, T. J. Hainsworth, J. F. Haddon [13]. Another example of the use of deformable templates for the tracking of objects in an image sequence is described by J. Rehg and A. Witkin in the IEEE International Conference on Robotics and Automation [14]. More current research into eye-gaze determination and tracking systems has yielded some improved results. A system designed by Collet, Finkel, and Gherbi [15] does preliminary segmentation of the image data in order to reduce the parametric search space of subsequent algorithms. The system, named CapRe, looks for pixels which it considers to be skin colored and uses them to locate a face in the image. After location of a face, the system uses matched filtering techniques to locate the eyes, nostrils, and corners of the mouth. These points are then tracked and used to determine the subject's head orientation. The system is able to work at 12fps and has a correct feature location accuracy of 80%. The point-of-gaze detection algorithms are currently being developed and will require a calibration routine to be performed prior to each system use. A system similar to the one designed by Collet, Finkel, and Gherbi was designed by Stiefelhagen, Yang, and Waibel [16]. The system locates and tracks the pupils, nostrils, and corners of the mouth in order to determine head orientation and point-of-gaze. It uses iterative thresholding and edge detection to determine these 6 points and is thus very susceptible to false matching. There is currently very coarse checking in place to verify the likelihood that a good fit has been achieved. The system allows for 15cm of head motion along the horizontal and vertical axis. The system is able to detect gaze with an accuracy of 12.0° of arc and track at 15fps. A calibration routine is required by this system for each user prior to use. A system designed by Ohtani and Ebisawa [17] and Sugioka, Ebisawa and Ohtani [18] uses an improved bright-eye and glint detection method to determine point-of-gaze. The system uses two infra-red light sources to illuminate the eye from different angles on odd and even scan lines. The system then uses differencing between the illuminations to detect the bright eye, while the glint is determined using a matched filter correlation. The system is capable of locating eye-gaze to within 1.0° of arc. However, a calibration routine is required to be performed prior to each use. Additionally although head motion is allowed, the camera's zoom and orientation is adjusted so that the pupil is centered in the image and its radius takes up 1.6 times the image radius. This camera motion and refocussing can take up to 5 seconds at each new head position and so the system is unusable during this period. An eye-gaze detection system using a color camera and a method based on a neural network was developed by Baluja and Pomerleau [19]. The system uses neural networks for the detection and tracking of the eyes and also for the determination of their point-of-gaze. A stationary light is placed in front of the user, and the system starts by finding the right eye of the user by searching the video image for the reflection of this light, the glint, distinguished by being a small, very bright 7 point surrounded by a darker region. It then extracts a smaller, rectangular section of the image centered at the glint and feeds this to the neural network. The output of the network is the coordinates of the point-of-gaze. The system works at 15fps with a gaze detection accuracy of 1.5°. However, the user's head motion is constrained to a 10cm displacement along the horizontal and vertical axis. Additionally the system requires that the neural nets are pre-trained for each user. A commercially available eye-gaze tracker was developed by L C Technologies Inc [20]. This system uses an infra-red imaging camera located under the monitor of the user's computer to detect the user's pupil center and corneal reflection from the infra-red source. The system runs on a standard PC using expansion cards. The system requires that the user be located 18-22 cm from the monitor and is able to determine eye-gaze with an accuracy of 3.5° of arc. Prior to each use, the user must run a 15-20 second calibration routine to adjust the system to characteristics of the current user's eye, lighting conditions, position within the cameras field of view, etc. The system is able to run at 30fps. A second commercially available system was developed by EyeTech Digital Systems [21]. This system uses a similar setup to the L C Technologies system. The system is able to determine eye-gaze with an accuracy of 1.5° of arc and the users head is allowed to move within a 4cm cube. This system also requires that a calibration routine is run prior to each use. A common feature among all of the eye tracking systems previously described is the need for system calibration prior to use. These systems require that the user 8 gaze at fixed targets to enable the system to perform the translation from 1-D or 2-D parameters to the 3-D model required to determine point-of-gaze. Additionally, some systems as described, required the manual entry of information about the eye features of the subject currently being tracked. Another restriction common to all of these systems, to varying degrees, is the restriction placed on head motion while the eye is being tracked. The main goal of our system as described in the following section is to design and implement a system which removes these procedures and restrictions from the user, instead providing a system which is self-calibrating and will allow a large degree of head motion requiring only that both eyes are within the system's imaging area. 1.2. Thesis Overview And Scope The eye-tracking system described in this thesis is based on the imaging method. The imaged facial region is a complex set of intensity minima and maxima. Even very close to the eye region, there is much information that can cause distraction or intractable complexity to a feature search or parameterization algorithm. As can be seen in Figures 2 and 3, even in what appears to be a simple eye image, Figure 1, the edge information is incredibly dense and does not provide a good representation on the level of the features to be parameterized. In order to locate and parameterize the eye regions, global methods need to be applied in order to avoid false matching. Motion characteristics, in addition to spatial characteristics need to 9 be considered when locating and parameterizing eye regions. Figure 1: Feature Complexity Surrounding Eye Figure 2: Edge Information Surrounding Eye (o=l) 10 10 20 30 40 50 60 70 80 90 100 Figure 3: Edge Information Surrounding Eye (G=2) In order to determine point-of gaze for a subject imaged through a camera, one must first locate the eyes in an image sequence. Rectangular image regions containing each eye are found by looking for eye-like motion. The iris and pupil are then located in each region. A goodness-of-fit calculation, evaluation of an energy function, is used to choose the best eye; a set of 2-D deformable templates are applied to the image to locate the iridal/scleral1 boundary, the sclera, skin and hair average intensities, and the upper and lower eyelid boundaries. From the 2-D information, a 3-D model of the eyeball is built and its point-of-gaze is determined. The point-of-gaze is then tracked by repeatedly deforming the templates to fit the changing image data and updating the 3-D model accordingly. The second eye could be used as a 1 The boundary region between the iris and the sclera. 11 check on gaze detection to add robustness to the system, however this is not done in the system described by this thesis. As described in Chapter 1, Subsection 1, previous systems suffer from a poor ability to locate the eyes without being directed to the local eye region or having system parameters set before each user. Additionally, previous systems all required the running of a calibration routine prior to use. In the system described in this thesis, the problem of eye location is solved by using the detection of characteristic eye motion as a tool to quickly locate the eyes in an image sequence. These eye motions, or Eye-events as they are referred to in this thesis, provide the system with small localized regions with which to continue searching for eye features. This use of motion has not been described in previous works in the field and is a significant contribution of this thesis. The system described in this thesis also requires no calibration be preformed prior to use. It is self-calibrating based on the person currently being imaged. Deformable templates are the basis for the eye parameterizations performed by this system. These templates are specified by a set of parameters using a priori knowledge about the expected shape of the features being matched, guiding the template contour deformation process. The templates described in this thesis are flexible enough to change their sizes and other parameters, so as to match themselves with the data. The final values of these parameters form a parameterization vector which is used to describe the eye features. Energy functions are defined containing terms attracting the templates to salient image features such as peaks and valleys in 12 the image intensity, detected edges, and absolute intensity levels within the image. The maxima of these energy functions correspond to the best fit of the template models to the image features. Each of the templates described in this thesis are fit to the eye in various stages, as parameters of our 2-D eye model are adjusted by their interaction with the image data. Region information is used by our deformable templates to combine local edge details into a perception on the level of the features being parameterized. Our system provides a scale and location independence, allowing the user a freedom of motion not found on other systems as described in Chapter 1, Subsection 1. However, although the system was designed with the goal of relatively unrestricted head orientation in mind, the current system as implemented, requires the head to be a fixed distance from the camera in order to perform the point-of-gaze calculation. The current implementation also requires the facial normal to be orthogonal to the imaging plane. The system enhancements required to eliminate these restrictions are discussed. The algorithms developed in this thesis do not require adjustments for different users. They calibrate themselves during the location and parameterization of eyes present in the image data, and then determine the subject's point-of-gaze without the need for calibration using a known gaze location. This contrasts with the systems previously mentioned, all of which require adjustments as stated. This thesis describes the components of a system that locates a person's eyes in an image sequence, chooses and then parameterizes one of the eyes, and finally 13 determines its point of gaze. The eye and face models used in these tasks are defined and the algorithms explained. The accuracy of the system is evaluated using comparison with known gaze points. I Acquire image Eye region localization using motion segmentation Eye region feature parameterization Figure 4: Overall System Functionality The overall system functionality is shown in Figure 4. The first phase of the 14 system is the Location phase. The task of this phase is to locate Eye-events in an image sequence; this is described in Chapter 3. If after possible Eye-events have been evaluated a likely candidate is chosen, the system will then begin the Parameterization phase; this is described in Chapter 4. If the Location phase does not find any Eye-events, it will continue searching in the next frames of the image sequence. The Parameterization phase uses the deformable templates described in Chapter 2 and Chapter 4 to model the located eye. After the parameterization is completed, the fit is evaluated, and the system determines whether or not the models have fit a trackable eye. If a good fit has been achieved, the system proceeds to the Tracking phase; this is described in Chapter 5. If a good fit has not been achieved the system returns to the Location phase and restarts the process. The Tracking phase re-parameterizes the eye previous parameterized in successive frames. This tracking parameterization takes place over a much smaller search space bounded by possible eye and head motions between successive frames. While a good parameter fit is maintained, the system will remain in the Tracking phase; if the parameter lock on the eye is lost, the system will return to the Location phase and the process is restarted. The algorithms described in this thesis were implemented using Matlab on a SPARC 5 workstation and on a six processor Texas Instruments TMS320C40 Parallel Processing Network. The C40 system consisted of 3 dual TMS32C40 V M E bus cards. One of the dual C40 cards had a frame-grabber mapped through dual port R A M into the main memory space of one of its C40s. This allowed 512 by 512, 8-bit grayscale images to be gathered from a CCD camera at 15fps. Another one of the 15 dual C40 cards contained a C40 sharing dual port R A M with a video display DAC, allowing the memory to be used to display images on a monitor. A third dual C40 card had a high-speed serial link connected to one of its C40s. This serial link connected the C40 to a V M E SPARC 2 card which in turn was connected to the U.B.C. network, thus enabling communication to the C40 network from any workstation. The network code described in this thesis was uploaded to the C40 network through this serial link to a single C40, and then distributed through parallel links between the various C40 nodes. The C40s in the network were connected using 32-bit, 15Mbyte/s parallel links. The implementation on the C40 network was however not fully debugged and currently suffers from communication problems between nodes. The individual image processing algorithms run equally well on both the SPARC and C40 systems. 16 2. 2-D AND 3-D E Y E AND F A C E M O D E L S The system uses one face and two eye models in the parameterization and determination of point-of-gaze for an eye, as imaged through a video camera. These models are deformable templates which will be used to convert the complex shape of an eye within the image sequence into a small number of parameters which can be used to determine the point-of-gaze of the person being imaged. Two-dimensional eye and face models are used in the eye's location and parameterization. A 3-D eye model is used in the determination of point-of-gaze from the parameterized eye region. These deformable templates provide a priori knowledge of eye regions which allows low level image features such as edges and pixel intensities to be coordinated into the parameterization of a high level object: the eye. The 2-D eye model consists of two concentric elliptical regions bounded by two parabolic regions as shown in Figure 5. The inner elliptical region's shape, the pupil, is parameterized by its minor axis, collinear with the e/ axis, and by its elongation factor along the e2 axis. This elongation factor determines the ellipse's deviation from being a perfect circle. The border of the pupil's elliptical region is defined by its average pixel intensity gradient; the region is parameterized by its average pixel intensity and homogeneity. The outer elliptical region, the iris, is parameterized in exactly the same way. These model parameters and those for the following parameters are defined fully in Chapter 4. 17 Pupil Ellipse (Center, Radius, and Elongation Factor) Upper Eyelid ParaPolic Arc Iridal Ellipse (Center, Radius, ~ and Elongation Factor) Lower Eyelid ParaPolic Arc 2 Lower Eyelid _ Parabolic Arc 1 Division Between Lower Eyelid Parabolas • e Figure 5: The 2-D Eye Model The upper parabolic arc in Figure 5 defines the upper eyelid boundary. It can represent either the border between the eyelid and the eyelash, or the border between the eyelid and the sclera and iris, depending on whether the eyelash is laying upwards or downwards in the current image. This border is defined by a parabolic arc intensity gradient which is maximum along the eyelid/sclera boundary and by coefficients defining and placing it in relation to the iridal center (ellipse representing the iris). The lower lid is defined by two parabolic arcs joined end to end, the boundary between them is defined dynamically. The lower lid parabolic arcs are defined by their parabolic intensity gradients and by their parabola coefficients. Average skin, hair, and sclera intensities are also defined during the parameterization of the eye features. The 2-D face model shown in Figure 6 is used 18 to define probable regions for finding skin and eyebrows. The face model also provides the coarse relative location of the two eyes for the motion segmentation algorithm. Skin Intensity ^ _ i „ e Figure 6: The 2-D Face Model The 2-D eye model was designed with the intent of designing a transform from its parameters to those of a 3-D model. In order to determine the point-of-gaze for a single eyeball, six parameters need to be known: the three dimensional coordinates of the center of the eyeball sphere relative to the imaging camera, and the three dimensional position of the center of the cornea relative to the center of the eyeball sphere. Using these parameters, a ray originating at the center of the eyeball sphere and passing through the center of the cornea can be constructed. This ray determines the current point-pf-gaze for the parameterized eyeball as shown in Figure 19 7. The 2-D eyelid parameters are used to estimate the center of the eyeball sphere and its radius. The 2-D iridal center and the estimated radius of the eyeball sphere are used to calculate the center of the corneal dome and thus the ray which determines the current point-of-gaze. Figure 7: 3-D Eye Model Showing Point-Of-Gaze Calculation Parameters 2.1. Eye Model Orientation The algorithms in the thesis are designed to work with images sampled from a CCD camera. These images have two axes designated naturally by the image sampling grid's orthogonal rows and columns. However, these axes are in most cases 20 not optimal for our eye parameterization and tracking algorithms. These algorithms are designed to work best with axes oriented along the vertical and horizontal axis of a person's face as shown in Figure 8. The angle between these two sets of axes needs to be determined in order to run our algorithms on any eye regions present in the image data. This angle is determined using information gained using the motion segmentation information. The two spatio-temporal Eye-event boxes provide us with the horizontal facial axis running parallel to a line drawn between the centers of the two coarsely parameterized irises. Using this axis and the high probability that the vertical facial axis is not more than 90° from the axis designated by the columns of the frame grabber sampling grid2, we can determine the vertical facial axis. It will be orthogonal to the horizontal axis with its positive direction closest to that of the column axis. The horizontal axis will be termed the e/ axis and the vertical axis the e2 axis. The equations for these axes can be represented by Equation Group 1. el = (cos 0, sin Q) (1) e2 = (-sin 0, cos Q) where 0 is the angle between the horizontal face and the image row axis. 2 This will be true unless the person is standing on their head or the camera is upside down. 21 22 3. E Y E LOCATION ALGORITHMS The eye location algorithms can be divided up into four main sections. These sections are initial spatial motion segmentation, secondary spatial motion segmentation, temporal motion segmentation, and coarse-scale initial iridal/pupillary parameterization. The relationship between these sections is shown in Figure 9. The eye location algorithms utilize both eye region motion characteristics and spatial characteristics to locate the eyes in an image sequence. Motion is utilized to reduce the complexity of the search. The Location algorithms described in this thesis have a successful location rate of greater than 95% 3.1. Eye Region Localization The 2-D eye model described in this thesis is a deformable template. Its parameters can be varied to give a close approximation to the pupil, iris, and eyelid region boundaries for most eye images. One method of fitting this model to a particular image would be to correlate the image with the model in the space of the model. In other words, vary the parameters of the model while monitoring the templates' fit to the actual image. The best fit in terms of an energy function would be considered the correct parameterization of the image features. This method, however, would take an unworkable amount of time to complete. A method of reducing the parameter space of the template is therefore required. 23 Acquire images Initial spatial motion segmentation I Secondary spatial motion segmentation Coarse-scale initial iridal/pupillary parameterization Goto Parameterization algorithms Reset eye-event state machine Figure 9: Flowchart Of Eye Location Algorithms One method of reducing the search space is to isolate local regions around possible eyes in the image. Using this method, not only does the eye template now 24 have to be correlated over a much smaller region than the entire image, but maximum values of the templates' scaling parameters can be found from the size of the eye regions. Motion is used as the initial indication of possible eye locations. By searching for motion conforming to certain criteria, high probability regions of eye location are found. The main goal of the motion segmentation algorithm is to reduce motion which happens over a time sequence of image frames into two localised spatio-temporal motion boxes which satisfy criteria indicating they are likely eye regions. Further algorithms are then run on these regions to better determine whether or not they contain eyes. The motion segmentation takes place in three distinct steps: the initial spatial motion grouping, the secondary spatial motion grouping, and the temporal motion grouping. The motion searched for during these steps is termed an Eye-event in this thesis. The initial spatial grouping is a fast algorithm which quickly reduces the Eye-event parameter search space. The secondary spatial grouping is used to define more accurately possible Eye-events. Both of these searches parameterize spatial Eye-events in a single image frame. They provide no indication however of eye motion over periods of time larger than 1/15 of a second, the time between successive 3 The regions found exist as three dimensional objects. The first two dimensions being the plane of the image, the spatial domain, and the third being the time axis, temporal domain. Thus the term spatio-temporal applies to the motion boxes. 25 frames. Temporal grouping is used to characterize Eye-events occurring over larger time periods. 3.2. Initial Spatial Motion Segmentation An Eye-event is usually a blink or a change in point-of-gaze greater than approximately 15° arc. An Eye-event begins and ends with at least Tneg seconds of non-motion (motion below a set motion threshold.) This threshold of motion is determined by the combination of the type of Eye-events expected and the frame rate of the system. The derivation of this threshold is discussed later in the implementation section of this thesis. The criteria for an Eye-event are as follows: • Between the beginning and ending frames of an Eye-event there must be two main localized motion regions, see Figure 11. • No time-adjacent pair of image frames may contain more motion than that determined to be the maximum amount which could be caused by two eyes barely fitting into the image frame as shown in Figure 10, L max_motion • • An Eye-event must last more than Tmin_event. • The two main motion regions must be separated by at least 2/3 the maximum of the two region widths. • The aspect ratio of each region must be between 1:3 and 3:1. • The motion regions must be smaller than Thei_max along the e\ axis and The2_max along the e2 axis, see Chapter 3, Subsection 5 26 • The motion must occur in two similarly scaled localized regions. • An Eye-event is begun by motion above a specified level, Lsig, only after a specified length of time, Tneg, of negligible motion. Figure 10: Upper Limit Of Allowable Eye Scale The algorithm is designed to work on the motion caused by two eyes independent of scale. These conditions were derived experimentally by analysis of motion caused by both blinking and by typical eye motion. If all these conditions are met, and the coarse iris matching which is performed later on the motion regions returns a good match, then these motion regions become classified as Eye-events and are utilized in subsequent algorithms. Otherwise the search for Eye-events continues. The algorithm consists first of differencing images taken a specified time 27 interval apart. By subtracting each image from its predecessor, we end up with a sequence of differenced images with which to perform the remainder of the motion segmentation algorithm. Each difference image is thresholded to remove pixels below a certain value, Lmot. Pixels below this threshold are replaced with zero valued pixels. This threshold value is determined using the time interval between successive frames and is used to eliminate low frequency motion and noise from the differenced image. In order to detect Eye-events, thresholded difference images are scanned for non-zero pixels, and a fill-search algorithm is performed around each one to determine the size of this region of motion as shown in Figure 11. This is the initial motion segmentation. As moving pixels are found and grouped into regions, their values are set to zero to prevent their inclusion in another region. The fill algorithm is looking for 4-connected regions bounded by a border of zero-valued pixels of at least thickness Thzero-border- The border thickness, Thzero-border, is a dynamic limit based on the size of the region being bounded. It is calculated as one quarter of the maximum length of the current region in the dimension of the current direction of expansion, but has a minimum value of Thmin (See Chapter 7). For example, if the algorithm was finding the border of a region that so far had a maximum horizontal width of 12 pixels, the algorithm would now be looking for a zero-border with a contiguous horizontal width of 3 pixels. 28 Motion Region 0 0 o o 0 0 o o o o o o o o o o o o o o o o o o o 0 o o o o o o • • • O o o o o o o o o o o o o o o o o o o o o o o o 0 o o Image Grid i i i i i i r Motion pixel O Border pixel Non-motion pixel Figure 11: Example Of An Initial Spatial Motion Segmentation Region The region boundary thickness parameter, Thzero_horder, is made dynamic to provide some scale independence to the motion segmentation at this level. Using this algorithm, a larger region is permitted to have internal regions of non-motion which are based on the region's size. The regions are recorded in a table using the column values of the region's leftmost and rightmost points and the row values of the top and bottom rows of the region. These values create an isothetic rectangle representing the region as shown in Figure 11. 29 3.3. Secondary Spatial Motion Segmentation Once the table is completed, the secondary motion segmentation begins to operate on the regions specified in the initial spatial segmentation table. Regions are combined according to a closeness criteria based on their rectangle representation. The closeness criterion states that two rectangles are considered close if the distance between their border along each image axis is smaller than the average of the two rectangle's widths along that same axis. This closeness criteria is shown in Equation Group 2. This equation was determined to be an effective measure of closeness through the analysis of the eye motion of more than 15 test subjects. If max(xl) < min(x2), where x, andx2 are components of regions Xx andX2 along the ex axis. Then Xx andX2 are considered close if (min(x2 )-max(x,)) < X \ R J r X 2 R X\L X2L 7/"max(x,) < min(x2), where X, andx2 are components of regionsXx andX2 along the e2 axis. Then X\IJ~^~X1JI xin x~. Xx andX2 are considered close if (min(x2 )-max(x,)) < In Equation 2, the subscripts L and R refer to left and right along the horizontal image axis e\ and the subscripts U and D refer to up and down along the vertical image axis, e2. If two motion rectangles fall within their calculated closeness boundary then they are combined into a single region as shown in Figure 12. The motion rectangle combination uses the following formula. If rectangle 1, Xj had sides specified by {\n, xm, XJU, xm } and rectangle 2, X2, had sides specified by { x2i, X2R, x2u, X2D } then the combined rectangle, Xnew, would have side specified by { XNL, XNR, 30 XNU, *ND } and given by Equation Group 3. xNL =min(xu,x2L), XNR = m a X ( X l / ? ^ 2 « ) » (3) XNU = min ( x 1 [ 7 ,x2U), XND ~n^ax(xID,x2D) The initial and secondary spatial motion segmentation is performed on each difference image in the Eye-event. An example of this is shown in Figure 12. Following this segmentation, temporal grouping is used to group motion regions occurring at different time axis locations during the Eye-event. ; X, x. Secondary Segmentation Region, XN, (Regions X,, X;, and X, Meet The Closeness Criteria) Initial Segmentation Region, X, C D Initial Segmentation Region, X, Initial Segmentation Region, X, Isolated Motion Regions (Not Close To Any Other Motion Regions) 1 „ x. X. X. X, Secondary Segmentation^ Region, XB, (Regions X,, X,, Xs, and X, Meet The Closeness Criteria) Figure 12: Example Of Secondary Spatial Motion Segmentation 31 3.4. Temporal Motion Segmentation Temporal motion segmentation uses a closeness criteria and grouping equations similar to that used for spatial motion segmentation to create spatio-temporal boxes representing localized regions of motion. The image plane width and height of the box are derived with the spatial grouping, while the time axis depth of the box is the number of frames over which the event takes place. The volume of the box is calculated by summing the motion region areas used in the temporal grouping of each box. Thus the volume of each box is a measure of the combined areas used to create that box. The total volume of each box is then used to help determine which motion regions are most likely to be the eyes. Once an Eye-event has ended, the two most likely, largest, possible Eye-event regions are selected for further analysis. The regions must conform to the previously discussed restrictions placed on their relative positioning and aspect ratios. If they do, a coarse iridal search as described later is performed on each region. This search will determine if there is a likely iridal region present in each event box. Upon the successful completion of the iridal search, these regions are considered Eye-events and they are used for further algorithmic analysis. If the regions do not conform to the restrictions placed on them, or if the iridal searches indicate that an iris is not likely within one or both of these regions, then the segmentation algorithm resets and begins searching for new Eye-event possibilities. Figure 13 shows Eye-event regions, at different points within the scale search space, determined by the Motion Segmentation algorithm to have a high likelihood of containing iridal regions. 32 20 40 60 80 100 120 Figure 13: Examples Of Eye-Event Regions At Points In The Scale Search Space 3.5. Motion Region Limits In order to define motion region area bounds, and maximum and minimum median intensities for regions within these bounds, for motion regions caused by eye motions, it is necessary to parameterize possible eye motions. This parameterization will also be used to help determine maximum search regions used during the eye tracking algorithms described in Chapter 5. 33 In order to accomplish this parameterization, we need to look at both eyelid and eyeball motion. The spatio-temporal motion regions expected from each will be parameterized and used to bound the motion segmentation algorithm. In order to properly analyse the eyeball motion results, we will first take a brief look at the average relative size of parts of the eyeball. The eye of the adult human has been found [24] to have an average scleral sphere radius, rsciera, of 12mm, a corneal radius, rcorneal, of 8mm, and a distance between these spherical centers, dseparati0„, of 5mm, as shown in Figure 14. From these numbers one can calculate the size of the two dimensional iridal radius as viewed along a ray passing through both the corneal and scleral spherical centers. As shown in Figure 14, the scleral radius, the corneal radius, and the separation between the two spherical centers form a triangle from which the cosine triangle identity4 can be used to find the angle 0. 180-0 then becomes an angle in a right-triangle in which the hypotenuse is rcorneai and the opposite side is the iridal radius, riridai- This radius is calculated to be 5.8mm. We now therefore have a relative measure of the scleral radius based on the iridal radius. Furthermore, as we have found through the analysis of more than 15 test subjects, the eyelid corner-to-corner width across the eye region along the e/, axis to be an average of between 4.5 to 5.5 times the iridal radius, the corner to corner distance also becomes a fairly accurate measure to calculate the scleral radius. The size of the 4 B 2 = A 2 + C 2 -2ACcosG, where the sides A and C meet at an angle of 0, and B is the side opposite 0. 34 iridal radius relative to the scleral sphere radius will now be used in the determination of the maximum motion region which can be caused by eye motion in a single adjacent frame difference image. Apparent iridal radius ~* Figure 14: Calculating The Iridal Radius Using Average Eyeball Dimensions There are three main classifications of eyeball movement: saccadic, micro-saccadic, and smooth pursuit eye motions. We are concerned only with saccadic motions in the context of this thesis. Micro-saccadic motions are an order of magnitude smaller than saccadic motions and produce motion results on the order of Extrapolated coronal sphere Scleral sphere 35 the system noise floor at most eye scales. Smooth pursuit motions, caused by the eye tracking a moving object (or a stable object tracked from a moving reference), can reach velocities up to 90°/sec [25], but at these velocities, saccadic motion is also occurring to keep the eye accurately fixed on the target. Since saccadic motion can reach peak velocities of up to 500°/sec, its effect overshadows that of smooth pursuit motion and provides the upper bound that we require. Saccadic motion is characterized by four parameters: peak velocity, mean velocity, duration, and angle of rotation. Larger saccadic motions have increasingly larger peak velocities up to about 30° saccades, at which point peak velocity saturates at around 500°/sec [25]. It will take an average of 200ms for a large saccadic motion of 60° to complete. It will have a mean velocity of 3007sec and a peak velocity of 500°/sec. The velocity contour of this motion indicates that less than one third of the event time occurs at a velocity above 300°/sec. Taking one third as the maximum time the eye spends above mean velocity, and using the peak velocity as the velocity of the eye during this period, we arrive at the a conservative upper bound on the maximum rotational distance that the eye can move during one third of a large saccade, as shown in Equation 4. 1 dee - • 200 ns • 5 0 0 — = 33.3 deg (4) 3 s The eye therefore, at its fastest, will move less than 34° during one third of a large saccade, 66.7ms. The frame rate of our Motion Segmentation algorithm (15fps as discussed in Chapter 3, Subsection 4) also has a period of 66.7ms. This motion 36 could therefore occur between consecutive frames, becoming a single difference image. The motion region we could expect from a large saccade is shown in Figure 15. The effect of this motion on the difference image would be greatest if the motion is centered about a ray coaxial with the imaging camera, as the motion in this case would have the greatest component of motion in the imaging plane. Figure 15: Motion Region Caused By Eyeball Rotation Because of the way our motion segmentation algorithm parameterizes motion regions, we are concerned with the isothetic bounding rectangle of the difference region. Additionally, as the corneal radius will also cause motion in the eyelid region close to the iris, our expansion of this motion region to a bounding rectangle 37 encompassing the iridal difference region increases the likelihood that we are including the entire motion region when calculating our maximum bound. The largest eye scale we allow with our system occurs when both eyes are as large as they can be while remaining completely visible in the image. This situation is shown in Figure 10. The distance across the bridge of the nose between the two inside eye corners should be approximately equal to the distance across each eye [24]. At the maximum scale allowed by our eye model, the width of each eye takes up 32.4% of the image frame along the e/ axis, corner-to-corner. Assuming that corner-to-corner across each eye shows almost the entire scleral spherical diameter, the radius of the iris in the maximum case can be calculated. Based on a 12:5.8 average ratio between the scleral radius and the iridal radius, the iridal radius should be 8.0% of the image frame along the ei axis. With a rotation of 17° on either side of the image normal, the elliptical forshortening, the reduction of the width of the ellipse along the ei axis, is 4.4% for each iris, giving each iris an apparent width of 7.6% along the et axis while maintaining 8.0% along the e2 axis. From Figure 16, one can see that the intersection point of the two iridal regions is at a horizontal distance from their center calculated by the sine of 17° multiplied by the scleral radius. The iridal regions therefore intersect at 12.0% along the ei axis from their outer edge. The maximum motion region caused by the motion of the iris should therefore have a bounding rectangle of 16.0% along the e2 axis by 24.9% along the e/ axis and encompassing 4.0% of the total image area. 38 before saccade 1 — non-motion after saccade region Figure 16: The Maximum Iridal Motion Region Of A Single Difference Image The maximum motion region which can be caused by a blinking motion will now be examined. A blink lasts an average of 350ms [25], which at our motion segmentation frame rate, would occur over a period of five frame intervals. Assuming for our purposes that the eyelid speed is approximately constant throughout the blink, dividing the blink up into frames we can see the extent of a single difference images motion region as shown in Figure 17. The situation however is complicated by the fact that large eyelashes will tend to enlarge this motion region, providing a larger moving high-contrast region during a blink event than that 39 provided by the eye-lid alone. Additionally, depending on the force applied during a blink, surrounding facial regions may be involved. The surrounding facial regions should have low contrast and not contribute much to the difference region. The bounding rectangle of this motion region therefore should be the same as our maximum allowable eye size, plus a size allowance made for the eyelash region. We will use 3:1 as our worst case upper lid parabolic aspect ratio, see Chapter 4, Subsection 9.3, and 3:2/3 as our worst case lower lid aspect ratio. By adding an extra 3:1/3 for the upper eyelash region, we estimate our worst case bounding rectangle to be 3:2. This bounding rectangle therefore covers a region measuring 32.4% of the image along the e/ axis by 21.6% of the image along the e2 axis. Since these values form a bounding rectangle which encompasses that formed by the values derived for the iridal motion, the eyelid motion values are used as the motion region threshold values Thei_max along the ei axis and The2_max along the e2 axis. e 2 Figure 17: Motion Region Caused By Eyelid Motion Because much of the space near the center of the eyelid motion bounding rectangle is not motion, it is useful to calculate the actual motion within this region as 40 an indication of the motion density of the region. The area of motion for the two parabolic lids is two thirds the area difference between the bounding rectangle for the initial lid position and the lid position a frame later5. Because one fifth of a blink can be assumed to occur between image frames, the lid will have moved one fifth of the size of the bounding rectangle along the e2 axis in this time. In order to be conservative, we will double this value to two fifths and assume the motion along the arc to be a constant equal to its maximum thickness. Since the length of the parabolic eyelid arc is bounded by the width of the bounding rectangle along the ej axis plus twice the height of the bounding rectangle along the axis, the area of motion within the bounding rectangle for the eyelid blink can be calculated using Equation 5. 2 Eyelid Motion Area = - HeightBomding ( WidthBounding + 2 HeightBomding ) (5) The maximum eyelid motion area can therefore be evaluated as 3.2% of the total image area. Since saccadic motion may occur concurrently with eyelid motion, one can combine this bound with the maximum motion area for the iridal motion, arriving at a maximum motion bound of 7.2% of the total image area. Since the highest contrast that can be present in an image is a full-scale sweep, a bound can now be placed on the maximum motion in a single Eye-event frame, Lmax_motion. 5 The area under a parabolic region can be shown to be two-thirds the area of the bounding rectangle enclosing that parabola. 41 4. E Y E PARAMETERIZATION ALGORITHMS Algorithms have been designed which, given the Eye-event regions found using the motion segmentation algorithms, choose the 'best' regions and parameterize the eye within that region according to the parameters of Figure 5. The algorithms have been designed heuristically to work well with the diversity of eye regions expected. The parameterization algorithms can be divided up as shown in Figure 18. The Coarse-scale Iris and Pupil Parameterization section is used to search a fairly large search space and roughly parameterize either the iris or the pupil, depending on which is more prominent in the chosen Eye-event region. The Fine-scale Iris and Pupil Parameterization section is used to search a small search space around the parameters found by the Coarse-scale algorithm in order to more accurately parameterize both the pupil and the iris of the chosen Eye-event. The sclera parameterization algorithm finds an average value for the sclera of the eye being parameterized. It also chooses a good location within the scleral region to begin searching along the e2 axis for other eye and facial features. The hair and skin intensity estimation routines are used to calibrate the eyelid templates to the Eye-event being parameterized. The upper and lower eyelid models, semi-parabolic arcs, are used to coarsely parameterize one half of both the upper and lower eyelids. Finally, the full-parabolic parameterization algorithms are used to accurately parameterize both the upper and lower eyelids present in the Eye-event region 42 From Motion Segmentation algorithm fr Coarse-scale iris and pupil parameterization Fine-scale iris and pupil parameterization I Sclera parameterization I Hair and skin intensity estimation I Upper and lower eyelid, semi-parabolic parameterization Upper and lower eyelid, full-parabolic parameterization Goodness-of-fit calculation Figure 18: Flowchart Of Parameterization Algorithms The goodness-of-fit calculation is then evaluated on the eye parameterization vector with the goal of determining whether the eye templates have been matched with the 43 image data well enough to be tracked by the Tracking algorithm or whether the system should return to the Motion Segmentation algorithm to search for another Eye-event region. The Parameterization algorithms described in this thesis have a successful parameterization accuracy of greater than 90%. The eye parameterization algorithms use a priori knowledge of eye shape and relative feature intensity to guide the deformable template. The 2-D model contains parametric limits, beyond which a feature match becomes improbable. For example, the intensities acceptable for an iris or sclera are not predetermined, however a scleral region will not be searched for at intensity levels lower than that of the previously determined iris. Also, the width from eye-corner to eye-corner is not pre-determined, but will not be searched for at scales larger than seven times the previously found iridal radius. Additionally, the parabolae used to parameterize the upper and lower eyelids, will not be searched for in regions of the parameter space where the width to height aspect ratios of these parabolae are less than 3:1 or greater than 7:1 (see Figure 19). 7:1 Figure 19: Minimum And Maximim Allowable Eyelid Aspect Ratios 44 The eye parameterization algorithms will be performed on the image shown in Figure 20. Figure 20: Image To Be Parameterized 4.1. Filter Correlation In this thesis, when we use the word correlation to describe the interaction of a filter with the current image under analysis, we mean the following. The filter is superimposed on the image at a starting image location and is moved through a region of its parameter space, interacting with the image at specified intervals within this space. The correlation space that the filter must be moved through includes not only image coordinates but also any filter scaling parameters that are being changed 45 to parameterize a particular feature. The level of parameterization currently being performed determines the step size with which each parameter moves through the correlation space. If a coarse parameterization is being performed then the correlation space may be large, but the step size will correspondingly be set large. If a fine parameterization is being performed, then the correlation space should be small (the coarse search having been completed) and the individual parameter step sizes will also be set small, allowing for a very accurate fit. At each step in the correlation, the filter interacts with the image to produce a goodness-of-fit parameter, a "correlation", for that particular point in the correlation space. The interaction between each filter region and the image produces parameters for that region, and the interaction of these region parameters produce the goodness-of-fit parameter. The correlation ends when the correlation space has been traversed, and the solution to the correlation is the point in the correlation space with the best goodness-of-fit parameter. 4.2. Pupil And Iris Filter The main goal of this algorithm is to parameterize the scleral/iridal and the iridal/pupillary boundaries. This is done using a filter designed with a priori knowledge about the expected shape and the relative size and intensity of these regions. The deformable template filter, PI(x,y,r,el,t), has defined regions which interact with image areas in and around the boundaries being parameterized. The 46 template, shown in Figure 21, is placed on the image being parameterized and is correlated with the image in the Eye-event regions. The deformation of the template is performed using the template center coordinates xc and yc, the template radius, r, and the template elongation parameter, el, defined as the ratio between the template's height along the ej axis and the width along the axis. The pupillary/iridal template also has a thickness parameter defining the distance between the inner and outer filter regions, tpupii/iriS. This parameter is varied between searches of differing granularity and not during a single filter correlation. The correlation of the pupillary/iridal template can be described by Equation 6. C(x,y,r,el)= I(x,y) ® PI (x,y,r,el,t) (6) x<y£Se>e-e>enl x>y^JllKr <r^S pupil / irisradius • e ' s Spupi l I imehngatum • '-^inil We know from our 2-D eye model that when searching for a pupil or iris that we want a region which is nearly circular, an ellipse with usually not more than two times elongation along the major axis based on the possible gaze angle being imaged. However, we do not know the scale of the required feature. The correlation space includes not only the center point of the filter in image space, but also requires the searching of the scaling parameter space defined using the filter radius, r, and elongation, el, determining the size of our template regions. The thickness parameter, t, is held constant during the correlation at an initial thickness, Thinu. The correlation search therefore has four degrees of freedom: x, y, r, el. Al Figure 21: Iridal/Pupillary Deformable Template Filter Our pupil/iris filter contains nine distinct regions. The four inner regions, Ri-R4, are matched with corresponding outer regions Rs-Rs (e.g. R3 with R7, R4 with R$ etc.). Each of these regions is used to calculate the a-trimmed6 [23] average intensity of the region's corresponding image area. The average intensity difference between an inner section and its corresponding outer section is used as an indication of the edge intensity between those regions. The subsectioning of the inner and outer elliptical regions allows emphasis of different feature areas during different parts of 6 The process of a-trimming is the removal of x% of the highest and lowest percentile intensity values. 48 the pupil/iris parameterization process. For example, during the initial iris matching process, the effect of the R3/R7 and R4/R8 regions would be de-emphasized while the effect of the R1/R5 and R2/R6 regions would be emphasized. This is due to the possible eyelid occlusion of the iridal/scleral boundary in the upper and lower regions, see Figure 22. Figure 22: Images Showing Lash Down (Left) And Lash Up (Right) Condition The Ro region is used to calculate two pupil/iris goodness-of-fit indicators. The first is the a-trimmed average image intensity over the Ro region. This helps to ensure that the circular edges detected by the other regions are surrounding a region with a certain overall intensity. The a-trimming is useful to eliminate the effects of bright light-reflection spots on the iris and pupil. A good fit for this region would be centered over either the pupil or the iris. The second goodness-of-fit indicator calculated in the Ro region is a measure of the intensity homogeneity within the region, Ro_homo- The region is divided up into concentric circular sub-regions as shown in Figure 23. The a-trimmed average 49 intensity in each sub-region is calculated. The difference between the average intensities for all neighboring sub-regions is then calculated and the average difference is used as a homogeneity indication for the Ro region, Ro_homo- This indicator is useful in differentiating between the solid dark intensity of a pupil, and the diverse marbled intensity of an iris. The iris/pupil parameterization filtering operation can be described by Equation Group 7. Iris/Pupil = max(I(x, y) ® Firis/pupu(x', y', r, el)) (7) = max( CQRO + c9R0_hOmo + ctRi + C2R2 + C3R3 + C4R4 + c5R5 + C6R6 + c7R7 + c8Rs) Where (x, y) are the coordinates in the image data, and (x', y') are the coordinates relative to the filter center (xc, yc): x' = x — xc and y' =y—yc- The minor axis of the inner elliptical region is r. The cx constants determine the weighting and sign of each region's contribution to the operation sum. 50 Figure 23: RO Subregion Definition 4.3. Coarse-scale Iris And Pupil Parameterization The goal of this algorithm is to find an initial coarse parameterization for either the pupillary/iridal or iridal/scleral boundaries. It will later be determined whether the edge found was the former or the latter. The algorithm consists of correlating the pupil/iris filter over image regions defined by the two Eye-events found during the eye location algorithms. The filter is applied in two stages at each location within these regions. The first application consists of calculating the a-trimmed average of the data in the image region corresponding to the current location of the Ro filter region. If this average is below the maximum allowable pupil 51 intensity, LpupiLmax, then it is considered that the filter is centered on a possible pupil region, and the second filter application is performed. The second application consists of varying the filter's radial scaling parameter, r, and calculating the a-trimmed average intensities of the image data within the R 0 , Ri, R 2 , R5, and R<5 regions at each r value. The intensity values are evaluated at each point in the filter search space using the filter sum described by Equation 8. Pupil/Iris = max{co(Ro_max- Ro) + ciR5 - C/Ri + ciRt - C1R2} (8) where Rn is the corresponding pupil/iris filter region intensity, and cm is the weight applied to each average. The value Ro_max is the maximum possible region average (based on the pixel intensity quantization level); it is used to increase the sum contribution from the Ro region for lower intensity values, helping to center the filter on a dark region (pupil). The initial search has three degrees of freedom: the xc and yc center coordinates of the filter and the filter's radius along its minor axis. The search is described by Equation Group 9. ^pupa/iris, £2 axis elongation factor, - 1, tpupiuiris, edge thickness parameter, = tinuiai, (9) tinitiai, > 5 pixels rpupu/iris, ei axis scaling parameter, rmin < rpupi\nriS < rmax, traversed at increments of rcoarse The thickness, Unitiai, defines a fairly wide edge (greater than 5 pixels at the 52 sampled image level) to allow the elongation factor to remain at one (circular filter) and still detect the pupillary/iridal or iridal/scleral boundaries when the eye is looking to the left or right of a normal to the image plane (elliptical eye boundaries), as shown in Figure 24. The limits of the ei axis scaling parameter are based on the minimum and maximum allowable eye scales discussed in the Motion Limits section. Left-gaze C o a r s e Parameter izat ion Initial Right-edge Iridal Parameter izat ions Center -gaze C o a r s e Parameter izat ion Initial Right-edge Iridal Parameter izat ions Right -gaze Coa rse Parameter izat ion Initial Right-edge Iridal Parameter izat ions Initial Left-edge Iridal Parameter izat ions Initial Left-edge Iridal Parameter izat ions Initial Left-edge Iridal Parameter izat ions Figure 24: Iridal Boundary Parameterization For Different Gaze Angles The maximum value of the filter sum is used to determine the best pupil/iris parameterization within each Eye-event; the best overall match is chosen to be the eye-region used in all subsequent algorithms. The results of this initial parameterization and sum evaluation are shown overlaid on the image in Figure 25. 53 50 100 150 200 Figure 25: Initial Iris/Pupil Search Result 4.4. Pupillary And Iridal Region Differentiation The initial parameterization of the iridal and pupilary boundaries is performed with the intent of finding either boundary and then finding the other based on the parameterization result. This initial application of the iridal/pupillary filter is therefore done with the R 3 , R 4 , R 7 , and Rs regions ignored, thus only searching for elliptical boundaries oriented along the e2 axis. This filter will therefore provide good detection for both the iridal/scleral and pupillary/iridal boundaries. The 5 4 boundary of the iris is generally occluded in the upper and lower areas due to the eyelids and eyelashes. If the R 3 , R 4 , R 7 , and Rg regions were used in this search, there would be a good chance that they would erroneously be attracted to the eyelid/iridal or eyelash/iridal boundary. This initial calculation makes use of the Ro region for average intensity but does not perform the homogeneity calculation on it. Once the initial search is completed and an edge region has been selected, the search for a second edge begins. It has not yet been determined whether the edge selected is the scleral/iridal or iridal/pupillary boundary, therefore the parametric search region for the second edge must include elliptical region radii both less than and greater than that of the initial parameterized edge, rinmai. A small positional search space is defined at the center of the first parameterization as shown in Figure Figure 26: Center Search Region for Iridal/Pupillary Border Differentiation The search is performed differently for radii less than and greater than rinitiai-When searching at smaller radii, it is assumed that the boundary being sought is the 26. Iridal/pupillary center search region 55 iridal/pupillary border. The filter coefficients are adjusted to utilize the entire elliptical region and the homogeneity calculation is performed in region Ro. When searching at radii greater than r,„,„a/, it is assumed that the boundary being sought is the iridal/scleral boundary. Within this area of the search space, only the lateral portions of the elliptical regions are utilized and the homogeneity calculation is not performed. 40 50 h 60 70 VoO 110 120 130 140 150 160 170 180 190 200 Figure 27: Choices For Iridal/Scleral And Pupillary/iridal Boundaries The best edges at both a radii larger and a radii smaller than rinitiai are parameterized; these two edges are used in conjunction with the first edge to determine the most likely candidates for the iridal/scleral and the iridal/pupillary 56 boundaries (see Figure 27). In order to make this determination properly, each parameterized edge is filtered with the entire iridal/pupillary filter: all regions of the filter including the homogeneity calculation. Upon completion, the decision is based on the criteria described by Equation Group 10. pupil = max {FiridaUpupiUary (all regions)} (10) iris = max {Firidai/pupiUaiyt (regions R0, R3, R4, R7, and R8)} Additionally, the pupil region should be inside the iridal region, and should have a homogeneity lower than that of the iridal region. 4.5. Fine-Scale Iridal Parameterization Once the coarse pupil/iris filtering results have been used to select an Eye-event region and to coarsely parameterize both the iridal/scleral and pupillary/iridal boundaries, the fine parameterization of the iridal region is started. This parameterization consists of a pupil/iris filter correlation with four degrees of freedom: the xc and yc location of the filter center relative to the image, the radius of the filter's ei axis, rpupn/iris, and the e2 axis elongation factor, . The pupil/iris filter regions used are the same as in the coarse search, but the weighting of the center region is reduced to place more emphasis on the boundary fit. The fine-scale parameter search space is centered on the iridal parameters from the coarse search: Ximtiai, yinitiai, and rinniai. The edge thickness parameter is defined as r,„,7, a//i, to provide a smaller fitting region than that used in the initial search. The search region is bounded and sampled as described by Equation Group 11. 57 (xc, yc) = (xtnitiai, y^01) ± rinMJ2, traversed at increments of rinitiai/10 fpupil/iris - f'initial ± 20% traversed at increments of 5% (11) elpupii/iris = 1 £ elpupu/iris < 1.5 traversed at increments of 0.1 tpupu/iris, edge thickness parameter, = rinitiai/5 where r,„,„a/ is the filter's ei axis radius, (x,„,y,a/> yinitial) are the iridal center coordinates found during the previous section's initial search, tpupiiAns is the edge thickness parameter defining the distance between the inner and outer region of the filter Figure 28: Error In Iridal Edge Parametrization With elpupmris = 1 58 Because the filter correlation done using only the xc, yc, and rpupu/iris (elongation held at one) will in most cases center the filter as shown in Figure 28, the e\ axis radius will be near its correct value with error along both axis as shown in figure. The elongation parameter can therefore be varied as an independent step following a correlation varying the other three parameters, effectively reducing the search space. Following the elongation parameter scaling, a very fine-scale correlation is performed centered around the results of this last search. The search is done using an edge thickness parameter of riridai/5, varying the filter parameters as described by Equation Group 12. rpupa/Ms ~ rinitial ± 10% traversed at increments of 2% (12) (xc, yc) = (Xiridai, >'iridal) ± riridai/10, traversed at increments of riridai/20 elpupu/iris = el iridal ± 10% traversed at increments of 2% tpupii/iris, edge thickness parameter, = rinitiai/5 where rmdai, x^i, yiridai, and elindai are the parameterization results of the previous search. tpupnf„\s is the edge thickness parameter defining the distance between the inner and outer region of the filter. This final search is done to ensure a 'snug' fit on the iridal/scleral boundary. The results of this parameterization are shown in Figure 29. 59 60 190 200 Figure 29: Very Fine-Scale Iridal Parametrization 4.6. Edge Detection and Regions Although the goal of the pupil/iris filter is to detect the boundary edges of the scleral/iridal and the iridal/pupillary boundaries, the pupil/iris filter does not perform edge detection using the classic V 2 G, Laplacian of Gaussian operator. Instead it uses a method which locates an accurate solution much faster. If the V G method were used, it would generate a dense clutter of edges in the eye region (see Figure 2 and 3 in Chapter 1) or with a coarse edge map with many subtle, yet necessary, edge details 60 lost (lower eyelid region for example), depending on the smoothing scale7 chosen. Instead, we search for regions where the difference of their average intensities has changed closest to that expected by our 2-D eye model. The area between these regions is defined as an edge. The distance between the two regions starts out large for coarse searching over large image regions (3 to 4 times the inter-region distance used in the fine parameter searching), and becomes smaller as a more accurate fit is required. When the distance between the regions is large, a less dense search grid is required to find regions matching a particular shape specification. The search grid density is appropriately increased as the inter-region distance is decreased. Thus, this model is efficient for large searches but still effective for accurate fits. For example, the search for the left side of the iridal/scleral boundary may start out with a search region and grid density as shown in Figure 30. The search region during the fine matching stages may look more like that shown in Figure 31. 7 A a of between 1 and 2 is required for the edges that we are parameterizing. 61 Iridal Filter / ' Expanded Search Region Figure 30: Coarse-scale Iridal Filter Center Search Region Iridal Filter Figure 31: Fine-scale Iridal Filter Center Search Region 4.7. Scleral Tone Parameterization The following algorithm is used to determine a likely average intensity for the scleral region in the image being parameterized. This intensity will be used to 62 initially calibrate parts of the deformable template to the current person in front of the camera and the current lighting conditions. It will help to provide a level of independence to the transformation from real world brightness to image intensity. The previously determined iridal boundary and center are used as reference points in this algorithm. Four regions, RSi, RS2, RS3, and RS4, surrounding the iris are defined as shown in Figure 32. The average intensity within each region is determined and region with the highest average is selected as the best choice for a scleral region. The darkness variance in the other scleral regions are most likely caused by the region in question crossing a lash or lid boundary, assuming that the skin and hair of most people have an average intensity lower than that of their scleral regions. The brightest region, therefore, has a good chance of being only sclera. By having the four regions distributed as they are, the most likely sclera can be chosen regardless of eye-gaze orientation and eyelid position. 63 RS4 e, Figure 32: Scleral Search Regions If either RSI and RS2 or RS3 and RS4 are within 10% of one another in average intensity, then a new region consisting of the combination the the two close regions is selected as the most likely scleral area. The scleral tone is defined as the average of this new region. By using an averaged region instead of a single point to define the scleral intensity, the susceptibility of the algorithm to noise in the image data or to a local reflective anomaly on the eye is reduced. The scleral parameterization shown in Figure 33 indicates with dots centered within each region that both RSI and RS2 have average intensities within 10% of each other. The average of the combined region is therefore defined as the scleral tone. 64 100 110 120 130 140 150 160 170 180 190 200 Figure 33: Scleral Region Selection 4.8. Skin And Hair Intensity Estimation The goal of this algorithm is to estimate good values for the current subject's average skin and hair intensity. Subsequent algorithms in distinguishing eyelid and eyelash regions from the scleral and iridal areas use these intensities. The algorithm uses the previously calculated iridal center as a reference point to designate regions of high probability for skin and hair based on our 2-D face model. The regions are scaled according to this iridal radius; this algorithm therefore runs independently of the size of the face in the image data. The size of the face in the image data is directly related to the closeness of the person to the lens. The search regions defined 65 by this algorithm will therefore be larger for a close face than for a distant face. Although sampled at a fixed scale by the imaging camera, the regions are sub-sampled at a variable rate to allow the algorithm to run to completion in a fixed amount of time. •|4Q! i i i i i i i i 1 1 100 110 120 130 140 150 160 170 180 190 200 Figure 34: Skin And EyeBrow Search Regions The regions are defined as shown in Figure 34. Each region is oriented along the ei and e2 axis and is subdivided into smaller regions. Skin is defined to be the average of the highest average intensity within the upper and lower regions. The a-trimmed average intensity within each of these subregions is calculated and the highest subregion averages within the upper and lower regions are averaged to 66 become the skin intensity parameter, Thsid„. In order to determine the intensity value for hair on the face of the current subject, the eyebrows are used as a sample region. The eyebrows are assumed to be the same intensity as the eyelashes, which is accurate enough for the purposes of subsequent algorithms. The previously estimated average skin intensity is used as a reference level for this algorithm. Hair on most people's faces has been found through a sampling of more than 15 subjects to be generally more than 30% darker in intensity than the surrounding skin intensity. A value of 70% of the skin intensity is therefore used as a threshold in locating hair regions. A region above the eye is defined as shown in Figure 34, and this region is further subdivided into thin rectangular, horizontally oriented subregions. The a-trimmed average intensity within each subregion is calculated and the largest contiguous group of subregions with intensities lower than the threshold is considered to be the eyebrow. The a-trimmed average intensities for the subregions within the eyebrow search region are evaluated as follows: 1) Calculate the a-trimmed average intensity within each subregion. 2) The largest contiguous group of regions with average intensities lower than the hair threshold, Thhair, is marked as a possible eyebrow region. 3) It the width of this region along the e\ axis is less than 0.5 x riridai then raise Thhair by 10%. 4) If Thhair - Thsian < 0.05 * Thskin then abort algorithm. Eyebrow region is 67 not parameterizable by our algorithm. 5) Return to step 2 and repeat search. If a suitable eyebrow region is found, then the algorithm finds the darkest localized area within this region. Correlating a small unit filter over the eyebrow region does this. The lowest correlation value is considered the hair intensity value. The hair value is calculated in this manner because the brighter regions can usually be attributed to skin reflecting light between the eyebrow hairs; a localized region is used to minimize this effect. If an eyebrow region cannot be located by our algorithm then a default average value for hair is substituted, the intensity value for medium brown hair under our lighting conditions. Figure 35 shows the a-trimmed average intensity of the subregion surrounding the eyebrow region. The plateau on the left side of the figure is the forehead of the subject. This middle valley is the eyebrow, and the right-side peak is the upper eyelid. 68 Image Intensity Cross-Section Across Eyebrow/Skin Boundaries 2501 i 1 1 1 1 1 1 1 ) I I I I I I I i u I 0 5 10 15 20 25 30 35 40 45 Image Rows Figure 35: Skin/Hair Boundaries Of An Eyebrow 4.9. Upper And Lower Eyelid Semi-parabolic Parameterization The following group of algorithms are used to fit deformable eyelid templates to the upper and lower eyelids found near the Eye-event region. The scleral region center found in Chapter 4, Subsection 7 is used to center a search region above and below the sclera along the e2 axis. These regions will be used to search for possible upper and lower eyelid edges, Chapter 4, Subsection 9.1. Following the choosing of likely upper and lower eyelid edge locations, the slope of these edges is calculated, 69 Chapter 4, Subsection 9.2, and used to coarsely fit a semi-parabolic arc to each lid, Chapter 4, Subsection 9.3 and Subsection 11. The semi-parabolic arc is used to allow a larger search space to be traversed in a shorter amount of time and with a lower likelihood of a false match. A full parabolic arc is then fit to the most likely upper and lower eyelid features, Chapter 4, Subsection 10 and Chapter 4, Subsection 11, and a goodness-of-fit calculation is performed to see if a good template fit has been achieved, Chapter 4, Subsection 12. 4.9.1. Eyelid Edge Locations This algorithm is designed to find edge locations that may belong to the upper and lower eyelids of the chosen eye of the current subject. Two regions Ej and E2 are defined, centered on the brightest scleral location (small region) selected from the scleral region defined during the scleral intensity algorithm. The equal sized E/ and E2 regions are shown in Figure 36. The regions are oriented along the e2 axis; E\ is intended to intersect the upper eyelid region while E2 is intended to intersect the lower eyelid region. These regions are scaled using the iridal radius, rindai, found in Chapter 4, Subsection 5. This dynamic scaling allows the algorithm to adapt to subject currently being imaged. 70 Upper Eyelid Search Region, E1 a-trimmed And Zero Crossing Detected Subregions (x..,.,.„y.. ) Lower Eyelid / Detected Edge Lower Eyelid Search Region, E2 a-trimmed And Zero Crossing Detected Subregion Subdivided Lower Eyelid Search Region, E2 Figure 36: Eyelid Boundary Search Regions The eyelid search regions will be used to locate edges which have a high likelihood of belonging to either the upper or lower eyelids. In order to accomplish this, the algorithm prepares these regions for edge detection by correlating each region with a 7 x 7 Laplacian of Gaussian filter [26], producing a new region SI(x,y) as described in Equation Group 13. SI(x,y)= I(x,y) ® V2G(x,y) h Spine (13) V2G(x,y)--1 2a • .2 , 2 4/1 x +y \ l a After having been filtered, these new regions are then further processed using zero-crossing detection. Zero-crossings, regions where the sign of the filtered values changes, are searched for only in the e2 direction. This is because the scleral point 71 selected for the region center should force the region to intersect the eyelids at a point where their edges are oriented mostly along the el axis. By ignoring zero-crossings in the el direction, we are therefore eliminating edges that have a very low likelihood of belonging to the upper or lower eyelids. Although a small component of lid edge may be in the el direction, the edge will not be missed since the majority of it is in the e2 direction. Each zero-crossing found is replaced with the value of the crossing gradient. A l l other locations are replaced with zero values. The filtered and zero-crossing detected regions, E l and E2 are then subdivided into smaller regions scaled using the previously calculated iridal radius, riridai, as shown in Figure 37. These subregions can now be used to locate the possible upper and lower eyelid positions required by the eyelid parameterization algorithms. E E, Subdivided Region Of E1 And E2 Figure 37: Subregion Scalings Of The Et And E2 Eyelid Search Regions Each subregion is summed and its average intensity value calculated. These values become a measure of the edge intensity within each subregion. Because the eyelid edges for which we are searching are among the most prominent in the regions, we can reduce the possible choices (eliminate some minor edges and noise effects) by 72 thresholding the average edge values according to Equation Group 14. abs(Edge(x,y)) >0.3 max(abs(Edge(x,y))), Edge(x.y) = Edge(x,y) abs(Edge(x,y)) < 0.3 max(abs(Edge(x,y))), Edge(x,y) = 0 (14) x,y e Su u SL2 After this process, only subregions with non-zero average values will be considered as possible eyelids. The next step is to determine average intensities in the areas adjacent to the edge subregions. The a-trimmed average intensity is calculated for each region and combined with the edge locations to form two tables: the upper eyelid possibility table and the lower eyelid possibility table. Each table contains the center coordinates and intensity of each edge region, as well as the average intensity of local regions immediately above and below (along the e2 axis) the edge. This information will be used in subsequent algorithms to determine the most likely eyelid locations. The locations in the upper and lower eyelid possibility tables for our test image are shown overlaid in Figure 38. 73 Figure 38: Possible Upper And Lower Eyelid Edge Choices 4.9.2. Initial Eyelid Slope Calculations This algorithm calculates the slope of edge locations found using the edge location algorithms described in Chapter 4, Subsection 9.1. for both the upper and lower eyelid possibility tables. The edge locations in the upper and lower possibility tables are shown overlaid on the image in Figure 38. These slopes will be used to help center the search region used by the eyelid deformable template in Chapter 4, Subsection 9.3 at a good starting location. 74 Prior to calculating slopes at the upper edge locations, we reduce the table to the two edges bordering the most likely eyelash region. This edge identification is done in the following manner. The upper eyelid search region, L I , should cross over two useful edge regions: the sclera/hair eyelid boundary and the hair/skin eyelid boundary. This is regardless of whether the eyelash is in the up or down orientation (see Figures 39 and 40). One of these boundaries is the parabolically parameterizable eyelid edge and the other is the jagged eyelash boundary. The differentiation between the two will not be made until after each has been parameterized. 75 Figure 40: Image Of Eye Region With Lash In The Up Orientation The likely candidates for these edge locations are chosen from the upper eyelid possibility table according to the local intensity values calculated adjacent to the edge regions. We are searching for two edges which indicate a scleral/hair boundary followed by a hair/skin boundary in the positive e2 direction. This is determined using the formulas in Equation Group 15. Edge Choices = min ( (LowerlntensityfnJ - Sclerallntensity) + (15) (UpperlntensityfnJ - Hairlntensity) + (LowerIntensity[n +1 ] - Hairlntensity) + (Upperlntensity[n +1] - Skinlntensity) ) where n is the possibility table position indicator, Lower Intensity is the intensity value of the of the region below edge n, and Upperlntensity is the intensity above row n. The two edges should be adjacent in the edge table, as indicated by the 76 n and n+1 indices. This is because the edge detection filter used contains a Gaussian averager which smoothes over most eyelash separations (at the eye scales we allow), which does not give strong edge indications for a very localized strong edge. Even in the case where a strong edge was indicated within the eyelash region, the local intensity values above and below this edge would still both indicate hair. The upper eyelid possibility table is thus prefiltered to remove any edges that indicate a hair/hair boundary to within a 20% tolerance. The sclera/hair and hair/skin boundaries should therefore be found in adjacent table locations. The two upper eyelid edge locations chosen using Equation Group 15 are now used as center points for localized slope edge calculations. The edges are followed in both directions for a length of 2/3 of the iridal radius. These edge points are then used in a least squares minimization to calculate their slope. The same V 2 G edge filter used previously is used to follow the edges. The edge search region is divided up into subregions of width 3/(2*radjrjdai). The V G filter is correlated within each subregion, starting with the two adjacent to the start point on either side. Zero-crossing detection is done in the e2 direction on the filtered regions, and the strongest edge value within each subregion is chosen as an edge point. The successive subregions moving outwards from the start are centered, along the e2 axis, on the previous edge value. This causes the search regions to track the edge as shown in Figures 41 and 42. 77 Figure 41: Subregions Used For Edge Following The e2 length of the subregion is determined by the maximum slope that we need to find. It is assumed that an eyelid boundary does not achieve a slope with an absolute value greater than 60° except possibly in the corner regions. Additionally, knowing that the scleral point we have chosen to center our search space along the ei axis is not in the corner region, we calculate the required subregion length to be 3V3/radjridai as shown in Figure 41. 78 60 Figure 42: Eyelid Edge Data After calculating the edge points on either side of the two upper eyelid start points, these points are used to calculate least squares minimized slope lines through these points. This is done using the formulas in Equation Group 16. x = (ATAylATb (16) where x - , y = ax + b "1 x\~ >r 1 xl A = b = 1 xn yn 79 The coordinates are expressed as (x,y) where JC is the coordinate value along the ei axis and y is the coordinate value along the e2 axis. The line through these points is expressed using the parameters a and b as y = ax+b. The slopes we require are therefore the a values of the x vector. The application of this algorithm to the eyelid possibility table for our test image is shown in Figure 43. The slopes calculated at the potential upper and lower eyelid edge points are now used in the eyelid parameterization algorithm. 50 130h 140 110 120 130 140 150 160 170 180 190 200 Figure 43: Eyelid Slopes Calculated From Edge Data The lower eyelid boundary is generally much harder to parameterize than the 80 upper. There is low contrast between the skin of the lower lid and the scleral region8; also lower eyelash contrast is minimal. Therefore the elimination of any entries in the lower eyelid possibility table is deferred until further parameterization has occurred. The larger scope of the later algorithm greatly reduces the chance of misfitting. If the lower eyelid table were reduced to one edge at this point, it would likely be in error, the slope of each possible edge location is therefore calculated using the same method described to find the upper eyelid slope calculations. 4.9.3. Semi-parabolic Eyelid Fitting The parameterization of the upper and lower eyelid boundaries is accomplished using a parabola-fitting algorithm based on the eyelid filter, EL(x,y,a,sl,t), shown in Figure 44. A deformable, semi-parabolic arc template will be fit to the eyelid features above and below the scleral region found in Chapter 4, Subsection 7. The algorithm is applied first to the two upper eyelid candidates. Upon parameterization of the upper lid, the algorithm is then applied to the lower lid possibilities; in combination with the upper lid results, the parameterization of the upper and lower eyelid boundaries are completed. The upper and lower eyelid possibility edges shown in Figure 43 are used as starting points for the algorithm. To parameterize the upper eyelid, the eyelid filter uses two semi-parabolic regions, L i a and hi& or Lib and L2b, in order to determine the average intensities adjacent to a parabolic edge. The curvature of the parabola, as well as the spacing Except with tanned or dark skinned people. 81 between the two regions, is variable. The initial application of the filter uses only the upper lid subregions on the side of the sclera containing the possible eyelid edges. For example, if the scleral region to the left of the pupil had been parameterized, then only the Lu and L2a regions would be used. These subregions are used because the model is initially fit to the edge information on the parameterized side. As the center of the search space is based on the slope and edge position of the possible lids, and the space is fairly large during the initial search, traversing this space on the opposite side of the model could cause interaction with strong edges such as the iridal/scleral boundary. This interaction could lead to a false fit as the parameters are varied to fit the model. For the initial search therefore, only half the model is used. Upper Eyelid Semi-paraPolic Initial Filter Regions Upper Eyelid (x^y j Upper Eyelid (x^y^ Upper Eyelid Initial Edge Parameterization slope, SI / A Scleral Centering Region Lower Eyelid (x^y, Upper Eyelid Semi-parabolic Filter Regions Ycu — OuppfR (Xoj-Xpu) + Vpu y = -aUPPEB(x-xcur + y c a YCL = a ^ l X a - x J 2 + yP l. y = -a^ l x - X a ) + y a Iridal Boundary Lower Eyelid Initial Edge Parameterization \ Lower Eyelid Semi-parabol ic Filter Regions Lower Eyelid ( x ^ : Figure 44: Upper And Lower Eyelid Boundary Filter The first application of the eyelid model to the upper eyelid regions, uses a 82 coarse search over a fairly large parameter space. This is done to reduce the possibility of matching to a local maxima. The slopes calculated at the possible eyelid region's edges are used to center the parameter search space of the eyelid filter. These slopes are related to the parabolic regions as follows. The region boundaries have edges determined by Equation 17. yc = a(xc - xp)2 + yp (17) Where (xc,yc) are the starting edge coordinates along the ei and e2 axis respectively, and (xp,yp) is the location of the peak of the parabolic edge region. Differentiating this equation, we find that the slope at the edge locations are related to the parabolic regions by Equation 18 slope - dy/dx = 2a(xc - xp) (18) The initial upper eyelid parabolic curvature parameter, a, is set so that when the parabola is one iridal radius high in the e2 direction, it has a width of 5.5 iridal radii in the ej direction. The initial a value is therefore set as shown in Equation Group 19. y' = ax'2 (19) rindai = a(5.5 r iridai)2 :. a = 4/(I21riridal) This a value was determined through measuring the eyes of more than fifteen people, to be a good starting point for the aspect ratio of the upper eyelid boundary. The boundary that we actually want to place on the eyelid aspect ratios is based on the vertical distance between the eyelid peaks and the intersection of the upper and 83 lower parabola. But since this intersection is unknown, it is assumed to be at a height of 1 iridal radius in the e2 direction for the upper lid, and 2/3 iridal radius for the lower lid. The edge slope and the initial a value completely define the parabola, and position it relative to the (xc,yc) point. The slope, si, is therefore used to place the parabolic regions over the image data so that the (xc,yc) point of the filter is placed over the edge point being used, and the slope of the parabola through this point is equal to the slope previously calculated at this edge. The eyelid coarse parameter search space is bounded and sampled as described in Equation Group 20. a = 4/(12Iririliai) ± 60% traversed at increments of 3% (20) si = initial edge slope ± 10% traversed at increments of 1% (xc,yc) = initial (xc,yc) ± riridai/5 traversed at increments of r^da/lO teyelid, edge thickness parameter, = riridai/5, The edge thickness parameter, teyend, is not varied during the search space correlation. The ±30% boundary on the a parameter allows the aspect ratio to vary between 3:1 and 7:1, which are reasonable limits on eye shape as shown in Figure 45. For example to go from 5:1 to 7:1 the calculation is as described in Equation Group 21. y = ax2, r^dai = ai($rmdai)2 ~ a2(7riridai)2 (21) 84 ci2/ai = 0.60 Bounding Rectangles For 7:1 Bounding Rectangles For 3:1 Figure 45: Parabolic Parameterization Bounds For The Upper And Lower Eyelids A further bound is placed on the eyelid parabola by the boundary between the lid and the eyeball above and below the pupil. In order to ensure that the region bounded by the lid parabola contains the pupil region, edges are found along the e2 axis above and below the pupil using the R3, R4, R7, and R# regions of the iris/pupil filter. The edge that is detected may either be the iridal/scleral boundary or the iridal/eyelid boundary, depending on the current point-of-gaze and the openness of the eye. The elongation parameter of the filter is set to 0.5 to detect the flatter curve 85 expected. The filter is centered on the parameterized pupil, started at a radius 50% greater than the pupil. It is then expanded at intervals of iridal radii/10 until an edge with a region gradient of at least Thud, is found for both the upper and lower areas independently. Thnd is defined as 50% of the edge gradient found when parameterizing the iridal/scleral boundary of the iridal region. These edges define a reference distance between the pupil and the eyelid boundaries along the e2 axis. The eyelid parabolae must be no closer than iridal radius/5 (to allow for eyelash error) when they are directly above or below the pupil. These conditions may reduce the previously specified eyelid search space bounds, depending on the initial search parameters. Although the bounds on the parabolic regions are based on symmetric parabolae, the filter calculations only utilize semi-parabolic regions in their initial search. The filter calculations performed at each location in the edge region's search space are as follows. The LIA and L2A regions are divided into two subsections each. The boundary between these subsections is determined by the parameterized iridal boundary, as shown in Figure 46. With the parameterized scleral region on the left side of the iris as shown in this section's example, the four regions are Luo and L2Ao, the upper eyelid regions outside the iridal boundary, and Lui and L2AI, the upper eyelid regions inside the iridal boundary. The a-trimmed average intensity is calculated for all four regions. The averages from the two regions outside the iridal boundary are subtracted so that a positive difference indicates that the edge gradient is in the same direction as the corresponding edge gradient in the upper eyelid 86 possibility table. For the two regions within the iridal boundary, the absolute value of the difference of their region averages is calculated. The differences calculated for the subregions on either side of the iridal boundary are summed and used as an indication of the parabolas' goodness-of-fit, with a larger value indicating a better eyelid fit. Regions Figure 46: Upper Eye Filter Regions Defined By Iridal Boundary If the iridal boundary does not intersect these regions, then only the first difference calculation is performed and used as the goodness-of-fit indicator. The summation of the two (or one) difference calculations is used as a goodness-of-fit criteria for the initial parameterization; the highest value with the search space is used as the semi-parabolic parameterization for the potential upper eyelid edge. The results of the semi-parabolic parameterizations performed in this section are shown in Figure 47. 87 120 -130 -140*- ' 1 1 1 1 1 1 1 1 100 110 120 130 140 150 160 170 180 190 200 Figure 47: Possible Eyelid Semi-Parabola Parameterizations After the initial parameterization has been completed on both possible eyelid candidates, the one with the highest difference sum is chosen as the upper eyelid half parabola parameterization. This is generally the eyelid boundary, whereas the other is usually the rough eyelash boundary. 4.10. Upper Eyelid Full-parabolic Parameterization Next, the chosen semi-parabola's parameters are used as the center for a full parabolic parameterization of the upper eyelid region. This is done using a fine resolution search within a small parameter space surrounding the starting location. The eyelid filter regions are used in a manner identical to the coarse search. The 88 parameter space is bounded and sampled according to the criteria in Equation Group 22. a = ahaif_paraboia± 10%, traversed in increments ofl% (22) si = slopehaif_paraboia ± 5%, traversed in increments of 0.5% (xc.yc) = (xhaifjamboia, }'haifjamboia) ± iridal radius/10, traversed in increments of iridal radius/20 teyeiid, edge thickness parameter, = riri(iai/10 The best fit is chosen using the same goodness-of-fit criteria as that used during the coarse semi-parabola parameterization. The fit chosen for the fine search is considered the parameterization of the upper eyelid boundary. 4.11. Lower Eyelid Semi-parabolic And Full-parabolic Parameterization Unfortunately, the method used to identify the two upper eyelid possibilities cannot accurately be applied to the lower eyelid. The lower eyelid generally has low contrast with the scleral region; further, the lower lid contains very little eyelash region. For this reason, none of the edge locations specified in the lower eyelid possibility table can be discarded until after each one has been parameterized to the semi-parabola level, and a goodness-of-fit calculation has been done to choose the best edge. Each edge-start specified in the lower eyelid parameterization table is followed for a distance of iridal radius/3 in either direction, using the method previously specified for the upper eyelid edges. These edge points are then used in 89. least squares minimization calculations to find the slope at each edge start. Each slope and edge start coordinate pair is then used to apply the lower eyelid template to the image data. The lower eyelid parameterization algorithm consists of two stages: a coarse semi-parabola fit, and a fine-scale full parabola search. The coarse search is performed on each edge start. This search is performed using either the L3 or L4 regions eyelid filter regions, chosen so that lower eyelid region is adjacent to the parameterized scleral region. Regions L3 and L 4 are defined by their curvature parameters a2 and a3, and by their thickness parameter /. The division between the L3 and L 4 regions is initially defined to be centered at their parabolic valley. Lower Eyelid - max(I(x, y) ® Fu/4 u LO(X ', y', a2/a3)) (23) The coarse search, Equation 23, consists of varying the template parameters on a search space around the initial template parameters. The template starting point is defined similarly to the upper eyelid calculations. The coarse search also includes the L 0 template region in its correlation sum in order to ensure centering around the sclera of the upper and lower eyelid parameterizations. The lower eyelid filter regions are applied in a manner differing from the upper eyelid regions. During the initial search, the L3/4 filter region is divided into parabolic subregions as shown in Figure 48. Each subregion is used to calculate the a-trimmed average intensity of the corresponding image area. These averages are then differenced with their neighbouring regions, and the sum of these differences used to indicate the homogeneity of the region. The region homogeneity is lowest 90 (highest sum) when the region is surrounding the lid edge. The region sum is calculated in this manner because an edge with adjacent homogenous regions is not usually present: sparse eyelash interspersed with a low contrast skin/scleral boundary. The maximum sum calculated as the L3/4 regions traverses the initial search space, which is defined around each edge of the lower eyelid possibility table, is used as the initial semi-parabola parameterization for this edge. Iridal Boundary As in the upper eyelid parameter search space, the parabolic curvature parameter, a, is bounded based on expected maximum and minimum aspect ratios for the lower eyelid edge. As stated previously, we expect the intersection of the upper and lower eyelid regions to be at an average distance of 2/3 the iridal radius from the lower eyelid minima along the e2 axis. Based on this assumption, the aspect ratio bounds used for the upper eyelid edge are adjusted by 2/3 as follows. The center of A e Figure 48: Lower Eyelid Filter Subregions 91 the curvature parameter search region is changed from 11/2:1 for the upper eyelid boundary, to 11/3:1 for the lower eyelid boundary. The 3:1 minimum ratio becomes 3:2/3 and the 7:1 maximum ratio becomes 7:2/3. The initial lower eyelid search region is therefore defined as described in Equation Group 24. a = 9/(121radiridai) ± 60% traversed at increments of 3% (24) slope = initial edge slope ± 10% traversed at increments of 1% (xc,yc) = initial (xc,yc) ± radiridai/5 traversed at increments of radiridai/10 The ±60% boundary on the a parameter allows the aspect ratio to vary between 3:2/3 and 7:2/3, which are reasonable limits on eye shape as shown in Figure 45. Once the initial parameterization search region has been traversed for each edge, the parameterization of each edge is used in a goodness-of-fit calculation to determine the best lower lid parameterization. The goodness-of-fit measure, Equation 25, is calculated using two regions: the L 3 / 4 region and the Lo region. The edge detected using the L 3 / 4 region is combined with a scleral calculation performed in the Lo region; the Lo region is defined to be the side of the iris used for the edge calculations, and bounded at the top by the parameterized upper eyelid. The a-trimmed average intensity within this region is calculated and used to ensure that the area enclosed by the lower lid/upper lid combination is likely just sclera. Lower Lid Half Parabola = max(cl L3/4 + c2 Lo) (25) Where the constants cl and c2 are used to normalize the high scleral average with the low edge intensity. The highest sum is used to determine the lower lid half 92 parabola parameterization. Two regions are used to parameterize the lower lid because the lower lid can sometimes have two distinct curvatures. The division between the L3 and L4 regions is determined by locating the point along the parameterized region where the edge intensity drops below 50% of the L3/4 average. This point is located by moving along the parabolic peak in both directions and calculating the average intensity of the subregions shown in Figure 48. The contiguous block of 3 subregions closest to the parameterized edge side and below the 50% threshold marks the division between the L 3 and L 4 regions. If a block below this threshold is not found within 2/3 the width of the semi-parabola region on either side of the peak, then a division between the regions is not made and a single parabola is used to characterize the L3 and L 4 region. If this region is found, the slope is calculated at this point using the algorithms described previously, and a second half-parabola fit is performed for the remaining edge. For example, if the L 3 region was used for the first parameterization, then the L 4 region will now be used for the second half parabola. Two regions are therefore used only if necessary. Once the initial semi-parabola fits have been performed, a fine half parabola fit is then performed over each region. If only one region was necessary, then a fine scale full parabola fit is performed. The search space is defined for each region as shown in Equation Group 26: a - cihaif_paraboia± 10%, traversed in increments ofl% (26) slope = slopehaif_pamboia ± 5%, traversed in increments of 0.5% 93 (xc,yc) = (xhaif_parabota, yhaif_poaraboia) ± iridal radius/10, traversed in increments of iridal radius/20 Where two curvature parameters, ahaif_paraboia are used if the region division has been made. Using these search regions, the best fits are found for both of the semi-parabola edges (or a single full parabola region) based on the maximum of the region calculations. The lower lid parameterization has now been completed. 4.12. Goodness-of-fit Calculation The overall goodness-of-fit parameter for the completed parameterization is defined as the non-weighted sum of the individual goodness-of-fit parameters from the subsection parameterizations. This sum is described by Equation 27. The negative terms in this equation provide a differential sum with matching positive terms. Fit = Rn - R0_homo - R ! - R 2 - R 3 - R 4 + R 5 + R 6 + R7 + R8+ (27) Lo + Lj - L2 + Lj+ L4 Each term in this equation has been described in previous sections. The sum is compared against a threshold, Thfit, and used to determine whether the parameterization vector represents an accurate parameterization of the features of the eye in the chosen Eye-event region. If the sum is equal to or above Thflh the system moves onto the Tracking algorithms to continue parameterization of the eye features in subsequent image frames. If the sum is less than Thflt, the system returns to the Motion Segmentation algorithms to search for new Eye-event regions in subsequent 94 image frames. The completed feature parameterization of the eye in the chosen Eye-event region is shown in Figure 49. This parameterization of the eye features has a goodness-of-fit sum greater than Thflt, enabling the system to move onto the Tracking algorithms. 50 100 150 200 Figure 49: Parameterization Of Iridal/Scleral Boundary, Upper Eyelid, And Lower Eyelid 95 5. T R A C K I N G ALGORITHMS The Tracking algorithms developed in this thesis are used for two main purposes: the tracking of the iridal region and the tracking of the eyelid boundaries. These eye region features provide the information required by the 2-D and 3-D eye models in determining the point-of-gaze for the imaged subject. The Tracking algorithms use the last known feature locations and feature motion characteristics to quickly locate the features new position. Also utilized by these algorithms are the average intensities of the skin, hair, and scleral regions calculated for the 2-D eye and face models during the parameterization algorithms. The Tracking algorithms calculate a goodness-of-fit parameter, which is used to determine when eye track has been lost. The Tracking algorithms described in this thesis have a tracking success rate of greater than 15 seconds locked duration 9 out of 10 trials. The tracking algorithms are outlined in Figure 50. The first task of the Tracking algorithms is to find the iridal/scleral boundary in an image frame following the one used in the previous Parameterization algorithms. The previously calculated parameterization vector is used as a starting point for this template matching. The deformable template used in this parameterization is identical to the one used by the Parameterization algorithm, PI(x,y,r,el,t). The next task performed by the Tracking algorithm is the parameterization of the upper eyelid followed by the lower eyelid using a deformable template similar to EL(x,y,a,sl,t). Finally, a goodness-of-fit calculation is performed to see whether an accurate eye feature parameterization has been achieved. If it has, 96 the Point-of-gaze Detection algorithm described in Chapter 6 is run, a new image is sampled, and the Tracking algorithm repeats, using the just calculated feature parameters, parameterization vector, as a starting point for a new search. If an accurate parameterization has not been achieved, the system will return to the Motion Segmentation algorithm to search for a new Eye-event. The main difference between the Tracking algorithms and those performed by the Parameterization section is the speed of execution. When the Tracking algorithms are used in the C40 system, they must be able to perform an eye feature parameterization within 66ms in order for the system to track eye movements at 15fps. The Parameterization algorithms described in Chapter 4 can take many times longer than this without impacting system performance. For the purposes of the SPARC 5 simulation, no significant (under 200-300ms) real-time constraints are placed on the execution speed of the Parameterization algorithms. However, when executing in the C40 system, the Parameterization algorithms are required to complete in under two frames (132ms) to ensure that the eye features do not change significantly from the image being parameterized. In order to not lose the Eye-event region during this time period, the C40 system runs a coarse iridal parameterization in parallel with the Parameterization algorithm. Therefore, when the Parameterization algorithm completes, the Tracking algorithm is able to reposition the accurately parameterized eye features over the most current iridal center. 97 From Parameterization algorithm Find iridal position in new image frame I Find upper eyelid position in new image frame I Find lower eyelid position in new image frame I Acquire new image 1 Point-of-gaze detection Figure 50: Flowchart Overview Of Tracking Algorithm 5.1. Iridal Region Tracking The iridal region is tracked by utilizing the same filter used in its original parameterization. This tracking is done in two stages: a coarse search over a limited parametric space followed by a fine search over a smaller but more complete 98 (utilizing more parameters) parametric space. Using the last known iris location, a region is defined as shown in Figure 51. The size of this region is based on the speed of the eye and head motion we require the algorithm to track. Based on the saccadic motion calculations of Chapter 3, Subsection 5, the maximum eye motion we will track between two adjacent frames is a 34° saccade. This results in a motion region which is square with sides 3/2*r,nya/. In order to account for minor horizontal head motions and rotations, we will double this region in the e\ direction, allowing for head motions on the order of a saccade during any frame period (66ms). The vertical direction need not be increased as saccades tend to be smaller in this direction and thus the 3/2*r,n(/a/ region height should be enough to account for both saccadic and small head motions. Large head motions will be tracked, but their component of velocity in the image plane must not exceed this region boundary within any single frame period. Iridal Filter \ Center ..- Search Region i Y 3 * r , . , » . / 1 4 ~f 3' •r 1 r • <— •4— r n i > o lb * L . — • Expanded Search Region Figure 51: Coarse Iridal Filter Location Search Space The image region is traversed using a fairly coarse sampling density, points 99 are spaced at intervals of 1/5 the iridal radius along the e, axis and 3/14 the iridal radius along the e2 axis. At each of these points the iridal filter is applied in a limited sense to determine the need for full filter application on the image data. The limited application entails calculating the a-trimmed average intensity within an image area defined by the small circular iridal/pupil template region, Roa, defined in Chapter 4, Subsection 2 and shown in Figure 23. If the average intensity of this region is found to be lower than the maximum pupil intensity threshold, lpupu_max, then the region becomes a possible candidate for being the center of the pupil, and the filter is applied further. The next stage makes use of the Ro, Ri, R2, R 5 , and R 6 filter regions, however only the iridal radius parameter is varied at each candidate location. The search region is centered about the last known iridal center, (x,„,(,0/, ymmai)- This coarse iridal template deformation is described by Equation Group 28. If , \ \\l{x,y)dxdy < /w,./max then (28) C(x,y,r,el)= I(x,y) ® PI(x,y,r,el) Scoarsejearch = (xc,yc) = xinitiai ± 3riridai/2 traversed at increments ofriridal/5, yinitial ± 3riridai/4 traversed at increments of 3riridai/14, SCoarse_radius = 0.8r!ast j , < r < 1.2riast jt traversed at increments ofrlast j/10 The elongation in this portion of the tracking algorithm is set to 1, and the region separation parameter, t, is set according to the last known iridal radius at flastJlt/5-100 Once the filter has been evaluated throughout the search region, the best filter match, the largest C(x,y,r,el), is selected as the tentative iridal match. A fine filter correlation is now performed within a small parameter space surrounding this new match. The search space is shown in Figure 52. The fine scale iridal parameterization search space is defined by Equation Group 29. (Xc, yc) = (Xinitiai, yinitial) ± rinitiai/2, traversed at increments ofrinitiai/10 fpupil/Ms = '"initial^- 20% traversed at increments of 5% (29) Spupu/iris — 1 ^ Cpupii/iris < 1.5 traversed at increments of 0.1 Where rinitiai is the filter's minor axis radius, e\ axis, and (ximah yinitial) are the iridal center coordinates found during the coarse iridal search. A l l filter parameters are varied, with the exception of t, which is set to rias,/10, and the best fit is selected. This fit is then verified using the goodness-of-fit function previously described for the iris. If the fit is rejected then the algorithm signals that it has lost track and then ends. Otherwise, the tracking continues with the next frame. Coarse Iridal _ Parameterization Filter Radial And Elongation — Search Region Filter Center Search Region Figure 52: Fine Scale Iridal Filter Search Space 101 5.2. Eyelid Tracking The eyelid tracking is performed using a similar eyelid filter as that used in the eyelid parameterization algorithm. The filter is simplified by making the lower lid region a single parabola for speed purposes. The upper and lower lid regions are positioned by points defined at their peaks as shown in Figure 53. As with the iridal tracking, eyelid tracking uses a two-stage algorithm: coarse and fine. However unlike the eyelid parameterization algorithm, the tracking algorithm does not base its parabola on an initial slope value calculated at an edge location. The coarse matching is performed over a small parameter space defined around the last known eyelid parameters. The size of the image region covered in this search is dependent on the maximum speed of eye and head motion we are expecting to track and on the scale of the parameterized region. The coarse search region for the eyelid search is defined to be a rectangle with dimensions one half those of each eyelids bounding rectangle. These regions therefore allow for motions on the order of half the eye region during any single frame period. If the actual motion exceeds this region, then the track will be lost and the algorithm ended. Based on the saccadic motion calculations of Chapter 3, Subsection 5, the maximum eye motion we will track between two adjacent frames is a 34° saccade. This results in a motion region which is square with sides 3/2*r,>,</a/-102 Upper Eyelid Center Point (x .V ) Iridal Boundary Lower Eyelid Center Point Upper Eyelid Full-parabolic Filter Regions ^ Ycu CUPPER (^ CU'^ PU) YPI > < ^ y = a u w,(x-x n j) +y m Yci — ^lOWER (^ CL'^ PL) "F Ypi ' y = aL0WER(x-v) +Ypt Lower Eyelid Full-parabolic Filter Regions Figure 53: Upper And Lower Eyelid Tracking Templates The eyelid parameters varied in both search stages are the upper and lower region curvature parameters, al and a2, and the positional parameters Xpi, Ypi, XP2, and YP2. The bounds placed on these parameters are identical to those described in the Parameterization section with one exception. The upper and lower iridal edge is not tracked and therefore does not place a constraint on the eyelid parabolas. This constraint is however implicit in the tracking of the eyelid boundaries. The region curvature parameters are varied during the coarse search by ±10% of the last known parameter value using a step size of 2% of this value. The fine scale search space is defined with the parabolic center points moving in a region riridai by riri(iai centered about the coarse search results center points; 103 in a region riridai by riridai centered about the coarse search results center points; traversed at at interval of riridai /10 along both axis. The curvature parameters are varied by ±2% of the last known parameter value using a step size of 1% of this value. 5.3. Goodness-Of-Fit Calculation After the fine-scale template matching is completed, the eye feature parameterization is evaluated using a goodness-of-fit calculation. The calculation uses a similar sum to that used by the parameterization algorithm, see Equation 27. The only difference is the combining of the L3 and L4 regions into a single region, L3 /4, due to the single parabola representation of the lower eyelid by the Tracking algorithm. If the goodness-of-fit sum is greater than or equal to Thflt, the Tracking algorithm will calculate the point-of-gaze of the images subject using the current eye feature parameterization. A description of the Point-of-gaze algorithm and results from its use are given in Chapter 6 and Chapter 7. Following the Point-of-gaze calculation, the Tracking algorithm retrieves the next image in the sequence and begins parameterizing the new eye features using the previous results as a starting point. If the sum of the goodness-of-fit calculation is less than Thflt, the system returns to the Motion Segmentation algorithms to search for an Eye-event in the subsequent images. 104 6. POINT-OF-GAZE DETERMINATION Once the eye has been parameterized, the 3-D eye model is used to calculate the current image's point-of-gaze. Seven parameters are needed for this calculation: the three coordinates of the scleral sphere center, the three coordinates of the iridal region center, and the distance from the scleral center to the target, along a normal to the coronal plane. These parameters are then applied to the 3-D eye model shown in Figure 54, and the point-of-gaze determined. The parameterized eyelids are used to estimate the scleral sphere parameters. Since we have shown in Chapter 3, Subsection 5 that the corner to corner width of the eye is a good estimate of double the scleral radius, we will use this width to estimate the distance along the optical axis, gaze direction, between the iris and the scleral center. We propose that eyelids can be thought of as two straight edged pieces of skin pulled across the eye, and as such, the opening between the two will center itself on the scleral sphere, ignoring the effects of the corneal sphere. The center of mass of the opening therefore will provide a good estimate to the scleral sphere center when the coronal and imaging planes are near parallel. The validity of this assumption will be tested by the results to come. A second method for calculating the center of the scleral sphere could also be applied. This method uses a line between the intersection points of the upper and lower eyelids to derive the y coordinate of the center, and the center of this line to derive the x coordinate of the center. The radius of the scleral sphere will still be 105 derived as one half the distance across the eye. In this thesis, we will use the first method of estimating the scleral sphere center. The coordinates of the center of the iris have already been parameterized within the imaging plane, and by combining these results with the estimated scleral center and radius, a ray along the optical axis can be calculated as shown in Figure 54. Figure 54: Determination Of Point-Of-Gaze Using Scleral And Iridal Centers The camera must be located at known real world coordinates and the transformation from 3-D real world coordinates to 2-D pixel coordinates known. For 106 a camera aligned so that its optical axis is coaxial with that of our real-world coordinate system, and the x and y axis are defined identically for the real-world and image plane, the x and y image coordinates relate to the x, y, and z real-world coordinate according to Equation Group 30. x2o=^^, y 2 D ~ l s - (30) A z 3 D A. z3D Where X is the camera focal length. This allows one to make the transformation from the image to the physical world. 107 7. S Y S T E M IMPLEMENTATION AND RESULTS 7.1. Matlab Implementation And Test Results The Matlab implementation of the previously described algorithms was performed on a SPARC 5. This implementation was however not real-time critical. The algorithms were run on image sequences without a 66ms (15fps) restraint on each stages processing time. The images and results shown in the previous sections are from the Matlab implementation of the Location and Parameterization algorithms. Our point-of-gaze detection algorithm is demonstrated in this thesis on two test subjects. The first subject, a female, was tested by having her sit in front of the imaging camera and look at a series of three targets for a minimum of 1 second per target. The second subject, a male, was tested by having him sit in front of the imaging camera and look at a series of nine targets for a minimum of 1 second per target. The female subject is representative of an average difficulty subject to determine point-of-gaze. She has good lower eyelid contrast and even length, solid eyelash regions. The male subject is representative of difficult subjects for which to determine point-of-gaze. He has less lower eyelid contrast, less parabolic eyelid shape, and jagged non-uniform eyelash regions. The gaze accuracy's calculated for this subject should be a good representation of an outer accuracy bound for the system described in this thesis. The three targets were arranged as shown in Figure 55 and the nine targets as shown in Figure 56. The target positions were fixed, and the distance from the camera to the person known. The focal length of the camera 108 was entered into the system to allow the transformation between image coordinates and real-world coordinates to be performed. Figure 55: Three Target Arrangement The accuracy of the Tracking and Point-of-gaze Determination algorithms was tested on each subject by only allowing the Location algorithm to be performed once at the beginning of an image sequence. All subsequent parameterizations are the result of the Tracking algorithm following the eye through the sequence of images. The results of the eye-gaze calculations for the first subject are shown in Table 1. These results were derived using the eyelid parameterizations center of mass to estimate the scleral center. 109 Image Real World Scleral Center X Coordinate Target Position Y Coordinate Target Position X Error Y Error Target 1 (-1.2, 0.5, 80.0) 2.1 cm 1.0 cm 2.1 cm 1.0 cm Target 2 (-6.1,-1.2, 80.0) -33.5 cm 2.2 cm 2.0 cm 2.2 cm Target 3 (-7.9, 0.3, 80.0) 36.1 cm 1.5 cm 1.1 cm 1.5 cm Table 1: Three Target Point-of-gaze Calculation Results Using the radial distance from the scleral center to the selected target, the point-of-gaze error can be quantified as an error percentage along the x and axis. The results obtained from the three target gaze-points show an error of less than or equal to 1.19° along the x axis and an error of less or equal to 1.58° along the >> axis. These results are shown in Table 2. Image Radial Distance from Scleral Center to Target X Error Y Error X Angular Error Y Angular Error Target 1 80.0 cm 2.1 cm 1.0 cm 1.51° 0.72° Target 2 87.3 cm 2.0 cm 2.2 cm 1.19° 1.58° Target 3 87.3 cm 1.1 cm 1.5 cm 0.66° 1.08° Table 2: Three Target Point-of-gaze Error Evaluation The images used for the previous calculations are shown in Figures 56, 57, and 58. The results of the parameterization of the eye chosen by the Location algorithm are shown overlaid on each image. 110 50 100 150 200 250 Figure 56: Subject Looking At Target 1 50 100 150 200 250 Figure 58: Subject Looking At Target 3 The Accuracy of the Tracking and Point-of-gaze Determination algorithms was also tested using a nine target array. The subject and targets were arranged as shown in Figure 59. As with the first subject, the Location algorithm was only performed at the start of the image sequence. A l l subsequent parameterizations are the result of the Tracking algorithm following the eye through the sequence of images. The results of the nine target gazes from the subject's left eye are shown in Table 3. These results were derived using the eyelid parameterizations center of mass to estimate the scleral center. 112 Figure 59: Nine Target Configuration Image Real World Scleral Center X Coordinate Target Position Y Coordinate Target Position X error Y error Target 1 (2.9, 3.6, 79.7) 2.9 cm 3.6 cm 2.9 cm 3.6 cm Target 2 (2.8, 2.9, 79.7) -52.7 cm 29.8 cm 8.3 cm -8.2 cm Target 3 (2.5, 2.6, 79.7) 3.1 cm 30.9 cm 3.1 cm -7.1 cm Target 4 (3.8, 3.3, 79.7) 52.6 cm 32.2 cm -8.4 cm -5.8 cm Target 5 (2.9, 2.3, 79.7) -53.3 cm 4.1 cm 7.7 cm 4.1 cm Target 6 (2.3, 3.0, 79.7) 62.0 cm 6.7 cm 1.0 cm 6.7 cm Target 7 (2.9, 3.6, 79.7) -48.1 cm -6.1 cm 12.9 cm 31.9 cm 113 Target 8 (2.8, 3.4, 79.7) -7.1 cm -18.1 cm -7.1 cm 19.9 cm Target 9 (1.6, -0.7, 79.7) 50.3 cm -21.5 cm -10.7 cm 16.5 cm Tab le 3: Nine Target Point-of-gaze Calculation Results Using the radial distance from the scleral center to the selected target, the point-of-gaze error can be quantified as an error percentage along the x and y axis. The results obtained from the nine target gaze-points show an error of 4.01° or less along the x axis and 4.99° or less along the y axis for the targets above the scleral center, targets 2, 3, and 4, and for the targets at the same elevation as the scleral center, targets 1, 5, and 6. The results obtained from the targets below the scleral center, targets 7, 8, and 9, show an increased maximum error of less than or equal to 5.41° along the x axis and an error of less or equal to 15.76° along the y axis. These results are shown in Table 4. Image Radial Distance from Scleral Center to Target X Error Y Error X Angular Error Y Angular Error Target 1 79.8 cm 2.9 cm 3.6 cm 2.08° 2.59° Target 2 108.1 cm 8.3 cm -8.2 cm 3.58° -4.99° Target 3 87.2 cm 3.1 cm -7.1 cm 2.23° -4.30° Target 4 104.1 cm -8.4 cm -5.8 cm -4.01° -3.49° Target 5 102.2 cm 7.7 cm 4.1 cm 3.33° 2.94° 114 Target 6 99.0 cm 1.0 cm 6.7 cm 0.45° 4.81° Target 7 110.3 cm 12.9 cm 31.9 cm 5.41° 15.76° Target 8 89.8 cm -7.1 cm 19.9 cm -5.09° 10.51° Target 9 106.2 cm -10.7 cm 16.5 cm -5.17° 8.87° Table 4: Nine Target Point-of-gaze Error Evaluation The results show a good correspondence between actual and calculated gaze position for the upper and middle row targets. Very poor results were obtained for the lower three targets; this is due to the amount of upper lid occluding the iris in these cases, causing the iridal match to suffer in accuracy. A portion of this error can also be attributed to our method for estimating the scleral sphere center; when the eye is gazing at a lower than eye level target, this center is calculated too low. The images used for the previous calculations are shown in Figures 60, 61, 62, 63, 64, 65, 66, 67, and 68. The results of the parameterization of the eye chosen by the Location algorithm are shown overlaid on each image. The results achieved would allow the use of the system to visually position a cursor over a word within a paragraph in a word processing application for words at or above the eye level of the user, assuming 12 point text, 17" monitor, user 30cm from screen. However the system accuracy for targets lower than the eye level of the user is better suited to having the user select from 4-5 choice buttons distributed equally across the lower portion of the monitor. 115 50 100 150 200 250 Figure 64: Gaze At Target 5 Figure 65: Gaze At Target 6 118 Figure 67: Gaze At Target 8 119 50 100 150 200 250 Figure 68: Gaze At Target 9 7.2. C40 Network Implementation And Evaluation The results shown in the preceding sections were obtained from a simulation run using Matlab on a SPARC 5 workstation. The eye parameterization system described in this thesis was also implemented on a network of six TMS320C40 Digital Signal Processors using the network topology shown in Figure 69. Each link shown is a 32-bit wide, 15MByte/s parallel communications channel between processors. Each port can be operated through interrupts, providing background DMA transfer of data between processors in the system. Bugs in the nodal communication dealing with mode switching however were not removed and the 120 system was not completed. Figure 69: C40 Network Topology The code for each C40 was written in C using the Texas Instruments Optimizing C compiler. C40 Node 1 is connected to a SPARC 2 using a 10 Mbps serial connection. C40 Node 5 contains a frame grabber, which is mapped into its C40's main memory space through dual port R A M , sharing dual port R A M with the 121 C40 in order to facilitate the transfer of image data. The frame grabber samples images at 30 interlaced frames per second. C40 Node 6 has display memory mapped into its C40's main memory space through dual ported V R A M . Pixel data written into this bank of V R A M is read out by a display D A C and converted to a video signal. The CRT display attached to the system is updated at 60 frames per second, 30 frames per second interlaced. The C40 system implementation described in this thesis was implemented in 1994. It requires approximately 200mips in order to run at 15fps. Today, a Pentium II, 300Mhz system is capable through its pipelined architecture and the use of an optimizing compiler to avoid pipeline stalls, of approximately 300mips. Therefore, this eye location, parameterization, and tracking system would be able to run on a single Pentium II system at rates greater than 15fps. 7.2.1. System Modes The system was designed to operate in three distinct modes. The flow between these modes is shown in Figure 70. The first mode is the Motion Segmentation and Initial Parameterization Mode. In this mode possible Eye-events in the image sequence are isolated and analysed. If an Eye-event as defined in this system is confirmed, the segmentation and initial parameterization information is sent to the Parameterization Nodes and the system switches to the second mode. The second mode is the Eye-Parameterization and Iridal Tracking Mode. In this mode, the frame occurring just after the end of the chosen Eye-event is analysed 122 to find the parameters of the 2-D eye model previously discussed. In addition to this, the coarse iridal parameterization calculated during Mode 1 is used to track the iris until the 2-D eye model parameterizations are complete. Once the parameterization is complete, the eye-parameter goodness-of-fit function is calculated and the criteria for eye acceptance are checked. If the criteria are met, then the system switches to the third mode of operation. If the acceptance conditions are not satisfied, the system switches back to the first mode. 123 r Acquire image $ _ Mode 1: Motion Segmentation And Initial Parameterization 5 Mode 2: Eye Parameterization And Iridal Tracking J Confirmed \ ^ ^ features? Yes Acquire image Yes Figure 70: Modal Flow Of Eye Tracking System The third mode of operation is the Iridal Region and Eyelid Tracking Mode. In this mode, the iris and the eyelids are tracked in a pipelined fashion. In other Mode 3: • Iris And Eyelid Tracking 124 words, the iris is tracked on the current frame, while the eyelids are being tracked on the previous frame. This mode ends when the iridal or eyelid parameterizations fail to satisfy an acceptance criteria. When this mode ends, the system switches back to Mode 1. A sample of the screen output of the Eye Tracking system is shown in Figure 71. The four quadrants are, from upper left clockwise: Current Image, Differenced Image, Parameterized Image, Most Recent Eye-events. 50 100 150 200 250 300 350 400 450 500 Figure 71: Sample Screen Output From C40 Eye Tracking System 125 7.2.2. Mode 1: Motion Segmentation and Initial Parameterization Three C40 nodes take part in this mode: the Grabber Node, the Motion Segmentation Node, and the Display Node as shown in Figure 72. Two Parameterization Nodes are also in use, but their function in this mode is to wait for initial parameterization and segmentation information, as well as image data, from the Motion Segmentation Node. Upon reception of this information, these nodes begin the parameterization of eye features. However, shortly after sending this info to the Parameterization Nodes, the Motion Segmentation Node switches the system into Mode 2. Therefore, most of the Parameterization Node processing actually occurs during Mode 2. Frame Grabber Node Image Motion Segmentation Node Image a n d segmentation! information Initial Parameterization Node 1 Image Segmen ted i m a g e a n d regional parameters Display Node Initial Parameterization Node 2 Figure 72: Mode 1: Motion Segmentation And Initial Parameterization 7.2.2.1.Frame Grabber Node The Frame Grabber Node serves three main purposes in this mode. The first 126 is the copying of the current image from the frame grabber's dual port memory. This is done after the frame grabber indicates it has written the odd scan lines for the current image; only the odd scan line are copied by the Frame Grabber Node. Since the second grabber function is to reduce the image by a factor of two in both image dimensions, the copying of only odd scan lines accomplishes half of this task, the vertical sub-sampling. The pixel byte data is packed into four byte words in the frame-grabber memory and must therefore be unpacked before the removal of every second pixel in the horizontal dimension. Once the horizontal sub-sampling is done, the pixels are then repacked and sent to the Motion Segmentation Node at its request. This is the grabber's third task. As can be seen in Table 5, the Frame Grabber Node is able to perform its tasks in under 33ms. The Frame Grabber Node is therefore able to work at the full video frame rate of 30 fps. The actual rate at which this node operates is dictated, however, by the Motion Segmentation Node. Function Time Reauired Image Copying, Un-packing, Reduction, and Re-packing 22 ms Imaging Transmission 6 ms Table 5: Frame Grabber Node Timing The Frame Grabber Node during Mode 1 is responsible for the following: • Copying the image from the frame-grabbers dual port R A M • Reducing the image size by sub-sampling by a factor of two along both the horizontal and vertical image axis 127 • Packing byte pixel data into 4 byte word packets • Waiting for the Motion Segmentation Node to signal that it is ready for an image • Sending the image to the Motion Segmentation Node The frame Grabber Node continues grabbing images at the rate dictated by the Motion Segmentation Node (up to the 30 fps maximum) until the Motion Segmentation Node informs the Frame Grabber Node to switch to Mode 2. 7.2.2.2.Motion Segmentation Node The Eye-event location, Motion Segmentation, algorithm described in Chapter 3 is implemented in this node. The Motion Segmentation Nodes signaling to the Frame Grabber Node, requesting a new image, is done immediately following the uncompression (unpacking) of the current image. This is possible because the uncompression routine moves the uncompressed image to a new buffer, thereby freeing up the compressed image buffer for a new image. Three consecutive images may be in memory at any one time, the two being differenced, and the one being received and written to memory by a background D M A task. All image transferring is done through background parallel-port/DMA transfers, thereby not disrupting the main task flow (with the exception of bus delay and port/DMA interrupt handling). The timing requirements of the various sections of the Motion Segmentation algorithm force the system to run at 15fps. The Motion Segmentation Node cannot process images in less than 33ms, forcing the Frame Grabber Node to skip an entire 128 frame, as the previous one is being overwritten once the delay is greater than 33ms. When the Motion Segmentation Node has found two acceptable Eye-event regions, it sends these region parameters, along with the current image, to the Parameterization Node 1. At this point, the Motion Segmentation Node informs the Frame Grabber Node to switch to Mode 2 operation. The Motion Segmentation Node then becomes the Iridal Tracker Node 1. Iridal Tracker Node 2 sits dormant during Mode 1 operation, waiting until an image is sent to it (this does not occur until Mode 2). The Motion Segmentation Node performs well, detecting an Eye-event by registering and utilizing small test subject eye-gaze changes of less than 15° or blinks. This allows the system to begin Mode 2 operation usually in under 2 seconds. The Motion Segmentation Node during Mode 1 is responsible for the following: • Accepting images from the Frame Grabber Node • Reduction of images using sub-sampling by a factor of two along both vertical and horizontal image axis to reduce data rate • Creating a difference image sequence by differencing consecutive pairs of reduced images. • Using the previously described motion segmentation algorithm to search for possible Eye-events in the difference image sequence • Sending the current reduced image to the Display Node • Sending the current difference image to the Display Node • Sending confirmed Eye-event regions to both the Parameterization Node 1 and the Display Node 129 • Setting the system to Mode 2 operation upon confirmation of Eye-event regions The values of variables used by this algorithm are set shown in Table 6 follows: Motion Segmentation Variables Value T 133ms Tmin_event 200ms Lsig 1.0 x 104 pixel intensity sum Lmax_motion 1.1 x 106 pixel intensity sum Lmax 3.1 x 106 pixel intensity sum Thel_max 81 pixels The2_max 54 pixels Table 6: Variables Used In The Motion Segmentation Algorithm The timing of the subsections of the Motion Segmentation routines are summarized in Table 7. Motion Segmentation Function Time Required Uncompress (unpack) image 14ms Difference image with previous 12ms Initial segmentation 8ms 130 Secondary segmentation 1ms Coarse pupillary/iridal search 1ms Receive new image from Frame Grabber Node 1ms (Received as background task) Send current image to Display Node 6ms (Sent as background task) Send difference image to Display Node 6ms (Sent as background task) Send Eye-event region data to Parameterization Node 1 and Display Node 1ms Send current image to Parameterization Node 1 6ms (Sent as background task) Total Time 66ms Table 7: Timing Of The Motion Segmentation Algorithms 7.2.2.3.Display Node The Display Node receives image and parameterization information from a number of nodes. This information is used by the Display Node to display four distinct regions of information on the display monitor as shown in Figure 70. The upper left region, area 1, displays the current reduced image. This image is received from different nodes depending on the current mode of operation. For example, in Mode 1 this image data comes from the Motion Segmentation Node. The upper right region, area 2, displays the current difference image received from the Motion Segmentation Node. The lower left region, area 3, displays confirmed eye regions. This image is created by taking the current image at the time the Eye-event information is received by the Display Node, and using it to black out any image area 131 outside the Eye-event regions. The lower right region, area 4, is not used during Mode 1 operation. 7.2.3. Mode 2: Eye-Parameterization and Iridal Tracking One set of objectives for Mode 2 is to select one of the Eye-event regions as the best choice for parameterization, to perform a fine-scale parameterization of this eye region according to the parameters of our 2-D eye-model, and finally to decide by calculating a goodness-of-fit function, whether or not an eye was likely parameterized. A parallel objective of Mode 2 is to track the coarsely parameterized iridal/scleral boundary present in the Eye-event chosen for fine scale parameterization. Frame Grabber Node Iridal Tracker Nodel Image Dsday Node Tracking parcmeters Iridal Tracker Node2 Ftrarreterization Nodel Segnenled image and regard paareters taarreferization Node2 Figure 73: Mode 2: Eye-Parameterization and Iridal Tracking Six C40 nodes take part in Mode 2 operation. These are the Frame Grabber Node, two Parameterization Nodes, two Iridal Tracker Nodes, and a Display Node. The general interaction of these nodes is shown in Figure 73. The Iridal Tracking 132 Nodes track the the iridal region until the Parameterization Node 1 signals to the Iridal Tracker Node 1 that the parameterization is complete, also indicating whether or not the parameterized region was likely an eye. Iridal Tracker Node 1 then either sets the system back to Mode 1 in the case of a non-eye parameterization or sets the system to Mode 3. 7.2.3.1.Frame Grabber Node During Mode 2 operation, the grabber receives area-of-interest information region information from Iridal Tracker Node 1, and sends images segmented according to this information to Iridal Tracker Node 2. It also continues to send entire images to the Iridal Tracker Node 1. The entire images are sent because it is the most efficient way to get them to the Display Node. The Frame Grabber Node in Mode 2 is responsible for the following activities: • Copying the image from the frame-grabbers dual port R A M • Reducing the image size by sub-sampling by a factor of two along both the horizontal and vertical image axis • Packing byte pixel data into 4 byte dword packets • Receiving chosen Eye-event region information from the Iridal Tracker Node 1 • Sending the image data to the Iridal Tracker Node 1 • Sending the region of the image data with the Eye-event to the Iridal Tracker Node 2 133 7.2.3.2.Iridal Tracker Nodes 1 And 2 The Iridal Tracker Nodes use the Iridal Region Tracking algorithms as previously described in Chapter 5, Subsection 2. These nodes track the iridal region in parallel by splitting the pupil/iris filter search region equally between themselves. Communication between the Iridal Tracker Nodes consists of search region parameters sent from Node 1 to Node 2, and search results sent from Node 2 to Node 1. The search is therefore coordinated by Node 1. Node 1 signals to the grabber that it is ready to receive the next image by sending the new segmented-image, region parameters to the Frame Grabber Node. At this point, the Frame Grabber Node sends the segmented image to the Iridal Tracker Node 2. It has, however, already sent the new image to Node 1. Because the node will buffer the image until it is requested, it speeds the image receiving time to less than 1ms. The Iridal Tracker Node 1 is responsible for the following activities: • Receiving the current image data from the Frame Grabber Node • Defining a search region based on the last known iris position • Dividing the search region into two equal subregions and sending the parameters of one subregion to Iridal Tracker Node 2 • Performing the iridal search algorithm on the subregion not sent • Receiving search results for the sent subregion from Iridal Tracker Node 2 • Finding the best iridal parameterization from the combined subregions • Sending the best iridal match information to the Display Node 134 • Sending the current image to the Display Node The Iridal Tracker Node 2 is responsible for the following activities: • Receiving the segmented image from the Frame Grabber Node • Receiving the search parameter information from the Iridal Tracker Node 1 • Performing the iridal search algorithm using the search parameter information • Sending the search results to Iridal Tracker Node 1 7.2.3.3.Parameterization Nodes 1 And 2 The Parameterization Nodes run the eye parameterization routines described in Chapter 4. The parameterization search regions are divided equally between the two nodes. As the Parameterization Node 2 is working on image regions specified by Parameterization Node 1, only the image region of interest is passed to Parametrization Node 2. The orientation of the head is not utilized by the current parameterization software, and so the head must remain within 15° of vertical to be parameterized accurately. All other aspects of the parameterization algorithms are implemented as previously described. Once the Parameterization Nodes have finished parameterizing the current image, Parameterization Node 1 calculates the overall Goodness-of-fit criteria for the 135 entire parameterization as described in Chapter 4, Subsection 12. The individual Goodness-of-fit inequalities are calculated during the parameterization process, and the parameterization can be aborted at various points should one fail (e.g. after the fine-scale iridal region parameterization, after the fine-scale upper lid parameterization, etc.). If the overall goodness-of-fit calculation indicates an acceptable fit, the Parameterization Nodes are switched into Mode 3 operation. They become the Eyelid Tracker Nodes 1 and 2 respectively. The Parameterization Node 1 sends the parameterization results to Iridal Tracker Node 1, indicating that the system has switched to Mode 3 In this case, the parameterization results, as well as the image on which the parameterization was done, are also sent to the Display Node. If the Goodness-of-fit calculation indicates that an acceptable fit has not occurred, then a message is sent to the Iridal Tracker Node informing it that the system should be switched back to Mode 1 operation. At this point, Iridal Tracker Node 1 informs the Frame Grabber Node to switch to Mode 1 (send images only to the Motion Segmentation Node) and then once again becomes the Motion Segmentation Node. Minimum Goodness-of-fit values are shown in Table 8. Goodness Of Fit Subsection Min. Value Pupil 20 Iris 32 Upper Lid 15 136 Lower Lid 10 Min Allowable Total, Thfil: 77 Table 8: Goodness-of-fit Calculation Threshold Parameters The Parameterization Node 1 is responsible for the following activities: • Receiving an image from the Motion Segmentation Node • Dividing the pupil and iris search regions into two equal subregions and sending the search information for one of these subregions to Parameterization Node 2 • Sending the image received from the Motion Segmentation Node to the Parameterization Node 2 • Searching for the pupil and iris in the subregion not sent • Receiving the pupil and iris search information from Parameterization Node 2 • Choosing the best pupil and iris parameterizations from the combined subregions • Parameterizing the average scleral, hair, and skin intensities using the algorithms previously discussed • Splitting the upper and lower eyelid search space into two equal subregions and sending one of the subregions to Parameterization Node 2 • Receiving the upper and lower eyelid search results from Parameterization 137 Node 2 • Choosing the best upper and lower eyelid search results from Parameterization Node 2 • Calculation of the overall goodness-of-fit parameter based on the 2-D eye model parameterizations • Sending the parameterization results and the goodness-of-fit results to Iridal Tracker Node 1 • Sending the parameterization results to the Display Node Parameterization Node 2 is responsible for the following activities: • Receiving the image to be parameterized from Parameterization Node 1 • Receiving the pupil and iris search space from the Parameterization Node 1 • Searching for the pupil/iridal and iridal/scleral boundaries in the received search space • Sending the pupil and iris search results to Parameterization Node 1 • Receiving the upper and lower eyelid search space • Searching for the upper and lower eyelid boundaries in the received search space • Sending the upper and lower eyelid search results to Parameterization Node 1 138 The timing of the Parameterization algorithm subsections is listed in Table 9. Parametrization Subsection Min. Time Required Max.Time Required Send Image To Parameterization Node 1 6ms 6ms Send Image To Parameterization Node 2 1ms 1ms Stage 1 Coarse Pupillary/ Iridal Parameterization 1 ms 12 ms Stage 2 Coarse Pupillary/ Iridal Parameterization 1 ms 5 ms Fine Scale Iridal Parameterization 2 ms 7 ms Scleral Search 0.5 ms 1 ms Hair And Skin Parameterization 0.5 ms 1 ms Coarse Upper Eyelid Parameterization 2 ms 10 ms Coarse Lower Eyelid Parameterization 1 ms 9 ms Fine Upper Eyelid Parameterization 1 ms 8 ms Fine Lower Eyelid Parameterization 1 ms 7 ms Goodness-of-fit Calculation 0.5 ms 0.5 ms Switch To Mode 3 Operation 1 ms 2 ms Total Time 24.5 ms 69.5 ms Table 9: Parameterization Algorithm Subsection Timings 7.2.3.4.Display Node The Display Node does not change modes with the rest of the system. 139 However, as there is no Motion Segmentation Node present during Mode 2, screen area 2, the difference image sequence, and screen area 3, the Eye-event regions, are not updated. The Display Node receives parameterization information and the image used for this parameterization from the Parameterization Node 1. The Display Node then draws the pupil, iris, and eyelid boundaries, as well as the selected scleral region, directly on the image. This modified image is then displayed in screen area 4. 7.2.4. Mode 3: Iridal and Eyelid Tracking Mode 3's purpose is to track the chosen eyes iridal /scleral and eyelid boundaries. It consists of six C40 nodes as shown in Figure 74. The Frame Grabber Node, Display Node, two Iridal Tracker Nodes, and two Eyelid Tracker Nodes comprise the Mode 3 system. Rome Grabber Nate Iricd Tracker Nxtel T r c d d r g ^ ^ paaretes Segnented irrage Iricd Trccker Nxte2 Irrage end iriebpercmefers Segrented irrage end iricd pcrcmefers SegTBnted image •srJoy Node Eyelid Trccker Nxte l Inrcgecrid e»elidpacm3ters Figure 74: Mode 3: Iris and Eyelid Tracking 140 During Mode 3, the Frame Grabber Node and Display Node operate exactly as they do during Mode 2 operation. The Iridal Tracking pair run the same algorithm as in Mode 2, their timing results are therefore identical to those during Mode 2. The only difference in operation is Iridal Tracker Node 2 sending the segmented image to Eyelid Tracker Node 2 and the Iridal Tracker Node 1 sending the segmented image and new iris parameterization to the Eyelid Tracker Node 1. Iridal Tracker Nodes 1 and 2 both send the segmented images to their respective Eyelid Tracker Nodes as soon they receive the image themselves. The Eyelid Tracker Nodes background parallel port/DMA task buffers the image data until it is requested by the main task. In this way, the apparent image transfer time is reduced from 6ms down to less than lms. The Iridal Tracker Nodes use the Iridal Region Tracking algorithm as previously described in Chapter 5, Subsection 1, and also described for Mode 2 operation. The Iridal Tracker Node 1 is responsible for the following activities: • Receiving the current image data from the Frame Grabber Node • Defining a search region based on the last known iris position • Dividing the search region into two equal subregions and sending the parameters of one subregion to Iridal Tracker Node 2 • Performing the iridal search algorithm on the subregion not sent • Receiving search results for the sent subregion from Iridal Tracker Node 2 • Finding the best iridal parameterization from the combined subregions 141 • Sending the best iridal match information to the Display Node • Sending the current image to the Eyelid Tracker Node 1 The Iridal Tracker Node 2 is responsible for the following activities: • Receiving the segmented image from the Frame Grabber Node • Receiving the search parameter information from the Iridal Tracker Node 1 • Performing the iridal search algorithm using the search parameter information • Sending the search results to Iridal Tracker Node 1 • Sending the segmented image to the Eyelid Tracker Node 2 During Mode 3 operation, both the Iridal Tracker Node 1 and the Eyelid Tracker Node 1 are performing Goodness-of-fit calculations on each frames parameterization. If any of these fall below the acceptable minima, Thflt, specified for the goodness-of-fit calculation and described in Table 8, or below 50% of the best fit of the last 5 frames (to avoid sharp quality transitions), the system is returned to Mode 1 operation. If the Iridal Tracker Node 1 loses a good track then it will send a message to both the Frame Grabber Node and Eyelid Tracker Node 1, informing them to switch to Mode 1 operation. It will then itself become the original Motion Segmentation Node. If the Eyelid Tracker Node 1 loses its good track, it will instruct the Iridal Tracker Node 1 to behave as if it had just lost track, and the system switches to Mode 1 operation. 142 If after the Eyelid Tracker Nodes have finished processing an image, Eyelid Tracker Node 1 determines that the processing took longer than 66ms, the tracker will skip the next image to retain the overall system frame rate. The Eyelid Tracker Nodel is responsible for the following activities: • Receiving the current image from the Iridal Tracker Node 1 • Defining a search region based on the last known eyelid position • Dividing the search region into two equal subregions and sending the parameters of one subregion to the Eyelid Tracker Node 2 • Performing the eyelid search algorithm on the subregion not sent • Receiving the eyelid search results from the Eyelid Tracker Node 2 • Choosing the best eyelid search results from the combined subregion searches • Sending the chosen eyelid parameters to the Display Node • Sending the current image to the Display Node The Eyelid Tracker Node 2 is responsible for the following activities: • Receiving the current segmented image from the Iridal Tracker Node 2 • Receiving the search parameter info from the Eyelid Tracker Node 1 • Performing the eyelid search algorithm within the specified search region • Sending the search results to Eyelid Tracker node 1 During Mode 3 operation, the Display Node behaves identically to its Mode 2 operation. The C40 tracking algorithms performed identically to the SPARC simulation 143 algorithms and were tested at up to 15fps. Eye feature track was maintained as a subject moved slowly, = 10 cm/sec, along the e2 axis, as well as towards and away from the camera at a similar rate. Future work should add to the Display Node functionality, the running of the point-of-gaze detection algorithm described in Chapter 6. The user's distance from the imaging camera could be entered manually into the system, and the point-of-gaze coordinates displayed for a fixed distance target plane. This would allow the accuracy of the C40 system to be analysed quantitatively as was done with the SPARC simulation in addition to the qualitative accuracy analysis presented here. 144 8. SUMMARY CONCLUSIONS AND RECOMMENDATIONS FOR F U T U R E W O R K This thesis has presented research into an eye tracking system. The main focus was on designing a system which could adapt to different users, not requiring manual adjustment for an individual user. Additionally, the system was required to provide freedom of movement for the user. Algorithms for the location of eyes in an image sequence were presented. These algorithms locate eyes using spatio-temporal motion regions to identify Eye-events. Eye-events with a goodness-of-fit parameter above a set threshold are passed onto the parameterization routines. Algorithms for the parameterization of eyes in an image were presented. These algorithms accurately parameterize the iridal/scleral and pupillary/iridal boundaries. They also parameterize the upper and lower eyelid boundaries and the average skin and hair intensities. By utilizing filters designed using a priori knowledge of eye feature shapes and boundaries on these shapes, the Eye-event regions are efficiently parameterized. Modified versions' of the eye parameterization algorithms were presented to track eye motions. The goal of these algorithms is to use previous parameterization vectors to track the incremental eye motions and deformations in an image sequence. Efficiency while maintaining accuracy is the main focus of these algorithms. Tracking speed is enhanced by utilizing information about eye motion limits to 145 reduce the parameter search space. These algorithms were developed and tested using Matlab on a Sparc workstation. They were run as three distinct functions: eye location, eye parameterization, and eye tracking/eye-gaze parameterization. The algorithms were then developed into a multi-processor eye tracking system utilizing six Texas Instruments TMS320C40 digital signal processors configured as a parallel processing mesh. The timing results presented in this thesis were produced using the C40s. The algorithms were made to work as well on the C40's as they did using Matlab, however the communication between the nodes was not implemented completely. A future project could be the further analysis of the implementation of the eye tracking algorithms on the C40 network and the elimination of the timing problems between the nodes. The eye-gaze calculation routines were not implemented on the C40 system; this could also be part of a future project. The C40 implementation of the Motion Segmentation algorithms was able to detect an Eye-event utilizing small eye-gaze changes or blinks. This allowed the system to begin Mode 2 operation usually in under 2 sec. The Tracking algorithms implemented on the C40 system were able to track parameterized eye features at up to 15 fps during saccadic eye motion without the requirement for a fixed head position The eye feature Parameterization algorithms gave excellent results, visual inspection of these results show that the algorithms are able to find the eye feature boundaries near the limit of the image resolution. The Eye-gaze parameterization 146 algorithms utilizing these results were however not as accurate as anticipated at some gaze angles. Using the SPARC system, the accuracy of the system's determination of a person's point-of-gaze was compared to the actual target positions, showing an error of 4.01° or less along the horizontal axis and 4.99° or less along the vertical axis for targets at or above the eye level of the subject. The accuracy of the system lowers to 5.41° or less along the horizontal axis and 15.76° or less along the vertical axis for targets below the eye level of the test subject. The accuracy of the results that our system has achieved is lower than the accuracy achieved by some current eye tracking systems [17], [18], [19], [20], and [21]. However, our system requires no prior calibration for individual users and allows a freedom of head motion not found on other systems. Although calibration would have increased the accuracy of our system, its absence provides us with a more robust system. As seen in the parameterization results, the center of the iris is calculated accurately. The position of the eyeball in 3-space is also calculated accurately. The center of the scleral sphere therefore needs to be calculated more accurately as it is appears to be the major reason for the lack of accuracy found in the point-of-gaze calculation routine. In this thesis, the center of the scleral region is calculated using the center of mass and corner to corner distance of the area bounded by the eyelid region. A future project could be the improvement of the calculation of the scleral sphere center One weakness of this thesis is the relatively few subject trials documented. Future work should be done to increase the number of trials performed, increasing the 147 confidence level in the algorithms described. Another future optimization would be the use of elliptical segments to parameterize the eyelid boundaries. Since the eyelids can be modeled as straight pieces of skin stretched over a spherical object, they form elliptical intersections with the sphere. Using non-linear least squares fitting, the eyelids could be fit in a manner similar to the parabolic fitting previously described in this thesis. The intersection of the eyelid boundaries with the scleral sphere could be used to give an estimation of its center, as they describe curves which uniquely define the sphere's parameters. The eye tracking algorithms presented in this thesis assume that a normal to the coronal plane9 is parallel to a normal to the imaging plane. This assumption places a restriction on the head orientation of the user. Since we calculate the center of the scleral sphere as a scleral radii directly back into the imaging plane from the eye center calculated using the eyelid parameterizations. As the normals of the imaging and coronal planes begin to differ, the use of the eyelid parameterizations becomes less accurate. A future improvement would be the automation of the determination of the distance from the camera's imaging plane to the subjects corneal surface. This could be accomplished using stereo correlation. A second camera would be required, located a fixed distance from the first and with its optical axis parallel to that of the first camera. By sampling images from both cameras a small time interval apart, 9 a plane constructed parallel to the face of the subject. 148 stereo information present between the two images could be used to derive the depth to image features. Small areas in the neighbourhood of the parameterized scleral/iridal boundary could be chosen to be correlated with the image from the second camera. The scleral/iridal boundary would likely be chosen due to its high spatial frequency content, making it a good choice for correlation matching. The best correlation match within a local region of the second image would be noted and its positional difference in relation to the original mask from the first image used as a measure of the disparity between the two images at the depth of the feature. This disparity would directly relate to the depth of the feature being correlated from the imaging plane of the camera Another future improvement would be to calculate the coronal plane of the subject. This would require a sequence of stereo image pairs instead of the single images currently used. Test points on the subjects face would be stereo correlated as previously described to determine their depth in the image plane. The coronal plane would then be calculated from these points. Depth calculations on the two iridal regions would be used to determine a vector in the e2 direction of the face. A vector in the ei direction is a more difficult calculation. Depth calculations of the subject would have to be taken along the nose and forehead to form a "T" shaped depth map with the head in a known orientation. These measurements would then be used to calculate a contour of the subjects face in the ei direction. This contour would then be tracked to continually update the heads orientation about the e2 axis. The depth of the iridal regions would also be 149 continually tracked to update the head orientation about the e/ axis. These two orientations determine the coronal plane. The center of the scleral sphere would then be calculated as a scleral radii along the normal to the coronal plane from the center of the eye region bounded by the eyelid parameterizations. 150 REFERENCES [1] R. J. Martin and M . G. Harris, "Eye Tracking Joystick," Display System Optics, SPIE Volume 778, 1987. [2] D. C. Johnson, D. M . Drouin, and A. D. Drake, "A Two Dimensional Fiber Optic Eye Position Sensor For Tracking And Point-Of-Gaze Measurements," CH2666-6/88/0000-0012, IEEE 1988 [3] Y. Ebisawa, K. Kaneko, and S. Kojima, "Non-invasive, Eye-Gaze Position Detecting Method Used On Man/Machine Interface For The Disabled," Computer Based Medical Systems: Fourth Annual IEEE Symposium, 1991. [4] T. E. Hutchinson, K. P. White, W. N. Martin, K. C. Reichert, and L. A. Frey, "Human-Computer Interaction Using Eye-Gaze Input," IEEE Transactions On Systems, Man, and Cybernetics, Vol. 19, Nov./Dec. 1989. [5] L. A. Frey, K. P. White, Jr., and T. E. Hutchinson, "Eye-Gaze Word Processing," IEEE Transactions On Systems, Man, and Cybernetics, Vol. 20, July/Aug. 1990. [6] K. P. White Jr., T. E. Hutchinson, and J. M . Carley, "Spatially Dynamic Calibration Of And Eye-Tracking System," IEEE Transactions On Systems, Man, And Cybernetics, Vol. 23, No. 4, July/Aug. 1993. [7] T. N. Cornweet and H. D. Crane, "Accurate Two-dimensional Eye Tracker Using First And Fourth Purkinje Images," J. Opt. Soc. Amer., Vol. 63, No. 8, Aug. 1973. 151 [8] A. L. Yuille, D. S. Cohen, and P. W. Hallinan, "Feature Extraction From Faces Using Deformable Templates," IEEE Computer Vision And Pattern Recognition, 1989. [9] A. L. Yuille, D. S. Cohen, and P. W. Hallinan, "Facial Feature Extraction By Deformable Templates," Harvard Robotics Lab. Tech., Rep. 88-2, 1988. [10] M . A. Shackleton and W. J. Welsh, "Classification Of Facial Features For Recognition," Image Processing Research Group, British Telecom Research Labs, 1991. [11] J. Waite and W. Welsh, "Head Boundary Location Using Snakes," British Telecom Technical Journal, Vol. 8, No. 3, 1990. [12] M . Kass, A. Witkin, and D. Terzopolous, "Snakes: Active Contour Models," Proceedings First International Conference On Computer Vision, London, June 1987. [13] K. V. Mardia, T. J. Hainsworth, J. F. Haddon, "Deformable Templates In Image Sequences," Department Of Statistics, the University Of Leeds, 1992. [14] J. Rehg and A. Witkin, "Visual Tracking With Deformable Models," IEEE International Conference On Robotics And Automation, Sacramento, CA, April 1991. [15] Cristophe Collet, Alain Finkel, Rachid Gherbi, "CapRe: a gaze tracking system in man-machine interaction", LSV-CNRS & ENS, Cachan Cedex, France, 1997 152 [16] R. Stiefelhagen, J. Yang, and A. Waibel, "A model-based gaze tracking system", Proceedings of IEEE International Joint Symposium On Intelligence and Systems - Image, Speech, and Natural Language Systems, Washington DS, USA, 1996. [17] M . Ohtani and Y. Ebisawa, "Eye-gaze detection based on the pupil detection technique using two light sources and the image difference method", IEEE Engineering in Medicine and Biology, 1995. [18] A. Sugioka, Y. Ebisawa and M . Ohtani, "Non-contact video-based eye-gaze detection method allowing large head displacements", Faculty of Engineering, Shizuoka University, 1997 [19] S. Baluja, D. Pomerleau, "Non-invasive gaze tracking using artificial neural networks", C M U Technical Report, CMU-CS-94-102, 1994. [20] L C Technologies Inc., 9455 Silver King Court, Fairfax, Virginia, 22031, Voice: (703) 385-7133 Fax: (703) 385-7137 [21 ] EyeTech Digital Systems, L L C , Mesa, Arizona, Voice: (602) 386-6303 [22] R. Szelski and S. B. Kang, "Recovering 3D Shape And Motion From Image Streams Using Nonlinear Least Squares," Journal Of Visual Communication And Image Representation, Vol. 5, No. 1, March 1994. [23] R. C. Gonzalez and P. Wintz, "Digital Image Processing", Second Edition, Addison-Wesley Publishing Company, Inc., 1987. [24] J. R. Crauley-Dillon, "Vision And Visual Dysfunction Volume 2: Evolution Of The Eye And Visual Systems", CRC Press," CRC Press, Boca-Raton, 153 Florida, 1991 R. H. S. Carpenter, "Vision And Visual Dysfunction: Volume 8: Movement", CRC Press, Boca-Raton, Florida, 1991 D. Marr and E. Hildreth, "Theory Of Edge Detection," Proceedings of Royal Society, London, B 207, 1980 154 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0065130/manifest

Comment

Related Items