UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A real-time 3D motion tracking system Kam, Johnny 1993

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1993_spring_kam_johnny.pdf [ 4.12MB ]
JSON: 831-1.0051434.json
JSON-LD: 831-1.0051434-ld.json
RDF/XML (Pretty): 831-1.0051434-rdf.xml
RDF/JSON: 831-1.0051434-rdf.json
Turtle: 831-1.0051434-turtle.txt
N-Triples: 831-1.0051434-rdf-ntriples.txt
Original Record: 831-1.0051434-source.json
Full Text

Full Text

A Real-time 3D Motion Tracking SystembyJOHNNY WAI YEE KAMB.Sc., The University of British Columbia, 1990A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEIN THE FACULTY OF GRADUATE STUDIESDEPARTMENT OF COMPUTER SCIENCEWe accept this thesis as conforming to the required standardTHE UNIVERSITY OF BRITISH COLUMBIAApril, 1993© Johnny Kam, 1993In presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives. It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of C.:1*-7h., The University of British ColumbiaVancouver, CanadaDate  /11,1"c1 )^ric7?-3 DE-6 (2/88)AbstractVision allows one to react to rapid changes in the surrounding environment. The abil-ity of animals to control their eye movements and follow a moving target has alwaysbeen a focus in biological research. The biological control system that governs the eyemovements is known as the oculo motor control system. Generally, the control of eyemovements to follow a moving visual target is known as gaze control.The primary goal of motion tracking is to keep an object of interest, generally knownas the visual target, in the view of the observer at all time. Tracking can be drivenby changes perceived from the real world. One obvious change introduced by a movingobject is the change in its location, which can be described in terms of displacement. Inthis project, we will show that by using stereo disparity and optical flow, two significanttypes of displacements, as the major source of directing signals in a robotic gaze controlsystem, we can determine where the moving object is located and perform the trackingduty, without recognizing what the object is.The recent advances in computer hardware, exemplified by our Datacube MaxVideo200 system and a network of Transputers, make it possible to perform image processingoperations at video rates, and to implement real-time systems with input images obtainedfrom video cameras. The main purposes of this project are to establish some simplecontrol theories to monitor changes perceived in the real world, and to apply such theoriesin the implementation of a real-time three-dimensional motion tracking system on abinocular camera head system installed in the Laboratory for Computational Intelligence(LCI) at the Department of Computer Science of the University of British Columbia(UBC).The control scheme of our motion tracking system is based on the Perception-Reasoning-Action (PRA) regime. We will describe an approach of using an active monitoring processtogether with a process for accumulating temporal data to allow different hardwarecomponents running at different rates to communicate and cooperate in a real-timesystem working on real world data. We will also describe a cancellation method toreduce the unstable effects of background optical flow generated from ego-motion, andcreate a "pop-out" effect in the motion field to ease the burden of target selection. Theresults of various experiments conducted, and the difficulties of tracking without anyknowledge of the world and the objects will also be discussed.11ContentsAbstract ^Table of Contents ^List of Figures ^viAcknowledgements ^  vii1 Introduction 12 Background and Related Work^ 62.1 The Biological Oculomotor Control System  ^62.2 Detection of Moving Objects  ^82.3 Integrating Stereo and Optical Flow ^  112.4 Active Vision, Behavioral Vision, Animate Vision, and Passive Vision^122.5 Gaze Control ^  142.5.1 Real-time Binocular Gaze Holding System at Rochester ^ 163 Proposed Techniques and Control Theories^ 183.1 Objectives, Purposes, and Assumptions  183.2 Perception-Reasoning-Action Control Scheme ^  213.2.1 Perception ^ Computing Optical Flow and Stereo Disparity ^ An Active Monitor ^^Optical Flow Accumulation ^3.2.2^Reasoning 25283.2.2.1^Cancelling Background Optical Flow Caused by Ego-Motion^Segmenting the Optical Flow Field into Connected Com-ponents ^ 313.2.2.3^Picking the Visual Target ^ 333.2.3^Action 353.3 Discussion of a Prediction System ^ 374 Implementation 404.1 Overview^ 404.2 Hardware Configuration for the LCI Robot Head ^ 404.3 Software Description ^ 454.3.1^The Datacube Program ^ 454.3.2^The Transputer Programs 484.3.2.1^The G.R.S. Server ^ 494.3.2.2^The Optical Flow Accumulator ^ 504.3.2.3^The Tag Program^ 524.3.2.4^The Front Program 564.3.2.5^The Back Program ^ 574.4 Computing the Motion Parameters of the Robot Head ^ 584.4.1^Panning and Tilting ^ 584.4.2^Verging ^ 595 Evaluation and Discussion 655.1 Evaluation and Performance of our Motion Tracking System ^ 665.1.1^Experiments, and What Can Be Done ^ 68iv5.1.1.1 Detection of Moving Objects when Robot Head is NotMoving ^  685.1.1.2 The Vergence Only Experiment ^  695.1.1.3 Panning and Tilting without Verging  725.1.2 Deficiencies, Problems, and What Cannot Be Done ^ 745.1.2.1 Rigid Objects Assumption ^  745.1.2.2 Disadvantages and Problems of Using Correlation Matching 755.1.2.3 Problems with Background Optical Flow Cancellation^765.1.2.4 Reliance on Connectedness ^  775.1.2.5 Panning, Tilting, and Verging Simultaneously ^ 785.1.2.6 Other Minor Issues ^  785.2 Additional Discussion ^  795.2.1 The Dumb Motion Trackers versus Our Motion Tracker ^ 795.2.2 Thresholding in Segmentation ^  815.2.3 The Selection Methods ^  825.2.4 Robustness versus Speed Tradeoff ^  825.2.5 Comparisons with Other Motion Tracking Systems ^ 836 Conclusions and Future Directions^ 856.1 Concluding Remarks ^  856.2 Future Work and Possible Improvements ^  88vList of Figures3.1 PRA Communication ^ 213.2 The Perception System 233.3 Optical Flow Accumulation ^ 273.4 Data Transmission between the Perception and Reasoning Systems 283.5 The Two-Dimensional and Three-Dimensional Looks of the Optical FlowField ^ 343.6 Communication between the Reasoning and Action Systems ^ 374.1 Hardware Configuration of the LCI Robot Head System^ 424.2 The LCI Robot Head ^ 434.3 Various Motions of the Robot Head ^ 444.4 Software Components and Data Flow 464.5 The Four Subframes in the Output Image of the Datacube Program 474.6 An Example Optical Flow Field and the Result Returned by the LabellingAlgorithm ^ 554.7 The Geometry of Using Stereo Cameras ^ 604.8 Determination of the 8 verge angle ^ 625.1 Results of the Motion Detection Experiment ^ 70viAcknowledgementsMany thanks ...... to my parents, Mr. Gilbert Sik-Wing Kam and Mrs. Muner Lee Kam, for your loveand faith all these years, for providing me an excellent environment to grow up, andfor being so understandable and patient during difficult times. This thesis cannot befinished without your support and encouragement.... to my sister, Elaine, the Pharm.D. to be, my brothers, Timothy, the M.D. to be, andDanny, the software engineer, for everything! Well, what can I say more? I am proud ofall of you.... to Dr. James Little for supervising this thesis, for always coming up with new ideasand problems, and for being such a good friend during my years of studying at UBC.Wishing you and your family all the best, Jim.... to Mr. Vincent Manis, for your encouragement and effort of making me work so hardfor my undergraduate degree.... to Dr. Alan Mackworth for reading this thesis.... to Dr. David Poole for teaching me to be reasonable.... to Dr. Robert Woodham and Dr. David Lowe for making Computational Visioninteresting and challenging.... to Mr. Dan Razzell, our LCI manager, for helping me out all these years.... to Rod Barman and Stewart Kingdon, our LCI technical staff, for setting up thehardware environment, and for providing the still usable software.... to Ms. Valerie McRae, our CIAR secretary, for the caring and support, and forproofreading part of this thesis.... to my friends, Pierre Poulin, Chris Healey, Carl Alphonce, Mike Sahota, Swamy,Esfandiar Bandari, Stanley Jang, Yggy King, Art Pope, Ying Li, Andrew Csinger, ScottFlinn and Karen Kuder, for valuable ideas, suggestions, and discussion.... to George Phillips for helping me numerous times with the IATEXand PostScriptproblems.... to all the present and former staff and graduate students at UBC CPSC for makingthis department a friendly and enjoyable place to work in.THANK YOU!Johnny KamVancouver, B.C.CANADAApril 18, 1993viiChapter 1IntroductionVision allows one to react to rapid changes in the surrounding environment. Cues foranimals to notice such changes are mostly visual. One obvious change introduced by anymoving object is the change in its location with respect to other stationary objects inthe environment, where such change can usually be described in terms of displacement,the difference in locations.It has always been a main focus in research to examine animals' abilities to perceivethe changes incurred by moving objects and to react to those changes simultaneously. Aparticularly interesting area is the ability of animals to control their eye movements andfollow a moving target. The biological control system that governs the eye movements isknown as the oculomotor control system. Within such control system, the two significanttypes of eye movements are saccade and vergence. The saccadic, or gaze shifting, systemenables the observer to transfer fixation rapidly from one visual target to another, whilethe vergence system allows the observer to adjust the angle between the eyes so thatboth eyes are directed at the same point. The control of eye movements to follow amoving visual target is generally known as gaze control.1Chapter I. Introduction^ 2Functionally, gaze control allows one to change the direction of gaze from one positionto another, and consequently, one can maintain gaze on a chosen target, or in other words,fixate a moving object in the visual system. One can, as a result, gather additionalinformation about such object for further analyses and tasks such as recognition andlearning. The cooperation of gaze shifting, gaze holding, and vergence thus allows thetask of motion tracking to be performed.With recent developments in sensors, parallel processors, and special purpose im-age processing hardware, it is now possible [Little et al., 1991] to attempt to buildrobotic devices that can simulate an animal's ability to visually track an object mov-ing in three-dimensional space [Ferrier, 1992] [Christensen, 1992] [Jenkin et al., 1992][Pahlavan and Eklundh, 1992] [Crowley et al., 1992] [Pretlove and Parker, 1992]. Themain purposes of this project are to establish some simple control theories to monitorchanges perceived in the real world, and to apply such theories in the implementation ofa real-time three-dimensional motion tracking system on a binocular camera head systeminstalled in the Laboratory for Computational Intelligence (LCI) at the Department ofComputer Science of the University of British Columbia (UBC). The primary goal ofthis tracking system is to center the image of the object of interest, in this case beingan object in motion, as quickly as possible. Such passive motion tracking system mustbe comprised of a module to detect moving objects, a module to select the visual target,and a module to respond in the form of gaze shifting and verging the binocular head. Itis also our interest to investigate how different systems, namely, the Datacube MaxVideosystem and a network of Transputers, which run at different rates, can communicate andcooperate in real-time.The detection of moving objects in a scene is difficult when the observer is also inmotion, because the dominant motion is usually generated by the moving observer, whichleads to a complex pattern of displacements. It is necessary that the system be able toChapter I. Introduction^ 3determine objects moving with respect to the stationary environment, rather than withrespect to the observer. Some segmentation process must occur to separate the apparentmotion, or ego-motion, caused by the moving observer, from the motion incurred bythe moving objects. The control system is required to stabilize gaze against ego-motionwhile tracking the moving target.The whole system, described in this report, will follow the Perception-Reasoning-Action (PRA) framework, but with very little representation of the world, or even theobserver. Real-time performance is necessary for a real-behaving system [Nelson, 1991].There is a delay between the time at which some visual observation is made and the timeat which the control command based on the observation is computed. To minimize suchdelays with our limited computational resources, we need to focus on the minimum pos-sible and non-trivial set of visual information necessary to achieve the purpose of motiontracking. The system has to be designed so that responses can be made appropriatelyand timely in the unpredictable environment, and that it can keep up with the pace ofthe world.Researchers have reported that humans have several interacting control systems thatstabilize gaze against ego-motion and follow moving targets, but failed to identify thenecessary visual cues that should be used in a robotic gaze control system for motiontracking [Ballard and Brown, 1992]. In this project, we show that by using stereo dis-parity and optical flow, the two significant sources of displacement measures, as primaryvisual cues in the robotic gaze control system, we can determine where the moving objectis located and perform the tracking duty, without recognizing what the object is. Wecan easily fixate on a 3D location while ignoring the distracting surrounding motion byintegrating stereo with optical flow. In theory, the vergence system can provide the gazecontrol system with extremely useful input for filtering purposes, and for reducing thevolume of space to be considered.Chapter 1. Introduction^ 4Several researchers have recently been implementing some gaze control systems withencouraging results [Brown, 1989] [Coombs, 1992], and have demonstrated that specialpurpose hardware are required to perform complicated computation in real-time. Asan alternative to using the existing techniques for motion tracking and replicating thework that has already been done, it is our intention to build a system which makesgood use of our current specialized hardware, and to carefully allocate resources so thatother types of computations can be performed at the same time. It is our belief that ifwe can continuously monitor the changes in the environment accessible to the cameras,we should be able achieve our objectives and compute displacements using a simplecorrelation matching technique, assuming that such technique can produce reliable anddense data.Our system uses optical flow and stereo disparity for tracking by panning, tilting,and verging the robot head. An active monitoring process along with the optical flowaccumulation processes form our perception system. The reasoning system consists of acancellation process to eliminate the unstable effects of the background optical flow. Asegmentation process is used to partition the flow field into connected components, andallows the visual target to be selected.Chapter 2 of this thesis describes the research related to the field of motion tracking.The work done on uncovering the mystery behind humans' ability to track moving objectsis discussed. Related work on motion detection and integration of stereo and optical flowwill be described. Other robotic gaze control systems developed at various research sites,particularly at the University of Rochester, are also discussed.Chapter 3 describes the PRA model, control theories, and techniques used in thisproject. The various assumptions made in designing the system will be described.Chapter 4 contains detailed descriptions about the implementation of our motionChapter 1. Introduction^ 5tracking system, in terms of both hardware and software.Chapter 5 presents the evaluation of our motion tracking system. The performance,drawbacks, and various problems will be discussed. We will describe the different exper-iments that have been carried out, and will also compare our motion tracking systemwith other types of tracking systems implemented elsewhere.The conclusions and directions of future work will be followed in Chapter 6.Chapter 2Background and Related Work2.1 The Biological Oculomotor Control SystemResearchers have spent considerable amount of effort investigating on the ability of ani-mals, particularly humans, to visually track moving objects. The control system respon-sible for such trackings directs the eyes to move rapidly to follow a visual target and tostabilize the images of such target on the retina in spite of relative movements betweenthe target and the observer.The rapid eye movements are known as saccades. During these movements, botheyes rotate in same direction. It has been reported that animals do not see well duringsaccadic movements, and that the oculomotor control system may become unresponsiveto stimulus during these movements. Therefore, the oculomotor control system willattempt to make the duration of each movement as small as possible [Robinson, 1968].A separate system known as the smooth pursuit system, which responds to targetvelocity regardless of target position, operates independently with the saccadic system for6Chapter 2. Background and Related Work^ 7stabilizing images on the retina. It is worth noting that the saccadic system is a sampledcontrol system, whereas smooth pursuit is continuous. Researchers have discovered thatsaccadic movements are made in response to a large burst of tension suddenly appliedand suddenly removed, while smooth pursuit movements are created by smaller smoothlyapplied forces.Humans possess two eyes mainly to perceive depth in the surrounding environment.Stereo disparity serves as a great visual cue to recognizing objects at different depth,and allowing the tracking of objects moving in three-dimensional space. With objectsmoving near to or far away from the observer, his eyes must make equal movementsbut in opposite directions, governed by the vergence system, in order to keep thetarget image focused and centered on the retina. When the eyes rotate nasally themovement is called convergence, and when they rotate temporally1 it is referred to asdivergence. It seems clear that a different control system is responsible for vergencemovements. Vergence movements allow humans to register an object on the fovea (thecentral, high-resolution region of the retina) of each eye, so that the greatest possibleamount of information about the object can be extracted. Experiments have shown thatsuch system appears to be continuous with a very low gain integrator, which makes it theslowest of all the oculomotor subsystems [Robinson, 1968]. It has also been reported thatthe disparity vergence system is not only sensitive to the amount of disparity betweenthe left and right retinal images, but also to the rate at which this disparity changes[Krishnan and Stark, 1977].It is considered that all changes of fixation are made entirely by mixtures of puresaccadic movements and pure vergence movements; that is, the two systems operateindependently, and it has been observed that they cooperate in a very complex way.'An anatomical term, meaning towards the temples or the sides of the forehead.Chapter 2. Background and Related Work^ 8Based on experimental results on stimulations of the brain, voluntary saccades originatein the frontal eye fields, involuntary saccades and smooth pursuit movements in theoccipital eye fields, and vergence movements in areas 19 and 22 of the brain. Theintersampling interval for saccadic movements is 200 milliseconds, and the time delay isof the order of 150 to 200 milliseconds in the disparity vergence system.2.2 Detection of Moving ObjectsDetection of moving objects from a stationary observer is easy, as the difference oftwo frames in an image sequence will show the direction and magnitude of the object'smotion. A more interesting and difficult problem is to detect moving objects from amoving observer, in most cases being an electronically controlled camera. Being ableto find out how much an object has moved with respect to the stationary environment,rather than with respect to the observer, is the key to determining whether or not theobject is in motion while the observer is moving.The dominant motion in the motion field is usually generated by the moving camera,if one assumes that the moving object is much smaller than the background environmentin the field of view of the camera. The central idea to identify a moving object involvessegmentation of the motion field based on consistency of the pixel values, e.g. the opticalflow vectors of a flow image, into separate components. Some robust segmentationalgorithms have been presented, such as the one reported in [Adiv, 1985], where the flowfield is partitioned into consistent segments, and independently moving rigid objectscan be analysed. A main interest in recent research is to identify which componentscorrespond to moving objects, and which portions of the motion field are caused by themoving observer, with or without knowing the ego-motion parameters.Chapter 2. Background and Related Work^ 9Gibson makes an interesting observation that the flow vectors due to all stationarycomponents intersect at the focus of expansion (FOE) [Gibson, 1979]. Moving objectshave their individual FOE, usually different from the FOE due to the stationary compo-nent. Jain describes how an ego-motion polar transform of the dynamic scenes acquiredby a translating observer can make moving objects easily distinguishable from stationaryones [Jain, 1984]. However, such technique requires intensive computation for the trans-formation and the recovery of the FOEs, and is rather inefficient to use in any real-timesystem.Thompson and Pong describe the principles of detecting moving objects under variouscircumstances [Thompson and Pong, 1987], but leave an open question on how to applysuch theories in practice. Different assumptions are made under different situations, andvarious detection algorithms are developed. In particular, situations where the cameramotion, either a translation or a rotation, is known or unknown are examined. Point-based, edge-based, and region-based techniques which relied on knowing the FOE areapplied, and the techniques have been shown to be quite robust against noise. Theyfurther state that no method for detecting moving objects will be effective if it dependson knowing precise values of optical flow. However, it is clear that the effectiveness of anyreliable technique is directly proportional to the preciseness of the input data. Movingobjects can be incorrectly identified if the input data is not reliable, and thus the motiondetection process becomes impractical.Another observation made during these studies is that if an object is being tracked,its optical flow is zero [Thompson and Pong, 1987]. This phenomenon can be seen in theideal situation where an observer is actively tracking a moving object, and knows aheadhow far the object will move courtesy of a prediction system, so that the optical flowof such moving target being followed is effectively zero. But this observation cannot bemade easily in a motion tracking system when the motion parameters of the camera is aChapter 2. Background and Related Work^ 10function of how far the object has moved in a certain time interval. In other words, if wehave a passive system whose actions are triggered by changes that are occuring in the realworld, then it is unlikely that the passive system can maintain the tracking operation on amoving object without non-zero optical flow input. However, the observation still holds ifone is to consider that the optical flow of the moving object produced is the displacementof such object with respect to the image plane before the motion has started and afterthe motion is completed.Some subspace methods and center-surround motion operators for recovering ob-server translation [Heeger et al., 1991] and a least squares method to solve the bright-ness change constraint equation for the motion parameters in the case of known depth[Peleg and Rom, 1990] have been introduced. These methods, however, may not be suit-able to apply in a real-time system due to the fact that the motion parameters mustbe recovered before different components can be identified and that extensive amount ofcomputation are usually required.The work by Nelson addresses the problem of identifying independently moving ob-jects from a moving sensor [Nelson, 1990]. Two methods are discussed, one making use ofthe information about the motion of the observer, and the other using knowledge abouthow certain types of independently moving objects move. The implementations thatrun in real-time on a parallel pipelined image processing system are described. The firstmethod, called constraint ray filtering, is based on the idea that in any rigid environment,the projected motion of any point is constrained to lie on a one dimensional locus in thevelocity space whose parameters depend only on the observer motion and the location ofthe image point. An independently moving object can be detected because its projectedvelocity is unlikely to fall on this locus. The restriction is that observer's motion has tobe known prior to the computation. The second method, known as the animate motionmethod, uses the idea that the observer's motion is generally slow and smooth, whereasChapter 2. Background and Related Work^ 11the apparent motions of independently moving objects are comparatively changing morerapidly. The limitation of this method is that it is insensitive to smoothly moving ob-jects. These methods are in general quite resistant to noise, and can make use of motioninformation of low accuracy. Both methods have been successfully implemented on theDatacube MaxVideo system, and an update rate of 10 Hertz is achieved. Our system islike Nelson's since it combines motion, real-time processing, and known parameters ofthe observer's motion.Some researchers have recently said that it is easier to detect tracking signals in activevisual following than in passive tracking, as motion blur emphasizes the signal of targetover the background [Coombs and Brown, 1992].2.3 Integrating Stereo and Optical FlowStereoscopic motion analysis combines stereo data and motion data computed from thebinocular images sequence. It relies on the dynamics of the scene. Stereoscopic imageanalysis usually requires images to be taken from the same scene at the same timefrom two parallel viewing points lying on a horizontal line. Matching techniques havebeen developed to compute stereo disparities from the left and right images pair, asin [Marr and Poggio, 1976] [Drumheller and Poggio, 1986] [Fua, 1991], so that the 3Dlocations of the image points can be recovered providing that the geometry of the stereocameras configuration is given. Real motion in the 3D world projects to 2D images toproduce apparent motion in the image plane known as optical flow. A simple correlationmethod can be used to compute optical flow, the 2D motion vector. A reliable coarse-to-fine matching technique has been developed in [Anandan, 1989]. Some simple, efficient,and robust parallel motion and stereo algorithms, as presented in [Bulthoff et al., 1989]Chapter 2. Background and Related Work^ 12[Little and Gillett, 1990], can also be used.The 3D motion vector for each image point can therefore be computed using bothstereo and optical flow information. The 3D motion parameters, with respect to a worldcoordinate system, can then be recovered for a corresponding group with consistentmotion vectors.Previous work in combining stereo and optical flow was mostly aimed to either solvethe correspondence problem [Waxman and Duncan, 1986], or to recover the motion indepth parameters [Balasubramanyam and Snyder, 1988]. Stereo data provides absolutedepth information as additional constraints. These constraints will likely reduce thecomputation needed as compared to the situation where the motion parameters areto be determined without the depth, even though it has been shown that informationabout depth, structure, and motion of objects relative to the observer can be determinedfrom optical flow alone. Numerous experiments have also confirmed that a binocularcamera system provides a more robust sensing mechanism while operating under realisticconditions. Some work on extracting 3D motion and structural data illustrates thatit makes more sense to work in 3D but at the cost of being very sensitive to noise[Zhang and Fauger as, 1992].2.4 Active Vision, Behavioral Vision, Animate Vi-sion, and Passive VisionAn important idea in current computer vision research is that the vision process isdynamic rather than static [Clark and Ferrier, 1988] [Ballard and Brown, 1992]. As aresult of various work done by different research groups, several coherent themes haveemerged. The use of active sensing to continuously gather information has receivedChapter 2. Background and Related Work^ 13considerable attention. Issues on how to react to the rapid changes perceived and howto control the visual sensors have been the main foci in most studies.Active vision, as considered by most researchers, refers to the process of making avision problem that is ill-posed into a well-posed one, or in other words, converting anunderdetermined problem into an overdetermined one by changing the parameters ofthe visual observer. The two common tasks of any active vision system are to figureout where to look next, and to carry out the motion that will let one look there. Theclass of active vision algorithms shares the idea of using an observer actively providingconstraints that can simplify computation of image features, and help eliminating am-biguities. Active vision algorithms can be more robust than static algorithms, and areoften computationally efficient since irrelevant information is ignored, so that the processfor finding a solution is easier and the results are more reliable.In recent years, research at the University of Rochester has reflected the theme thatunderstanding the phenomenon of intelligence and discovering how to produce an artifi-cial one must proceed in the context of behaviour [Nelson, 1991]. Behavioral approach toAI, vision in particular, has received considerable attention as it is observed that mostideas for machine intelligence are inspired by the abilities of animals, particularly hu-mans. One important strategy that has been used is to throw out as much informationas quickly as possible, since the total quantity of information contained in a visual signaloften exceeds the capacity that any system can handle. The vision system has to keep upwith the pace of the world, and it does not compute all things at all time, but only whatit needs at a certain time. As a result, the amount of representation that is required maybe drastically reduced, thus freeing up the valuable limited computational resources.To emphasize the focus on the human-like aspects of vision and control schemes,the term animate vision was introduced. Animate vision is a framework for sequentialChapter 2. Background and Related Work^ 14decision making, gaze control, and visual learning [Ballard and Brown, 1992]. As statedin the report, the interactionist approach is based on the idea that the world and theperceiver should participate jointly in computation, and that neither is complete withoutthe other. For instance, the world can be viewed as an external memory, where gazecontrol positions the eyes appropriately at the point of application to allow decisions tobe made.Prediction and perhaps learning are the most important ingredients that distinguishactive systems from passive ones. If the goal of an active vision system is to track amoving object, it may have to first actively explore the environment to look for a specifictarget to follow, with or without recognition. A prediction engine will cooperativelycommand the observer to track the target, in addition to the signals returned by themotion detector. The behaviour of the visual target learned through such active sensinghelps to strengthen the reliability of the responses made in the unpredictable environ-ment. In passive systems, however, the observer's motion is solely the reaction to thechanges perceived. The importance of this type of reactive responses is captured byBrooks' subsumption architecture, and his idea of using very little representation whenbehaviours are taken to be the fundamental primitives [Brooks, 1987].2.5 Gaze ControlHumans have several interacting control systems that stabilize gaze against ego-motionand follow moving targets. Recent research in gaze control mechanisms has been mostlyto replicate this important behaviour in a real-time computer controlled environment.Due to the fact that the processing is inevitably computationally intensive, parallelcomputer systems and specialized image processing hardware are needed. However,Chapter 2. Background and Related Work^ 15the strategies used for gaze control must be cooperative with the hardware and efficientenough so that the system can interact with the world and has active control over its ownstate in a timely and consistent manner. Gaze control is made up of two subproblems:gaze holding and gaze shifting. The main concern is with gaze holding, the operation ofmaintaining fixation on a moving object with the cameras on a moving platform.There are a few reasons why we need gaze control. Functionally, we want to changethe direction of gaze from one position to another, and to fixate an object to minimizemotion blur. Being able to fixate an object allows the observer to gather more informationfor further analyses. In theory, it has been pointed out that the object of interestshould be kept at the center of the images to obtain higher precision, and that thegreatest amount of information can be extracted. In addition, segmentation is knownto be a problem as it is difficult to separate an object from the background in a scenewithout recognition, but it is hard to recognize the object without separating it fromthe background. Gaze control can help to solve this "chicken-and-egg" problem byseparating a moving object from background without recognition, based on the ideathat stabilizing one point in the scene that is moving relative to the observer inducestarget "pop-out" due to motion blur induced in the non-stabilized parts of the scene[Ballard and Brown, 1992].The three types of controls which are common and necessary in a robotic camerahead, or Eye-Head, system are panning, tilting, and verging.Panning refers to the process of rotating the inter-camera baseline about a vertical axis.The pan motor thus will move the head in the horizontal direction.Tilting refers to the process of rotating the inter-camera baseline about a horizontalaxis. The tilt motor will be responsible for vertical motions.Chapter 2. Background and Related Work^ 16Vergence is the process of adjusting the angle between the eyes, or cameras, so thatboth eyes are directed at the same world point. It is an antisymmetric rotation ofeach camera about a vertical axis.With these three degrees of freedom, one can theoretically place the intersection of theoptical axes of the two cameras anywhere in the three dimensional volume about thehead.2.5.1 Real-time Binocular Gaze Holding System at RochesterDavid Coombs and his colleagues have successfully designed and implemented a gazeholding system on a binocular camera head system at the University of Rochester[Coombs, 1992]. Their work has focussed on the problem of using visual cues aloneto hold gaze from a moving platform on an object moving in three dimensions. The sys-tem implemented on the Rochester head demonstrates that gaze holding can be achievedprior to object recognition, assuming smooth object motion. The vergence and pursuitsystems perform complementary functions in the sense that the pursuit system centersthe target, and the vergence system converges on it. Interestingly, the vergence systemminimizes the disparity on the target being foveated, and the pursuit system requiresthat the target be properly verged before it can locate it and center it on the coordinatesystem.The vergence system estimates vergence error based on stereo disparity. Disparityis measured with a cepstral filter. The cepstrum of a signal is the Fourier transform ofthe log of its power spectrum, and the power spectrum is just the Fourier transformof the autocorrelation function of the signal [Olson and Coombs, 1991]. The disparityestimator will report the disparity that bests accounts for the shift between the images,Chapter 2. Background and Related Work^ 17in this case being the tallest peak in the spectrum. The control system will then generatesmooth eye movements to correct the vergence error.The pursuit system attempts to keep the visual target centered in the cameras' im-ages, assuming that the vergence system can keep the cameras verged on such target.The visual target will be the object that is both near the center of the current imageand in the region called horopter, which is the 3D locus of points with zero disparityat the current vergence angle. The core of this subsystem is the Zero-Disparity Filter(ZDF) [Coombs and Brown, 1992]. This filter does not measure disparity, but it locatesthe portions of the images that have zero stereo disparity. It is a non-linear image filterwhich suppresses features with non-zero disparity, and can be implemented in the real-time system using the total mask correlation method [Coombs, 1992]. The windowedoutput of this filter will be the input to the pursuit system, which guides the cameras topan and tilt so that the object, with zero disparity, will be centered on the image.Various control techniques have also been implemented to generate smooth cameramovements. In particular, the c—/3--'y predictor, a linear Kalman filter, is used to smooththe verging angles, and to smooth and interpolate the target positional signal, assumingthat the target signal has uniform acceleration [Coombs, 1992]. The a— p -7 predictoris also used to predict delayed signals, which can lead to more accurate tracking.Delay can cause a system to be unstable. The two major factors causing delays arecomputation and transmission. Researchers at the University of Rochester have usedSmith prediction and multiple Kalman filters to cope with delays, and the results havebeen satisfactory.Chapter 3Proposed Techniques and ControlTheories3.1 Objectives, Purposes, and AssumptionsThe main objectives of this project are to establish some simple control theories tomonitor and to react to changes perceived in the real world, and to apply such theoriesin the implementation of a three-dimensional motion tracking system on the robot headinstalled in LCI. The configuration of this LCI binocular camera head system will bedescribed in the next chapter. The primary goal of our tracking, or gaze control, systemis to center the image of the object of interest as quickly as possible. Such a passivemotion tracking system must be comprised of a module to detect moving objects, amodule to select the visual target, and a module to respond in the form of gaze shiftingand verging the binocular head.Displacement is perhaps the most explicit and direct form of measurement to rep-18Chapter 3. Proposed Techniques and Control Theories^ 19resent the change in locations of an object that appears in the images input through acamera. Optical Flow is the motion displacement which allows the determination ofhow far the object has moved, usually during a short time interval. Stereo Disparity isthe displacement measure which shows the difference in the relative locations of a pointregistered in the left and the right stereo images grabbed at the same time. It is worthnoting that stereo disparity is inversely proportional to the depth measured with respectto the cameras. Our goal is to find out how to make appropriate use of optical flowand stereo disparity to perform tracking in a real world situation, and to stabilize thecameras' movement.It is also our interest to investigate how different computer systems, namely, theDatacube MaxVideo system and a Transputer network, which run at different rates, cancommunicate and cooperate in a real-time environment. The motion tracking system hasto be designed so that responses can be made correctly and timely in the unpredictableenvironment, and that it can keep up with the pace of the world. The demonstrationsystems implemented in this project to illustrate our theories are quite simple, but theydo form, in our opinion, the basis of more sophisticated motion tracking systems.Our control theories are designed to be as general as possible, but it is inevitable thatassumptions have to be made as there are many uncertainties and exceptions which arevery difficult to handle without using special or complicated procedures. The followingis the list of assumptions that we will undertake in our theories:1. Much of the vision research to date is concerned with rigid objects, since non-rigidobjects are very difficult to handle without prior knowledge about their behaviour.Therefore, to keep this project tractable, we will assume that we are only dealingwith the motion of rigid objects.2. We will assume that displacement measures are being returned continuously inChapter 3. Proposed Techniques and Control Theories^ 20a frame-by-frame manner. This mechanism is analogus to the natural ability ofhumans to detect changes caused by a moving object in the way that the changescan be seen in a continuous fashion.3. The technique used in measuring displacements is fast and accurate, and thatdense and reliable integer valued stereo disparity and optical flow values are alwaysproduced.4. The camera geometry is known. In particular, the focal length and the baselineseparation of the stereo cameras used should be provided.5. We will assume that the motion of any object is slow, and smooth enough to workwith. This assumption reveals the fact that every system has its limits, and it isunreasonable to expect our motion tracking system to work in all scenarios, suchas following a high-speed bullet.6. The motion of the observer will also assume to be slow, and smooth. This allowsthe measuring technique to work on a smaller range of possible matches.7. The motion tracking system is expected to be run on some special purpose imageprocessing hardware and a multi-computer system, so that the control procedurescan be designed to take advantage of the powerful computational resources.8. We assume that the motion parameters of the observer are known in the entiresystem, or available upon request.9. For the sake of simplicity and ease of illustration of our ideas, we assume thatoptical flow is computed from the viewpoint of only one camera, presumably theleft camera.10. We will not have any other specific knowledge of the objects we are dealing with.Chapter 3. Proposed Techniques and Control Theories^ 213.2 Perception-Reasoning-Action Control SchemeThe perception-reasoning-action (PRA) control loop is commonly used in traditional ATproblem solving techniques. In this control strategy, three subsystems are generally used:the perception system provides input to the reasoning system, and the task to beperformed by the action system depends on the output of the reasoning system (seefigure 3.1). PRA loop is interesting and popular due to the fact that it is a fundamentalconcept which is easy to understand. It even resembles one of the ways humans solveproblems. In the human visual system, the visual input is often analysed without con-scious attention before we proceed with our actions. This phenomenon can easily befound in many of our activities such as driving a car, playing video games, or visuallyfollowing a flying plane.PerceptionObservedChangesReasoningTask to beperformedActionFigure 3.1: PRA CommunicationChapter 3. Proposed Techniques and Control Theories^ 22Our control theories will follow the PRA framework. Changes in the environmentwill be observed from a pair of stereo cameras by the perception system. The reasoningsystem will process the changes represented by optical flow vectors and stereo disparities,find out the new location of the moving target, and provide information for the actionsystem to track the target by physically moving the robot head. Although the dataflow appears to be sequential within the loop, the three subsystems are actually parallelprocesses running concurrently. For example, the perception process will continue tomonitor the changes while the reasoning processes are busy analysing the data. Otherprocesses can be actively communicating with one another at the same time.3.2.1 PerceptionThe role of the perception system is to pay attention to the changes in the environment.Changes can be computed and returned as an image in minimal time by powerful imageprocessing hardware. As a result, a continuous flow of these images will be producedfor further processing. In consideration of the delays that are coupled with the com-putation in the reasoning process, special control procedures are needed to ensure thatthese images would not be lost and that all data would be available when the reasoningprocess needs the data. We introduce the idea of using an active monitor and an ac-cumulation process to keep track of those data provided by the optical flow images. Asa consequence, we are observing the world and monitoring the changes in a continuousfashion, without any representation of the world, the objects', and even the observer.Besides, the perception system does not play an active role in passing data to the rea-soning system once they are available. In fact, the transfer of data is demand driven,1We are referring to special structural representation for recognizing objects. It should be obviousthat optical flow vectors represent the locations and sizes of the moving objects, but not the shape,color, or other recognizable features.Chapter 3. Proposed Techniques and Control Theories^ 23but the perception of changes is always active.The perception system is composed of three subsystems, each being a parallel processcommunicating with the other process through message passing (see figure 3.2).Figure 3.2: The Perception System3.2.1.1 Computing Optical Flow and Stereo DisparityOur control strategies rely on the assumption that reliable displacement measures canbe produced rapidly. It is anticipated that the displacements computation would beperformed on high-speed computers, so that results are available at a reasonable rate. Asimple correlation technique, such as the sum of squared differences (SSD) or the sumof absolute values of differences (SAD) algorithm, can be used effectively because therange of motion is expected to be small with the slow moving objects, and the range ofdisparity is not expected to be large in one dimensional space. Some correlation matchingtechniques have been shown to be capable of producing dense flow maps at an acceptableChapter 3. Proposed Techniques and Control Theories^ 24level of accuracy [Fua, 1991] [Anandan, 1989] [Barron et al., 1992].Correlation should be performed on the whole image. For every point of the image,an array of correlation scores can be computed by taking a fixed window in the firstimage, and a shifting window in the second. The pixel that has the best match is theone with the optimal correlation score, most likely the minimum number if SSD is used.A verification process can be used to ensure that data returned is reliable, providingthat any extra computation would not seriously degrade the overall performance andresponse time.It should be pointed out that the more frequently the optical flow image is returned,the more changes the perception system should be able to capture from the real world.This observation can be made when trying to compare the absolute changes capturedby x frames versus 2x frames returned by the matching operations at a fixed time inter-val. Given the fact that the size of the correlation window will not change during suchoperations, the 2x frames sequence should contain more accurate information about themotion of the moving objects. An Active MonitorOur design of the motion tracking system has taken into consideration that differentprograms running on different architectures have to communicate and cooperate in real-time. In the likelihood that the processes for computing and returning optical flow andstereo disparity will be running on some special purpose image processing hardware,other programs, possibly running on different platforms or configurations, attemptingto work on such returned data should employ a process responsible solely for receivingthe data. This frame grabbing process, which we denote as the active monitor, waits fordata to come in, and keeps track of the data until it has been properly stored for laterChapter 3. Proposed Techniques and Control Theories^ 25retrieval by the reasoning system. This observing or monitoring process is termed activesince it is an independent process which has the initiative to grab a frame whenever itis available without having to wait for instructions to do so.This particular observing process is simple but plays an extremely important role insynchronizing the communication and uniting the different modules. Optical flow andstereo disparity images will be pumped out continuously regardless of whether or notthe reasoning system is prepared to process the data. It is extremely important thatthis observing process be executed at a rate no slower than the rate the displacementmeasures are being pumped out, or otherwise, the loss of any data frame might greatlycontribute to the inaccuracies of the overall system. This is especially critical since weaccumulate displacements over several frames. Optical Flow AccumulationWhen using parallel processes running on some multi-rate systems to perform motiontracking, there is usually no guarantee that a recipient process, in most cases being a pro-cess in the reasoning system, can be freed to receive and work on the displacement dataonce they become available through the production of the correlation system. However,it is unacceptable, in a real-time system, for any process to block itself without carryingon with its own duties simply because some recipient processes are in busy states andare not ready for the data. The active monitor provides the perception system an oppor-tunity of paying attention to all displacement data produced by the correlation systemduring one cycle of the PRA loop. Such data must then be properly stored and managedso that it can still be accessible at the appropriate time.The two different types of displacement measures used in this motion tracking systemhave different characteristics. Optical flow can be considered as temporal data as itChapter 3. Proposed Techniques and Control Theories^ 26represents how and how much an object has moved within a certain time period. Asequence of optical flow images can thus provide the motion path over a lengthy period oftime. Stereo disparity gives the observer an idea of how close an object is at a particulartime instant. Although one can in general detect moving things from a sequence ofstereo disparity images, extra effort for matching is required, and is usually not desirableif optical flow is already present.Knowing the fact that optical flow vectors represent changes in locations over time,we suggest adopting the idea of accumulating optical flow vectors for storing such changesin order to make them accessible for later uses. The accumulation is basically simplevector addition, and it works as follows:Let OFtPoti be the optical flow vector representing the change of location of a pointPT from time to to time t1.Similarly, OFtPix OFtP2t3, ..., etc, will be available as time goes on.We can compute the absolute change of location of such point ,/='. from time to totime t3 by adding the three vectors OFfox,t„ OF/2, and OF33, i.e.,OFiPs, = OFPs +OFPx + OF'x.0 7.3^tO ,t1^il ,t2^t2 ,t3In general, (referring to figure 3.3)OFron = OFtPoi + OFtPi,t2 + ... -F OFtPn_onIt should be pointed out that several of these accumulation processes can operateconcurrently on different parts of the flow field, so that time delays due to accumulationcan be minimized.As a result of accumulation of optical flow vectors, the absolute changes in 2D loca-tions of the moving objects during one cycle of the PRA loop can be safely stored untilChapter 3. Proposed Techniques and Control Theories^ 27Figure 3.3: Optical Flow Accumulationsuch data is demanded by the reasoning system. Stereo disparity data does not requireany special procedure to manage, as we are only interested in knowing the depth of anobject at a particular time instant. The 3D motion path of a moving object can easilybe constructed even if we pay attention to the depth only at the beginning and endingof one PRA cycle.By adapting these simple ideas of using an active monitor and the accumulation ofdata, our motion tracking system with different modules running on various computersystems can communicate without having to worry about the timing and synchronizationproblem. In addition, simple but reliable correlation matching techniques can be usedinstead of complicated matching procedures. The determination of optical flow and stereodisparity will be a continuous process, instead of being demand driven, i.e., matching isperformed automatically at all time, as opposed to having the matching process activatedby the reasoning system for request of data.Perception'--.—._.....accumulated optical floc__Reasoningand stereo WispantyChapter 3. Proposed Techniques and Control Theories^ 283.2.2 ReasoningThe main objectives of the reasoning2 system are to analyse the accumulated optical flowdata along with stereo disparities, and to pick out the object to be tracked. The input tothis reasoning system is obviously the displacement measures provided by the perceptionsystem. Data will be available on request in this interactive parallel environment (seefigure 3.4).Figure 3.4: Data Transmission between the Perception and Reasoning SystemsThree procedures are used in our design to select the visual target to follow. Thebackground optical flow caused by the motion of the robot head will be taken care of by acancellation process. The revised accumulated optical flow will be used by a segmentationprocess to find the different connected components. The object of interest can be chosenamong these components as either the region with the largest area, the region with thehighest velocity, the region which is closest to the camera with a reasonable size, orvarious combinations of the above.20ne might argue that there isn't much reasoning, as in logical reasoning, involved in this motiontracking system. The term "reason" used here is in the context of attempting to generate some proofsto justify the actions to be performed, and hence it is more than just simple perception.Chapter 3. Proposed Techniques and Control Theories^ Cancelling Background Optical Flow Caused by Ego-MotionDuring each cycle of the PRA loop, the reasoning system is responsible for analysing theaccumulated optical flow produced by the perception system before instructing the actionsystem to move the robot head. During such time delays of processing the displacementdata and moving the head, objects will continue to move in the real world and theaccumulation of optical flow will continue to operate as a separate parallel process. Notonly does such accumulation record the changes caused by moving objects, it also recordsthe background optical flow caused by the moving robot head.The dominant motion in the motion field is usually generated by the moving cameras,if one assumes that the moving object is much smaller than the background environmentin the field of view of the cameras. The background optical flow caused by ego-motionmakes the job of detecting moving objects much tougher than with stationary cameras.The background flow perceived obviously travels in the opposite direction of the movingcameras. In theory, the whole scene with respect to the cameras will shift during thecameras movement, regardless of whether or not there is any moving object in view, butwith the expection that objects moving in the exact same speed and direction with thecameras will appear stationary in the image sequence.The magnitude of the background optical flow is a function of depth with respectto the cameras and the ego-motion parameters. Closer objects appear to have a largerbackground flow than farther objects. This phenomenon certainly leads to an extremelycomplex pattern of flow field for analysis. To compensate for such apparent motion ofthe background caused by camera motion, we suggest using a method of cancellation toreduce the effect of the background flow with the aid of stereo disparity, and the knownmotion parameters of the cameras.As part of the initialization process of our motion tracking system, a table of depth toChapter 3. Proposed Techniques and Control Theories^ 30flow value mapping figures will be established. This special mapping table can be indexedby stereo disparity, and it contains how much background optical flow is expected to beperceived if the cameras move in one degree in certain direction at a particular depthvalue, for instance, one degree of panning to the right with stereo disparity of +2 maygenerate a -3 flow value. Separate entries for panning and tilting motions are required.This table can be constructed using the following simple procedure. Assuming thatall the depth values the correlation system can handle are represented by objects thatare within the reach of the cameras. We also assume that all objects will be stationaryduring this process of initialization. For each disparity value within the correlation range,we first find out a representing point in the image that has the particular disparity value,or depth, as the reference point for determining the motion displacement. The next stepis to move the cameras at a known but small angle, and then perform correlation on theimages taken before and after such movement. Several repeating moves with differentmove angles may be required in order to get better approximations of the mapping values.Given the facts that we know how much the cameras have moved during one cycle,and that we have access to the stereo disparity data before and after such cameras'movement, we are able to first determine the depth of any point in the motion field, andthen compute the expected background flow for that point using the special mappingtable.Cancellation is therefore a simple procedure of subtracting the expected backgroundflow from the motion field. In the ideal situation, expected background flow should beequal to the apparent motion perceived during the accumulation process. However, thisis not always the case as time delays for both accumulation and correlation fail to ensurethat the accumulated optical flow would contain the exact changes that have occurredin the real world. Also, any round-off error in computation is always a factor causingChapter 3. Proposed Techniques and Control Theories^ 31such inequality. Nevertheless, cancellation creates a "pop-out" effect in the sense thatalthough the background optical flow cannot be eliminated completely, its unstablizedeffect should be drastically reduced. In other words, the optical flow incurred by anymoving object should be significantly larger than the background flow after cancella-tion. As a result, the revised accumulated optical flow field should reflect the correct3displacements that the moving objects really caused. Segmenting the Optical Flow Field into Connected ComponentsThe next step of the analyses is to segment the revised accumulated optical flow fieldinto various connected components. A common definition of an object encountered in asegmentation algorithm is that it is a set of tokens with the same kinematic parameters[Zhang and Faugeras, 1992]. By applying such concept to our system, an object canbe a set of connected points with the same or extremely close optical flow features. Aconnected component can therefore be interpreted as the group of motion vectors thatcorresponds to an independently moving object in the scene.Optical flow based segmentation methods have all the drawbacks associated with thecomputation of optical flow. It is our hope that the correlation errors can be minimizedby the smooth motion constraint. Any segmentation algorithm which is fast and reliablecan be employed in our motion tracking system.Adiv's approach for segmentation first partitions the flow field into connected seg-ments, and then groups the segments under the hypothesis that they are induced by asingle, rigidly moving object [Adiv, 1985]. Such technique has been shown to be rela-tively reliable, but the extensive computation involved makes it an undesirable candidatefor use in a real-time system. Similarly, Jain's technique for segmentation [Jain, 1984],30r an extremely close approximation.Chapter 3. Proposed Techniques and Control Theories^ 32which is based on consistency of the focus of expansion, also requires large amount ofcomputation.A simple labelling or tagging algorithm based on 4-connectedness on a 2D plane hasbeen used in our implementation to partition the flow field using optical flow features.Such algorithm will be described in the next chapter. A region is 4-connected if every 2pixels can be joined by a sequence of pixels using only up, down, left, or right moves.Optical flow vectors corresponding to a moving object may not be equivalent dueto the reason4 that structural differences of the surfaces of an object cause the vectorsto vary slighty, or that noise in the images or errors in correlation matching contributeto the inaccuracies of the vectors. In either case, thresholds ought to be used in thesegmentation process to ensure that vectors should be grouped together if their featuresmatch to a certain degree. Such features, for determining whether or not a vector shouldbelong to a region, must be unique or have strong potentials in classifying the flow vectors.The magnitude and direction of a vector are good candidates for defining connectednessin any vector segmentation algorithm. Two vectors can be defined to belong in the sameregion if the differences of their magnitudes and directions are less than some pre-definedthresholds.The output of this segmentation process is a list of connected components, eachbeing described by its location, size, magnitude of the average flow, average disparity,and possibly other symbolic descriptions. A selection procedure is then required to pickout one of these components as the visual target.4Another well-known cause is the aperture problem, that is, the image data may not be sufficient todetermine the optical flow to more than just a linear collection of velocities.Chapter 3. Proposed Techniques and Control Theories^ 333.2.2.3 Picking the Visual TargetThe input to this procedure are the connected components computed by the segmenta-tion process, and the stereo disparities produced by the perception system. No objectrecognition or any other features extraction process has been or will be used. This selec-tion procedure must work on what is given as quickly as possible, so that responses canbe made in real-time. Such procedure must also take into consideration that multiplemoving objects can present at the same time, and there have to be some criteria forpicking a specific one out.There are several options as to which connected component should be picked as thevisual target. Some of these options are:• picking the region with the largest area• picking the region with the highest average velocity• picking the region which is closest to the camera• using a combination of the above suggestionsPicking the region with the largest area without paying any attention to the velocitiesis perhaps the most unstable method. It has been pointed out that the cancellation ofbackground optical flow does not guarantee to zero out all background flow because ofround-off errors and errors in correlation matching. The largest connected componentcan be the region containing the remains of all the background optical flow vectors, andthis is definitely not a target we are looking for.One obvious choice for eliminating the picking of such background region is to pickthe region with the highest average velocity. This idea is inspired by the "pop-out"Chapter 3. Proposed Techniques and Control Theories^ 34effect caused by cancellation. Although it is likely that the background region wouldnever get picked, noise is the major factor why this method is not robust, as the regionwith highest average velocity can be one that is created by noise or errors.More constraints will undoubtedly make the selection task much easier. Using stereodisparities will give the connected components a three-dimensional look. In other words,these components will be further segmented or layered in the new dimension (see figure3.5).The 2D look of the motion field The 3D look of the motion fieldFigure 3.5: The Two-Dimensional and Three-Dimensional Looks of the Optical FlowFieldPicking the region which is closest to the cameras is not a bad idea, if one considersthat the motion tracking system is one that should always be alert to attackers. However,noise is again a big factor in disabling this method to work to its full potential.A slightly different idea is to pick the region which is closest to the zero-disparitysurface imaginatively created by the two cameras [Olson and Coombs, 1991]. If ourgoal of motion tracking is to simply follow the moving object by panning, tilting, andChapter 3. Proposed Techniques and Control Theories^ 35also verging the robot head, then this is probably one of the better selection methods.However, only regions with a reasonable size should be picked, if we want to eliminatethe noise problem, but it is rather difficult to define such condition based on what isgiven. It is also an open question as to whether or not velocity should be a factor in thisselection method.Since we are assuming that the system has no knowledge about any moving object,perceptual grouping, the property of human visual perception to perceive discrete blobsmoving together as a single object, cannot be modelled in our motion tracking system.Such deficiency is certainly a big factor causing errors in the location of an object duringtracking, in the case where the motion field corresponding to the rigid and contiguousvisual target is split into several pieces because part of such object is being occluded.Some model-based region growing techniques, e.g., [Shio and Sklansky, 1991], have beenpresented to solve such problem, but it is beyond the capacity of our system for thereason that object models must be used.Picking the right region to track is perhaps a more difficult and ambiguous taskthan initially imagined. It certainly relies heavily on how the motion tracking system isdefined to behave, and since there are many different ideas that can be explored, there isdefinitely no single solution which would work in all scenarios. Perhaps experimentationcan provide some guidelines as to how a confidence measure can be computed to decidewhich method is best and more robust to use.3.2.3 ActionThe primary goal of the action system is to respond to the locations change of the objectsin the world by gaze shifting and verging the robot head. It should be pointed out thatboth gaze shifts and vergence movements are rotational motions with respect to certainChapter 3. Proposed Techniques and Control Theories^ 36axes of the robot head.The reasoning system provides information as to which region should be picked as thevisual target. A reference point should be retrieved from such region so that the motionparameters for the robot head can be computed. The centroid of the target region ispossibly the best reference point, and is also relatively easy to calculate if the labellingprocedure of the reasoning system can collect some additional statistical data.Computing the motion parameters for the robot head is a process of converting aflow value to a gaze shifting command, and converting a disparity value to a vergencemovement command. The conversion an inverse process to the computation of theexpected background flow in the perception system. The special mapping table createdduring the initialization procedures can be used here for computing how much the headshould pan and tilt. Knowing the depth of the region of interest, and how much itscentroid is offseted from the center of the image, it is quite easy to work out the pan andtilt parameters using the mapping table. Finding out how much to verge requires theknowledge of the focal length of the cameras. A simple geometric equation, based on theconfiguration of the robot head, can be used to compute the vergence angle adjustmentwhich will drive both cameras to point at the centroid of the visual target. More detailson computing the motion parameters will be presented in next chapter.Although the reasoning and action systems are established to be parallel processes,they are perhaps not intended to be concurrent or independent. The reasoning systemshould wait until the action system finishes moving the robot head before working on thenext set of accumulated optical flow data (see figure 3.6). This way, the accumulation inthe perception system can take in account how much apparent flow has been caused bythe action system, and allows the cancellation process in the reasoning system to workwith better approximations.Chapter 3. Proposed Techniques and Control Theories^ 37 signal to continue operationthe ai„.„-----VG-cion hasperformedReasoning ActionInstruction for tasks tobe perfoimedFigure 3.6: Communication between the Reasoning and Action SystemsThe movement carried out by the robot head should be slow and smooth, so thatthe correlation system can generate less mismatches due to the fact that only a limitedrange of displacement measures is being focussed on.By following this simple set of control strategies, we should be able to allow ourpassive motion tracking system to center the image of the object of interest in real-time.3.3 Discussion of a Prediction SystemPrediction is the key to stability of interacting closed-loop control systems with timedelays [Brown, 1989]. Time delays are usually caused by interactions among differentcontrol systems, processes of acquiring real data, and computations performed on suchdata. For any real-time system to be successful in simulating or exhibiting realisticbehaviour found in biological systems, it is often necessary that the response time ofsuch system be as minimal as possible.Our proposed control scheme is aimed to function as a passive motion tracking sys-tem. It is certainly our hope that by using powerful computer equipment, the responsetime during each PRA loop can be minimized to an acceptable level of operation. How-Chapter 3. Proposed Techniques and Control Theories^ 38ever, in practice, such assumption usually leads to unexpected breakdowns in unforeseencircumstances. In order to keep up with the pace of the world on a regular basis, aprediction module can assist the operation of the motion tracking system by indicatingwhat are to be expected in future states.A prediction module, possibly composed of a Kalman filter as used in [Singh, 1991]or [Matthies et al., 1988], can undoubtedly enhance performance by compensating tothe time delays in computation and communication. Based on our smooth movementconstraint on both the moving objects and the moving observer, we can deduce thatvelocity changes will also be slow and smooth. This observation can be used as a key topredicting how far the objects are expected to move away from their current positionswhile the reasoning system is analysing the perceived data. Under this framework, theprediction module outputs the estimated destinations of the moving objects based ontheir current velocities, and the velocities changes previously undergone respectively.The application of prediction in motion tracking involves having a mediator processtaken into consideration, for the visual target, both the estimated distance to be travelledby such object during time delays, and the actual change in locations of the objectcomputed from the perceived data. Such process will then be responsible to compute acorrective measure to drive the robot head to track such target. This is obviously a vastimprovement to a passive system.Prediction allows a motion tracking system to look ahead to the future states, andalso allows the robot head to move ahead to the destination during time delays. That is,the robot head will still be moving along with the object while data is being analysed.This control scheme however contradicts somehow with the concept of being passive andresponsive. There are certainly many open questions, which require in-depth investiga-tion, concerning the feasibility and methodology of applying such a prediction module toChapter 3. Proposed Techniques and Control Theories^ 39our proposed control scheme of motion tracking. As a result, prediction is not included aspart of our current study of designing and implementing a gaze control system. However,it is clearly one area which needs to be explored in future studies.Chapter 4Implementation4.1 OverviewThis chapter describes the implementation of a real-time motion tracking system basedon the control theories presented in the previous chapter. The general hardware setupfor the LCI robot head will be described. Different programs running on the Datacubehardware and on the Transputer network will also be discussed in detail in this chapter.The evaluation of our motion tracking system, the analyses and results of differentexperiments, and some suggested improvements will be presented in the next chapter ofthis thesis.4.2 Hardware Configuration for the LCI Robot HeadThis section describes the hardware configuration of the binocular camera head systeminstalled in the Laboratory for Computational Intelligence (LCI) at UBC. The system40Chapter 4. Implementation^ 41consists of a robot head, or otherwise known as the Eye-Head, and a special purposeimage processing system together with a multicomputer system, both being connectedto a host computer (see figure 4.1).Correlation is done on a Datacube MaxVideo-200 system, which has a pipelinedarchitecture. Digitizing of images of size 512 by 480 on the MaxVideo system can beperformed at video rate, i.e., the rate of 30 frames a second. Processing then proceedsat 60 frames per second. Processing smaller images takes correspondingly less time. TheDatacube system is attached to the VME bus of the host workstation.A network of T-800 Transputers running at 25MHz is also connected to the VMEbus of the host, a Sun SPARCstation 2 with 32 MB RAM. Each Transputer processorhas at least 2 MB RAM attached, with some special nodes, such as the frame grabber,containing more memory. Each Transputer can be connected to another Transputerthrough one of its four links by a custom-built Crossbar switch. The bi-directional linkis capable of transferring data at a speed of 20 megabits per second, which is roughlytwice the Ethernet data transfer rate. The Transputer network can share data with theDatacube system via a MaxTran node, a special node consisting of a Transputer, a framebuffer, and a link to the MaxVideo board. More details on the Datacube hardware andthe Transputer network used in LCI can be found in [Little et al., 1991].The LCI Robot Head contains a pair of Sony XC-77RR CCD monochrome camerasmounted on a motorized platform. The focal length of the camera is 8.5 mm. TwoCosmicar lenses, with manual focus and aperture control, are used. The cameras aremounted on the platform approximately 20 cm apart. The output of the cameras canbe grabbed by either the Datacube system or a frame grabbing node in the Transputernetwork, allowing more flexibility in designing the image processing routines.The robot head (see figure 4.2) made by Zebra Robotics can be electronically con-Chapter 4. Implementation^ 42SunSPARC 2TVMonitorMaxtran<^›.A networkof TransputersT-800DatacubeMax VideoImageProcessingHardwareNode/\VMEbusZebra RoboticsEye-HeadSony CCD camerasFigure 4.1: Hardware Configuration of the LCI Robot Head SystemChapter 4. Implementation^ 43Figure 4.2: The LCI Robot HeadChapter 4. Implementation^ 44trolled to pan, tilt, and verge with respect to some axes (refer to figure 4.3). The panaxis has a 4:1 spur gear reduction ratio, while the tilt axis has a 3:1 bevel gear reductionratio. The pan and tilt axes are coupled, in that one revolution of the pan axis will causea one-third revolution of the tilt axis. The verge axis is driven by a 20 threads per inchlead screw. Pittman motors are used, with each motor being connected to an HP encoderboard. The motor amplifiers are driven by a digital-to-analog converter controlled by aTransputer in the network. The encoder boards are linked to a special Transputer node,and therefore the head motion can be controlled by software.\-4ConvergenceDivergence\-4Figure 4.3: Various Motions of the Robot HeadThe pan axis supports motion ranging from roughly -150 degrees to +150 degrees.The motion range for the tilt axis is -90 degrees to +65 degrees. The vergence angle ofa binocular system is the angle between the optic axes of its cameras. The maximumvergence angle allowed is roughly 20 degrees. The velocities of the head motions can becontrolled by adjusting parameters during the head initialization process. The maximumvelocity of the head is approximately 120 degrees per second (dps) for panning, 150 dpsfor tilting, and 15 dps for verging.Chapter 4. Implementation^ 454.3 Software DescriptionAs suggested by our control theories, both the Datacube hardware and the Transputernetwork will be used in our implementation of the motion tracking system. The role ofthe datacube program is to compute optical flow and stereo disparity repeatedly, andto return the data in a continuous flow of images. These images are put into the framebuffers of the MaxTran node, allowing a Transputer process to access the data. Programsrunning on the Transputer network will be responsible for storing and analysing thedisplacement data, and for controlling the motion of the robot head.Figure 4.4 shows the software design of the overall system. It also depicts the ar-rangement and relationships of the different software components implemented, alongwith a brief description of the data flow in the network of Transputer programs. A circlein the figure denotes a Transputer node loaded with a program with a specific name, andthe number enclosed in the circle identifies the particular Transputer node used for theprogram. An arrow indicates the direction of the data flow from one Transputer node toanother. Data is being transferred via message passing in the Trollius operating systemrunning on the Transputers. The functionalities and purposes of each program writtenwill be described in detail in the following sections.4.3.1 The Datacube ProgramA datacube program has been written by Stewart Kingdon, one of our LCI staff, tocompute optical flow and stereo disparity using the Datacube MaxVideo hardware. Theinput images are grabbed from the pair of stereo cameras. The images are taken at 30frames per second; we only use one field of the frame. The images are 512x240, ourSony cameras average successive odd and even pairs so that we do not have missing dataDatacubeProgram-----check,-fOr new dataframe(with optical flow and stereo disparity)‘;:e-HeadEx.,c,o3der( --DAT20821 _Idisplacementatadisplacemendata displdata Perceptionconnectedcomponents connectedcomp.--ntsG.R.S.Servercement12^ 13 accumulatedoptical flostereodisparityreopiostingaccumulatedoptical flowaccumulatedoptical flostereodisparityregdostingaccumulatedopttcal flowaccumulatedoptical flustereodisparityreqiJestingaccumulatedopt],61 flowReasoningregyestingconriegtedcomponents -----completionsignalhead'smotionparametersrequestkigconnected--cOMponentsFrontActionDATsignaencoderommandChapter 4. Implementation^ 46Figure 4.4: Software Components and Data FlowChapter 4. Implementation^ 47in the vertical direction. We smooth with a Gaussian before subsampling to 128x120[Little et al., 1991]. An output image of size 128x512 is returned with the following foursubframes, which are in order: optical flow, stereo disparity, edges (zero-crossings of theLaplacian of Gaussian), and Laplacian of Gaussian images (see figure 4.5). Each of thesesubframes has a default size of 128x128, which can be scaled to a lower resolution of64x64 if needed. The output can be displayed on a TV monitor, in which different colorsare chosen to represent different optical flow values, and stereo disparities.OpticalFlowStereoDisparityEdges V2G Figure 4.5: The Four Subframes in the Output Image of the Datacube ProgramThe range of motion to be considered is specified in an input data file to the datacubeprogram. The common range of motion used is +2] for vertical flow and [-3, +3]for horizontal flow, and the range for stereo disparity to be considered is usually [-13,+13]. The input data file also contains some encoders used by the datacube program forreleasing the displacement data. Encoders are used so that only one optical flow imagecan be returned with both horizontal and vertical flow at every pixel of the image. Anyprogram that requires access to optical flow or stereo disparity can read the particulardata file used by the datacube program, and decode the encoded result. The data filealso contains information of the color used for each optical flow or stereo disparity datavalue. The datacube program is capable of returning about 10 128x512 output imagesper second with the input range of motion and disparity mentioned above.Simple sum of absolute values of differences (SAD) correlation technique is used inChapter 4. Implementation^ 48measuring optical flow and stereo disparity. A 7x7 correlation window is used in thecomputation. Only the correlation range specified in the input data file will be consid-ered. If there is no good match because of insufficient information, then no correlationresult at that particular point will be returned.4.3.2 The Transputer ProgramsTrollius 2.1 operating system is used in our implementation of the motion tracking systemdescribed as follows. Trollius is a programming environment designed for distributedmemory multicomputers [The Ohio State University, 1991]. The environment includesan operating system, a user interface, a C compiler, and some libraries for C. A bootschema, which specifies the identifiers and types of nodes, and their interconnections, isrequired to set up the topology of the Transputer network. The current interconnectionof any two Transputers follow the regulation that link 0 connects to link 1, and link 2connects to link 3. Programs written in C can be loaded onto the nodes after the bootingprocess has completed.Various programs (refer to figure 4.4) running in parallel have been written by theauthor of this thesis to model our PRA control theories. The displacement data is firsthandled by a G.R.S. program, which is responsible for grabbing the displacement datareturned by the datacube program, rearranging the data if necessary (e.g., resample),and sending them to other processes. The optical flow accumulation (0.F.A.) programwill keep the optical flow in some pre-allocated storages, and send out the accumulateddata when requested by a process in the reasoning system. The Tag program will beresponsible to make such a request and to segment the accumulated optical flow intodifferent connected components. The Front program acts as the commander of thereasoning system to lead the data flow and to request that the accumulated opticalChapter 4. Implementation^ 49flow be segmented. A decision process in the front program will select the visual targetfrom the connected components, and allows another procedure to compute the motionparameters for the robot head. Such motion parameters, or motor commands, will be sentto the Back program, which in turns instructs the robot head to move by sending signalsto the encoder boards, and to the digital-to-analog board. A special Stop program hasbeen written to send a quit signal to both the front and G.R.S. programs, requestingthat the ongoing operations be terminated. The G.R.S. ServerThe Grab-Rearrange-Send server acts as the active monitor of the perception systemresponsible for dealing with the displacement data being pumped out from the datacubeprogram. This server is aimed to run on the MaxTran node so that data can be accesseddirectly from the frame buffers. The data received in the frame buffer will be the imagecontaining the four subframes as described in a previous section.The grabbing process will constantly check for new data to arrive, and immediatelyrearrange the new data if necessary. The rearrangement includes the possibility of re-sampling the data to a lower resolution in order to speed up the response time. Thisis certainly an optional service of the G.R.S. server, and should not be used unless itis absolutely necessary. It should be noted that resampling of data is preferred to beperformed by the datacube program before the correlation process, so that displace-ments show the measurements with respect to the image size after resampling, ratherthan before resampling. Such ordering of events is crucial in later processes in whichoffseting errors might seriously affect any result, if resampling is done to the images afterdisplacements have been measured with respect to the higher resolution image size.The final step is to send displacement data to the optical flow accumulators. Data isChapter 4. Implementation^ 50sent between two Transputer nodes via the message passing mechanism in the Trolliusoperating system.The G.R.S. server runs as a continuous loop, actively repeating the grabbing, re-arranging, and sending operations without requiring any instruction from any parentprocess. It is, however, alert to a quit signal indicating when such operations shouldstop. The server is intended to run at a rate no slower than the displacement data isbeing returned, and therefore, the workload of the server is assigned to be as minimalas possible. It is necessary that the grabbing process be able to attend to all images.However, it is anticipated that a single accumulation process may not be able to handlethe amount of data due to the time delays of processing, and as a result, a number ofoptical flow accumulators, each responsible for processing a portion of the image, shouldbe employed to speed up the operations. The Optical Flow AccumulatorIn our current implementation, we elect to use three optical flow accumulators runningon three separate Transputers at the same time. On our previous attempts to accumulateoptical flow with only a single accumulator, images were lost because the accumulatorcould not execute fast enough to manage all data.Each of the three accumulators in the current system is responsible for processingonly one-third of the original optical flow image. The accumulation process is a separatelooping process actively working to manage the incoming data. The so-called foregroundprocess of the program is the one waiting for signals expressing the need for accumulatedoptical flow. Two buffers are required for accumulation because one would not want toblock the accumulation process when there is a need to send data. As a consequence,the two buffers will alternate their roles from time to time, one responsible to storeChapter 4. Implementation^ 51accumulated optical flow data, and the other being used by the process attempting tosend data over to a recipient process on another Transputer.Optical flow accumulation, as we described before, is merely simple vector addition.The encoded optical flow data contained in the output of the datacube program will bedecoded into separate horizontal and vertical flow values, before they are being addedto the existing flow data. The actual accumulation works by first looking at the flowvalues at the current pixel location, and then retracting the flow values at the positionwhere the current flow values direct to, and finally adding these retrieved flow values tothe current flow values.For instance, suppose we are trying to update the flow value of pixel (x, y). Assumethath_f^y) hv _f^y) vat time i. Then,h_f lowo,i+i(x, y) = h h _f low+i(x +h, yv _f^y) = v^(x+ h, y+v)where (i, i+1) denotes the new optical flow image.The accumulation program is built so that an image is completely processed beforethe next one is being attended to. This means that the image sending process of theG.R.S. server will be temporarily blocked if the accumulation process is not really forthe data. The idea of queuing up images on this receiving end will not work, as oncethe accumulation process falls behind the pace, it will not be able to catch up with thereal-data. Therefore, downsizing the images will be a more logical choice in this real-timesystem, when time constraint is a major factor in performance.Chapter 4. Implementation^ 52Another common route to speed up response time, or to avoid image loss, is toadd more accumulators. This is the perfect solution in terms of utilizing resources andmaximizing the amount of data to be processed. Unfortunately, we are unable plug inmore optical flow accumulators to the current structure of the Transputer network, sincethere are only 4 links connected to a single Transputer. It is possible to add in a layer ofoptical flow transmitters, and therefore, allows more accumulators to work at the sametime. However, the time used for transferring data will increase significantly, and it willnot likely help to speed up the operations by a significant factor. The Tag ProgramThe tag program is responsible for segmenting the accumulated optical flow into differentconnected components. The segmentation process is activated by the need of the frontprogram for picking a visual target. The number of tag programs, or workers, is equiv-alent to the number of optical flow accumulators, in which each tag worker is paired upwith a particular accumulator at all time. The front program will broadcast a requestsignal to all tag workers for connected components. Upon receipt of such signal, each tagworker will request the accumulated optical flow be sent over by its respective coupledaccumulator, and then transfer the accumulated data to the segmentation process.Based on our control theories, a cancellation process is used to reduce the unstableeffect of the background optical flow caused by ego-motion, before the segmentationprocess can work on partitioning the flow field. Such cancellation process is integratedinto the segmentation process in our implementation, in the way that whenever theflow values of a pixel are retreived from the flow field, the expected background flowcomputed for such pixel will be subtracted from the flow values, resulting in a revisedpair of horizontal and vertical accumulated optical flow.Chapter 4. Implementation^ 53The expected background flow values for each pixel is, in theory, calculated by mul-tiplying the ego-motion parameters with the mapping values corresponding to the depthof such pixel. A special mapping table, indexed by stereo disparity, is recommendedfor use with the cancellation procedure. However, in our current implementation of themotion tracking system, average mapping values, one for vertical motion and anotherfor horizontal, are used instead of a special table. This approach is adapted because ofthe difficulty of establishing a reference point for each depth, without knowing for surethat such a point will exist when the cameras are pointing at a random direction duringinitialization. It is also our belief that since cancellation will not completely zero outthe background optical flow due to round off errors and correlation errors, using averagenumbers are as good in approximation and creation of "pop-out" effects as using a com-plete table, whose entries are also subject to contain errors due to the use of correlationmatching. As a result, the initialization process will move the head in a certain degreeon each axis, and compute how much optical flow has been incurred, assuming that thewhole scene is stationary. Such procedure will be repeated a number of times, usingdifferent degrees of motion, before the average mapping values are derived.We implement the segmentation process using a labelling algorithm. The magnitudeand direction of a vector are used as basic features for defining connectedness in our algo-rithm. Two vectors will be loosely defined to belong to the same region, or match, if thedifferences of their magnitudes and directions are less than some pre-defined thresholds.The labelling algorithm works as follows:For each vector v at position (x, y), its features are first compared with thefeatures of the region containing vector v1 at position (x, y — 1). If bothfeatures match, then vector v should belong to such connected component,and share the same tag (or label) with vector v1. A checking procedure isrequired to find out if the region containing vector v2 at position (x-1, y) hasChapter 4. Implementation^ 54matching features with the region containing vector v. If it is the case thatthey match but do not share a common tag, i.e., the two regions have disjointlists of tags, then a merging process will be used to group the two regionstogether so that both regions are combined into one connected component.If the features of v do not match with the features of the region containingvector v1, then the features of the region containing vector v2 will be comparedwith the features of vector v. If they match, then vector v belongs to suchregion and share the same tag with vector v2. Otherwise, vector v belongs toa brand new region of its own, and will be assigned a new tag.If either vector v1 or v2 does not exist because v is at the boundary line, thenno comparison will be performed. The algorithm continues until all vectorsin the flow field are grouped into some connected components. It should beobvious that a connected component can consist of numerous tags, but eachtag can only belong to one unique region.Consider the optical flow field in figure 4.6 as an example. Assume that thelabelling process scans the field from left to right and from top to bottom. Thevector at position (4,3) starts a new connected component with tag number1. Vectors (5,3) and (6,3) belong to the same region and are assigned thesame tag number since they have similar features with vector (4,3). Thevector at position (9,3) creates a new region with a different tag number asthere is no flow value at positions (9,2) and (8,3). This particular region willbe merged to the initial connected component after the vector at position(9,4) has been analysed. This is due to the fact that vectors (9,3), (9,4),and (8,3) have similar features and should belong to the same region. As aresult, the initial connected component expands and its tag list also grows.Similar procedures are applied to vectors (4,5), (3,6), and (2,8). The vectorat position (9,8) begins a new region since its features do not match withthose already examined. In this example, the labelling process finishes withtwo separate connected components, one with 39 vectors and containing fivetags, and the other with just 5 vectors and using only one tag.The definition of connectedness should also include stereo disparity as a constraint..... ..... .....\\ \\\\\\\\\\ \\\\\.\\.\\ ^N,NN.NN. . .......... .....Chapter 4. Implementation^ 55012345678910110 1 2 3 4 5 6 7 8 9 10 11012345678910110 1 2 3 4 5 6 7 8 9 10 11Figure 4.6: An Example Optical Flow Field and the Result Returned by the LabellingAlgorithmif we are to create a motion tracking system which works in three dimensions, i.e., topan, tilt, and verge at the same time. It is still an open question, however, as to howthe depth constraint fits into the algorithm, as we try to avoid running into situationswhere the flow field is overly divided into different connected components. This situationarises if the constraints used are too firm, and that we may end up losing the target quiteeasily. Similarly, if the constraints are too flexible or loose, the whole flow field may notbe partitioned at all, and thus the segmentation process becomes useless. It should beobvious that the effects of any constraint are based on the values of the thresholds used,and it is our hope that subsequent experimentations can provide some useful and reliablethresholds for the system.During the segmentation process, a number of statistical data will be collected forlater computation. Particularly, the number of pixels in each region, and those thatChapter 4. Implementation^ 56correspond only to horizontal motion, or vertical motion, will be keep tracked of. Theminimum and maximum boundary values of each component are useful in future pro-cesses such as merging and determining the rough positions of the region. The varioussums of vertical optical flow, horizontal optical flow, and stereo disparity within eachconnected component are computed. A stereo disparity histogram is constructed to de-termine the depth of a region in a later procedure. Other useful data for computing thecentroid of a region are also collected in this segmentation process. The Front ProgramThis Transputer program can be regarded as the main controller of the activities carriedout by the reasoning system. The tag program plays a passive role in the operationsof the system by waiting for signals to start up the segmentation process. This frontprogram has an active loop requesting that the connected components be sent over bythe tag workers, and deciding which of those components should be used as the targetto be tracked.Upon receipt of the connected components, a selection procedure will decide whichregion is the right or best one to follow. As discussed already in the previous chapter,this is a rather ambiguous task which really depends on how the motion tracking systemis defined to behave. Based on the observation that if an object is being followed inthree-dimensional space, its disparity should be close to zero, and its centroid shouldbe fairly close to the center of the image if the motion of such object is relatively slow.Using such ideas as the selection criteria will at least ensure that once the target isbeing followed, the system will likely continue to track its motion. However, it doesnot address the initial objective of having our motion tracking system paying attentionto multiple moving objects. In order to shift attention to other moving objects, thereChapter 4. Implementation^ 57have to be some conditions, such as it is more important to pay attention to the closestobject or the largest moving region is more interesting, to override the zero-disparitycriteria. However, doing so may cause the robot head to move aimlessly on noisy images.Different selection routines have been written to pick out a connected component, usingthe data collected by the segmentation process. Only testings can demonstrate whichmethod is the better behaving one.The front program is also responsible to compute the centroid of the chosen target.The centroid (x,, yc) can be calculated by the following equations:E xixe = ^E Yi Yc=where xi and yi are x and y positions of a point belonging to the region, and n is thenumber of points in the region.The centroid will be taken as the representative point of the region to be trackedwhen computing the motion parameters of the robot head. The motion parameters,once computed, will be passed on to the back program for action. The Back ProgramThe only responsibility of the back program is to move the robot head, according to themotion parameters received. Signals will be sent over to the digital-to-analog converterboard and the encoder boards to drive the various motors. The low-level controllingroutines, and some inverse kinematics functions have been provided by Rod Barman,one of our friendly LCI staff. The details of such procedures are beyond the scope of thisthesis.Chapter 4. Implementation^ 58It should be re-iterated that the head motion must be slow and smooth in order toallow our control theories and correlation matching technique to function properly. Thevelocity and acceleration of the robot head can be adjusted to a suitable level during thehead initialization process. Once the head has moved to its destination, a completionsignal will be sent back to the front program, indicating that processing can be continuedfor the next set of data.4.4 Computing the Motion Parameters of the RobotHead4.4.1 Panning and TiltingOnce the centroid of the visual target has been computed, the pan and tilt motionparameters of the robot head can easily be determined. The goal of tracking is to keepthe region of interest at the center of the stereo images. Finding out how much thecentroid has moved away from the center of an image is an easy task. Such differencescan be simply converted, using the average mapping values, to the degrees of motion thatthe robot head should undertake to follow the moving region. However, the centroid ofthe region identifies the location of the object before the motion, rather than after themotion. This is due to the fact that our optical flow accumulation process adds newflow values to the origin of the flow. That is, the coordinate system of the accumulatedoptical flow image is the one used in the first image frame of the motion sequence, andthe optical flow values point to the destinations of the respective points at the end ofthe sequence. As a result, the pan motion parameter should also take into considerationthe average horizontal optical flow of the region, and the tilt motion parameter shouldStitt =mapverticalwYe — 7 Vaverage VbackgroundChapter 4. Implementation^ 59be computed also using the average vertical flow value of the region. The expectedbackground optical flow previously subtracted from the flow field should be added backto the average flow values, so that the pan and tilt motion parameters reveal the truedisplacements of the target object with respect to the robot head after its motion ratherthan before its motion.The formulae used are:Xc —^haverage^hbackgroundSpan =MaPhorizontalwhere Span is the pan motion parameter, Stilt is the tilt motion parameter, and w is thesize of the image, assuming that the width and the height of the image is the same.The resulting motion parameters, which are measured in radians, direct the robothead to move to a new location, and the cameras will still be looking at the object beingtracked, if the total response time of the motion tracking system is compatible with thespeed of the moving object.4.4.2 VergingVergence movement is a coupled motion of the two cameras wherein the cameras rotatein opposite directions. Determining the verge motion parameter requires understandingof the geometry of using stereo cameras, and their image planes. Referring to figure 4.7,the two stereo cameras can be verged inward or outward. The two image planes aretherefore not necessarily coplanar, but the optical axes of the cameras should still lie inthe same plane.In the diagrams, cv denotes the current verge angle with both cameras pointing atChapter 4. Implementation^ 60Figure 4.7: The Geometry of Using Stereo CamerasChapter 4. Implementation^ 61point p, and nv denotes the new verge angle if the cameras are to direct attention topoint q. The disparity 6 of an image point p is defined to be:6 = Xr — XI^ (4.1)where .x,. is the location of the pixel found on the right image plane, and xi is the locationwith respect to the left image plane.Let us assume that the motion tracking system can react fast enough to follow themoving object, so that the object always appears close to the center between the stereocameras.Based on the symmetrical movement of our stereo cameras on the motorized platform,we can assume that the centroid, being close to the center between the two cameras, issymmetrically displaced from the center of the image plane, so1d = lw — xr = xi — —2w2as can be seen from the diagrams, and thereforexi + Xr = W^ (4.2)where w is the size of horizontal dimension of the image plane.The point that the two optical axes intersect is usually referred to as the vergencepoint, which has zero disparity. For points that are closer to the cameras than thevergence point, their disparities are always less than zero, and for any point which isfarther away from the cameras than the vergence point, the disparity is always greaterthan zero.Figure 4.8 expands the view of the right image plane as introduced in figure 4.7.When the cameras verge, the image planes will follow. By simple geometry, we areable to identify the values of the angles as labelled. The angle 6-verge represents the degreeChapter 4. Implementation^ 62Figure 4.8: Determination of the 8verge angleof vergence movement required in order to change the total vergence angle from cv tonv. From the diagram, we can observe that1^1^1—2 8ver9e = rt) — —2 cv1^1^1—2 Sverge = —2 cv — —2 nvand f is the focal length of the cameras.if (5 < 0if 8 > 0The goal here is to compute ISverge based on the disparity on the region to be tracked.We know that1^dtan (-26verge) = 7where1d = —2w — xr1d = xr — —2 wif (5 < 0if ( 5' > 0Chapter 4. Implementation^ 63Given that f is known, we need to retrieve xr from disparity 8. From equation 4.1, weknow thatX,. = 8 + xiand from equation 4.2, we know thatTherefore, we can derive that8 + wxr — 2^andFinally,tan (-21 Sverge)tan (-21 Sverge)--1,58verge = 2 x arctan —2— —f8verge -- 2 x arctan -/afif 8 < 0if 6> 0if 8 <0if 8> 0The direction of &erg, is based on the the sign of b. As a result, we end up with onlyone equation:1 s6verge = 2 x arctan (1-)fThe focal length f should be measured in unit of pixels. A CCD camera calibrationprocess [Beyer, 1992] is technically required to perform such measurement; similar workis currently in progress. The value of f that we use in our experiments is obtained froma trial-and-error method.Chapter 4. Implementation^ 64We should re-iterate that our derivation of 6- verge relies heavily on the assumptionthat the reference point we are focussing on lies on the center plane between the stereocameras. This is a fair assumption if the motion tracking system can center the region ofinterest as quickly as possible, and thus allows the derivation to come up with a reliableestimate of how much vergence movement is required.Chapter 5Evaluation and DiscussionThis chapter presents the evaluation of our motion tracking system implemented as de-scribed in the previous chapter. Several experiments have been performed to demonstratewhat can be done and what cannot be done by our motion tracking system. We will findout as to what extent our system is able to operate satisfactorily, and will discuss theproblems which we have encountered in our experiments.We will attempt to find solutions to the questions being brought up in previousdiscussions, as well as some issues which have to be addressed in future research. Wewill briefly compare our motion tracking system with other different types of systemsimplemented elsewhere. We will suggest and describe some ideas to improve our currentsystem in the next chapter.65Chapter 5. Evaluation and Discussion^ 665.1 Evaluation and Performance of our Motion Track-ing SystemThe performance and usefulness of a real-time motion tracking system can be measuredin terms of the following parameters:• the amount and size of data that can be handled• the update rate, and response time• robustness, i.e., accuracy and stability• the computational resource required• the maximum speed of a moving object that can be followed• the maximum velocity of the robot head that is allowed• the amount of data that can be produced, perhaps for use by some other programsat the same timeThe datacube program works on gray-scaled images input from the stereo cameras,with the ranges for correlation specified in an input data file. In the experiments that wehave carried out, we generally use the correlation range [-3, +3] for horizontal motion, [-2,+2] for vertical motion, and [-13, +13] for stereo. Each subframe of the 128x512 outputimage is selected to be of the default size 128x128. The datacube program reports anoutput rate of approximately 10 new images per second based on this configuration. Itshould be noted that the datacube output is being updated at video rate, i.e., 30 Hz, butnew data is available at the rate of 10 Hz. Experimentation reveals that our Transputerprocesses cannot perform their duties in real-time with the processing of the 128x128Chapter 5. Evaluation and Discussion^ 67optical flow images. It is therefore necessary that these output images be scaled downto a lower resolution. If rescaling of the optical flow and stereo subframes to the sizeof 64x64 is being performed by the frame grabbing Transputer process, then the activemonitoring process cannot execute at a rate compatible with the datacube program'soutput rate, and as a result, the optical flow accumulators can only pay attention toabout 95% of the output data.Allowing the datacube program to rescale the output subframes to the size 64x64will also get us only 10 new images per second, for the reason that the output image willstill remain to be of size 128x512 but each of the four subframes is of size 64x64. Eventhough we are unable to speed up the output rate, rescaling being done on the Datacubehardware is still the favorable option because the offseting errors of resampling, such asround-off errors, after correlation will be eliminated. In addition, it appears that lessnoise is being generated in the lower resolution output. However, since the size of theinput gray scaled images will remain unchanged, smaller image size for correlation meansthat smaller motion will not be noticed as often. For example, if the input horizontalimage size is 512, then 4 pixels of horizontal movement will translate to 1 pixel of dis-placement in the 128 pixels wide output image, but 8 pixels of horizontal movement isrequired for the 64 pixels wide output image to record a pixel of displacement. Never-theless, since there is less work to be done by the frame grabbing Transputer process,the active monitoring process and the optical flow accumulators can consequently attendto all output images returned at the rate of 10 Hz.The various response time of the different Transputer programs usually depend onthe nature and the goals of the experiments, and will be reported along with the resultsof the different experiments explained below.Chapter 5. Evaluation and Discussion^ 68We assume that the speed of any moving object and the robot head is slow andsmooth when establishing our control theories. With the correlation ranges for motionused as mentioned above, our correlation process can produce reliable matches for anobject moving with a speed at most 1.2 metres per second approximately, if the objectis 2 metres away from the cameras. Moving objects that are farther away from thecameras can be supported at a slightly higher velocity. Objects that are moving fasterthan the speed limit cause unreliable correlation output, as can be instantly observedfrom the different color patches appearing on the TV monitor. Currently, we allow ourrobot head to pan or tilt with a maximum speed of roughly 15 degrees per second, sothat the background velocities generated are always within the correlation ranges of thedatacube program. These upper bounds for velocities are in general determined usingthe trial-and-error method.5.1.1 Experiments, and What Can Be DoneThrough experiments, we can show what our current system is able to achieve, studythe advantages and disadvantages of our approach to motion tracking, and motivate newideas for future improvements. Detection of Moving Objects when Robot Head is Not MovingBefore we allow our robot head to track, we first have to confirm that the correlation sys-tem is returning usable data, the perception system is working properly in accumulatingoptical flow, and the reasoning system is able to segment the flow field and pick out thetarget. Therefore, our first and the easiest experiment is to simply detect the locationsof a moving object in view over time without following it. This way, we can analyse theChapter 5. Evaluation and Discussion^ 69output of the reasoning system without being distracted by the cameras' motion.The experiment is conducted with a person walking into the view of the cameras fromthe left side, and slowly moving to the right, and then walking back to the left and exitfrom view. It should be noted that for the purpose of easy verification of results, onlyone moving object is present in the image sequence.The datacube program outputs the displacement data at a rate of 10 Hz. The activemonitoring process is able to notice all output images, and the optical flow accumulatorscan attend to all the 64x64 optical flow images returned.Not surprisingly, the tag programs can correctly segment the moving region from thestationary background, and the front program correctly reports that an object can beseen moving from the middle left to the right of the image and then back to the top leftby monitoring the centroid and the boundary of the moving blob (see figure 5.1). In thisexperiment, we pick the largest motion blob as our visual target, and it appears to beworking very nicely with slow moving objects. Each PRA iteration takes an average of1.38 seconds to complete.This experiment gives us the confidence that our perception system is functioningproperly, and that we are able to detect and identify the locations of a moving object,although there is no background optical flow generated by the stationary cameras. The Vergence Only ExperimentThe second experiment that we have carried out is to control the vergence movementfor fixation without panning or tilting the robot head. This experiment is conductedwith a person walking towards and away from the stereo cameras, while keeping himselfor herself close to the center of the images at all time. The input disparity range forChapter 5. Evaluation and Discussion^ 70Centroid of the Moving Person over TimeY34.0033.5033.0032.5032.0031.50Centroid 1^1 x20.00 40.00Figure 5.1: Results of the Motion Detection ExperimentChapter 5. Evaluation and Discussion^ 71correlation is [-13, +13], and the datacube program is pumping out images of 64x64subframes 10 times a second. Only stereo data is required for this experiment, as wehave already derived an equation for verging the cameras based on only disparity. Itshould be pointed out that the optical flow accumulators and the segmentation processesare not required for managing any data in this experiment. Stereo disparity images aretransmitted directly from the frame grabbing server to the front program for processing.For the sake of simplicity, we are only analysing data in the center 16x16 window in thestereo disparity image at one time.The system is able to verge on the target, in the middle of the images, extremelyquickly with a rate of approximately 10 moves' per second, and quite accurately withdense and reliable stereo data input. It is observed that the system is fairly stable whenthe object is stationary, that is, the cameras can fixate on the un-moved target with onlyminor vergence movements due to round-off errors and noise. On the other hand, if thecorrelation process fails to return reliable and dense data, for instance, when the personstanding in front of the cameras is wearing a white shirt with no texture, then it is nosurprise that the robot head will frequently lose control and oscillate back and forth ina ridiculous and random fashion.What we have learned from this experiment is that it is extremely easy to control therobot head with the type of data that does not have to be accumulated. In this case, weare not interested in how stereo changes over time, but only interested in how close anobject is at a particular time instant. The performance of the tracking system certainlyrelies heavily on the accuracy of input data, especially in this experiment, where wedo not interpolate the data of several stereo disparity images to obtain more accurateestimates of depth for tracking.1When the reasoning system decides not to wait for the action system to finish the robot head'smovement before continuing its processing.Chapter 5. Evaluation and Discussion^ 725.1.1.3 Panning and Tilting without VergingIn this experiment, we allow our system to track an object moving in 3D by panning andtilting the robot head. In short, we will call this experiment the "pan-tilt" experiment.We are using the same input configuration for the datacube program as in the otherexperiments. From the previous motion detection experiment, we know that our systemcan correctly locate moving objects when there is no head movement. The goals of thisexperiment are to find out if we can correctly and reliably detect moving objects whilethe robot head is moving, and to examine the robustness of the overall system. Theresults collected will be analysed and further examined in the later discussions of theeffectiveness and problems with our background optical flow cancellation method, thesegmentation process, and the different selection algorithms.The three optical flow accumulators, each responsible for dealing with one-third ofthe received data, can handle all the 64x64 optical flow images available at the rateof roughly 10 Hz. Our perception system will continuously add up the flow vectorsregardless of whether or not the robot head is moving. The accumulated optical flowvectors are transferred to the segmentation processes of the tag programs on request.Each of the three segmentation processes will logically be responsible to work on one-third of the flow field. Such processes require an average of 0.85 second to complete thesegmentation and statistical data collection duties. The connected components will thenbe passed on to the front program, which selects the visual target for the robot head totrack.The PRA loop will not repeat itself until the robot head has finished moving to a newlocation. The average time for executing one PRA loop is 1.8 seconds, which includes thetime for the massive data being transferred among the Transputer processes, the time formessages being displayed by the different Transputer programs intended to report theChapter 5. Evaluation and Discussion^ 73status of execution, and the time for waiting the robot head to complete its movement.As a side note, it should be pointed out that printing too many messages from theTrollius operating environment back to the terminals on the Sun host will significantlyincrease the overall response time.We have tested our motion tracking system on the simplest case scenario where thereis only one object moving slowly in front of a textured background environment. Weshould point out that the background is not artificial, that is, the background is theusual cluttered laboratory environment. It is not especially arranged to have sufficienttexture. The responses of the system have been fairly consistent, in that the robothead has been able to follow the moving object most of the time. The output of thesegmentation process does indicate that our system can separate the moving object fromthe background even when the robot head is allowed to move, given that our backgroundscene has purposely been designed to contain rich textural patterns so that the correlationoutput is dense with minimal mismatches. However, the system exhibits slightly differentbehaviour, as expected, in the passive following of the moving object, depending on theselection method used for picking out the visual target.If the moving object occupies at least half of the image, then using the largest areaselection method will enable the robot head to successfully track the moving object atmost time, except that when the object slows down or attempts to change its direction,the correlation process will fail to detect consistent optical flow within the expectedregion, and the system will eventually lose track of the target due to the backgroundoptical flow that cannot be completely eliminated. However, we have observed that oursystem can easily recover from false alarms and "re-track" the moving object once therobot head has stopped moving.If the closest area selection method is used to pick out the target object, then theChapter 5. Evaluation and Discussion^ 74stability of tracking the moving target will rely extremely heavily on the precison ofoptical flow computation, for the reason that without recognition, we are forced to useoptical flow vectors to access stereo data for retrieving disparity values of the object.Round-off errors and noise are the major factors that we are generally getting poorperformance with this technique, although it has been shown to be capable of followingan object moving at a slow and constant speed. It has been observed that the backgroundoptical flow, however, has little effect on selecting the closest object as target.Experimenting with multiple moving objects demonstrates that there are still a lotof problems remaining to be solved. One of the biggest questions is when the motiontracking system should shift attention to track another moving object. Depending onhow the motion tracking system is defined to behave, there is certainly no single solutionwhich will work perfectly in all situations.The results of this pan-tilt experiment have revealed a number of problems with ourcurrent implementation, which will be discussed in details in the following sections. Ourcurrent motion tracking system is frankly far from being robust or stable. It is our hopeto speed up the response time to at most I second per PRA loop. That is, we wouldlike the robot head to make at least one move per second. This will definitely be a vastimprovement to the current near- real-time performance.5.1.2 Deficiencies, Problems, and What Cannot Be Done5.1.2.1 Rigid Objects AssumptionThe efficiency and reliability of a control system must take into consideration what it failsto accomplish, either expectedly or unexpectedly. We have described the assumptionswhich are made in connection to the limitations of our system. Specifically, we haveChapter 5. Evaluation and Discussion^ 75assumed that we are dealing only with rigid moving objects, which will retain theirshapes and structures throughout the period of being tracked. Non-rigid objects ingeneral will not cause a great deal of difficulties for our system, except that changes inthe rigidity of an object will be regarded as changes in locations by our reasoning system,which has no knowledge about any one specific object it is dealing with. Disadvantages and Problems of Using Correlation MatchingThe engine for detecting motion is correlation matching. As we have explained previously,a correlation range has to be selected for the matching operation. Due to the nature thatthis range is usually quite small as we desire to obtain fast output, we assume that objectsare moving at a speed smooth and slow enough so that the matching operation can returnreliable estimates for displacements. If an object is moving too fast according to thecorrelation range defined, then the correlation output will be inconsistent and extremelyunreliable, and is undoubtedly unusable and impossible for the segmentation process tocome up with a good description for driving the robot head to continue tracking. Inother words, if correlation fails, then the tracking operation may immediately behaveabnormally and unexpectedly.The output of correlation is often corrupted by noise, mostly caused by quantizationerrors on edges, and sometimes the flickering of fluorescent lighting. The precisenessof the correlation output is inversely proportional to the resolution size of the images.Given the fact that images have been subsampled from the input size 512 to 64 forcorrelation, the fine details have been softened, and thus quantization errors increasewith the coarser features in the images. The effect of noise visualizes as small motionpatches on optical flow images that are frequently misinterpreted as moving objects. Asa preventive measure against the distracting effect of noise, we decide to track objectsChapter 5. Evaluation and Discussion^ 76only of a reasonable size by definition2. In other words, small connected componentswill be eliminated, and as a result, our motion tracking system is unable to track smallmoving objects, i.e., objects of sizes which are below the threshold defined for filteringout the small motion patches assumed to be produced by noise. This implementationerror is inevitable in the attempt to stabilize motions of the robot head based on noisyinput data.In addition, the disparities of a moving object are retrieved from using optical flowvectors. Any error in correlation matching will seriously degrade the accuracy of access-ing the stereo data, and thus handicap any other processes which rely on using stereodisparities for processing, such as the closest region selection method.Correlation errors have also directly caused problems in the segmentation process.The accumulated correlation errors in the optical flow accumulation processes will notnecessary cancel out, and this leads to an inconsistent and noisy flow field to be used insegmentation. Problems with Background Optical Flow CancellationThe background optical flow cancellation process is used with the assumption that thecorrelation process is capable of producing reliable matches at every point in the image.Unfortunately, this is not always the case when using real world data. As can be seenfrom our pan-tilt experiment, the background scene contains rich textural informationfor correlation. If it is the case that the robot head is moving but no optical flow isproduced for the stationary background, for instance, the cameras are looking at a whitewall, then our cancellation process will still attempt to subtract the expected backgroundflow from everywhere of the flow field, not knowing that the correlation process has failed'Possibly greater than 2% of the image size.Chapter 5. Evaluation and Discussion^ 77to return useful data. As a result, the motion tracking system will mistakenly identifythe compensated motion flow field as moving objects, thus creating chaos and confusionfor the selection routine. Reliance on ConnectednessThe effectiveness of our control theories relies heavily not only on the reliability of thecorrelation output, but also on the fact that the flow field generated by a moving ob-ject must appear to be contiguous. As we have no knowledge about any object, weare required to make the assumption that each connected component produced by thesegmentation process corresponds to an individually moving object, which has no rela-tion with any other moving object in view. The motion field corresponding to a rigidvisual target may be split into several unconnected blobs if part of the target is beingoccluded. This leads to inaccuracies in tracking as the centroid of the target is computedfrom a partial motion field corresponding to that target. There is no simple solution togrouping unconnected motion blobs together without acquiring the service of an objectrecognition process similar to the one reported in [Lowe, 1987], which is mainly basedon perceptual organization. Therefore, it is unavoidable for our motion tracking systemto make such a mistake in computing the centroid.On the other hand, two objects moving together with the same motion parametersmay cause similar optical flow, and may be grouped into one single partition if theircorresponding motion patches are connected in the optical flow images. It is possibleto separate perceptually the two objects using stereo disparity if they are located atdifferent depths. However, if this fails, and without any further information about theobjects, then the single partition will be interpreted as one moving object by our motiontracking system.Chapter 5. Evaluation and Discussion^ 785.1.2.5 Panning, Tilting, and Verging SimultaneouslyOur current motion tracking system does not fully support the panning, tilting, andverging motions simultaneously at this time. The background optical flow cancellationprocess in our current system only deals with optical flow generated by the panningand tilting motions. Another cancellation process will be required to handle optical flowgenerated by the vergence motion separately, if we are to allow our robot head to movein all three axes at the same time to fixate a 3D location. It is rather easy to implementthis additional cancellation process; however, it is our intention at this moment to studythoroughly the current setup, and to refine the control theories and implementation forbetter performance, before we proceed on to handling more complicated experiments.The additional cancellation process for vergence can perhaps be implemented usingthe framework of the current cancellation process. It can also be based on the observationthat optical flow generated from the right camera should in theory cancel out the opticalflow generated from the left camera. The latter idea requires optical flow to be computedalso from the point of view of the right camera.Being able to verge while tracking allows us to use the zero-disparity feature to factorout the target object to be followed. We will certainly experiment with this idea oncewe have a more stable system. Other Minor IssuesAs for all motion tracking systems ever implemented, it should be obvious that if anobject moves out of the range limit of the robot head, i.e., the object is out of reach,then it is impossible to continue the tracking operations on such object.There are some other minor issues surrounding the performance of our motion track-Chapter 5. Evaluation and Discussion^ 79ing system, which deserve some attention, but are often ignored since they are in generalnot expected to have serious impact as compared to other issues that have already beenor will be discussed. If we intend to increase the robustness of our system in every aspect,then much effort would be required on examining these issues in details.For instance, we may need to investigate the effects on correlation matching whenadjusting the focal length, aperture, shutter speed, gamma filters, and other control-lable features of the stereo cameras. A CCD camera calibration process [Beyer, 1992][Healey and Kondepudy, 1992] should be employed to estimate and eliminate noise inorder to obtain more precise data. Backlash of the robot head often generates faultyinput data. Applying more weight to the pan and tilt axes may help to eliminate thebacklash effect, but it is still unknown if such action will slow down the head motion.However, we will soon be moving to a new robot head with significantly less backlash.We may also need to adjust the parameters of the inverse kinematics equations inorder to obtain smoother head movements. We should also find out ways to speed upthe data transfer rate between two Transputer nodes within the Trollius environment orany other operating system we may use in future.5.2 Additional Discussion5.2.1 The Dumb Motion Trackers versus Our Motion TrackerThe absolute simplest tracking system is one that attempts to find some features, e.g., awhite dot on a black background, on a single image to follow, without even computing formotion. Such system is undoubtedly useless when dealing with real world data. Anothersimple tracker can be made by sensing something moving by taking the difference of twoChapter 5. Evaluation and Discussion^ 80images grabbed at different time, creating a binary image by thresholding the differences,and then finding the centroid for tracking. Although the system is fast, it is very difficult,if not impossible, to deal with noise or multiple objects. Computing displacements formotion using correlation matching, for example, will logically be the next option, if onewants to handle multiple moving objects.The simplest motion tracking system is perhaps one that will observe the world onlywhen the robot head is not moving, find out what is to be tracked, and then move therobot head without paying any attention to the changes in the world. The stop-and-lookmechanism used in these systems excludes the idea of imaging while moving, and theuse of sequential operations is extremely easy to implement and manage. Although thissimple control system can be characterized as being a motion tracking system, it is onethat is neither efficient nor robust. This system cannot continue to detect changes to anyobject while data is being analysed and the head is being moved to a new location. Thiswould imply that the system can only act on a limited set of knowledge of the world,and the fact that grabbing an image should be a relatively fast operation as comparedto data processing, the limited set of knowledge represents a very small proportion ofchanges that have occurred. It should be obvious that this system will experience greatdifficulty in following any target in a reliable and consistent manner with discrete anddiscontinuous input. For instance, there is no guarantee that the system can always grabthe images that will identify the object generating the largest motion while the head ismoving. Using assumptions or previous results definitely will not help much in fillingfor the missing information. We consider the idea of blindfolding oneself while in a busystates as a dumb behaviour, and in the extreme case unacceptable for use in a real-timesystem where responses are expected to reflect changes that are supposed to be smoothand steady.The deficencies of not being attentive to the world at all time motivate our use ofChapter 5. Evaluation and Discussion^ 81parallel processes, which have already been described in our control theories. Our ideais to continuously monitor the changes in the world so that we know exactly what hadhappened while our reasoning processes were busy working on something else.5.2.2 Thresholding in SegmentationOur goal of segmentation is to divide the optical flow field into separate connected com-ponents, each of which has a consistent set of optical flow values. Segmentation is knownto be an interesting but hard problem, as reported by many researchers using differenttechniques [Jain, 1984] [Adiv, 1985] [Yamamoto, 1990] [Boult and Brown, 1991]. It isparticularly difficult in our case when time constraint is a big factor preventing the useof sophisticated and complicated methods, which are usually time consuming.The input to our segmentation process is accumulated integer valued optical flow. Thethreshold values unfortunately depend on the number of images used, as the error rangeof the round-off errors grows awkwardly and undesirably larger with more optical flowimages being accumulated. Accumulated round-off errors is a major source of problemscausing us to use relatively large threshold values in order to prevent the flow field to beoverly divided into tiny motion patches. Correlation errors and noise also greatly affectthe performance of our segmentation process.Currently, we are using the magnitude of a vector as a criterion for determiningconnectedness, with the corresponding threshold set at 5 pixels. Smaller thresholdsoften misled the system to believe that the moving object is small in size, but in fact theobject may be as large as one-third of the image size. Larger thresholds, on the otherhand, generate the inverse effect of loosening the constraints for connectedness, but ingeneral, cause less troubles for losing track of the target.Chapter 5. Evaluation and Discussion^ 825.2.3 The Selection MethodsThe problems we have with the background optical flow cancellation process have sloweddown our investigation on the effectiveness of the different selection methods for pickingout the visual target. As can be seen from our pan-tilt experiment, the largest regiontechnique have already provided us a good starting point to work with, if we can somehowapply tricks to remove the background region by examining the bounding box of themoving region.Some work is underway to find out how to make appropriate use of the "pop-out"effect created by the cancellation process. In theory, the magnitude of the backgroundoptical flow after cancellation should be significantly smaller than other moving objects.A selection method attempting to pick out the object that has the largest motion, areasonable size, and a disparity value close to zero if vergence is also used in tracking,can perhaps exhibit stable behaviour if data is reliable to begin with.5.2.4 Robustness versus Speed TradeoffThe robustness and speed tradeoff problem exists in almost all real-time applications. Ifwe want our motion tracking system to be robust, then huge amount of processing timeis usually required for extensive computation. However, the robustness of a real-timesystem often depends on the response time and its ability to catch up with the pace ofthe real world. In order to speed up the response time, we usually need to cut down onthe amount of computation, but that will undoubtedly decrease robustness, given thatthe computational resource is normally limited!Knowing that it is difficult to have a system which is both extremely fast and robust,we need to value our goals and select the appropriate and proper behaviour to suit ourChapter 5. Evaluation and Discussion^ 83objectives. We would not want a motion tracking system which spends an enormousamount of time "thinking" before responding. Such system would unlikely be classifiedas functioning in real-time. In other words, it is our belief that preciseness is in generalless important than speed in a real-time system, considering the fact that the system canhave many future chances for recovering and correcting any erroneous response made.We should point out that a response made by the motion tracking system is thereaction to changes that have occurred while the system was busy working on the previousset of data. It would be wise to have a system which responds in a timely fashion, sothat it would not fall behind seriously by analysing "out-dated" data. The faster thesystem can respond, the less the world has changed. Also, there will be less round-offerrors and noise being accumulated with a faster responding system. Slow respondingmotion tracking systems can lose track of the target extremely easily, as objects maymove out of the views of the cameras before they have been noticed.5.2.5 Comparisons with Other Motion Tracking SystemsIn this section, we will briefly comment on the differences between our current motiontracking system and some other different types of systems implemented elsewhere.Following the largest moving object is a popular choice in motion tracking. A simpleone camera motion tracking system has been implemented to pick the second largestmoving area as target, assuming that the dominant motion to always be the motionof the background [Woodfill and Zabih, 1992]. Such system is running on a 16 kilo-processor Connection Machine; the tracking algorithm is sequential, and it is currentlydealing with only one object. It is our opinion that there is no guarantee the backgroundmotion is always the largest in magnitude or in size. Our system takes the approachof attempting to cancel out the background motion, or at least minimizing its effectChapter 5. Evaluation and Discussion^ 84causing it to be the least significant motion, i.e., all moving objects should have flowvalues larger than background motion after cancellation. Eventually, we will pick thetarget based primarily on the magnitude of motion, instead of the size of the region.Coombs' approach [Coombs, 1992], as already discussed in Chapter 2, is to first vergethe cameras and then use a Zero-Disparity Filter to pick out the target. Their systemis quite robust and has good performance with a prediction module. It is not clear,however, how such system can deal with multiple objects, or to shift attention. Theapproach taken at the University of Rochester is to throw away, or filter out, irrelevantdata or useless information as quickly as possible. Our datacube program produces stereodata, so it would be simple to implement ZDF in our system; we have decided to attacka more difficult problem, using motion data. Our current system uses more output data,and therefore, has a much better record of the motion paths of all moving objects inview for further analyses.An attentive control system [Clark and Ferrier, 1988] has been implemented to trackfeatures. As described in their report, shifts in focus of attention is accomplished byusing a saliency map and by altering of feedback gains applied to the visual feedbackpaths in the position and velocity control loops of the binocular camera system. Sincewe do not have any knowledge about any object we are dealing with, our system simplyuses motion field to drive attentional processing.The KTH Head has 13 degrees of freedom and 15 different motors, simulating theessential degrees of freedom in mammalians [Pahlavan et al., 1992]. Their current worksuggests the integration of low-level ocular processes for fixation, and the use of coopera-tive vergence-focussing to assist the matching process. It is difficult for us to experimentwith their ideas and the focussing algorithm, as our current LCI head does not havemotorized focus control or zoom lenses.Chapter 6Conclusions and Future Directions6.1 Concluding RemarksThe primary goal of motion tracking is to keep an object of interest, generally knownas the visual target, in the view of the observer at all time. One of our objectives inthis project is to implement a simple gaze control system on our robotic camera headsystem to demonstrate that we can find out where an object is without knowing whatthe object is. Our decision of not having, in our system, any particular knowledge aboutany object prevents the use of a feature-based object recognition process for tracking.As an alternative, tracking can be driven by changes perceived from the real world, andin this project, we have chosen to use displacements as the major source of directingsignals.In this thesis, we have described a set of control theories to track a moving object inreal-time with a three degrees of freedom robot head. The recent advances in computerhardware, exemplified by our Datacube MaxVideo 200 system and a network of Trans-85Chapter 6. Conclusions and Future Directions^ 86puters, make it possible for researchers to perform image processing operations at videorates, and to implement real-time systems with input images obtained from video cam-eras. We have taken advantages of our powerful equipment at LCI, and have developed apassive motion tracking system which reacts to changes in the surrounding environmentby panning, tilting, and verging the robot head.The control scheme of our motion tracking system is based on the Perception-Reasoning-Action regime. Our idea is to use parallel processes, which communicate with one anothervia message passing, to actively monitor changes in the environment, to process displace-ments and select the visual target, and to control the movements of the robot head. Theamount of data to be processed and the amount of computation have been designed tobe as minimal as possible, so that the system can react to the changes perceived in atimely manner, and can keep up with the pace of the world.We have described an elegant approach of using an active monitoring process togetherwith a process for accumulating temporal data to allow different hardware componentsrunning at different rates to communicate and cooperate in a real-time system workingon real world data. We have no control on changes in the real world, and therefore, thereshould be no delay to the processes producing data to reflect such changes, even whensome other processes are in busy states. The stream of output data can be stored andgrouped together for the busy processes to retrieve at a later time.We have also described a cancellation method to reduce the unstable effects of back-ground optical flow generated from ego-motion, and create a "pop-out" effect in the mo-tion field in the sense that after cancellation, the motions of the moving objects shouldbe significantly larger than the background motion. A simple segmentation process hasbeen developed to partition the motion field into separate connected components basedon the consistency of optical flow vectors; each component is interpreted as a movingChapter 6. Conclusions and Future Directions^ 87object. A few selection routines have been implemented to pick out a target. No singleselection method is sufficient to be applied in all situations. The one to be used reallydepends on how the motion tracking system is defined to behave.The problem of tracking moving objects using only displacements has shown to bedifficult and challenging. The results of various experiments provide us with an insight ofthe difficulties of tracking without any knowledge of the world and the objects. Round-offerrors, quantization errors, and correlation errors have seriously affected the performanceand degraded the robustness of our motion tracking system. Nevertheless, we are ableto track a moving person by panning and tilting the robot head in approximately 1.8seconds, and fixate an object by verging the stereo cameras at a rate of roughly 10 movesper second.The system we have described in this thesis can assist other vision tasks, such asobject recognition, by always keeping the object of interest in view for studying. Themotion path of the object being tracked can easily be recorded, and may allow a motionpath based object recognition system to be developed.The central assertion of this thesis is that we can track an object prior to such objectis being identified. Displacement has proven to be an important cue for tracking. It isarguable as to whether or not optical flow and stereo disparity can be classified as primaryvisual cues. Nevertheless, they provide the sufficient information for our motion trackingsystem to follow an object moving in the three-dimensional space that is accessible tothe cameras.Chapter 6. Conclusions and Future Directions^ 886.2 Future Work and Possible ImprovementsThere are a number of items and ideas on our list for improving the performance andfor expanding the functionalities of our current motion tracking system. The followinglist also covers areas which we have to address in future research.• We should include prediction in future versions of our motion tracking system. Aprediction process, as we have already described, allows us to move the robot headeven when some reasoning process is busy analysing data. A Kalman filter canbe used to implement the prediction process. The control theories will have to berevised to accommodate the idea of parallelizing the operations in the reasoning andaction systems. The prediction process can be very simple, as it is only concernedwith the centroid of the target.• We need to improve the correlation technique used in the datacube program so thatthe displacement output is more reliable. Neighboring displacement values shouldbe examined when there is a need to resolve ambiguities. We should considerreturning the confidence measures within the correlation process to the Transputerprocesses, so that displacements can be better managed on the receiving end.• Knowing that the correlation process will not guarantee reliable data at all time.We should find out ways to make use of the edges and Laplacian of Gaussian imagesreturned by our datacube program, so that more information can be available tothe segmentation process. Techniques such as those used in [Poggio et al., 1988][Gamble and Poggio, 1987] can perhaps be applied to our system.• We need sub-pixel accuracy for displacements in order to help eliminating theproblem of round-off errors and quantization errors being grossly built up in theChapter 6. Conclusions and Future Directions^ 89accumulation processes. This will help the background optical flow cancellationprocess to work to its potential. However, it is unclear how sub-pixel accuracy canbe obtained or represented in our current implementation.• One idea to cut down the amount of computation and to speed up the response timesignificantly is to use previously computed data to assist the picking of the visualtarget, as we already have the expectation of where the target will end up. Thecomputation of stereo disparity can be performed on a demand or feedback basis,so that the current problem of accessing stereo data using optical flow vectors asguides can be eliminated.• Experiments have shown that correlation matching performed on 128x128 imagescan return better and more consistent optical flow. We should find out ways tomake the datacube program return 64x64 output images, but allow the correlationprocess to work on images with larger resolutions for better matches.• Some work has already been started on optimizing the Transputer programs.Specifically, we want to discover routes to cut down the response time, and tospeed up the two-way data transfer rate among the Transputer processes beingexecuted on different Transputer nodes. We may need to upgrade the operatingsystem running on the Transputer nodes, or use faster hardware, for example,Texas Instrument TMS320C40 processors.• The current LCI robot head will be upgraded to use gears and motors that havesignificantly less backlash.• If we can create the special stereo mapping table for background optical flow can-cellation using a separate camera calibration process, then we may use the vergenceangle for indexing the special table if all three axes can move simultaneously, andthus allows a better approximation of the expected background flow to be used.Chapter 6. Conclusions and Future Directions^ 90• We need a more robust segmentation algorithm, which probably require more con-straints for partitioning the optical flow field. The prediction process may be ableto provide useful data for filtering out areas which are unlikely to contain the targetbeing followed, so that the overall response time can be reduced.• There are a few advantages for computing optical flow from the view points of bothcameras, as opposed to the current setup of using just the left camera.— The two sets of optical flow data provide more information to allow a consis-tency displacement check.— We can implement the additional cancellation process for vergence, so that wecan pan, tilt, and verge the robot head at the same time. In theory, opticalflow due to vergence from the left and from the right cameras should cancelout.— More importantly, we can process true three-dimensional data. Optical flowfrom the left camera and the right camera, together with stereo disparity,enable us to establish a field of 3D motion vectors. This allows the segmen-tation process to work with a more consistent and informative flow field. Inaddition, we can compute the 3D motion parameters of the moving objectsfor better analyses. Unfortunately, it is very unlikely that these features willbe implemented in the real-time system at this moment, since the extra com-putation will be extremely time consuming. In any case, they are certainlyworth looking into in the future.Bibliography[Adiv, 1985] G. Adiv. Determining Three-Dimensional Motion and Structure from Op-tical Flow Generated by Several Moving Objects. IEEE Transactions on PatternAnalysis and Machine Intelligence, 7(4):384-401, 1985.[Anandan, 1989] P. Anandan. A Computational Framework and an Algorithm for theMeasurement of Visual Motion. International Journal of Computer Vision, 2:283-310,1989.[Balasubramanyam and Snyder, 1988] P. Balasubramanyam and M. A. Snyder. Com-putation of Motion in Depth Parameters: A First Step in Stereoscopic Motion In-terpretation. In Proc. 1988 DARPA Image Understanding Workshop, pages 907-920,1988.[Ballard and Brown, 1992] D. H. Ballard and C. M. Brown. Principles of Animate Vi-sion. Computer Vision Graphics and Image Processing, 56(1):3-21, July 1992.[Barron et al., 1992] J. L. Barron, D. J. Fleet, and S. S. Beauchemin. Performance ofOptical Flow Techniques. TR-299, The University of Western Ontario, July 1992.[Beyer, 1992] H. A. Beyer. Accurate Calibration of CCD-Cameras. In Proc. IEEEConf. Computer Vision and Pattern Recognition, 1992, pages 96-101, 1992.[Boult and Brown, 1991] T. E. Boult and L. G. Brown. Factorization-based Segmenta-tion of Motions. In Proc. IEEE Workshop on Visual Motion, 1991, pages 179-186.IEEE, 1991.91[Brooks, 1987] R. A. Brooks. Intelligence without representation. In Proceedings, Work-shop on the Foundations of Artificial Intelligence, 1987.[Brown, 1989] C. M. Brown. Prediction in Gaze and Saccade Control. TR-295, Univer-sity of Rochester, May 1989.[Bulthoff et al., 1989] H. Bulthoff, J. J. Little, and T. Poggio. A parallel algorithm forreal-time computation of optical flow. Nature, 337:549-553, February 1989.[Christensen, 1992] H. I. Christensen. The AUC robot camera head. In SPIE Vol. 1708Applications of Artificial Intelligence X: Machine Vision and Robotics, pages 26-33,April 1992.[Clark and Ferrier, 1988] J. J. Clark and N. J. Ferrier. Modal Control of an AttentiveVision System. In Proc. 2nd International Conference on Computer Vision, pages514-523, 1988.[Coombs and Brown, 1992] D. Coombs and C. Brown. Real-time Smooth Pursuit Track-ing for a Moving Binocular Robot. In Proc. IEEE Conf. Computer Vision and PatternRecognition, 1992, pages 23-28. IEEE, 1992.[Coombs, 1992] D. J. Coombs. Real-time Gaze Holding in Binocular Robot Vision. PhDthesis, University of Rochester, Rochester, New York, June 1992.[Crowley et al., 1992] J. L. Crowley, P. Bobet, and M. Mesrabi. Layered Control of aBinocular Camera Head. In SPIE Vol. 1708 Applications of Artificial Intelligence X:Machine Vision and Robotics, pages 47-61, April 1992.[Drumheller and Poggio, 1986] Michael Drumheller and Tomaso Poggio. On ParallelStereo. In Proc. IEEE Conf. on Robotics and Automation, 1986, pages 1439-1448,Washington, DC, 1986. Proceedings of the IEEE.[Ferrier, 1992] N. J. Ferrier. The Harvard Binocular Head. In SPIE Vol. 1708 Appli-cations of Artificial Intelligence X: Machine Vision and Robotics, pages 2-13, April1992.92[Fua, 1991] P. Fua. A Parallel Stereo Algorithm that produces Dense Depth Maps andpreserves Image Features. Technical report 1369, INRIA, 1991.[Gamble and Poggio, 1987] E. B. Gamble and T. Poggio. Visual integration and detec-tion of discontinuities: The key role of intensity edges. A.I. Memo No. 970, MIT AlLaboratory, October 1987.[Gibson, 1979] J. J. Gibson. The Ecological Approach to Visual Perception. HoughtonMifflin Co., Boston, MA, 1979.[Healey and Kondepudy, 1992] G. Healey and R. Kondepudy. CCD Camera Calibrationand Noise Estimation. In Proc. IEEE Conf. Computer Vision and Pattern Recognition,1992, pages 90-95,1992.[Heeger et al., 1991] D. J. Heeger, A. D. Jepson, and E. P. Simoncelli. Recovering Ob-server Translation with Center-Surround Operators. In Proc. IEEE Workshop onVisual Motion, 1991, pages 95-100. IEEE, 1991.[Jain, 1984] R. C. Jain. Segmentation of Frame Sequences Obtained by a Moving Ob-server. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(5):624-629, September 1984.[Jenkin et al., 1992] M. Jenkin, E. Milios, and J. Tsotsos. TRISH: The Toronto-IRISStereo Head. In SPIE Vol. 1708 Applications of Artificial Intelligence X: MachineVision and Robotics, pages 36-46, April 1992.[Krishnan and Stark, 1977] V. V. Krishnan and L. Stark. A Heuristic Model for the Hu-man Vergence Eye Movement System. IEEE Transactions on Biomedical Engineering,24(1):44-49, January 1977.[Little and Gillett, 1990] J. J. Little and W. E. Gillett. Direct Evidence of Occlusion inStereo and Motion. TR-90-5, UBC Dept. of Computer Science, Vancouver, BC, 1990.[Little et al., 1991] J. J. Little, R. Barman, S. Kingdon, and J. Lu. ComputationalArchitectures for Responsive Vision: the Vision Engine. TR-91-25, UBC Dept. ofComputer Science, Vancouver, BC, 1991.93[Lowe, 1987] D. G. Lowe. Three-Dimensional Object Recognition From Single Two-Dimensional Images. Artificial Intelligence, 31:355-395, 1987.[Marr and Poggio, 1976] David Marr and Tomaso Poggio. Cooperative Computation ofStereo Disparity. Science, 194(4262):283-287, October 1976.[Matthies et al., 1988] L. Matthies, R. Szeliski, and T. Kanade. Kalman filter-basedalgorithms for estimating depth from image sequences. In Proc. 1988 DARPA ImageUnderstanding Workshop, 1988.[Nelson, 1990] R. C. Nelson. Qualitative Detection of Motion by a Moving Observer.TR-341, University of Rochester, April 1990.[Nelson, 1991] R. C. Nelson. Introduction: Vision as Intelligent Behavior — An Intro-duction to Machine Vision at the University of Rochester. International Journal ofComputer Vision, 7(1):5-9, 1991.[Olson and Coombs, 1991] T. J. Olson and D. J. Coombs. Real-time Vergence Controlfor Binocular Robots. International Journal of Computer Vision, 7(1):67-89, 1991.[Pahlavan and Eklundh, 1992] K. Pahlavan and J-0. Eklundh. Heads, Eyes, and Head-Eye Systems. In SPIE Vol. 1708 Applications of Artificial Intelligence X: MachineVision and Robotics, pages 14-25, April 1992.[Pahlavan et al., 1992] K. Pahlavan, T. Uhlin, and J. Eklundh. Integrating PrimaryOcular Processes. In ECCV, pages 526-541, 1992.[Peleg and Rom, 1990] S. Peleg and H. Rom. Motion Based Segmentation. In Proc. 10thInternational Conference on Pattern Recognition, pages 109-113. IEEE, 1990.[Poggio et al., 1988] T. Poggio, E. B. Gamble Jr., and J. J. Little. Parallel integrationof vision modules. Science, 242(4877):436-440, October 1988.[Pretlove and Parker, 1992] J. R. G. Pretlove and G. A. Parker. A light weight camerahead for robotic-based binocular stereo vision: An integrated engineering approach. InSPIE Vol. 1708 Applications of Artificial Intelligence X: Machine Vision and Robotics,pages 62-75, April 1992.94[Robinson, 1968] D. A. Robinson. The Oculomotor Control System : A Review. InProceedings of the IEEE, volume 56, pages 1032-1049, 1968.[Shio and Sklansky, 1991] A. Shio and J. Sklansky. Segmentation of People in Motion.In Proc. IEEE Workshop on Visual Motion, 1991, pages 325-332. IEEE, 1991.[Singh, 1991] A. Singh. Incremental Estimation of Image-Flow Using a Kalman Filter.In Proc. IEEE Workshop on Visual Motion, 1991, pages 36-43. IEEE, 1991.[The Ohio State University, 1991] Research Computing The Ohio State University. Trol-lius 2.1 User's Reference and C Reference Manual, March 1991.[Thompson and Pong, 1987] W. B. Thompson and T. C. Pong. Detecting Moving Ob-jects. In Proc. 1st International Conference on Computer Vision, pages 201-208.IEEE, 1987.[Waxman and Duncan, 1986] A. M. Waxman and J. H. Duncan. Binocular Image Flows:Steps toward Stereo-Motion Fusion. IEEE Transactions on Pattern Analysis andMachine Intelligence, 8(6):715-729, November 1986.[Woodfill and Zabih, 1992] J. Woodfill and R. Zabih. Using Motion Vision for a SimpleRobotic Task. In AAAI Fall Symposium Series: Sensory Aspects of Robotic Intelli-gence, pages 152-159, 1992.[Yamamoto, 1990] M. Yamamoto. A Segmentation Method Based on Motion from Im-age Segmentation and Depth. In Proc. 10th International Conference on PatternRecognition, pages 230-232. IEEE, 1990.[Zhang and Faugeras, 1992] Z. Zhang and 0. D. Faugeras. Three-Dimensional MotionComputation and Object Segmentation in a Long Sequence of Stereo Frames. Inter-national Journal of Computer Vision, 7(3):211-241, 1992.95


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items