UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

A computational vision system for joint angle sensing Mulligan, I. Jane 1988

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-UBC_1988_A6_7 M84.pdf [ 14.42MB ]
JSON: 831-1.0051948.json
JSON-LD: 831-1.0051948-ld.json
RDF/XML (Pretty): 831-1.0051948-rdf.xml
RDF/JSON: 831-1.0051948-rdf.json
Turtle: 831-1.0051948-turtle.txt
N-Triples: 831-1.0051948-rdf-ntriples.txt
Original Record: 831-1.0051948-source.json
Full Text

Full Text

A C O M P U T A T I O N A L VISION S Y S T E M F O R JOINT A N G L E SENSING By I. J A N E M U L L I G A N B.C.S .H. Acadia University 1984 A THESIS S U B M I T T E D IN P A R T I A L F U L F I L L M E N T OF T H E R E Q U I R E M E N T S F O R T H E D E G R E E OF M A S T E R OF SCIENCE in T H E F A C U L T Y OF G R A D U A T E STUDIES ( D E P A R T M E N T OF C O M P U T E R SCIENCE) We accept this thesis as conforming to the required standard T H E U N I V E R S I T Y OF BRITISH C O L U M B I A October 1988 © I.J. Mulligan, 1988 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of Computer Science The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract The research described was aimed at developing passive vision techniques to perform kinematic calibration for a multi-link manipulator, in near real time. Working with a single image of the arm of a Caterpillar excavator, edges are extracted first, then the constrained nature of the problem allows filtering of edge elements to select only those potentially arising from the arm. A model based matching process is performed on this refined data to determine the desired joint angles quickly and efficiently. Methods of exploiting the parallel nature of these techniques are also described. n Contents Abs t r ac t i i L i s t of Tables v i L i s t of Figures v i i Acknowledgement v i i i 1 In t roduct ion 1 1.1 Caterpillar Teleoperation 1 1.2 Joint Angle Sensing with Model Based Vision 2 2 Re la ted Research 5 2.1 Robotics and Vision 5 2.1.1 Vision Sensors 6 2.1.2 Kinematic Calibration 8 2.2 Computational Vision 11 2.2.1 Model Based Vision 12 2.2.2 Roberts - Machine Perception of 3-D Solids 12 2.2.3 R . A . Brooks - Symbolic Reasoning Among 3-D Models and 2-D Images 14 2.2.4 C. Goad - Special Purpose Automatic Programming for 3-D Model-Based Vision 17 2.2.5 D .G. Lowe - Three Dimensional Object Recognition from Single Two-Dimensional Images 19 2.2.6 Grimson - Recognition of Object Families Using Parameterized Models 22 2.2.7 Barrow et al. - Parametric Correspondence and Chamfer Match-ing: Two New Techniques for Image Matching 24 iii 3 V i s i o n and Jo in t A n g l e Sensing 27 3.1 Requirements 27 3.2 Chamfer Matching and Orientation Filtering 29 3.2.1 Image Processing 30 3.2.2 Problem Constraints 33 3.2.3 Orientation Filtering 36 3.2.4 Chamfer Matching 39 3.3 Camera Calibration 42 4 Implementa t ion 45 4.1 Representations 45 4.2 Edge Detection 47 4.3 Creating the Orientation Map 49 4.4 Chamfer Matching 51 4.4.1 Chamfering the Edge Image 51 4.4.2 Computing Error Measure 52 4.4.3 Finding Emin 57 5 Resul ts 61 5.1 Parameter Space 61 5.2 Free Parameters 63 5.2.1 Initial Interval 63 5.2.2 Orientation Tolerance 67 5.2.3 Filter Sampling Rate 69 5.3 Performance 70 5.3.1 Accuracy 70 5.3.2 Robustness 70 5.3.3 Convergence 72 5.3.4 Speed 75 6 Conclusions and Future W o r k 79 A Images 85 A . l Image 1 86 A.2 Image 2 87 A.3 Image 3 88 A.4 Image 4 89 A.5 Image 5 90 A.6 Image 6 91 A.7 Image 7 92 i v A . 8 Image 8 93 B C o d e 94 B. l A r m Model 94 B.2 Compute Orientation Filter 94 B.3 Chamfer for Edge Window 98 B.4 Computing Error Measure 100 B.4.1 Compute Error 100 B.4.2 Project Model Points 101 B.4.3 Mark Line 104 B.5 Optimisation 109 v List of Tables 5.1 True Joint Angles 5.2 Computed 0's . . 5.3 Image Processing C P U times 5.4 Average C P U times List of Figures 1.1 Excavator Configuration 3 3.1 Image 6 28 3.2 Analysis By Synthesis 31 3.3 Zero Crossings for Image Processed with V 2 G 32 3.4 Modelled A r m Components 34 3.5 E — 0 Low Dimensional Parameter Space 35 3.6 Filtered Edge Elements for Boom 38 3.7 Sequentially finding 92, 03 and 0 4 40 3.8 Computing the Error Measure 41 3.9 Chosen Algorithm 44 4.1 Pixel Neighbours 49 4.2 Scan Converting Lines 54 5.1 Unfiltered Parameter Space - Image 6, 03 63 5.2 Filtered Parameter Space - Image 6, 03 64 5.3 Unfiltered Parameter Space - Image 6, $2 65 5.4 Filtered Parameter Space - Image 6, 02 66 5.5 Boom Range of Convergence 68 5.6 Stick Range of Convergence 68 5.7 Bucket Range of Convergence 68 5.8 Histogram of Error 72 5.9 Image 7 - Overcast 73 5.10 Image 8 - Bright Sunlight 74 vii Acknowledgement M y sincerest thanks to my supervisors Alan Mackworth and Peter Lawrence for their patience and guidance. Thanks also to my family and friends for their support, and to N S E R C and Bell Northern Research for allowing me the time and funding to pursue postgraduate studies. viii Chapter 1 Introduction The Computational Vision techniques applied to Robotics in most industrial settings are primitive by the standards of the Computational Vision research community. The project described in this thesis is an attempt to apply innovations from model based vision research to the kinematic calibration of a robot manipulator, in this case the arm of a Caterpillar excavator. Kinematic calibration in general is the process of finding the accurate relationship between a manipulator's joint angles and endpoint positions [24]. More specifically the intent of this project is to perform joint angle sensing to aid in manipulator control and fault detection in a teleoperated industrial system. 1 . 1 Caterpillar Teleoperation The context for this effort is a project in the U B C Electrical Engineering Department aimed at teleoperating heavy equipment for the forestry industry. Current research and prototypes are based on a Caterpillar 215B excavator. In many industrial settings humans work in uncomfortable or dangerous conditions operating heavy equipment. Currently many of these machines are entirely hydraulic and are difficult and expensive to maintain as well as having controls which are non-intuitive and unnatural. The addition of electronics and computer control to these machines 1 CHAPTER 1. INTRODUCTION 2 could allow early detection of mechanical failures or degradations requiring preventative maintenance. Such a system could further detect unsafe operating conditions, for exam-ple when the machine is likely to topple over due to unsafe combinations of load and operating angle. The operator could even be removed from the machine entirely and teleoperate it from a remote site. In addition controls could be made to operate excava-tors in Cartesian, rather than joint angle coordinates, thus reducing the time required to train operators. A system to control and monitor excavators must be particularly robust in order that the equipment is not in fact made less useful than it was in its purely mechanical state. It must also be fail-safe because humans interact so closely with these machines. Ideally the system should be able to detect, isolate and take appropriate action on a fault, possibly with the help of an operator. Failures can occur as the result of electronic components, software errors, mechanical components or human or environmental errors. 1 . 2 Joint Angle Sensing with Model Based Vision Originally the vision system described here was proposed as part of an error detection strategy which looks for consistency in multiple sensor readings. One way to produce such redundancy is to compare readings from joint angle sensors with angles computed by analysing images of the excavator arm. As with all other parts of the excavator control system, a vision system for joint angle verification must be robust. It must work accurately and in real time. If a high degree of accuracy can be achieved, such a system could even supplant the absolute joint angle sensors currently used, and their expensive controller boards. Sensors in this environment are fragile in the sense that the wires linking them to the computer are vulnerable, and any flexing of the arm, due for example to a heavy load, makes their readings invalid. A inexpensive video camera system mounted in the cab could overcome these difficulties CHAPTER 1. INTRODUCTION 3 boom c e i 4 stick Figure 1.1: Excavator Configuration as well as having the dual purpose of providing visual feedback to a remote operator. The excavator in question has four joints formed by the rotating cab, the boom, stick and bucket, as illutrated in figure 1.1. For the purposes of this project only the angle of the boom 92, the angle between the boom and the stick 03, and the angle of the bucket 04 will be computed. The motion of the excavator arm through the image is highly constrained by its geometry. These constraints can be used to filter irrelevant information out of the image edge data. Model based chamfer matching can then be applied to the refined edge image to deduce the desired joint angles. The techniques devised for this project have been found to be fast, accurate and robust, and potentially much more broadly applicable than the narrow sphere of excavator arms. Chapter 2 of this thesis reviews some of the relevant vision and robotics research. Chapter 3 has a description of the techniques applied to the problem, for those interested in the heart of the thesis. Chapter 4 has the details of algorithms and implementation, while Chapter 5 has the charts and figures to explain the results. For conclusions and CHAPTER 1. INTRODUCTION future goals see Chapter 6. Chapter 2 Related Research 2 . 1 Robotics and Vision Computational vision as an independent area of research has sought to solve the prob-lems of object recognition and image understanding with considerable success. The vision capabilities of commercially available robotic systems are however severely lim-ited in comparison to these laboratory systems. For the most part working industrial systems are based on binary vision techniques, although a few other systems have used laser range finding and shape from techniques. A l l of these methods require specially structured lighting in the application environment, illustrating the special nature of robot perception which allows this type of engineering solution [8] as opposed to general computer perception. For the purpose of this discussion we will look at vision sensors applied to achieve two different ends. First the most common manifestation of this technology is aimed at facilitating industrial tasks such as inspection, assembly and materials handling, by locating objects relative to the manipulator. The second, relatively new application is kinematic calibration [24], which has the stated purpose of relating joint angles to endpoint positions, but may be more generally thought of as a kind of self awareness for robotic systems. In its many manifestations kinematic calibration involves sensors 5 CHAPTER 2. RELATED RESEARCH 6 observing the robot itself and thus encompasses the type of vision sensing required for the Caterpillar project. 2 . 1 . 1 V i s i o n S e n s o r s The types of vision sensors employed by commercially available robots, even in the robotics research community are generally restricted to structured lighting techniques such as binary imaging, laser range finding and 'shape-from' techniques. This is perhaps natural since the goal in robot sensing is to perform some task in the robot's environment, not to solve the general vision problem for its own sake. Binary imaging involves recognising an object in sharp contrast to the background environment, often the result of backlighting a work table or other special lighting tech-niques. The resulting image is thresholded so that pixels are marked either zero or one, depending on into which of the two contrasting regions, object or background they fall. Techniques such as run length coding can be used to store these images compactly, but the information available in binary images is similarly limited. Basically they are only useful in situations where all necessary information can be obtained from silhouettes [2]. Information available includes orientation, position, area and connectedness of the imaged object [26]. In industrial situations often knowing that the object is present, its position and orientation is sufficient for the purpose of manipulation [4]. Inspection tasks where a silhouette is adequate to detect flaws are also facilitated by binary imaging. In some situations a binary image is even sufficient to distinguish between a limited number of parts to be recognised [19]. Laser range finding can be used in two different ways to produce a partial depth map of the robot's environment. One method uses a projector and a camera in a fixed relationship to perform a triangulation. Less commonly systems use time of flight to measure range by enforcing frequency or amplitude modulation on a steerable laser CHAPTER 2. RELATED RESEARCH 7 beam. The light reflected by the object's surface can then be detected and the phase of its modulation compared with that of the projected beam. With either method the entire scene must be scanned to make the system useful. Possible scanning techniques include moving the scene relative to the ranging system, moving the ranging system relative to the scene, using moveable mirrors to scan the projected pattern across the scene or using multiple light stripes to cover the whole scene at once [1]. The derived depth information is of course extremely useful in robot manipulation tasks [37]. The so called shape from techniques are not generally commercially available, but have been applied in parts recognition in a few cases. In general they use the natural constraints of the physics of light and the imaging process to derive depth information. Shape from stereo finds correspondences between points in two different images of the same scene taken from different camera positions. Shape from shading uses multiple images taken under differing lighting conditions, combined with knowledge of the re-flectance properties of the imaged object to derive shape information based on the image irradience equation. Shape from motion also finds image point correspondences over multiple images taken over time. Object shape and depth information are derived based on the relative motion of features in the images. These methods are more often seen demonstrated in labs than in industry. Another form of endpoint tracking in three space commonly used for calibration tasks uses active infrared light emitting diode (IRED) markers and two or more lateral effect photodiode detector cameras. Selcom A B of Sweden produces Selspot I and 7/ systems commercially while Northern Digital Inc. produces a similar Watsmart system. Three non collinear IREDs visible in the images are sufficient to solve for position and orientation of the object to which they are fixed [25]. An interesting characteristic of vision systems for robot manipulators is the eye in hand concept, where vision sensors of various kinds are mounted on the distal link of the CHAPTER 2. RELATED RESEARCH 8 arm itself [27] [4]. Proponents point out that parallax, coordinate transformation and resolution are greatly simplified by this strategy. Others dismiss equipment attached to the end effector as interfering with robot operations [25]. 2 . 1 . 2 K i n e m a t i c C a l i b r a t i o n Kinematic calibration encompasses the various techniques and considerations involved in computing an accurate relationship between endpoint positions and joint angles. In contrast to the inverse kinematics problem which seeks to solve 0 — f(x) for joint angles and their velocities 0 given information such as workspace position and velocity x, kinematic calibration seeks to find the parameters of / . The field includes a number of common tasks. Robot repeatability determines how precisely a robot can position itself at a specific endpoint location in multiple trials. Parametric and Non-Parametric estimation of endpoint location have the goal of improving robot positioning and robot registration finds the location of the manipulator relative to objects in its environment [24]. Mooring et al. [15] have described a three level hierarchy of calibration techniques. The first they call joint level calibration, where the aim is to find the correct relationship between the real joint displacement and the joint transducer signal. The second level or kinematic model calibration involves improving the kinematic model of the manip-ulator, its geometric errors as well as such non-geometric effects as backlash and joint eccentricity. Level three is called dynamic model calibration and attempts to compute inertial properties of the manipulator links as well as kinematic properties. Level one calibration is relatively common since many manipulators using incremental encoders for joint transducers must perform this task on powering up [15]. Currently considerable attention is paid to level two calibration, whereas only a few papers address level three [31]. CHAPTER 2. RELATED RESEARCH 9 There seem to be two issues involved in the task of calibration. First, how to obtain accurate three dimensional information about the actual manipulator position from the environment. Second, how to relate this information to the numeric models used for robot control to improve accuracy and performance. In laboratories, techniques for performing kinematic calibration have often involved elaborate apparatus such as special guage blocks commonly used to measure robot re-peatability, and considerable manual intervention by operators. Agin [l] for example, requires a spirit level and special markings on the table to calibrate his P U M A robot with attached laser range finder in hand. Other systems utilise manually operated theodolites or specially machined calibration fixtures into which an insertion tool must be manually guided [25]. E l Zorkany et al. [14] use the Selspot II system to obtain 3-D calibration information from hand held targets, to correct errors in a learned trajectory. Hollerbach and Bennett [25] use the Watsmart system to calibrate the MIT serial link direct drive arm. A special calibration frame with six IREDs mounted on it is attached to the arm's third motor. The Watsmart system uses two cameras to triangulate the markers and compute their 3-D position. Taylor et al. [38] use passive markers in the form of retro reflective tape, illuminated by infrared LEDs placed near their T V camera to obtain positional information when analysing the gait of human subjects. Puskorius and Feld-kamp [35] use an eye in hand stereo pair of cameras trained on a fixed target point in space, rather than an external sensor observing one or more points on the arm. Ideally of course the goal is a quick and accurate means of calibrating a robot automatically with little human involvement or special equipment. The mathematical methods for attributing error in robot positioning to its various sources, often begin with the Denavit-Hartenberg method of assigning coordinate frames to manipulator links. Relative positions of various links can then be represented by matrices which are functions of joint angles and parameters describing the robot's ge-CHAPTER 2. RELATED RESEARCH 10 ometry [15]. If these parameters and angles are all considered variables, the first order terms of the Taylor series about their nominal values result in a linear expression for the differential errors in the nominal parameter values. This approach breaks down when two consecutive joint axes are parallel. If they are slightly misaligned and intersect, the position of the coordinate system origin for the axis moves a long way out along the axis. In this situation the kinematic parameters are not continuous and therefore small changes in axis alignment may cause large variations in geometric parameters. Hayati [22] modified the Denavit-Hartenberg parameterisation only for this parallel axis case, by forcing the distal link's origin to fall in the previous origin's X Y plane, and adding an angular alignment parameter fl. This technique has since been used by a number of other researchers [24]. Others have attempted to solve the parallel axis problem by applying special numeric techniques but with less success. Some researchers have abandoned the Denavit-Hartenberg representation in favour of the more general technique of relating two coordinate systems by three displacement and three relative orientation parameters. The advantage of these six parameter repre-sentations is that they allow handling of higher order kinematic pairs, although they do not take advantage of constraints provided by lower order kinematic pairs [24]. Parametric models do not always account for all geometric and non-geometric sources of error easily. They cover the whole workspace but may not be optimal for any spe-cific region. Non-parametric methods locally map the relationship between joint angles and endpoint positions, but they require large numbers of measurements and may be invalidated by changes in operating condition [24]. CHAPTER 2. RELATED RESEARCH 11 2 . 2 C o m p u t a t i o n a l V i s i o n Research in Computational Vision has evolved through several distinct phases. Initially considerable attention was concentrated on two dimensional, often binary, image prob-lems such as character recognition. The prevailing theory was that solutions to these problems would shed light on the more complex domain of 3-D images. Many of the techniques for binary image processing were and are still very effective in industrial en-vironments, as previously described. Other approaches such as the picture grammar method for image understanding, were eventually abandoned because of fundamental flaws in the theory and limited success. In an attempt to limit the scope of 3-D scene analysis, research in the early 70's con-centrated on the blocks world. Notable results include Guzman's segmenting techniques, and Huffman-Clowes, and Waltz' line labelling algorithms [29]. These were limited for the most part, by their dependence on corner contexts. In the late 70's and early 80's the work of David Marr [30] caused the blocks world to be abandoned in favour of paradigms more directly related to natural, animal vision systems. He proposed extracting as much information as possible from the image bottom up. This approach included more formal line finding techniques to determine the grouped edges, bars and blobs of the full primal sketch. His 2^-D sketch was created with the help of stereopsis, and shape from motion, texture and contour techniques. It made explicit the shape and orientation of surfaces in an image. The final 3-D level of processing involved generalised cone models which could be extracted from this image information, and compared to previously acquired models to perform recognition. Beginning perhaps with Brooks' A C R O N Y M [10] and continuing in the mid eighties with the work of Grimson, Goad and Lowe the focus has at least partially returned to recognition based on previously acquired models, or model-based vision. CHAPTER 2. RELATED RESEARCH 12 2.2.1 Model Based Vision The phrase model based vision is used in different contexts to refer to quite different types of systems. Binford [6], for example describes 11 general model based image analysis systems whose models vary from the specific, constrained object models of A C R O N Y M to Ohta's system, whose model for S K Y demands that sky regions be bright blue or grey, untextured and touching the top edge of the image. In relation to the task at hand the goal is to find a specific object, the manipulator arm, in an image where it is known to appear. Thus the model based vision systems reviewed here are those that take on similar tasks, in an effort to learn what gains this sort of limiting of the scene allows over truly general scene analysis. 2.2.2 Roberts - Machine Perception of 3-D Solids Roberts was one of the first researchers in the field of computational vision to exam-ine the three dimensional polyhedral world rather than two dimensional tasks such as character recognition [36]. He concluded that understanding such 2-D problems would not generate any progress in understanding the 3-D world. His work instead was based on the assumption that 3-D scenes were composed of various combinations of a limited set of solid models constrained by a number of assumptions about the picture taking process. The program he describes has three parts. The first reduces a photograph to an internally represented line drawing. The line drawing is then analysed to produce a list of its component 3-D objects. Finally a 3-D display program can produce alternate views of the scene described by the object list. The techniques used in the first and third parts have of course been usurped by advances in the image analysis and graphics fields, but Roberts' contribution as a pioneer in model based scene analysis is more remarkable. Humans make relatively consistent assumptions in their perceptions of depth. Rob-CHAPTER 2. RELATED RESEARCH 13 erts method depends on assumptions he felt emulated those of humans. Pictures are assumed to be views of the real world, and therefore perspective transformations of a 3-D field. This 3-D field is then assumed to consist of a number of solid objects occupying a finite region of space. Further these objects are assumed to be composed of one or more familiar models. In the implementation described, Roberts has only three models: a cube, a wedge and a hexagonal prism. He allows arbitrary translation and rotation, as well as three degrees of freedom for scaling for these models. He disallows skew and some questions have been raised as to whether his method handles perspective projection, as he claims [29]. A final important assumption is that all objects must be supported somehow, either by another object or the ground plane. This is necessary to place objects in space relative to one another. Let R be a homogeneous transform matrix which transforms model points into cor-responding real world points, and P a second matrix for transforming real world object points into picture points. If H = RP, then H will transform model points into pic-ture points. Our knowledge of the picture taking process allows us to compute P _ 1 and we can then attempt to solve R — HP-1 to determine how our models relate to the observed scene. In order to obtain a first guess at H however, we need a method of matching some observed picture points to corresponding model points. We can try fitting several models until the one with the least error is identified, but how do we reduce the search space of all possible translations and juxtapositions of the 3 models? Roberts observed that an object's topology remains invariant for transformations, and that its visible charac-teristics also remain constant over a wide range of viewpoints. Thus the arrangement of points, lines and approved polygons can be used to make reasonable guesses as to which model points should be matched to which picture points. An approved polygon for the models used by Roberts is a convex polygon with 3,4 or 6 sides, these correspond to the CHAPTER 2. RELATED RESEARCH 14 faces of the specific models he used. Although Roberts' work is still an important paradigm for scene analysis, it had problems not only with perspective projections. Instances where the order in which models were selected prevented the correct combination from being found, and where the corner contexts the method depends on are too narrow have been identified [29]. 2.2.3 R.A. Brooks - Symbolic Reasoning Among 3-D Models and 2-D Images About 1981, Rodney Brooks produced his A C R O N Y M program based on geometric reasoning about features of models which will be relatively invariant in any image [10]. The system has three phases : the prediction of invariant and quasi-invariant model features, low level search for corresponding features in the image and an interpretation phase where the located image features are matched to corresponding model features. To facilitate these operations Brooks uses viewpoint independent volumetric models based on generalised cones, which allow variation in size, structure and spatial relations, unlike Roberts' restricted set of polyhedra. These models are initially provided by the user. The internal representation is framelike in flavour, with basic object units. Objects are the nodes of the object graph which is tied together by subpart and affixment arcs. Subpart subgraphs are arranged in such a way that gross features appear at the top and with each successive level finer details are added. Affixment arcs relate coordinate systems of objects. They include symbolic transforms to transform the coordinate system of the referenced part, to that of the original object. Certain object characteristics can be replaced by unknowns to increase generality. For example the length or number of flanges on a modelled electric motor may be replaced by a quantifier representing a range of legal values. Thus size and structure are variable within a single motor model. Variable affixments can be used to model articulated objects since they represent varying spatial relationships between parts. Quantifiers can CHAPTER 2. RELATED RESEARCH 15 be related by algebraic constraints, for example a motor may have flanges or a base, but not both. Such constraints are organised into a directed restriction graph ordered from least to most restrictive, beginning with the base restriction node which has an empty constraint set. The user specifies part of this graph to the system. A C R O N Y M adds other parts while performing image understanding tasks. Given a user specified model in the form of object and restriction graphs, geometric reasoning is applied to determine invariant or quasi-invariant features of the object. It also serves to discover constraints which can be used to split the range of variations into cases in which still more quasi invariant features can be predicted. For example collinear or parallel features of models, produce observable collinearity and parallelism in the image over wide ranges of spatial relations between models and the camera coordinate system. Similarly connectivity is an invariant over all camera models, given that the connected features are observable. These quasi-invariant features are combined into predictions for a model's appearance in an image. Predictions are combined in a prediction graph which provides descriptions of model features and their relations, which ought to be matched to image features. The prediction graph has two functions : to guide low level image analysis in searching for features, and to provide instructions on how to use image feature measurements from partial interpretations to constrain the 3 - D models, identifying class memberships and specific 3 - D spatial relations. A prediction graph has associated with it a restriction node which refers to the object class constraints for the specific object being predicted. Predictions are created by examining the constraints on an object to be recognised. If possible special case predictions are made from these, otherwise case analysis is performed by adding constraints and hence restriction nodes, until such predictions are possible. Low level processes produce ribbons and ellipses as shape elements representing image features. Ribbons are 2-D specialisations of generalised cones. They are planar shapes CHAPTER 2. RELATED RESEARCH 16 described by a spine, a line segment cross section, and a sweeping rule. They emulate the lengthwise view of a generalised cone, while ellipses are a good way of describing the end cross sections. The descriptive module returns an observation graph, with ribbons and ellipses as the nodes and the image relations between them as arcs. In the interpretation phase matches are hypothesized between image features and major predicted features. The corresponding image measurements provide new back constraints which are a mechanism for finding mutually consistent features and hence identify object instances. In addition, back constraints provide information on position and orientation of the object instance. Back constraints from several objects interact to give relatively tight bounds on all unknowns. Interpretation continues by combining or conglomerating local shape matches for model features into more global matches for more complete objects. The requirements of the prediction graph and the new 3-D back constraints of each local match must be consistent. Consistent feature prediction pairs add a restriction node according to the instructions of the prediction. If this new restriction is unsatisfiable the pair is eliminated. A combinatorial search is then performed to discover consistent connected components, mainly by determining whether the restriction node of the conglomeration of components is satisfiable. The result is a number of interpretation graphs, only a few of which will be large and more or less similar in major aspects, but differing in detail. A large correct interpretation will have a restriction node which relates models and their spatial relations to the view of the world described by the predictions to which image features have been matched in the interpretation graph. Thus a 3-D understanding of the world has been derived by reasoning about models and a 2-D image. CHAPTER 2. RELATED RESEARCH 17 2.2.4 C. Goad - Special Purpose Automatic Programming for 3-D Model-Based Vision Goad proposed a method for automatic construction of fast special purpose vision pro-grams [20]. These programs are restricted to determining the position and orientation, relative to the viewer, of an exact model of the object to be recognised. The method is based on the observation that recognition is a task where a model description is pro-cessed once, in order to process multiple images. Goad concluded that models could be compiled automatically into efficient, highly specialised programs. He achieves this by starting with a basic matching algorithm and unwinding its loops and recursions so that the tree of its possible alternatives is expanded. The program assumes that the object for recognition is either wholly in view, or not visible at all. The model description is provided by the user. It need only include sufficient detail to ensure recognition, not as much as would be required for graphical display. The description consists of a list of oriented edges as well as conditions for their visibility in terms of the loci of points from which they are completely visible. If the edge's projection is too short to facilitate detection it is also considered invisible. The basic algorithm begins by selecting a model edge not yet matched, and using the currently hypothesised match, a prediction of where the corresponding image edge will appear is made. The edge selected must be visible given the currently hypothesized locus of visibility. Then a case analysis is performed according to whether the edge is actually visible. Another consideration is whether the edge detector has successfully located the edge. For this reason two probabilities are maintained by the program : the reliability of a match indicates the likelihood that the current configuration of edges could have occurred by chance among the background distribution of edges. The plausibility measures the probability that the edge would not have been detected given that the current partial match is correct. If reliability is high enough the match should be accepted, and if CHAPTER 2. RELATED RESEARCH 18 plausibility is low enough the algorithm should backtrack. The likelihood of an edge being detected gives rise to a second case analysis as to whether or not it has been found. First a selected edge is assumed visible, then assumed detected then the edge is sought in the predicted location in the image. If it is found the match is extended to include the edge, and the measured position and orientation are used to refine the current hypothesis of camera location. If the match fails the algorithm backtracks to assume the edge was not detected, and if this fails it backtracks to assume the edge is in fact not visible. For purposes of precompiling this algorithm into a specialised program its loops are unwound to form a tree in which nodes occur where edges are selected or assumed detected. Thus at every node or choice point there is an hypothesised match object position relative to the locus where the camera may lie. As a result the visibility of an edge at any node in this tree can be precomputed from the hypothesis. At each level of the tree the hypothesis is refined either by back projecting measurements from located edges, or from concluding an edge is visible or invisible in a case analysis. This latter refinement is available at compile time as well. With this information the selection of the next edge can also be determined in advance according to high likelihood of visibility, most specific prediction of position, and maximal derived information about camera position. Further optimisation is achieved by eliminating visibility case analysis nodes when the edge will always be visible, given the hypothesis, and by eliminating detection nodes when plausibility drops too low if the edge is not detected. Once three or four non-collinear points have been matched, position and orientation have been narrowed to one or two specific possibilities. Further matching only serves to verify the correctness of the match. Goad's experiments succeed in identifying a connecting rod casting, a universal joint casting and key caps of a specific shape in a jumbled pile. The implementation did not however fully automate the order of edge CHAPTER 2. RELATED RESEARCH 19 selection, which had to be done by hand. It is the basis of a successful, commercially available system which operates in real time [28]. 2.2.5 D.G. Lowe - Three Dimensional Object Recognition from Single Two-Dimensional Images Lowe in his S C E R P O system rejects the idea that depth information must be derived bottom up from images. Instead he uses ideas of perceptual organisation and determines features of his models which will be viewpoint invariant [28]. Similar instances of invari-ant features such as parallelism, connectivity or proxmity are sought in the images to be analysed and matched to the models. These initial matches are ranked by likelihood of being correct, and the best are used to reduce search as starting points for quan-titative computation of viewpoint. This search is based on the viewpoint consistency constraint which states that all projected model features in an image must be consistent with projection from a single viewpoint. Models are initially provided by the user. They consist of a set of 3-D line segments along with visibility specifications in the form of a boolean combination of hemispheres from which each segment is visible. Viewpoint invariant groups of features which can be produced by the modelled object are precomputed off line. Occurrences of proximity, collinearity and parallelism are located and grouped into lists of larger structures. Edge detection is performed on the image to produce straight line segments. These are stored in a grid-like data structure according to endpoint position and orientation. A l l of the invariant characteristics used involve endpoints of segments falling close together. The image data structures allow this arrangement to be easily identified since nearby endpoints can be easily located without searching all other segments including those far away. Next viewpoint invariant structures are identified and the probability that they are non-accidental computed. Since measurement is inexact such ranking is required to CHAPTER 2. RELATED RESEARCH 20 determine whether lines which are close together or nearly parallel or collinear are likely to be genuine instances of invariant features. A background of line segments uniformly distributed in the image with respect to orientation, position and scale is assumed. Further, scale independence requires that the density of lines of a given length vary inversely according to the square of their length. This implies shorter segments are expected to have higher density, allowing two long segments whose proximity might be obscured by closer short segments, to be assigned, appropriate significance. The lower the expectation of the feature arising accidentally the higher its significance. The significance of endpoints in close proximity is inversely related to the expected number of endpoints within a circle of radius equal to the gap between the endpoints. For lines nearly parallel the distribution of orientations is also assumed normal and the likelihood of lines falling within the deviation of orientation between the two lines is used to compute significance. A similar technique is used to rank instances of near collinearity. Identification and ranking of viewpoint invariant features provide a starting point for the quantitative solution for viewpoint and thus reduces the search space of possi-ble matches. To improve the likelihood that structures are non-accidental they can be combined, for example when they appear in trapezoid shapes or pairs of intersecting parallel lines. These larger structures are also viewpoint invariant and if they can be matched are highly unlikely to have arisen accidentally. The S C E R P O implementation requires groupings with at least three segments for matches. These structures are of course matched against structure lists composed from the original models. Even these more complex structures might be insufficient to narrow the search space if the system required many models to be recognised. For each of the matches found the quantitative verification procedure is executed to solve for viewpoint. Lowe observes that rather than being underconstrained, visual recognition with image and model data is overconstrained. The problem is difficult CHAPTER 2. RELATED RESEARCH 21 because of the independent and nonlinear nature of the constraints. He sets out to perform a form of quantitative spatial reasoning to provide a two way link between image measurements and the object model. Effectively, given matches between model and image points, we want to determine unknown projection and model parameters. The approach taken is to linearize the projection equations and apply Newton's method to solve for these unknowns in the following way. Let Dx and Dy indicate the position of the object in the image plane, Dz specifies the distance of the object from the camera, and (f>x, (f>y and <pz are incremental rotations to some initial rotation matrix R. Each iteration of the multi dimensional Newton convergence will attempt to solve for the vector of corrections [ADX, ADy, ADZ, A<f)x, A<f>y, A<f>z]. Since most parts of the model are only visible from a limited range of viewpoints we can select an initial orientation in depth consistent with this range for the object feature being matched. Other initial viewpoint parameters can be estimated from the match by comparing lines from the model and the matched image lines. For each model point (x,y, z) and its matched image point (u,v), the model point is projected into the image using the current parameter estimates, then the error in comparison to the image point is measured. The u and v components of the error can be used independently to produce separate linearised constraints expressed as the sum of the products of the partial derivatives of u or v times the unknown error correction values. Given three point correspondences from our three segment structures a linear system of six equations can be derived and solved for all six camera model corrections. The method usually ends up with more model-image correspondences than are neces-sary. The final viewpoint estimate is the result of a Gauss least-squares fit from greatly overconstrained data. CHAPTER 2. RELATED RESEARCH 22 2.2.6 Grimson - Recognition of Object Families Using Param-eterized Models Recent work by Grimson extends his original methods to allow for parameterized models [21]. These additions allow recognition of objects which may vary in scale, articulated objects or families of similar objects which stretch along one axis. Grimson's basic method searches for matches between the measured surface patches from the image, and surfaces of the known object model. The search consists of gener-ating possible interpretations meeting local geometric constraints, then testing them for consistency with the model by attempting to solve for the translation and rotation which produced the image view. This process takes the form of generating and traversing an interpretation tree depth first. At each level a new piece of image data is matched to the model, branching occurs by attempting to match the image feature to every possi-ble model feature. For the case of extraneous background data an additional null face branch is added at each node. The resulting search space is of course huge, and the method only becomes tractable when local geometric constraints are used to prune the tree. The geometric constraints used, to determine whether a partial interpretation is in-consistent have two parts : a set of image measurements and a set of computed model measurements. The angle constraint determines that the angle between the normals of two measured image edges is within measurement error of the angle between the normals of the model edges to which they are matched. The direction constraint uses the possible error in orientation to compute a range of components of the vector between the two image edge fragments in the direction of one of the edge normals. The ranges can be computed for both image edge pairs and corresponding model edges, and if the image range is a subset of the model range the match is consistent. For parameterised models that allow scaling the angle constraint remains unchanged, CHAPTER 2. RELATED RESEARCH 23 while the direction constraint must be changed to compute a range of scale factors. If the match is consistent the image range will be a scaled version of the model range, and thus we can solve for the range of scale factors for which the assignment of data to model edges is consistent. For the search tree we now require that a node's range of scales be computed by the intersection of the ranges computed for each of the pairwise constraints. If the range is empty the interpretation is inconsistent. For articulated objects the approach taken is to search for the individual rigid parts separately in the normal way. Parts are then tested to see that their interpretations are consistent with one another, this requires a simple test of the geometric constraints associated with the specific model's articulation, on their respective transforms. To improve efficiency one major subpart is found first, then the remainder can be sought in regions relative to this initial match, given model parameters for articulations. Stretchable models require a case analysis on measured image normal angle difference and corresponding model angles, to determine the stretching parameter for the angle con-straint. The edge direction constraint also produces a range of values for the stretching factor. The search process is altered by the addition of constraint propagation. The constraints are now used to determine the range of values for the stretching parameter that are consistent with the pairing of data fragments to model edges. These can be propagated through the search tree so that each node inherits a range of values consis-tent with its parent node, which is intersected with the range of values for the parameter. This local range is computed by pairing the current data point and its associated model edge, against all other data/model pairings in the interpretation and intersecting the resulting ranges. The match is inconsistent if the range becomes empty. CHAPTER 2. RELATED RESEARCH 24 2.2.7 Barrow et al. - Parametric Correspondence and Cham-fer Matching: Two New Techniques for Image Matching Iconic matching techniques such as correlation, exploit all available pictorial information, especially geometry and are efficient and precise in this sense, but they are sensitive to small changes in viewing conditions and cannot use other types of information. On the other hand symbolic matching techniques are more robust as a result of using invariant abstractions, they lose however in precision and efficiency in handling geometric rela-tionships. Barrow et al. claim to have found methods of combining the best of these two approaches in their parametric correspondence and chamfer matching [5]. They apply these methods in the context of determining the parameters of an analytic camera model, given an aerial image of a coastline. Instead of matching the image to a reference image via correlation, parametric cor-respondence matches it to a symbolic reference map, a type of 3-D model of the imaged landscape. An initial estimate for the camera model is used to predict the location of the landmarks in the image. The camera model is then adjusted until the prediction matches the observed image. In order for this to be tractable, the authors go on to introduce a matching technique they call chamfer matching, which allows comparison of two sets of curve segments at cost proportional to linear dimension. The process begins by extracting image features, basically edges, from the input image, thus retaining shape information while discarding greyscales. One possible and comparatively robust measure of similarity between two collections of feature points is the sum of the distances between each predicted point and the nearest image point. This measure can be efficiently computed if we transform the image feature array so that each pixel contains a number representing the distance to the nearest feature point. The distance values for the pixels where predicted features fall can then be summed to produce the desired measure of similarity. CHAPTER 2. RELATED RESEARCH 25 The transformation of the image begins with an array where feature points are marked with zero, and all other points are marked infinity. A forward and a backward pass use incremental distances of 2 and 3 to approximate Euclidean distances of 1 and y/2 computing integral distance values as follows : A l g o r i t h m 2 . 1 : For i 4= 2 step 1 un t i l N do For j 4= 2 step 1 un t i l N do F[i,j] 4= Minimum (F[i,j], (F[i-l,j] + 2), (F[i-1 j-1] + 3), (F[i,j-1] + 2), (F[i+1 + 3)); For i 4= (N-l) step -1 un t i l 1 do For j 4= (N-l) step -1 un t i l 1 do F[i,j] 4= Minimum (F[i,j], (F[i+l,j] + 2), (F[i+l,j+l] + 3), (F[ij+1] + 2), (F[i-1 j+1] + 3)); This distance array need only be computed once for each image. The chamfering process actually computes a medial axis transform [7], based on the image edgepoints. Chamfer matching is an unusual application for these transforms which are generally used to represent shapes based on their central skeletons and some weighting information. The parametric correspondence stage can now proceed to determine the three position and three orientation parameters for the analytic camera model. Traditional methods of the day included two stages, first matching a number of known landmarks then computing camera parameters from correspondences between them. Parametric correspondence, the authors claim, combines these two stages. The method works by repeatedly altering the transform matrix, reprojecting the symbolic map into the chamfered image accordingly, and hill climbing on the various camera parameters. Increases and decreases in the similarity measure indicate how the parameters should be adjusted. The advantages claimed for this method include that the match is based on shape CHAPTER 2. RELATED RESEARCH 26 which is rather more invariant than the intensity values used in correlation. Spatially extensive features are matched, rather than isolated patches of the image and all fea-tures are matched simultaneously, rather than searching the combinatorial space of local matches. Instead of numerous 2-D templates, a compact 3-D model is used and the cost of computing the similarity measure is linear in feature points. Problems include local minima which occur when the initial estimates for camera parameters are too far off or when a number of model features are confused with a single image feature, eg. for the picket fence problem, when the model contains a number of parallel lines. Chapter 3 Vision and Joint Angle Sensing 3 . 1 R e q u i r e m e n t s In the context of the teleoperation project, a vision system for joint angle sensing must be particularly robust, so that the humans who work closely with these machines are in no danger as a result of a misinterpreted image. Further, in introducing a new technology to an existing environment, the newcomer must provide capabilities at least as reliable as the original machines, otherwise it will not be accepted or used. Accuracy is another important issue. For the puposes of controlling a robot arm or excavator, accurate joint angles are essential to accurate positioning and task perfor-mance. Speed will also be crucial, particularly if the system is to supersede the existing joint angle sensors. As a backup sensor the vision system will have to operate at near real time rates of about an image per second, as a primary sensor it will have to operate at frame rates of about sixty per second. The only way to achieve this speed, given the amount of processing required is to use highly parallelisable methods and special, parallel hardware. The degree to which any algorithm can be implemented in parallel is therefore another criterion for choosing techniques. The environment of a excavator poses a number of challenges to the design of a Com-putational Vision system. Expected image quality is illustrated in figure 3.1. First of all 27 CHAPTER 3. VISION AND JOINT ANGLE SENSING Figure 3.1: Image 6 CHAPTER 3. VISION AND JOINT ANGLE SENSING 29 it is, in general out of doors and subject to the shadows, reflections and variable lighting of the natural world. Also these machines are obviously not stationary, so variations in lighting, and landmarks in the environment can occur throughout the machine's work-ing day. Immediately then, most of the machine vision techniques applied to robotics requiring controlled lighting, are eliminated. Special fixtures, cameras and LED's used by systems such as SELSPOT or Watsmart, as well as laser range finding, let alone any binary vision technique are inappropriate to these operating conditions. A second constraint imposed by the nature of the problem is that special markings or lights, or even a camera or fixture mounted on the arm itself are likely to fade, be obscured or knocked off entirely by the routine tasks of heavy equipment. Digging holes, colliding with branches, splashing mud or a number of other environmental hazards will affect these key instruments for recognition. If the goal is to backup or replace hydraulic joint sensors whose tenuous wires could be damaged by operating conditions, the vision system should at least avoid having identical weak points. What are the alternatives? The Computational Vision research community has stud-ied the recognition problem for twenty years: this is an ideal opportunity to apply industrially some small part of what they have learned. 3.2 Chamfer Matching and Orientation Filtering From the wealth of literature on Computational Vision the most promising techniques for joint angle sensing appear to be those from work on Model Based Vision. Successes such as Lowe's S C E R P O system [28], Goad's real time system [20], and Grimson's work on articulated objects [21] bode well for fast, accurate recognition of an articulated excavator arm. The constrained nature of the arm's movement through the image, and its smooth, man-made shape make the problem an ideal candidate for model based techniques. There is, after all, no point in deriving large amounts of information from the image bottom up CHAPTER 3. VISION AND JOINT ANGLE SENSING 30 as Marr suggested [30], if the scope of the problem can be reduced by specific knowledge of object shape and motion. The next question of course is which of the demonstrated techniques to investigate ? Lowe's system worked well but required several minutes to analyse an image. Goad's system was fast but required considerable preprocessing of the model, and high image quality. It was feared the system would not adapt well if the machine's end effector was changed, for example from a bucket to a feller-buncher, or to variations in the camera model or lighting and image quality. Grimson's original techniques require depth information from a laser range finder or stereo pair of images. Searching an interpretation tree can be time consuming if it is prone to false matches as the cluttered images of the excavator arm would be. Barrow et al.'s Chamfer Matching [5] promised to produce an error measure by com-paring the image to a synthetic image in time linear in projected model points. It was selected as worth pursuing as a solution to the problem. This decision then framed the basic analysis by synthesis approach adopted for this project. A n image would be processed to find edge points, while parameter estimates and arm and camera models would be used to produce a synthetic image. The two images would be compared to provide an error measure E. Parameters would then be repeatedly readjusted until E was minimised. This framework is illustrated in figure 3.2. 3 . 2 . 1 I m a g e P r o c e s s i n g The first step in determining joint angles in this system is some basic image processing, which transforms an eight bit, grey-scale image into a more usable form. Chamfer matching is known as an edge based technique because it compares only the position of edge points in the image to those in the synthetic image. It does not use information based on other image cues such as shading, motion or texture. Methods for extracting CHAPTER 3. VISION AND JOINT ANGLE SENSING I M A G E #2j #3, #4 Image Analysis Camera Scene Model Model Image Description Synthetic Image Description Image \ Matching J Difference Description Error Measure E Adjust for E rnin Figure 3.2: Analysis By Synthesis CHAPTER 3. VISION AND JOINT ANGLE SENSING 32 Figure 3.3: Zero Crossings for Image Processed with V2G edges from the grey scale image must therefore be applied before matching can proceed. A common strategy, first formalised by Marr and Hildreth [30] is to compute the Laplacian derivative of the image, smoothed with a Gaussian distribution function ( V 2 G ) . Since V2G is a second derivative operator, the points in the V 2 G image where intensity values change sign or zero crossings, represent points in the original image where intensity changes occurred. Such intensity changes generally arise from edges in the world and thus zero crossings of V 2 G are a reasonable approximation to edges projected by 3-D world objects as illustrated in figure 3.3. More recent work on edge detection by Canny [11] has shown that directional deriva-tives at varying scales produce better localisation of edges than V2G. Canny edge de-CHAPTER 3. VISION AND JOINT ANGLE SENSING 33 tection has the disadvantage of requiring more computation and being only partially parallelisable. In the context of the Caterpillar vision system, the results obtained with Canny edges were not significantly better than those for V 2 G , this plus the fact that V 2 G software is more readily available for parallel hardware make it a more likely choice for early image processing in this system. It is a fortunate coincidence that both of these operators are most effective at finding long, straight lines, and that such lines are the dominant features of the appearance of the Caterpillar arm in the image. 3 . 2 . 2 P r o b l e m C o n s t r a i n t s The basic geometry of the excavator itself provides a number of interesting constraints which can be exploited to help determine joint angles. As previously mentioned the dominant lines of the arm components are easily modelled. The model used in the system is as simple as possible, it consists of only three straight lines for each of the boom, stick and bucket. Again the image is constrained because of the situation of the camera in the cab of the excavator. Only the left upper edge and two lower edges of the boom and stick will ever be visible in the image, and thus only these lines need be used in the model. This has the added advantage that hidden line removal is unnecessary. The nature of robot manipulators is that the position of a link in space is dependent only on the positions and angles of the links and joints that come before it. This is no less true of the excavator arm, and has the useful effect of making the 9 parameter space quasi-separable. The position of the boom in the image is dependent only on 92, the stick position depends on 92 and 83 and the bucket on 92, 93 and 94, thus if we proceed by optimising E for each angle/component sequentially, at each step we are searching for only one parameter 0,-. E(92,93,94) = E'(92) + E'{2{93) + E'^3(94) (3.1) CHAPTER 3. VISION AND JOINT ANGLE SENSING 34 Figure 3.4: Modelled Arm Components CHAPTER 3. VISION AND JOINT ANGLE SENSING 35 CM «y» _ r— to V ^ OJ l i I <2 — CJ> — CO r— — 11 2 3 4 5 6 o 1 ' 1 ' 1 ' 1 ' 1 ' 1 ' 1 ' 1 ' 1 ' 1 ' 1 1 1 1 | ' 1 1 | 5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9 3.1 3.3 3.5 T H E T A 4 -Figure 3.5: E — 9 Low Dimensional Parameter Space The parameter space we are searching then is reduced to one dimension in 9 and one for E as illustrated in figure 3.5. Yet another feature of manipulator geometry and motion is that each link has a limited range of angles through which it can move. The excavator arm components each move through their angle range in a single plane. These limitations on motion also limit the regions of the image where each component can appear. For example the range of 92 and the shape of the boom, allow it to appear only in the right third of the image, thus we only need search this much smaller region in our effort to recognise the boom in the image. The stick and bucket windows are similarly limited, but their position and size vary depending on 62 and on 92 and 03 respectively. The nature of the problem then allows us to reduce its complexity, by modelling only easily manipulated straight lines, by determining only one parameter at a time, and by searching for each component in only a limited window in the image. CHAPTER 3. VISION AND JOINT ANGLE SENSING 36 3 . 2 . 3 O r i e n t a t i o n F i l t e r i n g A n initial attempt to apply chamfer matching using a synthetic image derived from the arm and camera models, and the zero crossing edge image of V 2 G failed due to the huge number of extraneous edge elements creating local minima in E. It became evident that as many of the irrelevant edge points as possible had to be removed before chamfer matching could become viable. The goal would be to use more information from the image and exploit the problem constraints to refine the edge image before chamfering. A number of approaches were considered. The first idea was to create a gradient strength map, by computing the first derivatives of G ® / with respect to x and y. (G ®I)X = GX®I (G®I)y^Gy®I ^> These indicate the rate of change of the smoothed intensity image in the x and y direc-tions. Computing \J(GX <g> I)2 + (Gy ® If (3.3) gives a value for gradient strength or the magnitude of change in intensity at that point in the image. If we associate edge elements with their gradient strength, we can use this value to threshold the V 2 G zero crossing map to produce a strong zero crossing map. This plan was discarded however because some of the background lines we want to eliminate, such as the horizon, were very strong while the middle line on the boom or stick could be quite weak. A second proposal was to compute a gradient orientation map where pixels are marked with Arctan^f- (3.4) (G ® I)y or the orientation angle of the edge on which they fall. The strong zero crossings could then be used, with their gradient orientations to perform a Hough Transform [13]. This is a process which finds edge elements belonging to straight lines, by determining whether CHAPTER 3. VISION AND JOINT ANGLE SENSING 37 the lines they define in a p — 0 parameter space intersect. It is parallelisable, but must be computed for each new image, and finds straight background lines, such as trees and horizons as well as manipulator lines. The nature of the computation based on orientation rather than position, will also associate edge elements with a line which are far removed from its main body. Returning to the constraints placed on the image by the geometry of the excavator and camera arrangement, the limitations on the position and orientation of each arm component in the image seemed useful to exploit. Any edge element from the image can only lie on a modelled component line 1) if it is in a window in the image where that component can appear, and 2) if at least one of the modelled component lines can fall on that (x,y) position for some valid joint angle 6, with orientation within some tolerance T of the real edge element's gradient orientation. In other words extraneous edge elements can be discarded based on their position and gradient orientation. One way to achieve this is to compute the orientation each model line would have when passing through (x,y) and accept or reject the edge element based on these computed values. Finding model line orientations involves finding the tangent from (x, y) to a projected, and therefore distorted, circle about the component's pivot point. This would require six multiplications, six divisions, three square roots, and three trig operations per edge element, per image even if the circle were not distorted. The cost was deemed too high, despite the advantage that computing model line orientations indicates likely 0 values for accepted edge elements. An alternative which seemed more attractive was to use the camera and arm models to create a filter marking each pixel in a component's window with the overall minimum and maximum orientation of all model lines falling on it. The resulting range could be used to quickly and efficiently accept or reject edge elements in the window, based on their gradient orientation. The filter is computed by sweeping each model component CHAPTER 3. VISION AND JOINT ANGLE SENSING 38 Figure 3.6: Filtered Edge Elements for Boom CHAPTER 3. VISION AND JOINT ANGLE SENSING 39 through its valid range of joint angles at fine increments of 0, updating the range for each pixel on which a model line falls. In this way only edge elements which can potentially fall on the arm component are retained. Figure 3.6 shows the result of filtering the window in the image where the boom can appear. There is an obvious improvement over the same region of the V2G zero crossing image in figure 3.3. The boom filter is the most time consuming to compute, but fortunately can be precomputed for each camera model. The position of the stick and bucket in the image depend on the proximal joint angles, and their filter size and shape vary depending on image position. At present their filters are computed for each image processed, although potentially a finite number of filters for a range of image positions could be precomputed for these two components. 3 . 2 . 4 C h a m f e r M a t c h i n g The goal of chamfer matching is to take real image edge points {R} and synthetic image edge points {S} and compute E = y2mm\Sj -Ri\. (3.5) i 3 The first step in the process is to compute the chamfered edge image, as described in section 2.2.7. The computation is much like computing a medial axis transform, it at-tempts to assign to each pixel an approximation to the Euclidean distance to the nearest image edge point. The actual edge points are marked zero, with pixel chamfer values increasing as you move away from them. Creating the chamfered image requires two passes in serial implementation, but relaxation techniques could be used to implement it in parallel. As proposed, the matching process proceeds by finding one angle at a time, beginning with 62, figure 3.7 illustrates such a solution. Minimising E implies adjusting the current Oi until the total distance between the synthetic image points and the actual image points is minimised, thus E acts as an L\ norm. For each edge image chamfered then, several CHAPTER 3. VISION AND JOINT ANGLE SENSING Figure 3.7: Sequentially finding 02, 03 and 04. The superimposed models in the above images illustrate a) the initial guess, b) the located boom, c) the located boom and stick, and d) the final computed solution. CHAPTER 3. VISION AND JOINT ANGLE SENSING 41 8 7 6 6 5 4 4 3, W 6 5 4 4 3 2 0 4 3 2 2 2 0 2 2 0 0 2 2 2 3 2 • 2 2 3 4 4 5 2 a> 4 4 4 5 6 6 7 7 4 5 6 6 6 7 8 8 9 PROJECTION EDGE ELEMENTS ERROR MEASURE FOR ILLUSTRATED PROJECTION = 19 Figure 3.8: Computing the Error Measure synthetic images will be computed based on adjusted 9 values, until the best match (lowest E) is found. Standard line scan conversion techniques from computer graphics select pixels in a raster grid which are marked in order to display a line. The same techniques are used here to select the locations in the chamfered image array where the current model line will fall. The chamfer values from these locations are summed to give E as illustrated in figure 3.8. Given the likelihood of poor image data, another advantage of using E is that it interpolates well over gaps in image lines. Emin will have a slightly higher value, but will still occur when parameters are close to their actual values. CHAPTER 3. VISION AND JOINT ANGLE SENSING 42 3 . 3 Camera Calibration Setting up an initial camera model, or even adjusting the current model to compensate for changes due to vibration or other long term operating effects is obviously an im-portant issue for the Caterpillar vision system. Using the chamfer matching techniques proposed above becomes much more difficult when searching for three orientation and three positional parameters, a i , . . . , a 6 at once, rather than three separable joint angle parameters. The proposed method will perform a steepest descent search on E, for the six camera parameters. The method of steepest descent uses estimates of 8a's to decide how to vary the ex's to obtain minimum E in seven dimensional E —a space. Once again the problem of how to eliminate image noise arose. The first attempt refined the image with the usual orientation filter computed for the current camera model, and produced a chamfered image accordingly. A l l the c5a's were computed based on this chamfered image, and only when the gradient techniques returned improved a values was the new filter for the new camera model computed. Obviously even relatively small changes in the camera parameters can make an edge element shift from valid to invalid or vice versa. Initially it was hoped that this approach would work with only slight inaccuracy, since the £a's were small, initial experiments were very discouraging however. A possibility for improved filtering which is currently being explored is to create a filter by selecting a fixed calibration position for the arm's joint angles, then sweeping the boom and stick models, based on this position through ranges of camera parameters. The goal is to be able to adjust for slight errors in camera parameters, so the a ranges used are the current values, plus or minus five degrees or five centimetres. This should produce a highly refined edge image, but unfortunately sweeping two models through fine increments of six OJ'S is massively time consuming in serial implementation. The slowness of obtaining results for filtering, have delayed results for camera calibration CHAPTER 3. VISION AND JOINT ANGLE SENSING 43 via these methods. Fortunately computing such filters could be hugely speeded up by implementing the process in parallel, for example by having a number of processors updating the filter for small ranges of the parameters, and need only be done once for any machine configuration. For clarity let us summarise the design decisions described in this chapter. The implemented system begins with a digitised image of the excavator arm, taken from the cab. Image processing tasks including computing of V 2 G , V G X , and VGy are applied, then the zero crossings of V2G are marked with their gradient orientation to produce the oriented zero crossing map. For each arm component in turn a chamfered edge window is computed, using an orientation filter to select only zero crossings whose orientations fall within an allowable range for the component lines. Optimisation techniques are used to adjust 6 until a minimum error measure E, is reached for the current joint angle. The diagram in figure 3.9 illustrates this process. CHAPTER 3. VISION AND JOINT ANGLE SENSING I M A G E Camera and Arm Models V2G,VGX,VG Compute Window Filter Oriented Zero Crossing Map Chamfering^ Chamfered Synthetic Edge Image Image (^Compute E Difference Description Error Measure E Adjust for Ern.ii Figure 3.9: Chosen Algorithm Chapter 4 Implementation The method proposed for joint angle sensing can be roughly divided into three parts: image processing, orientation filtering and chamfer matching. These have been imple-mented in C on SUN 3 and SUN 4 workstations. Images are rough approximations to real data expected when a video camera is actu-ally mounted in the excavator cab. The current data however was obtained using a wide angle lens with a 35mm camera, mounted on a tripod in the cab. Actual joint angles for 62, O3 and 64 were computed based on measurements including the vertical height of the second and fourth joints, and the straight line distance between them. Images were digitised on an Optronics Fi lm Scanner/Writer at 50 fj, then resampled , for purposes of manipulation to about 70^. 4 .1 R e p r e s e n t a t i o n s The main data structures employed by the vision system are large integer image arrays, one for the oriented zero crossing map, and one for the chamfered edge image. A third, somewhat smaller floating point array is used as the orientation map to store min and max allowable orientations for each pixel in a component window. The data used in the Caterpillar arm model was obtained in a similar manner to 45 CHAPTER 4. IMPLEMENTATION 46 the image data, in this case using a tape measure. These measurements have been reduced to three lines for each of the boom, stick and bucket, in the form of six, 3-D endpoints. Each component is represented within its own coordinate frame, not exactly in the Denavit-Hartenberg form, but with the origin centred at the proximal end, and the —z axis passing through the centre lengthwise. The actual 3 x 6 x 3 floating point array structure used can be found in Appendix B . l . The world model uses an approximation of the camera position as its origin. Using 0 estimates the arm components are translated and rotated to their 3-D world positions, based on measurements of the machine's geometry. A vector of six values, three for the camera's actual (x, y, z) position, and three for its x, y and z rotations is then used to transform the world coordinates into camera coordinates. Finally the camera's lens focal length is used to compute the perspective projection of these points as they will appear in the synthetic image. Both camera and coordinate transformations are performed using homogeneous trans-forms [32], common in Graphics and Robotics. In this form each rotation, or translation can be represented by a 4 x 4 matrix. These in turn can be multiplied together to repre-sent a sequence of translations and rotations as a single homogeneous transform matrix. For our purposes these matrices have been set up in the following way : x rotation 1 0 0 0 0 cosO —sinO 0 0 sinO cosO 0 0 0 0 1 y rotation cosd 0 sinO 0 0 1 0 0 —sinO 0 cosd 0 0 0 0 1 z rotation cosO —sinO 0 0 sinO cos9 0 0 0 0 1 0 0 0 0 1 translation 1 0 0 0 0 1 0 0 0 0 1 0 8x 8y 8z 1 3-D coordinates are represented as four value homogeneous coordinates [xw,yw, zw, w], CHAPTER 4. IMPLEMENTATION 47 but in general w will be 1 giving [x, y, z, 1]. In the case of computing perspective projec-tion however the transformation [x,y,z,l] / 0 0 0 0 / 0 0 0 0 / -1.0 0 0 0 / = [fxjyjzj- z) makes use of the w to produce the familiar projection equations JL a = fk-4 . 2 Edge Detection (4.1) The Gaussian distribution used for smoothing the image has the form G(x,y) = 27TC72 e 2" The form of the Laplacian operator is V 2/(x,</) = d2f(x,y) , d2f(x,y) thus, d2 V2G = lira2 e 2<r-dx2 2# + dx2 + dy2 _il±sr dy' (4.2) (4.3) (4.4) V 2 G can not only be applied in parallel with each processor convolving a small region with the filter, but the operator itself can be written as the sum of two separable oper-ators, making it more efficient still [26]. Since we are only interested in zero crossings, we allow the scale constant to be K where K > 0, thus the derivative becomes: V 2 G = Ke&e& {x2 - a2) + Ke&e& (y2 - a2). (4-5) From equation 4.5 there are /mearfilters of two different forms which can be precomputed. CHAPTER 4. IMPLEMENTATION 48 When using a n l x M filter, hi and h2 are computed for r = — -y to y . In addition V G ^ and VGj, are required to compute gradient orientations, they can also be reduced to a separable form. VGX = el£ (=§) VGy = e^r The only additional filter required is / i 3 - —re 2^, The program Gauss which computes the V 2 G <g> I image used by the system is part of the Laboratory for Computational Vision's library of image software. It was modified slightly to compute and apply h3 and output a gradient orientation map. The method is as described in algorithm 4.1. A l g o r i t h m 4.1: V 2 G { For y = Y to img_rows - 4f For x = f to img_cols - f { ix ® hfay] = E,-L_M M»] x ^ + *.y] M 2 4 ® M ^ , y] = E i l _ M M*] x l[x + i,y] M 2 Ix <8> h[x, y] = £.1 M h3[i] x I[x + i,y] * 2 M V 2 G [ x , y] = £ 1 M ^ [ i ] X 7, ® fc2[a:, y + i] 2 +/i2[i] x 4 (g) /ii[x,y + i]. V G X = £ 1 M x 7^  ® £3[a;, y + i] * 2 M V G y = £ 1 M h3[i] x Ix® /iJcE, y + i] 2 gradient orientation[x,y] = Arctan(^f-) } } * V 2 G * CHAPTER 4. IMPLEMENTATION 49 a b c d Figure 4.1: Pixel Neighbours Using this method, instead of a complexity of TV2 x M2 for an N x N image, the V 2 G processed image can be computed in iV 2 x M time. The zero crossings of V 2 G ® 7 are computed based on a neighbourhood of three pixels about the one being considered as illustrated in figure 4.1. If a, b, c, and d do not all have the same sign, or if the value of d is zero then d is marked as a zero crossing. If all four neighbours have the same sign d is not marked. This technique produces rather thick lines, but this is not necessarily detrimental, whereas choosing narrow lines could cause image information to be lost in the filtering process. 4 . 3 C r e a t i n g t h e O r i e n t a t i o n M a p Each component orientation map takes the form of a two level floating point array, with one level storing min orientations and the other max orientations for each pixel in a component window. It is computed by setting up offsets so that it lies over all possible positions for the current component, given currently known 6 for proximal joint angles. The routine Setup_Orientat ion_Map then iterates through small increments from 6min to 9max for the current joint angle, projecting each model line and computing its orientation. For the stick and bucket increments of ^ radians are used, for the boom, because of the distance of imaged points from the pivot of the joint, increments of radians are used. CHAPTER 4. IMPLEMENTATION 50 A l g o r i t h m 4.2: Setup_Orientat ion_Map { reinitialise orientation map, reset window limits, set-orientation = true, For 6 = 0min to 6max { project_model_points(for component i) For each pair of endpoints (xo, yo), (xi,yi) { If (XQ — XI) is non zero line.orientation = Arctan(^°~vi), \ XQ— X \ I ' Else line.orientation = ^ Mark_Line(x 0, y0,xx,yi), } } set-orientation — false, Mark_Line is called to determine at which pixels each model line falls in the orienta-tion map. When the global flag set^orientation is set, MarkJLine calls Mark.Orientation to update the orientation map for each point (x, y) it marks as a point on the projected model line. A l g o r i t h m 4.3: Mark_Or ien ta t ion If (x,y) is outside current window dimensions update window dimensions. If orientation-map[max, x, y] < line-orientation orientation-map[max, x, y] = line-orientation. If orientation-map[min,x,y] > line-orientation orientation-rnap[min,x,y] = line-orientation. CHAPTER 4. IMPLEMENTATION 51 Computing filters can be easily parallelised by having individual processors marking the orientation map for small subranges of the full 9min to 9max range. 4.4 Chamfer Matching 4.4 .1 C h a m f e r i n g t h e E d g e I m a g e Computing the chamfered edge image is done much as described in section 2.2.7, with the added twist that the orientation filter is applied simultaneously. In the initial pass of the algorithm each marked edgepoint in the oriented zero crossing map is compared to the orientation range for the corresponding point in the orientation filter. If the point falls within the valid range it is selected and marked with zero, all other points are given a distance value greater than zero, approximating their Euclidean distance from the nearest edge point. The second pass of the algorithm is unchanged. A l g o r i t h m 4.4: Chamfer ( edgeJmg, c h a m J m g ) { set window min and max based on window limits, initialise window region to infinity, For row = window_min_row to window_max_row For col = window_min_col to window_max_col { 0min = orientation-map[min,row,col] - Tolerance. 9max = orientation-map[max,row,col\ + Tolerance. If (edgeJmg[row,col] is a marked zero crossing) and ((0 > 9min) and (9 < 9max)) chamJmg[row,col] = 0, Else chamJmg[row,col] m i l l ( chamJmg[row,col], cham _img[row-1 ,col-1] +3, CHAPTER 4. IMPLEMENTATION 52 cham Jmg[row-1 ,col]+2, cham Jmg [row-1, col+1 ]+3, chamJmg[row,col-l]4-2 ). } * end For * For row = window_max_row down to window_min_row For col = window_max_col down to window_min_col { cham Jmg [row,col] -min ( cham Jmg [row,col], cham Jmg [ro w+1 ,col+1 ]+3, cham Jmg [ro w+1 ,col]+2, cham Jmg [row+1 ,col-1 ]+3, chamJmg[row,coi+l]-(-2 ). } * end For * } * end Chamfer * Chamfering is a two pass process in serial, but can be implemented as a parallel relaxation process, by starting various processors at edge points and having them prop-agate the distance approximations outward [3]. The method is based on the analogy of starting fires at edgepoints and propagating them until they meet at quench points. At each iteration the next level of surrounding pixels are marked with appropriate distance values, until all pixels are labelled. The number of iterations required is equal to the greatest distance labels must be propagated until a quench point is reached, that is the maximum distance from an edgepoint to the window edge, or half the greatest distance between neighbouring edgepoints, whichever is greater. 4 . 4 . 2 C o m p u t i n g E r r o r M e a s u r e The error measure E is computed via three routines : Compute_Error , Project JVIo-delJPoints and M a r k J i n e . ComputeJ5rror calls ProjectJModelJ°oints to obtain 2-D image coordinates of model endpoints. MarkJLine is then called for each of the three model lines and the error values it returns are summed to give the total error for the CHAPTER 4. IMPLEMENTATION 53 current 9 estimate for the current component. M a r k _ L i n e The Mark_Line routine calls on techniques for scan converting lines used in Raster Graph-ics. The projection routine Project_Model_Points has converted the six model points to their 2-D image projections, according to the camera model and the current 9 estimate. Mark_Line is called for each point pair representing a line. Scan conversion based on these endpoints indicates the coordinates of pixels which lie near the model line in a raster grid, or in this case an integer array representing the synthetic image. The chamfer values of these points are added to give the desired error measure. Raster Graphics places huge emphasis on speed and efficiency versus appearance of edges. Incremental methods which minimise the number of operations, particularly multiplications and divisions, are generally used to achieve the speed required. The basic incremental algorithm is based on the familiar line equation y = mx + b. If unit increments are assumed for x, = Roundlyi+m). Of course if the slope is greater than one, y is incremented and x t + i = Round(xi + ^ ) . One such algorithm was proposed by Bresenham [17]. It uses no real variables, instead a decision variable T is used to decide whether to increment y (if m < 1) or x (if m > 1). The decision value is based on comparing du and d; illustrated in figure 4.2. If (du — di) < 0 then L is chosen as the next marked pixel, otherwise U is chosen. For ease of computing the decision value, endpoints are presumed translated to the origin. du and d\ then become: du = {Vi - Vo) + 1 - - x 0 ) + 1) But the value for the decision variable is [du — d\). The computation required to acquire CHAPTER 4. IMPLEMENTATION 54 Figure 4.2: Scan Converting Lines this value can be reduced as follows: r,- = d,-du = %{{Xi - xo) + 1) - (yi - y0) - (yi -y0)-l + - x0) + 1) = 6x(dt - du) = 26y(xi - x0) + 26y - 26x(yi - y0) - Sx To make the decision computation incremental, Ti+i - Ti = 26y(xi+1 - x0) + 26y - 28x(yi+l - yt) - 6x -26y(xi - xQ) - 28y + 26x(yi - y0) + 6x = 28y(xi+1 - x{) - 28x(yi+l - yt) In this case x is the incremented variable so a? t + 1 — a;,- will always equal one, while J/J+I —yt will be zero or one as follows : If > 0, then select U thus yi = + 1 and Ti+1 = Ti + 2{8y-8x). If Ti < 0 then select L thus yi = and Ti+i = Ti + 28y. (4.7) Ti then becomes 28y — 8x, and all subsequent T,- can be computed incrementally. Instead CHAPTER 4. IMPLEMENTATION 55 of evaluating the line equation and rounding the conditional variable, the same effect can be achieved using only addition, subtraction and left shift for multiplication by two. A l g o r i t h m 4.5: M a r k _ L i n e (x0,y0,xn,yn) { 8x = \xn — x0\. Sy = \yn - yo\-* Sx and 8y precomputed * If (g) > 1 1 Sy > { } Else { y is the incremented variable. T\ = 28x — 8y. decisionJncl = 28x.' * if < 0 * decision_inc2 = 2(8x — 8y). * if > 0 * Let x be the conditionaLvariable. If (yo < yn) { Let yo be initial value for incremented_variable. Let yn mark the end_of Jine. } Else { Let yn be initial value for the incremented_variable. Let yo mark the end_of Jine. } err = chamJmg[conditional_variable,incremented_variable]. * err is initialised to chamfer value for endpoint * x is the incremented variable. T x = 28y - Sx. decisionJncl = 2<5y. * if < 0 * decisionJnc2 = 2(8y — 8x). * if Ti_\ > 0 * Let y be the conditionaLvariable. CHAPTER 4. IMPLEMENTATION 56 If (x0 < xn) { Let xQ be initial value for the incrementecLvariable. Let xn mark the end_of Jine. } Else { Let xn be initial value for the incrementecLvariable. Let x 0 mark the end_of Jine. } err = chamJmg[incremented_variable,conditional_variable]. * err is initialised to chamfer value for endpoint * } * end initialising if * W h i l e ( incremented_var < end_of Jine) { increment incremented_variable. i f (r,- < o) { T,- = I\+ descisionJncl. } Else { increment conditionaLvariable. Tj = I\-f decision Jnc2. } If x is the conditionaLvariable, err = err + chamJmg[conditional_variable,incremented_variable]. Else err = err + chamJmg[incremented_variable, conditionaLvariable]. * add chamfer error value for currently marked point to error total * } re turn err. } * Mark Line * CHAPTER 4. IMPLEMENTATION 57 The process of scan converting lines seems quite serial, but each of the three model lines could be marked by separate processors, or to speed things further the model lines could be broken down into a number of smaller segments to utilise still more processors. 4 . 4 . 3 F i n d i n g E m i n Finding Emin is a single variable optimisation problem, where we want to minimise the error measure, subject to the variable 8 falling within the valid range 8min < 9 < 8max for the current joint angle. The areas of Numeric Methods and Operations Research offer a number of alternative strategies for sequential search which can be used when the objective function is not known analytically [18]. These include three point interval, golden section and fibonacci searches. In many cases however the range of 8 is quite large and contains many local minima. If the vision system is operating at high speed, the excavator arm will not have had a chance to move far from its last known position so it seems fruitless to search the entire joint angle range. Instead of these techniques then, a variation on the three point interval search was devised. Beginning with an estimate of 8, and a coarse increment 8, E is computed for 8est, a = 8est — 8 and b = 8est + 8. Initially then [a, b] is the interval of interest. If E(a) or E(b) is the minimum of the three values computed, we slide the interval along the range of 8 to the left or right so that a or & respectively becomes the new 8esi. For example if a has the lowest E, 8est becomes b, the upper end of the interval, a becomes 8esi and a — 8 the next a. When the current 8est has the minimum E on the interval, 8 is divided by two and the new E values for a = 8est — | and b = 8est + f are computed. From this point on whichever of a, 8est or b has the lowest E is chosen as the next 8est, and 8 is divided by two, until some minimum increment is reached. The current estimate is returned by the system, and the E value it produces is considered Emin-The routine G r a d which performs this optimisation is the heart of the vision system. CHAPTER 4. IMPLEMENTATION 58 It processes each component in turn, chamfering the component window and computing ' Emin, and in the case of the stick and bucket even computing the orientation filter. A l g o r i t h m 4.6: G r a d ( 8 parameter vector ) { Chamfer ( window for boom ). cur_parm = boom. first_pass = true. W h i l e ( cur_parm < bucket) { inc = initialJnc. mid = Compute_Error (curJ?, cur_parm). W h i l e (inc > mininc) { right = Compute_Error (cur_# + inc, cur.parm). left = Compute_Error (cur_0 - inc, cur_parm). If (right < mid) and (right < left) If first_pass W h i l e (mid > right) { left = mid. mid = right. cur_0 = curj? + inc. right = Compute_Error (cur_0 + inc, cur.parm). } Else { } mid = right. cur_0 = cur_0 + inc. If (left < mid) and (left < right) If first_pass W h i l e (mid > left) CHAPTER 4. IMPLEMENTATION 59 { right = mid. mid = left, curj? = cur_0 - inc. right = Compute_Error (cur_0 - inc, cur_parm). } Else { mid = left. cur_0 = cur_# - inc. } inc = i f . first_pass = false.. } * inc < mininc * 9pa,Tm — CUr_ 9. * set global 9parm for this parameter to curj? * increment cur_parm. Setup_Orientation_Map (cur_parm). Chamfer (cur_parm). TotalJErr = TotaLErr + mid. first_pass = true. } * While cur.parm < bucket * } * Grad * The error values used are actually not total error per component, but rather an average error per marked model point per component. This became necessary because the same line at two different angles can have very different numbers of pixels representing it in the synthetic image. Total error can vary correspondingly creating falsely low E values for some erroneous 9 and higher values relatively, for better matches. As previously mentioned, at video frame rates the arm cannot be expected to move far between iterations of the Grad algorithm. As well as having separate processors compute CHAPTER 4. IMPLEMENTATION 60 E(a), and E(b) to improve speed, the initial increment defining the initial search interval can be made quite small to improve accuracy and reduce search time. In attempting to use chamfer matching to correct the camera model the problem of finding Emin became a multivariable optimisation problem. Instead of using the G r a d routine then an implementation of the method of steepest descent was created. First a gradient vector, . _i \dE dE dE dE dE dE AE -is computed by computing E for the initial camera model estimate $ 0 then varying each parameter very slightly and computing the change in E. $ is improved iteratively by solving = $t- + AAE (4.9) for the positive scalar A which minimises 75($ s + 1). The process terminates when the change in E for two consecutive iterations drops below some tolerance. This technique for camera model adjustment has not yet been tested. (4.8) Chapter 5 Results The goal of the prototype vision system is to demonstrate the viability of passive vision techniques which could be used for joint angle sensing. Given the techniques selected there are three basic results to be illustrated to establish that the system is sufficiently reliable and accurate to be useful. This entails showing that for a reasonable range of initial estimates, it will converge to accurate values for the joint angles. A n argument can be made that the system is robust, although the limited amount of data available makes this difficult to verify. Finally speed is essential if the system is to run at frame rates, although the prototype uses serial techniques, some arguments for potential speed improvement through the use of equivalent parallel techniques will be presented. 5 . 1 P a r a m e t e r S p a c e Perhaps it is important first to examine what exactly the system is doing, and the type of information it uses. Basically chamfer matching is a single variable optimisation in E — 9 space, where 8 is adjusted to minimise E. Without the orientation filtering devised for this project, the parameter space is as illustrated in figure 5.1, with no clear minimum particularly at the true 83 value of 1.3 radians, found in table 5.1. The necessity and effectiveness of the filtering process is illustrated by comparing figure 5.1 with figure 5.2, 61 CHAPTER 5. RESULTS 62 image name 02 03 04 1 frame 5 0.4060 2.1099 2.0595 2 frame 6 0.3236 2.1099 2.0595 3 frame 7 0.2524 2.1099 2.0595 4 frame 11 0.4105 1.6717 2.0595 5 frame 12 0.4129 1.3024 2.0595 6 frame 13 0.4129 1.3024 3.3699 7 frame 14 0.4129 1.3024 3.5517 8 frame 15 0.4129 1.3024 3.5517 Table 5.1: True Joint Angles where there is a clearly defined minimum very close to the true 0 3 value. Unfortunately orientation filtering is not quite as effective for the 02 parameter space. Although it produces an improvement as illustrated in figures 5.3 and 5.4, 0 2 space remains very jagged. For the unfiltered E — 02 space the actual joint angle, 0.4129 radians, is again not the minimum, whereas after filtering a minimum appears at the correct value for 0 with E 4 error units lower than the next best local minimum. The region over which the optimisation will converge to the correct minimum however is much more limited than for the other parameters. The reason for the volatility of 02's parameter space is unclear. It may be partially that the boom model lines are more easily mistaken for one another because they are nearly parallel and two of them are close together, this causes the "picket fence problem" and resulting local minima. Another factor could be that the sample images have stronger horizontal lines such as the river, horizon and asphalt, than they do vertical and these would interfere more with the boom than the stick or the bucket. CHAPTER 5. RESULTS 63 Figure 5.1: Unfiltered Parameter Space - Image 6, 03 5 . 2 F r e e P a r a m e t e r s There are three arbitrary parameters which can affect the performance of the chamfer matching process. The first is the initial interval set for the search, which greatly affects whether the system will converge to some value close to the the true joint angle or be sidetracked into a local minimum. A second parameter is the orientation tolerance used to restrict how closely edgepoints must fit the orientation range from the orientation map. The last of the free parameters is the filter sampling rate, or the size of the increment used while sweeping the model through its 6 range to produce a filter. 5 . 2 . 1 I n i t i a l I n t e r v a l In order to observe the effect of various initial intervals on the range of initial estimates for which the system will converge to a nearly correct value for 9, the system was run CHAPTER 5. RESULTS 64 m CM ~ _ CSI 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5 2.7 THETA 3 Figure 5.2: Filtered Parameter Space - Image 6, 0-CHAPTER 5. RESULTS Figure 5.3: Unfiltered Parameter Space - Image 6, 02 CHAPTERS. RESULTS 66 Figure 5.4: Filtered Parameter Space - Image 6, $• CHAPTER 5. RESULTS 67 for a range of interval values from to radians for each of the eight images. The convergence ranges observed are laid out in figures 5.5,5.6 and 5.7. The tests run were relatively coarse, but the trends in the data seem clear nonetheless. Perhaps it is to be expected that the best convergence for Q<i occurs with a small initial increment of ^ , because of the jaggedness of its parameter space. For the stick and bucket on the other hand, large increments of ^ perform well. In future two separate increments could be used, but for the current implementation a value of ^ was chosen as a compromise. 5 . 2 . 2 O r i e n t a t i o n T o l e r a n c e Orientation tolerance determines the degree to which the image edgepoint orientations must fit predicted model ranges. The tradeoff in this case is that more valid edge points will be accepted for larger tolerance values, but they will also introduce more of the image noise we are trying to eliminate. This tolerance greatly affects the degree of smoothness in parameter space achieved by orientation filtering. To a large extent it determines the amount of noise in the chamfered image, and thus the characteristics of minima in parameter space. Naturally the higher the tolerance, the more noise is allowed until at the extreme no extraneous edgepoints are filtered out. At the other extreme a tolerance of zero allows only those edgepoints with orientations exactly conforming to the orientation map. Inaccuracy in image data and computed gradient orientations makes such a restrictive tolerance undesirable since insufficient image data is allowed to pass the filter. In a series of executions, a range of tolerance values from ^ to were tested on the eight images to determine their effect on accuracy and convergence. As expected, tolerance values affected both ranges of initial estimates over which the correct 8 could be computed, as well as the accuracy of the results. The higher tolerances, those greater CHAPTER 5. RESULTS 68 Figure 5.5: Boom Range of Convergence • Initial Increment (rad) Figure 5.7: Bucket Range of Convergence CHAPTER 5. RESULTS 69 than j ^ , were unsatisfactory in most repects since the range of convergence, as well as accuracy were reduced by added image noise. The best results occurred at different tolerances for the various images, but generally the range of — to ^ offered the best overall performance. A tolerance value of ^ produced quite accurate results for seven out of eight images, and so would be a reasonable choice for the system in general. 5 . 2 . 3 F i l t e r S a m p l i n g R a t e There are actually two sampling rates used by the system, one for the boom and another for the stick and bucket. The considerably higher sampling rate for the boom is necessary because of the geometry of the machine, which causes the imaged boom to move quite a large distance for small changes in 62. This effect is the result of the imaged portion of the boom being some distance from the actual joint causing its motion. The lower the sampling rate the greater the chance that the ranges in the orientation map will be imperfect, and that valid edgepoints will be eliminated. The higher the sam-pling rate the more expensive the filters are to compute, and the slower the optimisation step grad, becomes. In determining the sampling rates the goal is to choose the lowest sampling rate for which accuracy and convergence are preserved. Tests were run for boom sampling rates from to 1 8 p 0 0 . Accuracy of computed 0 values was not affected, but the range over which convergence occurred varied somewhat. The lowest sampling rate at which maximum convergence intervals occurred was For the stick and bucket the sampling rate did affect accuracy, although convergence was not greatly affected in most cases. Tested sampling rates were from ^ to ^j. A value of was chosen because it was the lowest rate with the best accuracy. CHAPTER 5. RESULTS 70 5 . 3 P e r f o r m a n c e 5 . 3 . 1 A c c u r a c y Having acquired a set of preferred free parameters, tests for accuracy based on the eight images and their computed joint angles were performed. The values obtained with their variation from the true angles are listed in table 5.2. The data used by the system, including both camera and component models is quite rough. Image acquisition is another possible source of error in computing joint angles. Naturally any inaccuracy in computed 0 2 affects the accuracy of the estimate for 93 and so on, so the greatest error often occurs for the estimated 0 4. Further, depending on the arm position the bucket is sometimes difficult to locate accurately, particulary when viewed nearly front on. Using the three joint angles for each of the eight images, we have twenty four sample points {Qtrue — 9est)- The sample mean of this set is 0.0030 radians, the standard deviation is 0.0709 radians. Figure 5.8 is the histogram of these difference values, and illustrates their roughly normal distribution. Another factor affecting accuracy was image size. The larger the image the greater the imaged distinction for small changes in 9. The tradeoff of course is that large images are slower to process. 5 . 3 . 2 R o b u s t n e s s Error has been introduced, albeit involuntarily at every level of data acquisition for the Caterpillar vision prototype. The images have been digitised from 35mm negatives, with the inherent introduction of noise this process involves. The model of the Caterpillar arm was produced from measurements taken with a tape measure, as were measurements used to compute the true joint angles. The camera models used for the various images were estimated based on adjusting camera parameters until the arm model, positioned with known joint angles, visually matched the image in question. The fact that results CHAPTER 5. RESULTS Image 1 Computed 0 9 — 9est 02 0.4047 0.0013 03 2.1118 -0.0019 2.0944 -0.0349 Image 2 Computed 0 0 - 0est 0.3237 -0.0001 03 2.0985 0.0114 04 2.0289 0.0305 Image 3 Computed 9 9 — 9est 02 0.2563 -0.0040 63 2.1107 -0.0008 04 2.0289 0.0305 Image 4 Computed 9 0 — 0est 02 0.4091 0.0014 03 1.6581 0.0136 04 2.0595 0.0000 Image 5 Computed 9 0 — 0est 02 0.4129 0.0000 03 1.2762 0.0262 04 2.0780 -0.0185 Image 6 Computed 9 0 — 0est 02 0.4126 0.0003 03 1.2817 0.0206 04 3.3816 -0.0117 Image 7 Computed 9 0 — 0est 02 0.4112 0.0017 03 1.3090 -0.0066 04 3.5430 0.0087 Image 8 Computed 9 0 — 0est 02 0.4199 -0.0070 03 1.2959 0.0065 04 3.5474 0.0044 Table 5.2: Computed #'s CHAPTER 5. RESULTS 72 1 11 # of measurements T-0.0314 -0.0314 0 (rad) e -e est Figure 5.8: Histogram of Error are as positive as they are is a testament to the system's robustness! Ideally of course the system ought to have been tested under a variety of changes in the imaged environment. The only data of this type in the data set are images 7 and 8, which show the arm with identical joint angle and camera positions. Image 8 in figure 5.10 is considerably brighter than image 7 in figure 5.9, with the accompanying increase in shadows and clarity which result from bright sunlight. From table 5.3 the accuracy of the estimated 6 for both images was quite high and their convergence was about equal as well. Evidently lighting changes of this magnitude have little effect on performance. 5 . 3 . 3 C o n v e r g e n c e The same series of executions which selected the initial search interval, indicated the convergence characteristics of the system on the available data. If we look at the selected initial interval — in the graphs in figures 5.5, 5.6 and 5.7, the. average range of convergence CHAPTER 5. RESULTS 73 Figure 5.9: Image 7 - Overcast CHAPTER 5. RESULTS Figure 5.10: Image 8 - Bright Sunlight CHAPTER 5. RESULTS 75 Image Processing Times ( C P U sees) Image Size Sun3/60 Sun4/260 1 510 x 343 236.10 82.51 2 510 X 339 235.20 81.19 3 510 X 338 231.18 81.18 4 510 x 340 235.50 81.59 5 510 x 339 236.17 81.79 6 510 x 339 233.39 81.11 7 510 x 340 230.36 81.75 8 510 x 342 240.00 82.92 Average Image Processing Time(CPU sees) 234.74 81.75 Table 5.3: Image Processing C P U times for the boom is 0.061 radians in total. As previously illustrated the parameter space for 0 2 is not smoothly sloped to the desired minimum, so the range over which it can be expected to find Emin is narrower than those for 03 and t94. Convergence is also affected to a lesser degree by the filter sampling interval for the boom, and the orientation tolerence since all three affect the slope and smoothness of parameter space. Low orientation tolerences are generally preferable all around since the less noise, the fewer local minima and the wider the range of convergence. 5 . 3 . 4 S p e e d The average times for the serial implementations of the various activities in the vision system are listed in tables 5.3 and 5.4. The boom, filter, which is the most costly to compute, needs only to be computed once for each camera model. If we can assume the camera will require infrequent adjustment, the boom filter can be precomputed, and not included in the time required to process each image. The entire process in the current implemenatation takes 288.42 C P U sees on a Sun3/60 with math coprocessor chip, or 92.16 sees on a Sun4/260, per image. Fortunately by far the largest part of this time is taken up by highly parallelisable CHAPTER 5. RESULTS 76 Average Chamfer Times Component Sun3/60 Sun4/260 boom 11.96 2.04 stick 6.58 1.23 bucket 6.11 1.12 Average Filter Computation Time boom 80.06 17.60 stick 14.91 3.01 bucket 11.87 2.64 Average Search Time (chamfering, stick and bucket filters) 53.68 10.41 Table 5.4: Average C P U times image processing tasks. V 2 G , VGX and V G y are applied independently to the neigh-bourhood around each pixel, and given parallel hardware such as a Datacube, can be computed at frame rates. Computing filters, even excluding the boom, is time consuming: 26.78 sees on the Sun3/60 and 5.65 on the Sun4/260. As mentioned in chapter 4, multiple processors would allow each to compute the filter ranges for a very small range of 9. Better still methods of interpolating between precomputed filters would eliminate the computation altogether. As for the chamfer matching process itself, chamfering requires about 24.65 sees on the Sun3 and 4.39 on the Sun4/260. These times can be reduced by parallel implementation so that for N the window dimension, the algorithm complexity is reduced from the serial complexity of A^2 as outlined in section 4.4.1. The parallel complexity is the greater of the maximum distance of any edgepoint to a neighbouring window limit or half the maximum distance between neighbouring edgepoints. Of course the maximum distance for any two such points will be N in the worst case. Since there is still considerable noise in the filtered edge image the number of iterations will in general be far less than N. CHAPTER 5. RESULTS 77 The optimisation step could be speeded up by computing E for each model line, or even segments of model lines in parallel. At frame rates the arm will not have moved far between images, and the search interval can be made small, which may also improve accuracy. In general, if all avenues for parallelising these techniques are exploited, near real time operation seems within reach. Software exists in other labs to find the zero crossings of V 2 G at video frame rates. For the optimisation algorithm assume N is the window size, where N < I the image dimension, R is the range of 0;, S is the sampling rate, M is the number of iterations required to converge to the true 6 value and L is the number of model feature points. Chamfering the component window will require A^ 2 time in serial, N in parallel. Computing the error measure is linear in model feature points L. Computing a filter requires projecting the model, also linear in L, into the filter ^ times. Thus the whole optimisation process requires time in the order of 3N2 + 2Lj + ZML for a given image. It can be argued here that N and L are about the same order of magnitude, and M is much smaller, whereas R and S are more or less constant, so that overall complexity is N2 in serial. Our results have illustrated however that the constant is huge and cannot in reality be neglected. If we exploit parallel techniques using P processors the complexity should be reduced to 3N + 2Ljr§ + 3Mj;. Again the theoretical argument can be made that the overall parallel complexity is A''. Using our experimental results then worst case chamfer time in parallel would be 3 x ^11.96 = 10.37 C P U sees although chamfer times should be some-what lower, since the parallel algorithm is one pass rather than two. Filter computation time would be divided by the number of processors available, for example with 5 proces-sors 2 x = 5.96 sees, similarly the time to perform the basic optimisation operations would become ^ j p = 0.45 sees. The average search time would thus have been reduced from 53.68 C P U sees as recorded in table 5.4 to about 17 sees. This figure of course CHAPTER 5. RESULTS 78 would be reduced by having more than 5 processors, in fact filter and optimisation times could be reduced to very small values if a sufficiently large number of processors were chosen. Frame rates may be achieveable given clever coding and sufficiently powerful hardware. Chapter 6 Conclusions and Future Work In the preceding chapters a method for joint angle sensing using passive model based vi-sion techniques has been described. Edge based orientation filtering and chamfer match-ing offer a simple, efficient method for determining the joint angles of an excavator arm. The goal of the Caterpillar vision system in the context of teleoperation is to achieve consistent, accurate, real time results. The statistics presented in Chapter 5 for the pro-totype system are quite encouraging. Relatively high accuracy and ranges of convergence have been achieved given the quality of measurements and data available to the system. Arguments have also been made that the techniques used are highly parallelisable, and that near real time processing rates can be achieved, given the necessary hardware. In this project we have focussed on a Caterpillar arm, but obviously these techniques could be extended to track many kinds of manipulators. A particularly interesting ap-plication would be manipulators in the space station [12]. The space station is a highly structured environment with harsh lighting which sometimes obscures special markings and bar codes used for visual recognition. Techniques such as those proposed here might be very useful in this situation. The main manipulators are very long and must handle huge payloads while being light enough themselves to transport into space, they are therefore quite flexible. Our matching techniques could be extended to allow for flex-79 CHAPTER 6. CONCLUSIONS AND FUTURE WORK 80 ing of a sensed manipulator, and thus produce more accurate positioning information than many conventional methods. For flexible structures the joint angles that would be obtained from joint angle encoders for example, do not determine the position of the endpoint. If our techniques located the flexed arm, the amount of flexing and the endpoint position would also be known. Another area for future work on the excavator itself is joint angle sensing for the excavator cab rotation 6\. This will require analysis of motion and interactions between arm and cab motion. Other desirable features in a full blown Caterpillar vision system would be recognition of common objects such as logs, trucks or people for avoiding collisions and improved manipulation. Bibliography [l] Gerald J . Agin. Calibration and Use of a Light Stripe Range Sensor Mounted on the Hand of a Robot. In IEEE 1985 International Conference on Robotics and Automation, pages 680-685, IEEE, St. Louis, Missouri, March 1985. [2] Gerald J . Agin. Computer Vision Systems for Industrial Inspection and Assembly. IEEE Computer, 13:11-20, May 1980. [3] Carlo Arcelli, Luigi P. Cordelia, and Stefano Levialdi. From Local Maxima to Con-nected Skeletons. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-3(2):134-143, March 1981. [4] H.S. Baird and M . Lurie. Precise robotic assembly using vision in the hand. In Alan Pugh, editor, Robot Sensors, Volume 1 : Vision, chapter 2.3, pages 85-94, Springer-Verlag, Berlin, 1986. [5] H .G. Barrow, J . M . Tenenbaum, R .C. Bolles, and H.C. Wolf. Parametric Corre-spondence and Chamfer Matching : Two New Techniques for Image Matching. In Proceedings of the 5th Anual International Joint Conference on Aritificial Intelli-gence, pages 659-663, IJCAI, August 1977. [6] Thomas O. Binford. Survey of Model-Based Image Analysis Systems. The Interna-tional Journal of Robotics Research, 1(1):18—64, Spring 1982. [7] Harry Blum and Roger N . Nagel. Shape Description Using Weighted Symmetric Axis Features. Pattern Recognition, 10:167-180, 1978. [8] Michael Brady. Artificial Intelligence and Robotics. Technical Report 756, MIT Artificial Intelligence Laboratory, Cambridge,Mass., February 1984. [9] Michael Brady. Parallelism in Vision. Artificial Intelligence, 21:271-283, 1983. [10] R . A . Brooks. Symbolic Reasoning Among 3-D Models and 2-D Images. Artificial Intelligence, 17:285-348, 1981. 81 BIBLIOGRAPHY 82 [11] John Francis Canny. Finding Edges and Lines in Images. Technical Report 720, MIT Artificial Intelligence Laboratory, Cambridge, Mass., June 1983. [12] Karl H . Doetsch and Richard C Hughes. Space Station- An Application for Com-puter Vision. In Vision Interface '88, page supplemental, Canadian Image Process-ing and Pattern Recognition Society, Edmonton Alta. , June 1988. [13] Richard O. Duda and Peter E. Hart. Use of the Hough Transformation To Detect Lines and Curves in Pictures. Communications of the ACM, 15:11-15, January 1972. [14] H.I. El-Zorkany, R. Liscano, and B . Tondu. A Sensor Based Approach for Robot Programming. In David P. Casasent, editor, Intelligent Robots and Computer Vision (Fourth in Series), pages 289-297, SPIE, 1985. [15] L . J . Everett, M . Driels, and B.W. Mooring. Kinematic Modelling for Robot Cal-ibration. In IEEE 1987 International Conference on Robotics and Automation, pages 183-189, IEEE, Raleigh, North Carolina, March 1987. [16] O.D. Faugeras. New Steps Toward a Flexible 3-D Vision System for Robotics. In H . Hanafusa and H. Inoue, editors, Robotics Research 2, pages 25-33, Kyoto, Japan, August 1984. [17] J.D. Foley and A . Van Dam. Fundamentals of Interactive Computer Graphics. Addison-Wesley Publishing Co., Reading, Mass, first edition, 1982. [18] George E. Forsythe, Michael A . Malcolm, and Cleve B . Moler. Computer Methods for Mathematical Computations. Prentice-Hall Series In Automatic Computation, Prentice-Hall, Inc., Englewood Cliffs, New Jersey, first edition, 1977. [19] D. Gauthier, M . D . Levine, A.S. Malowany, N . Begnoche, and G. Lefebvre. Mea-suring the Alignment Accuracy of Surface Mount Assembly Circuit Board Masks. In Proceedings Vision Interface '88, pages 1-6, Canadian Image Processing and Pattern Recognition Society, Edmonton, Alta., June 1988. [20] C. Goad. Special Purpose Automatic Programming for 3D Model-Based Vision. In Proceedings ARPA Image Understanding Workshop, pages 94-104, A R P A , Arling-ton,Virginia, 1983. [21] W.E .L . Grimson. Recognition of Object Families Using Parameterized Models. IEEE, 3:93-101, March 1987. BIBLIOGRAPHY 83 Samad A. Hayati. Robot Arm Geometric Link Parameter Estimation. In 22nd IEEE Conference on Decision and Control, pages 1477-1483, IEEE Control Systems Society, San Antonio, Texas, December 1983. Ellen C. Hildreth and John M . Hollerbach. Artificial intelligence: computational approach to vision and motor control. In Stephen R. Geiger and Fred Plum, editors, Handbook of Physiology, Section 1 : The Nervous System, chapter 15, pages 605-642, American Physiology Society, Bethesda, Maryland, 1987. John M . Hollerbach. Annotated Bibliography for Kinematic Calibration. July 1987. Submitted for publication - Robotics Review. John M . Hollerbach and David J . Bennett. Automatic Kinematic Calibration Using a Motion Tracking System. August 1987. Submitted for publication - 4th Interna-tional Symposium on Robotics Research. B. K . P . Horn. Robot Vision. The MIT Electrical Engineering and Computer Science Series, MIT Press, Cambridge, Mass., first edition, 1986. C. Loughlin and J . Morris. Line, edge and contour following with eye-in-hand vision system. In Alan Pugh, editor, Robot Sensors, Volume 1 : Vision, chapter 2.4, pages 95-102, Springer-Verlag, Berlin, 1986. D. G. Lowe. Three-Dimensional Object Recognition from Single Two Dimensional Images. Artificial Intelligence, 31:355-395, 1987. A . K . Mackworth. How to See a Simple World : An Exegesis of Some Computer Programs for Scene Analysis. In E .W. Elcock and Donald Michie, editors, Machine Intelligence 8, chapter 25, pages 510-537, Ellis Horwood Ltd. , Chichester, 1977. David Marr. Vision. W . H . Freeman and Co., New York, N . Y . , first edition, 1982. Amitabha Mukerjee and Dana H. Ballard. Self Calibration in Robot Manipulators. In IEEE International Conference on Robotics and Automation, pages 1050-1057, IEEE, St. Louis, Missouri, March 1985. William M . Newman and Robert F. Sproull. Principles of Interactive Computer Graphics. Computer Science Series, McGraw-Hill Book Co., New York, N . Y . , sec-ond edition, 1979. Richard P. Paul. Robot Manipulators: Mathematics, Programming, and Control. The MIT Press Series in Artificial Intelligence, The MIT Press, Cambridge, Mass., first edition, 1981. BIBLIOGRAPHY 84 [34] A . Pugh. Robot sensors - a personal view. In Alan Pugh, editor, Robot Sensors, Volume 1 : Vision, chapter 1.1, pages 3-14, Springer-Verlag, Berlin, 1986. [35] G. V . Puskorius and L. A . Feldkamp. Global Calibration of a Robot/Vision System. In IEEE International Conference on Robotics and Automation, pages 190-195, IEEE, Raleigh, N.C. , March 1987. [36] L . G . Roberts. Machine Perception of Three Dimensional Solids. In J.T. Tippett et al., editor, Optical and Electro-optical Information Processing, pages 159-197, MIT Press, Cambridge,Mass., 1965. [37] Wallace S. Rutkowski, Ronald Benton, and Ernest W. Kent. Model Driven Deter-mination of Object Pose for a Visually Servoed Robot. In IEEE 1987 Internaional Conference on Robotics and Automation, pages 1419-1428, IEEE , Raleigh, N .C. , March 1987. [38] Keneth D. Taylor, Francois M . Mottier, Dan W. Simmons, William Cohen, Ray-mond Pavlak, Donald P. Cornell, and G. Blair Hankens. A n Automated Mo-tion Measurement System for Clinical Gait Analysis. Journal of Biomechanics, 15(7):505-516, 1982. Appendix A Images 85 APPENDIX A. IMAGES A . l I m a g e 1 APPENDIX A. IMAGES A . 2 I m a g e 2 APPENDIX A. IMAGES A . 4 I m a g e 4 APPENDIX A. IMAGES A . 6 I m a g e 6 APPENDIX A. IMAGES A . 7 I m a g e 7 APPENDIX A. IMAGES A . 8 I m a g e 8 Appendix B Code B . l A r m M o d e l f l o a t model[3][6][3] = /* boom */ { -18.25, 32.4, -163.0, -18.25, 13.5, -343.0, -18.25, -13.2, -153.0, -18.25, -11.5, -343.0, 18.25, -12.5, -230.0, 18.25, -11.5, -343.0, /* stick */ -12.75, 40.0, -14.0, -12.75, 15.0, -139.0, -12.75, -7.0, -14.0, -12.75, -7.0, -160.0, 12.75, -7.0, -14.0, 12.75, -7.0, -160.0, /* bucket */ 39.5, 5.3, -9.64, -39.5, 5.3, -9.64, 39.5, 5.3, -9.64, 40.0, 11.08, -90.43, -39.5, 5.3, -9.64, -40.0, 11.08, -90.43 B . 2 C o m p u t e O r i e n t a t i o n F i l t e r setup_orientation_map ( mod_idx ) int mod_idx; { int i , j , imax, imin; projection proj; f l o a t denom, val; 94 APPENDIX B. CODE set_o = TRUE; i f (mod_idx == boom) { imin = ( i n t ) (((a_min[mod_idx]/pi)*180.0 - 1. 0 ) * ( f l o a t ) b f s c a l e ) ; imax = ( i n t ) (((a_max[mod_idx]/pi)*180.0 + 1. 0 ) * ( f l o a t ) b f s c a l e ) ; /* because of distance from pivot to image pos, imcrement f o r boom must be about 1/40 deg. */ } else { imin = (int)(((a_min[mod_idx]/pi)*180.0 - 1.0 ) * ( f l o a t ) o s c a l e ) ; imax = (int)(((a_max[mod_idx]/pi)*180.0 + 1.0)*(float)oscale); } /* reset array */ fo r ( i = 0 ; i < map_rdim;i++) •C f o r ( j = 0;j < map_cdim; j++) { orient_map[mod_idx-7][i][j][0] = o _ i n i t ; orient_map[mod_idx-7][i][j][1] = - o _ i n i t ; } }/* end outer f o r */ /* reset window l i m i t s */ wind[mod.idx-7][0] = 0; wind[mod_idx-7][l] = map_cdim; wind[mod.idx-7][2] = 0; f o r (i=imin ; i <= imax; i++) { /* run through the en t i r e range of v a l i d angles f o r the current component */ i f (mod.idx == boom) va l = ( ( f l o a t ) i / (18 0 . 0 * ( f l o a t ) b f s c a l e ) ) * p i ; APPENDIX B. CODE else v a l = ( ( f l o a t ) i / ( 1 8 0 . 0 * ( f l o a t ) o s c a l e ) ) * p i ; proj ect_model_points(mod_idx,val,proj); /* get projected model points */ f o r (j = 0; j<5; j+=2) { i f (fabs(denom = proj [j] [0]-proj [j+1] [0] ) > 0.001) { li n e _ o = ( f l o a t ) atan((double) (proj [j] [ l ] - p r o j [j + 1] [l])/(double)denom) ; /* compute angle of l i n e to x axis */ } /* end i f */ else •C l i n e _ o = h a l f p i ; /* slope i s i n f i n i t e , angle i s pi/2 */ } mark_line(proj [j] [0] ,proj [j] [1] , proj [j+1] [0] ,proj [j+1] [1] ,proj array) ; } /* end f o r */ } /* end f o r */ set.o = FALSE; } /* end setup o r i e n t a t i o n map */ mark_orientation (row, c o l , cur_o) i n t row, c o l ; f l o a t cur_o; { i n t real_row, r e a l _ c o l , indx; real_row = r o f f s e t - row; r e a l . c o l = coffset + c o l ; indx = parm - 7; APPENDIX B. CODE 97 i f (real.row > wind[indx][0]) { wind[indx][0] = real_row; } else { i f (real_col<wind[indx][1]) { wind[indx][1] = real_col; } else { i f (real.col > wind[indx][2]) •C wind[indx][2] = real_col; } } > /* maintain window dimensions */ i f ((cur_o<0.0) && (parm>boom)) cur_o = cur_o + p i ; i f (orient_map[indx][real_row][real_col][0]>cur_o) { orient_map[indx][real.row][real.col][0] = cur_o; /* i f orient map value i s less than current min for the current pixel, replace i t */ } i f (orient_map[indx][real_row][real_col][l]<cur_o) { orient_map[indx][real_row][real_col][1] = cur_o; /* i f orient map value i s less than current min for the current pixel, replace i t */ > } /* end mark orientation */ APPENDIX B. CODE B . 3 C h a m f e r f o r E d g e W i n d o w /* Procedure Chamfer - imputs an image array processed */ /* with a Laplacian of the Gaussian, outputs a */ /* "chamfered" image i n which each p i x e l contains a */ /* value roughly measuring i t s distance from the */ /* nearest edge. */ chamfer(edge_img, cham_img) ia r r a y edge_img, cham.img; { i n t row,col,rmax,rmin,cmax,cmin; f l o a t theta, thetamin, thetamax; i n t h a l f _ r e g , worig, moff = 0; i n t indx; indx = parm-7; ha l f _ r e g = map_cdim/2; /* compute h a l f number of cols f o r a l l possible positions of components worig = p i v o t [ l ] - half _ r e g ; i f ( ( r m i n = pivot[0] - top_wind)<=0) { rmin = 1; moff = top_wind - pivot[0] + 1; } rmax = wind[indx][0] + pivot[0] - top_wind + 1; if ( ( c m i n = worig + wind[indx][1])<0) cmin = 0; cmax = pivot[1] + (wind[indx][2] - half_reg) + 1; fo r (row=rmin-l;row<rmax+l;row++) { for(col=cmin-l;col<cmax+l;col++) { cham_img[row][col]= i n f i n ; } } /* i n i t i a l i s e chamfer array */ APPENDIX B. CODE f o r (row = rain; row < rmax; row ++) { f o r ( c o l = emin ; c o l < cmax; c o l ++) { thetamin = orient_map[indx][row-rmin+moff][col-worig][0] - o r i e n t . e r r ; thetamax = orient_map[indx][row-rmin+moff][col-worig][1] + o r i e n t _ e r r ; theta = ( f l o a t ) edge_img[row][col]/1000.0; i f ((parm >boom)&&(theta<0.0)) theta = theta +pi; i f ((edge_img[row][col] > -4500) && ((theta > thetamin) && (theta < thetamax))) { cham_img[row][col] = 0; /* set edge points to 0 */ > else { cham_img[row][col] = minval (cham_img[row][col], cham_img[row-l][col-1] +3, cham_img[row-l][col] +2, cham_img[row-l][col+1] +3, cham_img[row][col-1] +2) ; } /* endif != 0 */ } /* end f o r c o l */ } /* end f o r row (forward pass) */ f o r (row = rmax-1; row >= rmin; row — ) { f o r ( c ol = cmax-1; c o l >= emin; c o l — ) { APPENDIX B. CODE 100 cham_img[row][col] = minval ( cham_img[row][col], cham_img[row+l][col+1] +3, cham_img[row+l] [col] +2, cham_img[row+l][col-1] +3, cham_img[row][col+1] +2 ); } /* end f o r c o l */ } /* end f o r row (backward pass) */ }• /* end chamfer */ B . 4 C o m p u t i n g E r r o r M e a s u r e B . 4 . 1 C o m p u t e E r r o r long compute_error (alpha,model.index) f l o a t alpha; i n t model_index; { long e r r ; proj e c t i o n p r o j ; mark_count = 0; proj ect_model_points(model.index,alpha,proj); err = mark_line(proj[0][0],proj[0][1], proj[1][0],proj[1][1],projarray)+ mark_line(proj[2][0],proj proj[3][0],proj mark_line(proj[4][0],proj proj[5][0],proj [2] [1] , [3][1],projarray)+ [4] [1] , [5][1],projarray); if(mark_count == 0) err = 32000; return e r r ; /* t o t a l error f o r model component i s sum of error f o r each of three l i n e s of component */ APPENDIX B. CODE 101 y I* end compute error */ B . 4 . 2 P r o j e c t M o d e l P o i n t s parameter_setup (model_index,alpha_adj) i n t model_index; f l o a t alpha_adj; { switch (model_index) { case 0 : /* f o c a l length */ { persp_xform[0][0] = persp_xform[l][1] = persp_xform[2][2] = persp_xform[3][3] = alpha_adj; persp_xform[2][3] = -1.0; break; > case 1 : /* x pos of camera */ { cam_transl[3][0] = -alpha.adj; break; } case 2 : /* y pos of camera */ { cam.transl[3][1] = -alpha.adj; break; } case 3 : /* z pos of camera */ { cam.transl[3][2] = -alpha_adj; break; } case 4 : /* x r o t a t i o n of camera */ { setx_rotation(cam_xrot,alpha_adj); break; APPENDIX B. CODE 102 } case 5 : /* y r o t a t i o n of camera */ { sety_rotation(cam_yrot,alpha.adj); break; } case 6 : /* z r o t a t i o n of camera */ { setz_rotation(cam_zrot,alpha_adj); break; } case 7 : /* boom */ { x_rot = -(gam[0] + alpha_adj + gam[l] - p i ) ; trans_x =112; trans.y = (int)(-118.ll+(float)sin(alpha_adj+gam[0])*200.0); trans.z = (int)(68.58-(fIoat)cos(alpha_adj+gam[0])*200 . 0 ) ; break; } case 8 : /* s t i c k */ { x_rot = (p i - alpha_adj - alpha[boom.ang] + gam[2]); trans.x = 112; trans_y = (int)(-118.ll+(float)sin(alpha[boom_ang])*520.0); trans_z = (int)(68.58-(float)cos(alpha[boom_ang])*520.0); break; } case 9 : /* bucket */ { x_rot = (pi - alpha[stick.ang] - alpha[boom_ang] + gam[2]); trans_x = 112; trans_y = (int)(-118.ll+(float)sin(alpha[boom_ang])*520.0 - ( f l o a t ) sin(x_rot)*180.0); trans_z = (int)(68.58 - ((float)cos(alpha[boom_ang])*520.0 + ( f l o a t ) cos(x_rot)*180.0)); x_rot = x_rot + (p i - alpha.adj); break; } } /* end switch */ APPENDIX B. CODE 103 } /* end parameter setup */ proj ect_model_points(model_index,alpha_adj,proj) i n t model_index; proj e c t i o n p r o j ; f l o a t alpha_adj; { i n t i ; f l o a t xform[4][4] ; coord point,xpoint; parameter_setup(model.index,alpha_adj); /* i n i t i a l i s e f o r new estimate alpha adj */ setx_rotation(mod_xform,x_rot); mod_xform[3][0] = trans_x; mod_xform[3][1] = trans_y; mod_xform[3][2] = trans.z; /* set up t r a n s l a t i o n / r o t a t i o n matrix f o r model */ update.xform(mod_xform,cam_xform,xform); /* apply o v e r a l l camera t r a n s l a t i o n to s p e c i f i c model transform */ f o r (i=0;i<6;i++) { point[0] = model[model_index-7][i][0]; point[1] = model[model_index-7][i][1]; point[2] = model[model_index-7][i][2]; point[3] = 1.0; apply.xform(point,xform,xpoint); p r o j [ i ] [ 0 ] = ( i n t ) ((xpoint[0]/xpoint[3])*xscale); p r o j [ i ] [ 1 ] = ( i n t ) ( ( x p o i n t [ l ] / x p o i n t [ 3 ] ) * y s c a l e ) ; /* transform each model point and project i t into 2-D */ APPENDIX B. CODE 104 } /* end f o r */ y /* end project model points */ B . 4 . 3 M a r k L i n e Mark_line scan converts a l i n e using a method proposed by Bresenham. */ long mark_line (XI,Yl,X2,Y2, projection) long X1,Y1,X2,Y2; i a r r a y projection; { i n t dx, dy, prox, endline, i n c l , inc2, max.rows, max.cols, rmax, cmax, u n i t i n c , xcond, indinc, indcond; long e rr; u n i t i n c = 1; err = 0; xcond = FALSE; max_rows = or i g i n y ; max_cols - o r i g i n x ; /* o r i g i n i s set at middle of window thus value o r i g i n x = max cols etc*/ . dx = ABS(X2 - XI); dy = ABS(Y2 - Y l ) ; /* o v e r a l l change i n x and y */ APPENDIX B. CODE 105 i f ((dx==0)I I((dx>0)&&(((float)dy/(float)dx)>1.0))) { xcond = TRUE; /* the slope of the l i n e i s greater than one therefore x becomes the c o n d i t i o n a l l y incremented index */ prox = ((dx) << 1) - dy ; /* compute i n i t i a l proximity of f(Yl+1) to x + 1 vs y */ i n c l = dx << 1; inc2 = (dx-dy)«l; /* compute proximity increments */ i f ( Yl > Y2) { indinc = Y2; indcond = X2; endline = Y l ; i f (X2 > XI) u n i t i n c = -1; } else { indinc = Y l ; indcond = XI; endline = Y2; i f (XI > X2) u n i t i n c = -1; row = or i g i n y - indinc; c o l = indcond + ori g i n x ; i f ((ABS(indinc)<max_rows)&&(ABS(indcond)<max co l s ) ) { APPENDIX B. CODE 106 p r o j e c t i o n [row][col] = l i n e ; err = cham [row][col]; mark_count++; i f (set_o) { mark_orientation (indinc,indcond,line_o); /* creating o r i e n t a t i o n map */ } /* end i f set_o */ } else { err = offscreen_err; } /* mark leftmost endpoint i n image array */ > else { /* slope i s less than 1, x i s incremented index */ prox = ((dy) « 1) - dx ; /* compute i n i t i a l proximity of f(Xl+1) to y + 1 vs y */ i n c l = dy << 1; inc2 = (dy-dx)«l; /* compute proximity increments */ i f (XI > X2) indinc = X2; indcond = Y2; endline = XI; i f (Y2 > YI) u n i t i n c = -1; } else { indinc = XI; indcond = YI; APPENDIX B. CODE 107 endline = X2; i f ( Yl > Y2) u n i t i n c = - 1 ; } rbw = o r i g i n y - indcond; c o l = o r i g i n x + indinc; i f ((ABS(indinc)<max_cols)&&(ABS(indcond)<max_rows)) { pro j e c t i o n [row][col] = l i n e ; e r r = cham [row][col]; mark_count++; i f (set_o) •C mark_orientation (indcond,indinc,line_o); } /* end i f set_o */ } else •c err = offscreen.err; } /* mark leftmost endpoint i n image array */ } /* end else - y i s the conditional increment */ while ((indinc < endline)) -C indinc++; i f (prox < 0) { prox = prox + i n c l ; /* increase i n y f o r unit increase i n x i s less than h a l f */ } else { indcond = indcond + u n i t i n c ; prox = prox + inc2; APPENDIX B. CODE } i f (xcond) { row = o r i g i n y - indinc; c o l = indcond + ori g i n x ; i f ((ABS(indinc)<max_rows)&&(ABS(indcond)<max_cols)) { projection[row][col] = l i n e ; err = err + cham[row][col] ; mark_count++; i f (set_o) { mark_orientation (indinc,indcond,line_o); } /* end i f set_o */ /* mark l i n e at current indinc, indcond */ } else { err = err + offscreen.err; /* y incremented */ i f ( ( ( c o l > xmax) && (unitinc>0)) I| (row<0) I I ( ( c o l < 0) && (unitinc < 0)) I I (endline < -max_rows)) { return e r r ; } } } else { row = or i g i n y - indcond; c o l = o r i g i n x + indinc; i f ((ABS(indinc)<max_cols)&&(ABS(indcond)<max_rows)) { projection[row][col] = l i n e ; e r r = err + cham [row][col] ; mark_count++; APPENDIX B. CODE 109 i f (set_o) { mark_orientation (indcond,indinc,line_o); } /* end i f set_o */ } else { err = err + offscreen_err; i f ( ( c o l > xmax) I I (endline < -max_cols) I| ((row > ymax) && (unitinc < 0)) II ((row < 0) && (unitinc > 0))) return err; } /* mark li n e at current indinc, indcond */ } }/* end w h i l e * / return err; } /* end mark li n e */ B . 5 Optimisation grad(alphas) /*************************^ flo a t alphas [] ; { int camera_reset, first_pass = TRUE, min_limit=FALSE, max_limit=FALSE; double left,mid,right, tot_err = (double) 0.0; floa t inc, cur_alpha; long miderr, r t e r r , l f t e r r ; parm = angle_parm; cur_alpha = alphas[parm]; mark.count = 0; camera.reset = FALSE; APPENDIX B. CODE 110 set_offset(parra); /* set window l i m i t s f o r current component parm */ chamtime = clockO ; chamfer(zsedge, cham); chamtime = clockO - chamtime; chamtime /= 10000; f p r i n t f (stderr, "CPU sees f o r chamfer °/,d i s % f \ n " , parm,(float)chamtime/100.0); /* f o r new model component chamfer an appropriate region of the image, and time operation */ while (parirKlastparm) { l e f t = mid = r i g h t = (double)0.0; inc = i n i t i a l . i n c ; /* i n i t i a l i s e */ i f (cur_alpha>a_max[parm]) { cur_alpha = a_max[parm]; mid = (dpuble)compute_error(cur_alpha,parm); /* i f alpha exceeds maximum f o r parameter, set alpha to max */ } else { i f (cur_alpha<a_min[parm]) { cur_alpha = a_min[parm]; mid = (double)compute_error(cur_alpha,parm); /* i f alpha i s l e s s than minimum f o r parameter set to min */ } else { mid = (double) compute_error(cur_alpha,parm); /* compute error measure f o r APPENDIX B. CODE current estimate f o r alpha */ i f (mark_count==0) { i f (fabs(a_max[parm]-cur.alpha) < fabs(a_min[parm]-cur_alpha)) { while (mark_count==0) { cur_alpha = cur_alpha - i n c ; mid = (double) compute_error(cur_alpha,parm); mid = mid/(double)mark.count; } } } else mid = mid/(double)mark_count; /* set error to average error per marked p i x e l */ while (inc>mininc) { i f (alpha_max = (cur_alpha+inc > a_max[parm])) r i g h t = (double)compute_error(a_max[parm],parm); else r i g h t = (double)compute_error(cur_alpha+inc,parm); r i g h t = r i g h t / (double)mark_count; /* compute average error per p i x e l f o r cur_alpha + inc */ i f (alpha_min = (cur_alpha-inc<a_min[parm])) l e f t = (double)compute_error(a_min[parm],parm); else l e f t = (double)compute_error(cur_alpha-inc,parm); l e f t = left/(double)mark.count ; /* compute average error f o r cur_alpha - inc */ if((right<mid)&&(right<left)) APPENDIX B. CODE 112 i f ( f i r s t . p a s s ) { while ((mid>right)&&(!max_limit)) { l e f t = mid; mid = r i g h t ; /* s h i f t error values of alpha i n t e r v a l */ i f (alpha.max) { max_limit = TRUE; cur_alpha = a_max[parm]; /* alpha at max value -cannot increment further */ } else •c cur_alpha = cur_alpha + inc ; i f (alpha_max = (cur_alpha + inc > a_max[parm])) r i g h t = (double) compute_error(a_max[parm],parm); else r i g h t = (double) comput e_ erro r(cur_alpha+i nc,parm); r i g h t = r i g h t / (double) mark_count; } } /* end inner while 1 */ } else { cur.alpha = cur_alpha + inc ; mid = r i g h t ; /* a f t e r f i r s t pass E f o r i n t e r v a l endpoints has already been computed */ } } /* end outer i f */ if((left<mid)&&(left<right)) APPENDIX B. CODE 113 { i f ( f i r s t . p a s s ) { while ((mid>left)&&(!min_limit)) { /* lowest average error f o r cur_alpha - inc */ r i g h t = mid; mid = l e f t ; /* s h i f t error values of alpha i n t e r v a l */ i f (alpha_min) { min.limit = TRUE; cur_alpha = a_min[parm]; /* alpha value at minimum cannot decrement further */ } else •c cur_alpha = cur_alpha - inc ; i f (alpha_min = (cur_alpha-inc<a_min[parm])) l e f t = (double) compute_error(a_min[parm] ,parm); else l e f t = (double) compute_error(cur_alpha-inc,parm); l e f t = l e f t / (double)mark_count; /* compute E f o r new cur.alpha -inc */ } } /* end inner while 2 */ } else { cur_alpha = cur_alpha - inc ; mid = l e f t ; /* a f t e r f i r s t pass E f o r i n t e r v a l endpoints has already-been computed */ } APPENDIX B. CODE 114 } /* end outer i f */ alpha_min = alpha_max = max_limit = min_limit = FALSE; f i r s t . p a s s = FALSE; inc = i n c / 2 ; /* reduce search step */ } /* end while (inc>mininc)*/ alphas[parm] = cur_alpha; /* reset stored value f o r current parameter to newly computed cur.alpha */ /* NEXT PARAMETER */ parm ++; i f (parm<lastparm) { f i r s t _ p a s s = TRUE; set_offset(parm); f i l t i m e = c l o c k ( ) ; setup_orientation_map (parm); f i l t i m e = clock() - f i l t i m e ; f i l t i m e /= 10000; f p r i n t f ( s t d e r r , "CPU sees f o r computing °/,d f i l t e r i s °/,f\n", parm,(float)filtime/100.0); /* compute o r i e n t a t i o n f i l t e r s f o r s t i c k and bucket */ chamtime = c l o c k ( ) ; chamfer(zsedge, cham); chamtime = clock() - chamtime; chamtime /= 10000; f p r i n t f ( s t d e r r , " e l a p s e d chamfer time f o r %d i s % f \ n " , parm,(float)chamtime/100.0); /* f o r each new model component chamfer an appropriate region of the image */ APPENDIX B. CODE 115 t o t _ e r r = t o t _ e r r + mid; > /* match was good enough now match using next parameter */ alpha_min = alpha_max = max_limit = min_limit = FALSE; cur_alpha = alphas[parm] ; copy_mat(ident,mod_xform); /* i n i t i a l i s e transform matrix to i d e n t i t y matrix */ } /* end while (parm<lastparm) */ f p r i n t f ( s t d e r r / ' t o t a l error = °/,lf \n" , (tot_err+mid)/3.0) ; > /* end grad */ 


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items