UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Inertial navigation system assisted visual localization Krys, Dennis 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2011_spring_krys_dennis.pdf [ 2.27MB ]
Metadata
JSON: 24-1.0071679.json
JSON-LD: 24-1.0071679-ld.json
RDF/XML (Pretty): 24-1.0071679-rdf.xml
RDF/JSON: 24-1.0071679-rdf.json
Turtle: 24-1.0071679-turtle.txt
N-Triples: 24-1.0071679-rdf-ntriples.txt
Original Record: 24-1.0071679-source.json
Full Text
24-1.0071679-fulltext.txt
Citation
24-1.0071679.ris

Full Text

Inertial Navigation System Assisted Visual Localization by Dennis Krys, BEng, University Of Victoria, 2006 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in The College of Graduate Studies (Electrical Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Okanagan) March 2011 c© Dennis Krys, 2011 Abstract With recent advancements in Global Positioning Systems (GPS), localization systems are now typically equipped with a GPS. However, in a large variety of environments and real-world applications, GPS-based localization systems are not practical. This research tackles such a problem and illustrates the idea of fusing a camera and an inertial naviga- tion system (INS) to create a dead reckoning localization system. The original purpose of the localization system is for a pipe inspection robot, but the proposed concepts can be readily applied to any environment where there is a wall in close proximity. The proposed sensor system can determine motions with up to six degrees of freedom using a camera and an INS. The system must assume a geometry for the wall, such as a flat surface, a wall in a hallway, or the round surface of the inside of a pipe. If the geometry of the wall is unknown then another sensor, such as a laser rangefinder, can be added to measure the range and estimate the overall shape of the wall. The localization system uses a combination of optical flow and image registration to obtain information about six degrees of freedom from a wall with little to no features. The INS provides an estimated motion for the camera system. The estimation of the motion is used by the optical flow algorithm to reduce the computational load significantly and by the image registration to decrease the likelihood of the algorithm diverging. The system is validated using numerical simulation and experiments. The experiments were conducted on a test platform constructed specifically in this research project to simulate the motion of a free-swimming robot inside a pipe. The simulator uses accurate ii position sensors to measure the exact location of the proposed localization system and compare it with the results obtained from the latter. Both the numerical simulation results and the results from the simulator are in agreement with reading of the localization system developed in this research. iii Preface Parts of the research contained in this thesis was contained in three conference papers [1 3]. In all three papers, I gathered all the data and did all the processing and analysis. The first paper [1] mainly concentrated on characterizing the simulator, which is described in Section 4.1. It also described the performance of the camera, an INS and a laser ranger finder for localization purposes. It is noted that much later in my work the laser range finder would be removed, but at this point it was being used.. The second paper [2]concentrated on data fusion with the camera, INS and laser range finder. This paper focused on the use of the INS for block matching, see Section 3.2.1. All results in this paper were for 2D localization only. The third paper [3] concentrated on expanding what was learned in 2D experiments and enhancing it into 3D. By this time only the INS and camera were being used for navigation. Concepts like Lucas-Kanade algorithm, see Section 1.4.5 and 3.2.2, and image registration, see Section 3.3, were incorporated into the localization algorithm. iv Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4.1 Localization Problem . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4.2 Sensor Fusion through Gaussian Filters . . . . . . . . . . . . . . 6 v 1.4.3 Dead Reckoning . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4.4 Image Mosaicing and Optical flow . . . . . . . . . . . . . . . . . 13 1.4.5 Lucas-Kanade . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2 Robot Model and Camera Model . . . . . . . . . . . . . . . . . . . . . . 20 2.1 Robot Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 Camera Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 Localization Based on Fusion of the INS and Vision System . . . . . 29 3.1 Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.1 Block Find . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.2 Lucas-Kanade . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.3 Selecting Feature Points . . . . . . . . . . . . . . . . . . . . . . . 41 3.3 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.4 EKF Technique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4 Experimental Setup and Results . . . . . . . . . . . . . . . . . . . . . . 53 4.1 Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Sensors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 vi 4.3 Result Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.4 Simulated Motions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5 Experiment Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.6 Computer Generated Results . . . . . . . . . . . . . . . . . . . . . . . . 67 4.7 Results from Real Experiments . . . . . . . . . . . . . . . . . . . . . . . 81 5 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . 98 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 vii List of Tables 1.1 Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2 Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Unscented Kalman Filter Prediction Step . . . . . . . . . . . . . . . . . 10 1.4 Unscented Kalman Filter Update Step . . . . . . . . . . . . . . . . . . . 11 1.5 Lucas-Kanade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.6 Computation Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.7 Limitations on Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.1 Image Registration Results with Motion Prediction and No Noise . . . . 61 4.2 Image Registration Results with No Motion Prediction and No Noise . . 62 4.3 Image Registration Results with Motion Prediction . . . . . . . . . . . . 63 4.4 Image Registration Results with No Prediction . . . . . . . . . . . . . . 64 4.5 Frame Skip Number to Time Conversion . . . . . . . . . . . . . . . . . . 66 4.6 Computer Generated Results Summary . . . . . . . . . . . . . . . . . . 67 4.7 Simulator Results Summary . . . . . . . . . . . . . . . . . . . . . . . . . 84 viii List of Figures 1.1 Gavia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Optical Flow Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Underwater Robot Coordinate Frame . . . . . . . . . . . . . . . . . . . 20 2.2 Unicycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3 Convert Coordinate Frames . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4 Pin hole Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.5 Non-inverted Camera Model . . . . . . . . . . . . . . . . . . . . . . . . 27 3.1 Coordinate Frame Conversion . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Block Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.3 Image Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.1 Schematic of the Simulator . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Picture of the Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3 INS Stationary Characteristics . . . . . . . . . . . . . . . . . . . . . . . . 57 4.4 Image Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 ix 4.5 Image Registration Results with Motion Prediction and No Noise (Top Left Case 1, Top Right Case 2, Bottom Center Case 3) . . . . . . . 61 4.6 Image Registration Results with No Motion Prediction and No Noise (Top Left Case 1, Top Right Case 2, Bottom Center Case 3) . . . . . . . 63 4.7 Image Registration Results with Motion Prediction (Top Left Case 1, Top Right Case 2, Bottom Center Case 3) . . . . . . . 64 4.8 Image Registration Results with No Prediction (Top Left Case 1, Top Right Case 2, Bottom Center Case 3) . . . . . . . 65 4.9 Simulated X Distance, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . 68 4.10 Simulated X Velocity, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . 68 4.11 Simulated Y Distance, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . 69 4.12 Simulated Y Velocity, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . 69 4.13 Simulated Z Distance, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . 69 4.14 Simulated Z Velocity, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . 70 4.15 Simulated Pitch, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . 70 4.16 Simulated Roll, Case 2, Frame Skip 10 . . . . . . . . . . . . . . . . . . . 70 4.17 Simulated Yaw, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . 71 4.18 Simulated X Distance, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . 71 4.19 Simulated X Velocity, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . 72 4.20 Simulated Y Distance, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . 72 x 4.21 Simulated Y Velocity, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . 72 4.22 Simulated Z Distance, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . 73 4.23 Simulated Z Velocity, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . 73 4.24 Pitch, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . . . . 73 4.25 Simulated Roll, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . 74 4.26 Simulated Yaw, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . 74 4.27 Simulated X Distance, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . 75 4.28 Simulated X Velocity, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . 75 4.29 Simulated Y Distance, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . 75 4.30 Simulated Y Velocity, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . 76 4.31 Simulated Z Distance, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . 76 4.32 Simulated Z Velocity, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . 76 4.33 Simulated Pitch, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . . . . 77 4.34 Simulated Roll, Case 3, Frame Skip 10 . . . . . . . . . . . . . . . . . . . 77 4.35 Simulated Yaw, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . . . . 77 4.36 Simulated X Distance, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . 78 4.37 Simulated X Velocity, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . 78 4.38 Simulated Y Distance, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . 79 4.39 Simulated Y Velocity, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . 79 xi 4.40 Simulated Z Distance, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . 79 4.41 Simulated Z Velocity, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . . 80 4.42 Pitch, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . . . . 80 4.43 Simulated Roll, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . . . . . 80 4.44 Simulated Yaw, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . . . . . 81 4.45 Block Matching Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.46 Block Matching Results (Zoom) . . . . . . . . . . . . . . . . . . . . . . 83 4.47 Block Matching Results Pixel Movement . . . . . . . . . . . . . . . . . . 83 4.48 X Distance, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 85 4.49 X Velocity, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 86 4.50 Y Distance, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 86 4.51 Y Velocity, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 86 4.52 Z Distance, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 87 4.53 Z Velocity, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 87 4.54 Pitch, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . . . . 87 4.55 Roll, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.56 Yaw, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . . . . 88 4.57 X Distance, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 89 4.58 X Velocity, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 89 xii 4.59 Y Distance, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 89 4.60 Y Velocity, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 90 4.61 Z Distance, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 90 4.62 Z Velocity, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 90 4.63 Pitch, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . . . . 91 4.64 Roll, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.65 Yaw, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.66 X Distance, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 92 4.67 X Velocity, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 92 4.68 Y Distance, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 93 4.69 Y Velocity, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 93 4.70 Z Distance, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 93 4.71 Z Velocity, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . . 94 4.72 X Distance, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 95 4.73 X Velocity, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 95 4.74 Y Distance, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 95 4.75 Y Velocity, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 96 4.76 Z Distance, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 96 4.77 Z Velocity, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . . . . . . . 96 xiii Acknowledgments I would like to express my gratitude to the faculty and staff at UBC as well as my fellow coworkers at NRC-IRC. Without their support this endeavor would have not been possible. In particular, I would like to thank Homayoun Najjarian for his support and dedication. He guided me through this stage of my life and was always there when I needed him. He was more than just an academic adviser and mentor, he was also my friend. I would also like to thank my friends for their understanding and support, and for being there when I need them. For provide well needed breaks and moral support. Finally I would like to thank my brother, for helping me keep things in prospective, and my parents who raised me and guided me. Without their unconditional love I would not be the person I am today and all this would not have been possible. xiv 1 Introduction 1.1 Motivations Localization has been dubbed the most fundamental problem to provide a mobile robot with autonomous capabilities [4]. This holds true for a variety of environments including autonomous underwater vehicles that have long been desired for inspection of large water mains and high pressure conduits without decommissioning the water service. Besides numerous difficulties in the mechanical design and building of such a robot, navigation has proved to be a major challenge [5]. Currently, an autonomous underwater vehicle (AUV) is being adapted at the National Research Council (NRC) Canada to carry non- destructive testing (NDT) sensors within in-service water mains [6]. The vehicle is an open-water torpedo-shape modular AUV known as Gavia AUV [7, 8], see Figure 1.1. Figure 1.1: Gavia In its original form, the localization of the robot is not adequate for either autonomous navigation inside aqueducts and water pipes, or for the position of the defects observed by the NDT sensors. The inspection robot is subject to a dynamic environment, so 1 localization must be carried out in real time and must be able to deal with uncertainties associated with sensor measurements. The NDT system will include a variety of different sensors including a vision system for the detection of defects in the pipes inner walls. This makes the Gavia an ideal candidate for a vision based localization system. 1.2 Objectives The main objective of this research is to prepare an appropriate hardware and software sensor system for the Gavia that will allow the Gavia to localize itself in real time, i.e. as it travels within an in-service water main. The proposed system must be able to ad hoc sensors that would be required for pipe inspection as well as an existing inertial navigation sensor (INS) and rangefinder. This was to include a camera system, INS and a rangefinder. The sensor system also needs to provide a sufficiently accurate estimation of the velocity and position of the robot within the pipe in real time so that the Gavia can adjust its speed and heading required by the inspection sensors i.e., typically inspection sensors need positioning and velocity regulation for the optimal performance. More precisely, fluctuations of the velocity of the robot deteriorate the reading of the NDT sensors if an increase in water flow occurs the sensor system must be able to sense the change in velocity and adjust the speed of the robot accordingly. Since the objectives require robot state estimation in real time, an emphasis will be placed on the methods that are not only accurate but computationally efficient also. 2 1.3 Methodology The research was conducted in order to provide a sensor system for a pipe inspection robot. There is a wide variety of different techniques for localization. These techniques can use a variety of different sensors which include cameras, inertial navigation systems (INS), and global positioning systems (GPS). Typically, these techniques will not work underground or in a pipe because many rely on GPS. GPS sensors provide an absolute position, which is ideal for sensor fusion systems that are used for localization. This is different from an INS, which measures accelerations rather than an absolute position. A camera system can provide an absolute position, when a list of known objects and their absolute positions are known. This is often not the case, and will not work in a pipe. Most vision sensor systems are also computationally costly and have troubles identifying objects in environments that appear the same, i.e. environments that look the same and don't have distinctive features such as a pipe wall. INS sensors require small computational load and are excellent for short periods of time, but they have severe drifting problems when run for longer than a couple of seconds where there is no means of resetting the error. In this paper a new sensor system is proposed that fuses the output of an INS and a camera. This allows the low computational characteristics of the INS to benefit the vision system and the vision system to alleviate the drifting problem of the INS. The INS allows an estimation of the movement between frames to be sent into the Lucas-Kanade algorithm, which is a block matching technique for feature detection. Once the common features are found in the frames, the results are passed to the image registration. The image registration extracts the motion from the change in location of the features between the two images calculates a motion the camera performed between the images. This motion is then passed to an extended kalman filter (EKF) to update the state space variables. This thesis will show the evolution of this sensor system from a 2D system to a 3D system. The proposed localization system is suitable for real-time 3 operations. The implementation of the system into a real-time platform has not been within the scope of this research and is considered as future work. 1.4 Literature Review 1.4.1 Localization Problem The problem of localization is a fundamental problem in robotics. There are a variety of different localization techniques for different environments and situations. No one method is the answer to all situations. Most solutions for localization problems today use a GPS [911]. The GPS provides an excellent means for obtaining the absolute position data, but unfortunately the GPS will not work in all environments, including underground and inside pipes. Another localization technique that is useful in absence of a GPS is radio frequency (RF) localization [12]. This requires the user to either set up a series of towers in the environment or use pre-existing systems, such as WIFI or cellular. Once again it's not feasible to set up RF towers inside a pipe for navigation purposes. This leads to the use of other sensors to obtain the position information. Often it is not feasible to have a sensor that returns the absolute position data. Then the system must rely on dead reckoning, see section 1.4.3. When using a dead reckoning system, sensors that provide a change in state such as an INS and/or wheel encoders can work well. It depends on the robot's configuration and environment. However with all dead reckoning systems, errors will build up and the localization will deteriorate. Some sensors systems are far more sensitive to errors than others. For example, INS is not suitable for localization over an extended period of time because they determine displacement and velocity by integration of accelerations measured over time. More precisely, a small error in the measurement of acceleration at any instant will compromise 4 the calculated values of velocity and displacement after. This is why the INS only provide accurate measurements over short periods of time. This can be countered by fusing another sensor into the system that has different weaknesses and strengths. There are a variety of different methods for fusing different sensor data together, see Section 1.4.2. For years now, sensor systems have been using cameras to help them navigate [10, 1315]. There is a wide variety of ways to utilizing a camera in a sensor system. One method is to use optical flow [1518]. This method looks at the flow patterns of subsections of the image from one frame to the next. This pattern can then give insight for the motion the camera performed from one image to the next. The method does not give an absolute position, just a relative motion with respect to the environment. Also this method can have issues generating the flow patterns if there are large changes in motion between sequences of images, i.e. fast motion with respect to the field of view or changing angles of the camera. More often this method is used for navigation rather than localization. Another method is feature tracking. Watching a set of features over a set of images allows the motion of the camera to be calculated. The calculations require an accurate model of the camera system, see section 2.2. Feature tracking is different than optical flow because in optical flow the image is broken up into a number of blocks and then each block location is found. In feature tracking, a limited number of features are tracked from one image to one or more images. These feature locations can be calculated through a number of different ways, see section 3.2. In recent years scale-invariant feature transform (SIFT) has become a dominate technique. If the environment is distinctive enough, then SIFT will allow tracking of images through one or more frames. When the environment isn't very distinctive, then block matching techniques can be used. 5 1.4.2 Sensor Fusion through Gaussian Filters There are a large number of different kinds of filters for a variety of different tasks. Filters are classified as parametric or non-parametric. If a filter is non-parametric, then there is no assumption on the probability distribution and can often represent nonlinear transformations [19]. An example of a non-parametric filter in robotics is the particle filter [20, 21]. The particle filter makes no assumption of distribution and can more accurately represent the nonlinear equations. Particle filters rely on a set of randomly generated state samples particle, to represent the posterior. This differs from parametric filters where the posterior will be represented by a known parametric form such as a Gaussian distribution. A particle represents a hypothesis of what the true world state may be at time t [19]. During each iteration of the particle filter a number of the particles are removed and an equal number of particles will be randomly generated. The idea behind this is to remove the particles that are least likely to represent the true state of the filter and input a new set of random particles that represent the current state better than the current set of particles. It is noted that increasing the number of particles in the filter will increase the likelihood that the probability distribution will be more accurately represented. The problem with non-parametric filters such as the particle filter is that they need a great deal more processing power than their parametric alternatives. If the particles are decreased to limit the processing power needed, then they often don't perform as well. Sometimes parametric filters are used in combination with non-parametric filters [21]. This allows the advantages of non-parametric filters to be used when needed and then a parametric filter can be used for the rest of the problem. Particle filters have gained in popularity because of the advances in computer systems and their easy implementation [22]. Parametric filters can be broken down to Gaussian and non-Gaussian based filters. His- torically, Gaussian filters constitute the earliest tractable implementations of the Bayes 6 Filter for continuous spaces[19]. The Bayes algorithm calculates the belief distribution from measurement and control data. This allows for the estimation of the uncertainties involved in ones measurements. Gaussian techniques all share the basic idea that beliefs are represented by multivariate normal distributions [19].Gaussian based filters require a normal distribution for noise, while non Gaussian filters can use other distributions. Popular examples of non-Gaussian based filters are the particle filter and histogram filter. These generally require a lot more processing power and memory than a traditional Gaussian based filter such as the Kalman Filter. Kalman filters are one of the most extensively used filters in robotics, especially in au- tonomous navigation applications [23]. In 1960, R.E. Kalman published his first paper on the Kalman filter. His work describes a recursive solution for linear discrete data. This will become the basic level of the Kalman filter which was unable to handle nonlinear equations. The Kalman filter is a recursive estimator made up of two parts, the predic- tion step and the update step, see Table 1.1. The prediction step involves predicting the current state, x̂−k , from the previous state, xk−1. The Control vector, uk−1, is a matrix that represents some new information to help with the prediction of the current state. Matrix A is the state transition model and B is the control input model. The prediction step also involves the calculation of P−k , which is the error covariance, an estimate of the state accuracy. Pk−1 is the previous error covariance and Q is the covariance of the process noise, which is assumed to be Gaussian. The second step in a Kalman filter is the update step. This involves three steps. The first step is to calculate the kalman gain Kk. The kalman gain is used to adjust the current projected state in step two of the update step. It is calculated using the error covariance,P−k , the observation matrix, H, and the covariance of the observation noise, Rk. Then the update estimate is calculated using the kalman gain, Kk, the current state, 7 x̂−k , the observation matrix, H, and the observation or measurement zk. The final step is to update the error covariance Pk. Kalman Filter Prediction Step 1. Project the state ahead: x̂−k = Axk−1 +Buk−1 2. Project the error covariance ahead: P−k = APk−1A T +Q Kalman Filter Update Step 1. Compute the Kalman gain: Kk = P−k H T HP−k HT+R 2. Update estimate with measurement zk: x̂k = x̂ − k +Kk(zk −Hx̂−k ) 3. Update the error covariance: Pk = (I −KkH)P−k Table 1.1: Kalman Filter The Kalman filter has been improved in a variety of ways to address certain problems. Commonly used in the field of robotics, the extended Kalman filter (EKF) was developed to address the issue of nonlinear equations. The EKF is a well known and powerful algorithm for fusing data [14, 24]. Similar to the Kalman filter, the EKF is a recursive filter composed of two main parts: the prediction step and the update step [19, 25, 26], see algorithm 1.2. Unlike the kalman filter, the state transition functions can be nonlinear, but they have to be differentiable functions. Other changes are as follows: matrix A has been replaced by matrix F , which is the jacobian of a linearized set of equations of the state transition model. For the update step, function h is the observation prediction and the matrix H is the jacobian of the observation prediction. 8 Extended Kalman Filter Prediction Step 1. Project the state ahead: x̂−k = f(x̂k−1, uk−1) 2. Project the error covariance ahead: P−k = FkPk−1F T k +Qk−1 Extended Kalman Filter Update Step 1. Compute the Kalman gain: Kk = P−k H T k HkP − k H T k +R 2. Update estimate with measurement zk: x̂k = x̂ − k +Kk(zk − h(x̂−k )) 3. Update the error covariance: Pk = (I −KkHk)P−k Table 1.2: Extended Kalman Filter Another form of the Kalman filter that has been gaining popularity is the unscented Kalman filter (UKF). The UKF is an alternative to the EKF. The input and output are identical to the EKF, but it handles nonlinear equations differently than the EKF. Instead of having to calculate the Jacobian of the state space to get a linear approximation of the equations, which can be very difficult if large nonlinear equations are involve, it uses sigma points and propagates these through the model. The resulting mean and variance of the sigma points passing through the model are then computed. The results of the UKF are often equal or better then the EKF, but at a slight cost to speed. In practice the EKF is often faster than an UKF [19]. Just like the kalman filter and EKF, the UKF also has a prediction step, see Table 1.3, and an update step, see Table 1.4. The prediction step involves the calculation of sigma point x̂ik−1 and there respecting weights w i m and w i c , where i = 0 . . . 2L and L is the number of augmented states. It is noted that α and k are tuning parameters that determine how far the sigma are spread from the mean. Also parameter β can be used to encode more information about the distribution. If the distribution is an exact Gaussian then the number 2 is the optimal value for β. The output from the prediction step, x̂−k and P−k , is equivalent to that of the EKF. 9 1. Calculate the 2L+ 1 sigma points: x̂0k−1 = x̂k−1 x̂ik−1 = x̂k−1 + γ √ Pk−1 where i = 1 . . . L x̂ik−1 = x̂k−1 − γ √ Pk−1 where i = L+ 1 . . . 2L γ = √ n+ λ 2. Propagate the sigma points through the transition function: x̂ik = f(x̂ i k−1, uk−1) 3. Project the state ahead: x̂−k = 2n∑ i=0 wimx̂ i k 4. Project the error covariance ahead: P−k = 2n∑ i=0 wic(x̂ i k − x̂−k )(x̂ik − x̂−k )T 5. The weights for the state and covariance are given by: w0m = λ L+λ w0c = λ L+λ + (1− α2 + β) wim = w i c = λ 2(L+λ) where i = 1 . . . 2L λ = α2(L+ k)− L Table 1.3: Unscented Kalman Filter Prediction Step Just like the prediction step, the update step requires the calculations of sigma point x̂ik and uses the same weight equations as the prediction step. Also the observation function h is the same as in the EKF. The resulting output from the UKF is x̂k and Pk, which is equivalent to the EKF. 10 1. Calculate the 2L+ 1 sigma points: x̂0k = x̂ − k x̂ik = x̂ − k + γ √ P−k where i = 1 . . . L x̂ik = x̂ − k − γ √ P−k where i = L+ 1 . . . 2L 2. Propagate the sigma points through the observation function h: z̄ik = h(x̂ i k) where i = 0 . . . 2L 3. Compute the predicted measurement: ẑk = 2L∑ i=0 wimz i k where i = 0 . . . 2L 4. Compute the predicted measurement covariance: Pzk = 2l∑ i=0 wic(z̄ i k − zk)(z̄ik − zk)T where i = 0 . . . 2L 5. Compute the state-measurement cross-covariance matrix: Pxk = 2l∑ i=0 wic(x̂ i k − x̂−k )(z̄ik − zk)T where i = 0 . . . 2L 6. Compute the Kalman gain: Kk = Pxk Pzk 7. Update estimate with measurement: zk: x̂k = x̂ − k +Kk(zk − ẑk) 8. Update the error covariance: Pk = P − k −KkPzkKTk 9. The weights for the state and covariance are given by: w0m = λ L+λ w0c = λ L+λ + (1− α2 + β) wim = w i c = λ 2(L+λ) where i = 1 . . . 2L λ = α2(L+ k)− L Table 1.4: Unscented Kalman Filter Update Step 11 1.4.3 Dead Reckoning Dead reckoning is the process of keeping track of one's position without using an instru- ment that determines the absolute position. Usually one starts the dead reckoning process from a known position. Then, speed and heading measurements are usually taken, and the position is calculated. This method dates back hundreds of years and has been used in a variety of applications including marine, air and automotive navigation. For example dead reckoning was the primary method of determining position when sailors, such as Christopher Columbus, crossed the Atlantic Ocean. It wasn't until the innovation of the marine chronometer in 1675 that navigation using dead reckoning became obsolete. The major problem with dead reckoning is that each measurement depends on previous measurements. So, if an error occurs, then that error will affect the final result until a known position is reached where both the process and error can be reset. Depending on the nature of the errors, i.e. magnitude and distribution of the noise, the position will degrade at different rates. Today dead reckoning is rarely used for extended periods of time because of technologies such as GPS. GPS provides an inexpensive, yet effective way to locate one's position. The problem with GPS is that it does not work everywhere. So dead reckoning is still preferable in places where GPS doesn't work or to help augment position data from GPS readings. A popular dead reckoning device is the inertia navigation system (INS). This device is used widely from advanced aircraft systems to navigation systems in automobiles. Typically, the INS readings are fused with other instruments to provide reliable position data in a variety of conditions [9, 11, 27]. 12 1.4.4 Image Mosaicing and Optical flow Image Mosaicing is the process of overlapping 2D images to create larger images. This process can give the impression of global images covering a larger area of interest, which is actually comprised of many images. In order to combine images, it is required to know how these images fit together. This can be achieved in a variety of ways including external information such as the movement of the camera as well as feature matching. Image Mosaicing can be very useful in visual inspections, because it gives the illusion of a continuous image that can be viewed. Optical flow is the process of estimating motion within a set of camera images. This can be used to estimate an object moving within a camera image or estimate the motion of the camera itself. Usually a set of vectors are produced illustrating the motion within an image, see Figure 1.2. Optical flow has become very popular as a form of motion estimation in applications such as robotics [18, 28], and in tracking block movement to help in compressing video images [16, 29]. For now this paper will concentrate on the motion estimation. Figure 1.2: Optical Flow Image 13 If the camera is moving, and not the objects within the cameras view, then the process of image Mosaicing and optical flow to estimate the motion are basically the same while producing different results. In optical flow, sets of features are found and the flow within the images is tracked. Assuming that the camera is moving and not an object in the cam- era's view, then image Mosaicing and optical flow have the same solution, but different outcomes. One's purpose is to fit overlapping images together; the other is to estimate the motion within the image. To complete both tasks, features within the image must be matched. This can be performed in a variety of ways. Two common techniques for finding features within overlapping images are block match- ing and SIFT. SIFT has quickly become popular within computer vision field [3032]. SIFT allows for features to change, and is robust against changes that include image scale, noise, local geometric distortion and illumination. The problem with SIFT is that it requires uniqueness among the features, i.e. the image environment cannot look the same. The features must be distinct enough to allow for changes in the image and to prevent false matching. Block matching on the other hand is a much more rudimentary method, but still widely used [3335]. The concept is to take small portions (the size can be varied in order to achieve better results for different environments) of the image from the anchor frame and look for them in the target frame. It is noted that the anchor frame is the image where the features are taken from and the target frame is the image where the features are found. This method works very well when the prospective of the camera isn't changing and the lighting remains the same. When the prospective changes it will cause the target frame to change so that the blocks from the anchor frame no longer match the target frame. This can be overcome if the prospective change little from the anchor frame to the target frame or a motion estimation is known and the blocks can be deformed to match the target frame. Block matching can be a successful alternative to SIFT when 14 the image doesn't have distinctive features, such as a wall. Otherwise SIFT is a much more robust implementation. In order for block matching to work, the algorithm must find a match for the block. There are a variety of different search techniques. The ideal search technique is an exhaustive search. This is where every possible combination is tried and the best matching result is chosen. The problem with this method is that it requires a lot of processing time. The alternatives are to search a subset region of the target frame and/or use different search algorithms that don't require every combination to be tried. Some popular search algorithms are the 3-step search and the diamond search [36, 37]. The idea of all these searches is to quickly converge on the region where the block should be, without having to try all the combinations. This generally only works well if the image is dissimilar. If the image is more or less the same, then most search techniques apart from an exhaustive search will generally converge on the wrong region. 1.4.5 Lucas-Kanade The Lucas-Kanade algorithm is an optical flow method that is very popular in computer vision [3840]. Reference [41] says the Lucas-Kanade algorithm is easily applied to a sub- set of points in the input image and that it has become an important sparse technique. The basic Lucas-Kanade algorithm makes 3 assumptions, as follows: the brightness is consistent, temporal persistence small movements, and spacial coherence. Spacial co- herence means that similar neighboring points in the scene belong to the same surface and have a similar motion. One major problem with these assumptions is the temporal persistence. The field of view of the camera system on the AUV is small because of the relatively short distance the camera is away from the wall. This leads to large motion vectors with small movements of the camera. The motions are often close to half a frame width when the AUV is traveling at its usual speed. In the current block matching algo- 15 rithm this isn't a problem since the INS provides enough information to choose a block in the optimal location from the anchor frame and predict the location to find that block in the target frame. A similar system will be needed to predict the location of the block match so that the algorithm gets a close enough prediction to converge. Reference [41] describes a pyramid Lucas-Kanade algorithm that allows greater movement. References [42, 43] describe many variations of the Lucas-Kanade algorithm. The goal of the Lucas-Kanade algorithm is to minimize the sum of squared error between two images, the template T and image I. The image I is warped onto the coordinate frame template to create a match. The warping of I requires interpolating the image at the sub-pixel locations. The basic Lucas-Kanade algorithm includes nine steps outlined in Table 1.5. This is the Forward Additive class of the algorithm. 1. Warp I with W (x; p) to compute I(W (x; p)) 2. Compute the error image T (x)− I(W (x; p)) 3. Warp the gradient ∇I with W (x; p) 4. Evaluate the Jacobian ∂W ∂p at (x; p) 5. Compute the steepest descent images ∇I ∂W ∂p 6. Compute the Hessian matrix H 7. Compute ∑ x[∇I ∂W∂p ]T [T (x)− I(W (x; p))] 8. Compute 4p = H−1∑x[∇I ∂W∂p ]T [T (x)− I(W (x; p))] 9. Update the parameters p← p+4p until ‖4p‖ ≤  Table 1.5: Lucas-Kanade Other varieties of the Lucas-Kanade are also described in [42, 43]. These varieties seem to be grouped into four classes. They are classified by the quantity approximation and the warp update rules. These include the Forward Additive Image Alignment, Forward Com- 16 position Image Alignment, Inverse Composition Image Alignment, and Inverse Additive Image Alignment. The Composition Image Alignment is an example of Forward Compositional algorithm. Instead of simply adding the additive updates to the current estimations of the param- eters as is done in the Lucas-Kanade algorithm, the Forward Compositional algorithm must update the warp in the compositional algorithm. This puts two limits on the warps, as follows: they must contain the identity, and the set of warps must be closed under composition. The set of warps must therefore be a semi-group. These limitations are not usually a problem because most computer vision techniques conform to these requirements. The Inverse Compositional Image Alignment is basically reformulated so that the Hessian matrix becomes a constant. This puts limitations on the warps because they have to be inverted before they can be composed with the current estimation. This is not usually a problem. The goal of the Inverse Additive is to remove the Hessian matrix from the Forward Additive algorithm, like it was done for the inverse composition with respect to the Forward Composition algorithm. This leads to a number of problems. First it is far more complex to derive and second it can only be applied to a very limited small set of warps. This includes mostly simple 2D linear warps such as translations and affine warps. The summary of each algorithm and their limitations as well as computational cost is summarized in Table 1.6 and Table1.7. Algorithm Pre-Computation Per Iteration Lucas-Kanade − O(n2N + n3) Compositional O(nN) O(n2N + n3) Inverse Composition O(n2N) O(nN + n3) Hager-Belhumeur Inverse Additive O(k2N) O(nN + kN + k3) Table 1.6: Computation Cost 17 Algorithm Efficient Can be Applied To Forward Additive No Any Warp Forward Compositional No Any Semi-Group Inverse Additive Yes Simple Linear 2D+ Inverse Compositional Yes Any group Table 1.7: Limitations on Algorithms The Forward Additive version of the algorithm was selected because it had the fewest restrictions. Further details of the implementation of the Lucas-Kanade algorithm is discussed in Section 3.2.2. 1.5 Contributions The main contributions of this research can be summarized in two major items. The first contribution is a test platform where data can be gathered to test localization algorithms. The test platform was designed to simulate different motions that will be experienced by an AUV inside a water main. The simulator is placed close to a wall with no distinct features. A cart capable of holding a variety of sensors, including a camera, INS and laser displacement sensor is moved along a wall in different paths. A set of control boards control two servo motors, one motor controls the speed and direction of the cart along a set of tracks, while the second motor controls the angle of the platform that the tracks are attached to. This allows the cart to change angle, distance and velocity with respect to the wall, providing a large variety of different motion patterns that can be used for testing purposes. For further details on the simulator see section 4.1. The second contribution is a sensor system that fuses an INS and camera together. This sensor system allows 3D movement with only a single camera and an INS. This thesis shows the short coming of a variety of different techniques used to try and complete 18 this task, as well as a successful implementation. The environment this sensor system has to perform is also difficult because the camera will not have any distinct features that it can use to navigate, i.e. the inside of a pipe all looks the same. For this reason the wall used in the simulator is also void of distinct features. This means that feature detection algorithms will have trouble finding meaningful information from the camera images. Also the field of view of the camera is small in comparison to the speeds at which the cart will be traveling. This means that the system must perform well when features only remain in the field of view for a couple of frames. This means that the same feature cannot be used for localization purposes over the course of the test, i.e. different features must be selected appropriately such that they will be in the next frame and as such can be used for localization purposes. Also the algorithm should have no problem being implemented in real time because a lot of effort was taken to minimize computational load and allow for parallel calculations, i.e. different parts of the algorithm can be computed at the same time allowing for a low lag between data input and result. It is also noted that inside real pipes there is water and this water could provide another set of challenges with bubbles and flow. The challenges of water in the camera images will not be address in this paper. 19 2 Robot Model and Camera Model In this chapter a simplified model of the underwater vehicle will be developed. 2.1 Robot Modeling In robotics it is desirable to know the orientation and location of the robot [9, 14, 44]. In order to represent the relative position and orientation of one rigid body with respect to another, a coordinate frame must be attached to each body [45]. For example a coordinate frame of the underwater robot is in 2.1. These coordinate frames must have a common reference frame in order to compare their relative position and orientation. This reference frame is typically a stationary coordinate frame. For the purpose of this thesis, the common reference frame will be called the world coordinate frame. Using the world coordinate frame, one can see the change in position of an object with respect to a fixed coordinate frame. Figure 2.1: Underwater Robot Coordinate Frame 20 Models are created to describe the robots interaction with the real world. Models are the mathematical representation of what is actually happening. One way of representing a model is with a set of state space variables. These variables represent a mathematical model of a physical system. It breaks the system down to a set of inputs, outputs and state variables which are related by a set of differential equations. Often these equations are put in a matrix with the variables expressed in vector form. This breaks up the set of equations in a form which is easy to manage [24, 44, 46, 47]. The output of a model is the variables that one would like to keep track of. For a robot this can be position, orientation, velocity etc. The input into a model is the information available about the changing state of the robot, i.e. sensor data or control inputs. For example a one wheeled robot such as a unicycle is displayed in Figure 2.2. The inputs to this model will be velocity v and rate of rotation θ̇. The state variables will be the x and y position and the orientation θ. The output will be the same as the state variables, i.e. the location displayed as x and y and the orientation θ . The output variables and state variables can be different depending on the requirements of the model. The equations for this model are 2.1. Using these equations, the orientation and position can be calculated with a given set of velocity and rate of rotation measurements. Since most robots are a lot more complicated than a simple one wheeled machine, more complex models are usually needed. 21 Figure 2.2: Unicycle  xt yt θt  =  1 0 0 0 1 0 0 0 1   xt−1 yt−1 θt−1 +  t · sin(θt−1) 0 t · cos(θt−1) 0 0 t   vt θ̇t  (2.1) To help build mathematical models, it is important to know how to describe rotation and translation changes. To describe a rotation and/or translation, a transformation matrix is used. A common method for specifying a rotation is in terms of three independent quantities called Euler angles [45]. Pitch, θ, describes the rotation about the y axis, see Equation 2.3. Roll, φ, describes the rotation in the x axis, see Equation 2.2 and yaw, ψ, describes the rotation about the z axis, see Equation 2.4. To describe a translation Equation 2.5 is used. This describes a position change in three dimensional space. 22 Rx(φ) =  1 0 0 0 0 cosφ −sinφ 0 0 sinφ cosφ 0 0 0 0 1  (2.2) Ry(θ) =  cosθ 0 sinθ 0 0 1 0 0 −sinθ 0 cosθ 0 0 0 0 1  (2.3) Rz(ψ) =  cosψ −sinψ 0 0 sinψ cosψ 0 0 0 0 1 0 0 0 0 1  (2.4) T =  1 0 0 X 0 1 0 Y 0 0 1 Z 0 0 0 1  (2.5) It is often desirable to be able to translate between different coordinate frames. This can be used to describe the movement from one coordinate frame to another, for example where a vehicle position changed and its desirable to know what that movement is with respect to the world coordinate frame. This can be accomplished by multiplying the vector with a combination of rotation and translation matrices. The resulting 4x4 matrix is called a transformation matrix and incorporates both translation and rotation. To perform more than one rotation and/or translation, the rotation and translation matrix 23 can be cascaded using one of two different methods. These methods are called post- multiply and pre-multiply. Post-Multiply is when the rotations and translations are preformed on an evolving ref- erence frame. This means that if a rotation about the x axis is performed, then in post-multiply the action will be performed on the current x axis and not the original, in the event that other rotations had been performed. In pre-multiply, the rotations and translations are performed on a fixed reference frame, i.e. the rotations and translations are performed on the original reference frame. When a combination of rotations and translations are performed, the resulting matrix will represent the transformation from one coordinate frame to the next. For example, see Figure 2.3. This figure shows two different coordinate frames. The first coordinate frame is a pure translation from the original. This leads to the origin of the axis being in a different location. The second coordinate frame is a pure rotation from the last, which is a 90 degree rotation around the Z axis. Equation 2.6 shows the combination of rotation and translation matrix if performing a post-multiply for the conversion from the original coordinates to the final coordinate systems. Any number of translation and rotation matrix can be linked together to obtain the desired matrix that describes the change from two different coordinate frames. 24 Figure 2.3: Convert Coordinate Frames Q = TRz( pi 2 ) =  1 0 0 X 0 1 0 Y 0 0 1 Z 0 0 0 1   cospi 2 −sinpi 2 0 0 sinpi 2 cospi 2 0 0 0 0 1 0 0 0 0 1  (2.6) 2.2 Camera Modeling Cameras have become popular in robotics as a sensor [14, 1618, 40, 43]. A camera model is required to use a camera as a sensor. The model describes the location or change in objects with reference to the position of the camera, i.e. whenever a camera moves, the objects its viewing will also change. The movement of the objects can be described mathematically with the motion of the camera. For instance, if a camera in panning left, then the objects the camera sees will also pan and eventually exit the field of view. In order to use the camera as a navigation sensor, a model describing this relationship has to be created. 25 The most simple of camera models is the pin hole model, illustrated in Figure 2.4. A pin hole camera is a camera without a lens and a single small aperture. Cameras with lens can be modeled using a pin hole camera model. The pin hole camera model mathematically describes the relationship between the image plane and a point in the world, i.e. it describes the relationship of an object located in a three dimensional world and its location on a two dimension plane. It is a popular model to use in computer vision [48]. Figure 2.4: Pin hole Camera Model The camera model in Figure 2.4 causes the image on the image plane to be inverted. The non-inverting pinhole camera model, illustrated in Figure 2.5, can be used instead of the inverting pinhole camera model if the inverted calculations are not desirable. A set of equations can be derived using the non-inverting camera model, see Equation 2.7. 26 Figure 2.5: Non-inverted Camera Model x X = y Y = r R = f Z (2.7) X, Y, Z is the position of the feature with respect to the camera x, y is the position of the feature within the image f is the focal length of the camera From this the perspective projection equations can be derived. x = f Z X (2.8) y = f Z Y (2.9) From these equations the inverse prospective equations can be formulated. 27 X = Z f x (2.10) Y = Z f y (2.11) Unfortunately both the direct and inverse perspective equations have nonlinear rela- tionship with coordinates x and y that are dependent on f and Z. This leads to the assumption that the distance between two points in the camera image are much smaller than the average distance z. This leads to Equations 2.12 and 2.13. x ≈ f Z X (2.12) y ≈ f Z Y (2.13) The camera model can be integrated into a robot model through a base frame. Similar to a robot model, the base coordinate frame allows different coordinate frames to have a common point of reference. The camera model can use a base frame to relate more than one camera together which allows for depth measurements. This is called stereo vision [49, 50]. A base frame can also be used to relate the camera to the robot model. In this thesis the camera model will be used for relating the camera to the robot model. Detailed information on how this was implemented is in section 3.1. 28 3 Localization Based on Fusion of the INS and Vision System 3.1 Camera Model The camera model is a set of equations that describe the relationship between the feature movement in the camera's images and the robot's movement. These models allow the camera to be used as a motion sensor when tracking a set of features. The camera model describes the movement of the camera from k to k + 1. The model also relates the coordinate frame of the camera to the robot coordinate frame and then to the world coordinate frame. A diagram of this is shown in Figure 3.1. Figure 3.1: Coordinate Frame Conversion 29 The coordinate transformation from the world coordinate frame to the robot coordinate frame at time k is given by Equation 3.1. Q1 = QT1QRz1QRy1QRx1 (3.1) QT1is the translation matrix and QRz1 , QRy1 and QRx1 are the rotation matrix. The Equations for these are 3.2, 3.3, 3.4 and 3.5. QT1 =  1 0 0 xk 0 1 0 yk 0 0 1 zk 0 0 0 1  (3.2) xk, yk and zk represents the location of the robot in the world coordinate frame at time k. QRz1 =  cosψk −sinψk 0 0 sinψk cosψk 0 0 0 0 1 0 0 0 0 1  (3.3) ψk is the yaw orientation of the robot with respect to the world coordinate frame at time k. 30 QRy1 =  cosθk 0 sinθk 0 0 1 0 0 −sinθk 0 cosθk 0 0 0 0 1  (3.4) θk is the pitch orientation of the robot with respect to the world coordinate frame at time k. QRx1 =  1 0 0 0 0 cosφk −sinφk 0 0 sinφk cosφk 0 0 0 0 1  (3.5) φk is the roll orientation of the robot with respect to the world coordinate frame at time k. The coordinate transformation from the robot to the camera at time k is represented by Equation 3.6. Q2 = QRx2QT2 (3.6) QT2, see Equation 3.7, represents the offset of the camera coordinate frame with respect to the robot coordinate frame in millimeters. While QRx2, see Equation 3.8, represents the rotation about the x axis for the camera coordinate frame with respect to the robot coordinate frame. 31 QT2 =  1 0 0 −15 0 1 0 −12 0 0 1 −70 0 0 0 1  (3.7) QRx2 =  1 0 0 0 0 cos (−pi 2 ) −sin (−pi 2 ) 0 0 sin (−pi 2 ) cos (−pi 2 ) 0 0 0 0 1  (3.8) The coordinate conversion from the world coordinate frame to the robot coordinate frame at time k + 1 is given by Equation 3.9. Q3 = QT3QRz3QRy3QRx3 (3.9) QT3 is the translation matrix and QRz3 , QRy3 and QRx3 are the rotation matrix. The Equations for these are 3.10, 3.11, 3.12 and 3.13. QT3 =  1 0 0 xk+1 0 1 0 yk+1 0 0 1 zk+1 0 0 0 1  (3.10) xk+1, yk+1 and zk+1 represents the location of the robot in the world coordinate frame at time k + 1. 32 QRz3 =  cosψk+1 −sinψk+1 0 0 sinψk+1 cosψk+1 0 0 0 0 1 0 0 0 0 1  (3.11) ψk+1 is the yaw orientation of the robot with respect to the world coordinate frame at time k + 1. QRy3 =  cosθk+1 0 sinθk+1 0 0 1 0 0 −sinθk+1 0 cosθk+1 0 0 0 0 1  (3.12) θk+1 is the pitch orientation of the robot with respect to the world coordinate frame at time k + 1. QRx3 =  1 0 0 0 0 cosφk+1 −sinφk+1 0 0 sinφk+1 cosφk+1 0 0 0 0 1  (3.13) φk+1 is the roll orientation of the robot with respect to the world coordinate frame at time k + 1. The coordinate transformation from the robot to the camera at time k + 1 is Q4. It is noted that the Q4 is equal to Q2 because the camera is fixed to the robot and cannot move independent from the robot. 33 Q4 = Q2 (3.14) The coordinate transformation from camera at k to k + 1 is Q5, 3.15. This describes the motion of the camera between two images. Q5 must be calculated through the relationship of the movement of the features between camera images. Before Q5 can be calculated, a set of features need to be found in the image at time k. A variety of different methods can be used to detect features in the camera images. Further discussion of this is described in Section 3.2. For now it is only important that these features can be found. Once the features have been identified in an image at time k, they must also be found in the next image. The relationship of the position of the features with respect to the camera position is described in Equation 3.16. Q5 = Q −1 2 Q −1 1 Q3Q2 =  Q511 Q512 Q513 Q514 Q521 Q522 Q523 Q524 Q531 Q532 Q533 Q534 Q541 Q542 Q543 Q544  (3.15)  Xmit+1 Y mit+1 Zmit+1 1  = Q5  Xmit Y mit Zmit 1  (3.16) Xmik , Y m ik and Zmik is the distance the object i is away from the camera at time k Xmik+1 , Y m ik+1 and Zmik+1 is the distance the object i is away from the camera at time k + 1. 34 By combining the non-inverting pinhole camera model, see Equation 2.7, and Equation 3.16, Equations 3.17 and 3.18 can be derived. These equations describe the relationship between the camera movement and the movement of the features i in the camera images. xik+1 = f Zmik+1 Xmik+1 = f Q511X m ik +Q512Y m ik +Q513Z m ik +Q514 Q531X m ik +Q532Y m ik +Q533Z m ik +Q534 (3.17) yik+1 = f Zmik+1 Y mik+1 = f Q521X m ik +Q522Y m ik +Q523Z m ik +Q524 Q531X m ik +Q532Y m ik +Q533Z m ik +Q534 (3.18) xik+1 and yik+1 is the location of feature i in the camera image at time k. Given a minimum of six features allows the movement Q5 to be calculated. 3.2 Feature Matching There are many ways for finding and tracking features in a set of camera images. This chapter will describe the process of using block finding and the Lucas-Kanade algorithm for finding and tracking features from images captured during a 3D motion. The chapter will continue with how to select the features and process them using image registration. The final topic in the chapter will be a simple data fusion using the EKF to fuse the data from the camera with the data from the INS. 3.2.1 Block Find Block finding is a 2D grid search that takes the block from the anchor frame (the previous frame) and searches for it in the target frame (the current frame), see Figure 3.2. The inputsXstart and Y start are the location where the search begins looking for the block in 35 the target frame. The input search describes the size of the search window, i.e. the range that the block will be searched for from the Xstart and Y start locations. The outputs x and y is the location of the best match for the search in the specified window. The output result is the sum of the differences between the block and the match contained in the current frame, i.e. the target frame. It is noted that the block is just a block of pixels that was obtained from the anchor frame. [x, y, result] = blockfind(image1, block,Xstart, Y start, search) searchwindow : [∀(x, y), Xstart− search 2 ≤ x ≤ Xstart+ search 2 Y start− search 2 ≤ y ≤ Y start+ search 2 ] result = ∑ |image1ij − block| where image1ij is the best block match from the current frame This is the original method used to find features in the images. The INS predicts the location of the block and the search algorithm will search in the vicinity of the estimated location. The vicinity is described as a predetermined search range. This method works well when dealing with 2D movements without a rapid change in the direction the robot is moving, i.e. when the camera will move in the X and Y directions and the camera orientation with respect to the wall remains the same. A number of different fast search algorithms were examined including the diamond search [36] and three step search [37]. It was discovered that these fast search algorithms are not effective because the images are so similar that the algorithm cannot converge to find the correct location. Normally this is an issue if the entire image has to be searched, but since the INS narrows down the search domain, the computational load of the search algorithm is not demanding. 36 Figure 3.2: Block Matching Another improvement that was made to the block search algorithm with the use of the INS is that the block can be taken from an optimal location within the anchor frame. Originally the block was always taken from the center of the anchor frame and searched for in the target frame, but when the INS is used, there is an estimate of the motion and the block can be taken from a location that allows for the maximum movement. If this is not done, then the maximum movement the camera can experience when finding a single block taken from the center of the anchor frame is a little less than half of the field of view, but when the block is chosen from an optimal location, the movement can almost be the entire field of view. This basically doubles the movement allowed per frame. The benefit of this also continues when more than one block is searched for in a single image. For example when dealing with 3D motions, a total of five blocks are used to increase the number of features. Under these circumstances, selecting the optimal locations for the blocks from the anchor frame will still increase the allowed motion. This allows for quicker motions without having to increase the field of view (FOV) and lower your effective accuracy, i.e. when the FOV is increased, the size of each pixel in the image also increases. Therefore the accuracy of the camera is decreased when the FOV is increased, given that the camera maintains the same resolution. 37 3.2.2 Lucas-Kanade Similar to the block matching of a 2D motion, a 3D motion will also require an estimate for optimal performance. The main difference is that when the camera moves in the 3D space, the block being searched for can change, i.e. the block will change shape as well as location. This changing shape must be accounted for to minimize the number of iterations of the search algorithm and help prevent the algorithm from diverging. The Affine Warp is a 2 × 3 matrix that describes the shape and location change of a block between two camera images. It is one of the inputs into the Lucas-Kanade algorithm, explained in Section 1.4.5. To calculate an estimate for the affine warp, an estimate for the motion of the camera between frames is required. Before the camera motion can be calculated, an estimate of the robot motion is required, see Equation 3.19. Equation 3.19 is the estimated displacement and orientation change of the robot from the anchor frame to the target frame. This is calculated using the change in the state space variables, i.e. location and orientation of the anchor frame, time k + 1, minus location and orientation of the target frame, time k. See Equations 3.20 to 3.25. motionest =  Xest Yest Zest θest ψest φest  (3.19) Xest = xk+1 − xk (3.20) Yest = yk+1 − yk (3.21) Zest = zk+1 − zk (3.22) 38 θest = θk+1 − θk (3.23) ψest = ψk+1 − ψk (3.24) φest = φk+1 − φk (3.25) By substituting these values of the motion estimation into matrix Q1 and Q2, see Equa- tions 3.1 and 3.6 , Q5 can be calculated, see Equation 3.15. Q5 now describes the estimated camera motions between the anchor frame and target frame. The initial warp matrix p is calculated by using Equations 3.26, 3.28 and 3.29. p =  a11 a12 a13 a21 a22 a23  (3.26)  Xp1k+1 Xp2k+1 Xp3k+1 Yp1k+1 Yp2k+1 Yp3k+1 Zp1k+1 Zp2k+1 Zp3k+1 1 1 1  = Q5 ∗  Xp1k Xp2k Xp3k Yp1k Yp2k Yp3k Zp1k Zp2k Zp3k 1 1 1  (3.27) X, Y and Z are the location of points 1, 2 and 3 with respect to the camera. The location of the points at time k + 1 are calculated by using the rotation and translation matrix Q5, see Equation 3.27. 39  a11 a12 a13 1  =  Xp1k Yp1k Zp1k f 0 Xp2k Yp2k Zp2k f 0 Xp3k Yp3k Zp3k f 0 0 0 0 1  −1  Xp1k+1 Xp2k+1 Xp3k+1 1  (3.28)  a21 a22 a23 1  =  Xp1k Yp1k Zp1k f 0 Xp2k Yp2k Zp2k f 0 Xp3k Yp3k Zp3k f 0 0 0 0 1  −1  Yp1k+1 Yp2k+1 Yp3k+1 1  (3.29) f is the focal length of the camera The estimated affine warp is sent to the Lucas-Kanade algorithm. The Lucas-Kanade algorithm used is the Forward additive version, see Section 1.4.5. This algorithm makes the least number of assumptions and therefore requires more calculations. In the future another version of the algorithm can be experimented on to increase the efficiency. All versions of the Lucas-Kanade attempt to minimize the error of the block match between the anchor frame and the target frame. Essentially the Lucas-Kanade algorithm warps the block until it fits the target frame. If the initial estimate is wrong, then the algorithm can diverge and incorrectly move/warp the block. The algorithm is very sensitive to movement and requires a good estimate on the location of the block. This is why an initial block search is used. The output of the Lucas-Kanade algorithm is the p matrix and rms error. If the rms error is greater than 4, it is assumed that the match is not good and hence the current target frame is not used. This means that the data will not be passed on to the image registration and the EKF will not be updated until the next frame. 40 3.2.3 Selecting Feature Points Once the Image is aligned from the affine image alignment, the p matrix is obtained from the Lucas-Kanade algorithm. The position of x and y pixel movement can be calculated by, xpix = astartX − P1,3 − 1 ypix = astartY − P2,3 − 1 The features selected are the four corners of the five blocks. This provides 20 feature points spread throughout the image. In the anchor frame the 64x64 pixel blocks are chosen from the optimal position to best represent the expected movement, see Section 3.2.1. For all experiments in this thesis the blocks had a separation of 100 pixels in the anchor frame. In the future the separation could be a variable that is tuned using the estimated motion. This will allow for faster motions, by decreasing the separations and increasing the allowed movement per frame, or increasing the sensitivity of rotation by increasing the separations. By increasing the separations, the features at the edges will change at a greater rate than the center features when rotations occur. The anchor points, appoints, are selected by taking the four corners of each block. astartX and astartY are the x and y positions of the top left corner of the current block. blocksizeX and blocksizeY are the x and y size of the blocks. For all experiments in this research, the block size was 64 pixels for both x and y. 41 appoints =  astartX astartX blocksizeX + astartX blocksizeX + astartX astartY blocksizeY + astartY blocksizeY + astartY astartY  (3.30) The target points are calculated by selecting the four corners of each target block match. Taking these points from the anchor block and multiplying them by M , see Equation 3.32, the warped points can be calculated. M is calculated using the p matrix from the output of the Lucas-Kanade algorithm. From here the target frame features can be extracted, see Equation 3.34. M =  P1,1 + 1 P1,2 P1,3 P2,1 P2,2 + 1 P2,3 0 0 1  (3.31) warp_points = M ∗  appoints1,1 appoints1,2 appoints1,3 appoints1,4 appoints2,1 appoints2,2 appoints2,3 appoints2,4 1 1 1 1  (3.32) image1_points =  astartX astartY astartX + blocksizeX astartY astartX astartY + blocksizeY astartX + blocksizeX astartY + blocksizeY  (3.33) 42 image2_points =  warp_points1,1 warp_points2,1 warp_points1,2 warp_points2,2 warp_points1,3 warp_points2,3 warp_points1,4 warp_points2,4  (3.34) 3.3 Image Registration Image registration is an iterative process based on feature matching between a model of the scene and points from it's image. It searches for a transformation Qdata/model that makes the data features match the model features within a given error. The model used for this experiment is derived from the first image. In order for this to work a number of assumptions have to be made. First, the location of the selected points in reference to the camera frame must be known. For this application the camera is taking pictures of a wall. The wall is assumed to be flat and the starting orientation of the wall with respect to the vehicle must be known. For this research the vehicle always starts parallel to the wall. For a pipe the assumption that the wall is flat isn't valid, but the general surface of the pipe wall is still known and the curved surface geometry has to be taken into account. If this cannot be done, then another sensor, such as a rangefinder is required to scan the surface of the pipe to obtain a range measurement for the wall. If the surface is relatively flat or uniform, then a set of low density range measurements will suffice. If the surface is complex with changing depths, then a high density scan will be needed. 43 Figure 3.3: Image Registration The Lucas-Kanade algorithm is used to match groups of points from the two images, allowing the algorithm to calculate the change in the camera position between the images. The difference between the points are caused by the difference of the camera position between these images, see Equation 3.35 and 3.36. The change in camera position can be estimated using Equations 3.37 and 3.38. Equations 3.37 and 3.38 require a minimum of three matching points for it to work, but 20 points were used in all the experiments. A least squares method was used to calculate4T and 4R because more than 3 matching points were used to minimize the error caused by noise. 4T and 4R are then used to update the translation and rotation values for the next iteration. This is repeated until 5 iterations have passed. The final rotation and translation provided from image registration are directly related to the camera's movement and are fused with the INS data using an EKF. δxi = xi(R 0, T 0)− xactuali (3.35) δyi = yi(R 0, T 0)− yactuali (3.36) 44 δxi ≈ f Zi 4T1 − fXi Z2i 4T3 − fXiYi Z2i 4R1 + f(X 2 i + Z 2 i ) Z2i 4R2 − fYi Zi 4R3 (3.37) δyi ≈ f Zi 4T2 − fYi Z2i 4T3 − f(Y 2 i + Z 2 i ) Z2i 4R1 + f(XiYi) Z2i 4R2 + fXi Zi 4R3 (3.38) 4T =  4X 4Y 4Z  (3.39) 4R =  4θ 4φ 4ψ  (3.40) The major problem with using image registration was providing a confidence on how accurate these results are. There are two places that will affect the results of the image registration. The first location is the feature matching. If the Lucas-Kanade algorithm fails to provide a correct match of the block, then the change in features will have little meaning and the results obtained from the image registration will have no meaning. For this reason a threshold was used that will dictate whether the feature match is successful or not. The second place that errors can come from is the image registration. If the motion estimation is inadequate then the algorithm may diverge and provide an incorrect motion. To examine the effects on both of these issues, a set of simulated experiment tests was created, see Section 4.4. 45 3.4 EKF Technique The EKF is used to fuse the camera data and the INS together. The state variables for the EKF are shown in Equation 3.41. Xk =  xk vxk yk vyk zk vzk θk φk ψk  (3.41) x, y, z = displacement traveled in the world coordinate vx, vy, vz = current velocity in the world coordinate θ = pitch, φ = roll, ψ = yaw The prediction step is essentially dead reckoning with the INS sensor. Equation 3.42 describes the relationship of updating the state variables during the prediction step, i.e. projecting the state ahead, see Section 1.4.2. X̂k+1|k = f(X̂k|k , Uk, k) (3.42) 46 f(X̂k|k , Uk, k) =  xk + Tvx vx + TA1 y + Tvy vy + TA2 z + Tvz vz + TA3 θ + T θ̇ins φ+ T φ̇ins ψ + T ψ̇ins  (3.43) A1 = (Axcosθcosψ + Ay(sinθsinφcosψ − cosφsinψ) + Az(sinφsinψ + sinθcosφcosψ)) A2 = (Axcosθsinψ + Ay(sinθsinφsinψ + cosφcosψ) + Az(sinθcosφsinψ − sinφcosψ)) A3 = (−Axsinθ + Aycosθsinφ+ Azcosθcosφ− Ag) Ax, Ay, Az = measured acceleration from the INS with respect to the vehicle Ag = acceleration due to gravity with respect to the world coordinate system θ̇ins = angular velocity of pitch measured by the INS φ̇ins = angular velocity of roll measured by the INS ψ̇ins = angular velocity of yaw measured by the INS T = is the time interval of the measurements coming from the INS 47 Equation 3.44 describes the relationship to calculate the error covariance, i.e. the pro- jecting error covariance ahead. Pk+1|k = FkPk|kF Tk + LkQkL T k (3.44) Fk = ∂f ∂X ∣∣∣∣∣ X=X̂k|k =  1 T 0 0 0 0 0 0 0 0 1 0 0 0 0 F2,7 F2,8 F2,9 0 0 1 T 0 0 0 0 0 0 0 0 1 0 0 F4,7 F4,8 F4,9 0 0 0 0 1 T 0 0 0 0 0 0 0 0 1 F6,7 F6,8 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1  (3.45) F2,7 = T (−Axsinθcosψ + Aycosθsinφcosψ + Azcosθcosφcosψ) F2,8 = T (Ay(sinθcosφcosψ + sinφsinψ) + Az(cosφsinψ − sinθsinψcosψ)) F2,9 = T (−Axcosθsinψ+Ay(−sinθsinφsinψ−cosφcosψ)+Az(sinφcosψ−sinθcosφsinψ)) F4,7 = T (−Axsinθsinψ + Aycosθsinφsinψ + Az(cosθcosφsinψ − sinφcosψ)) F4,8 = T (Ay(sinθcosφsinψ − sinφcosψ) + Az(−sinθsinφsinψ − cosφcosψ)) F4,9 = T (Axcosθcosψ +Ay(sinθsinφcosψ − cosφsinψ) +Az(sinθcosφcosψ + sinφsinψ)) 48 F6,7 = T (−Axcosθ − Aysinθsinφ− Azsinθcosφ) F6,8 = T (Aycosθcosφ− Azcosθsinφ) L = T  0 0 0 0 0 0 cosθcosψ sinθsinφcosψ − cosφcosψ sinφsinψ + sinθcosφcosψ 0 0 0 0 0 0 0 0 0 cosθsinψ sinθsinφsinψ + cosφcosψ sinθcosφsinψ − sinφcosψ 0 0 0 0 0 0 0 0 0 −sinθ cosθsinφ cosθcosφ 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1  (3.46) The second part of the EKF is the update step. The update Equations are 3.47, 3.48, 3.50 and 3.51. The purpose of the update step is to correct the error build up from the INS using data from the cameras. Equation 3.47 describes the update estimate using the measurement data, this is the equation that updates the state variables. X̂k+1|k+1 = X̂k+1|k +Kk+1Ỹk+1 (3.47) Ỹk+1 = zcamera − h(X̂k+1|k ) (3.48) 49 h(Xk) =  4X 4Y 4Z 4θ 4φ 4ψ  =  Xk+1|k + (Tcam − Tkal)Vxk+1|k −Xanchor Yk+1|k + (Tcam − Tkal)Vyk+1|k − Yanchor Zk+1|k −+(Tcam − Tkal)Vzk+1|kZanchor θk+1|k − θanchor φk+1|k − φanchor ψk+1|k − ψanchor  (3.49) 4X,4Y,4Z = the movement in the X, Y, Z from the anchor frame to the target frame 4θ,4φ,4ψ = the rotation about the X, Y, Z axis from the anchor frame to the target frame Tcam − Tkal = the times passed since the last prediction step of the kalman filter Equation 3.50 describes the update of the error covariance. Pk|k = (I −Kk+1Hk+1)Pk+1|k (3.50) Kk+1 = Pk+1|kHTk+1 Hk+1Pk+1|kHTk+1 +Rk+1 (3.51) 50 H =  1 0 0 0 0 0 (Tcam − Tkal) 0 0 0 0 0 0 1 0 0 0 0 0 (Tcam − Tkal) 0 0 0 0 0 0 1 0 0 0 0 0 (Tcam − Tkal) 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1  (3.52) Originally the EKF was to include the image registration equations so that the obser- vation matrix, h(Xk) will include the feature points and not the change in displacement and orientation. The problem with this is that it complicated the observation matrix. This became a major issue because the jacobine of the observation matrix, H had to be calculated. The complexity of the model meant that it was not possible to code by hand and the mathematical package Mupad hit a software limit of 25,000 characters when trying to write the code. Attempts were made to break up the code such that Mupad could then transfer the equations to code, but this was not successful. Attempts were also made to simplify the model making the image registration more local and then con- verting the results to the world coordinate frame from within the EKF. This led to other complications and as such the EKF model illustrated above was used. By removing the image registration out of the EKF it simplified the observation matrix, but also made it difficult to assign a value to the covariance of the sensor input, i.e. the change in displacement and orientation. In future work this issue could be revisited and a different filtering algorithm can be used, such as a UKF or particle filter. This will bypass the need to calculate the jacobine of the observation matrix and as such a more complicated 51 model which includes the image registration can be used. 52 4 Experimental Setup and Results 4.1 Simulator The simulator was designed to represent the movement of a pipe inspection robot inside a water main. Figure 4.1 illustrates the dimensions of the simulator and displays the movement of both the cart and platform. This movement provides motion characteristics that will be experienced by free swimming motion of a pipe inspection robot. The platform can clearly be seen in Figure 4.2. Figure 4.1: Schematic of the Simulator The mechanical structure of the simulator consists of a main platform that holds a dual- rail track system for the cart. Two tall pillars, one at each end of the platform, support the track. These pillars can be adjusted to change the track length by sliding them in a set of slots cut into the platform. The two ends of the track can move along a slot located in the middle of each pillar so that the inclination of the track can vary. By adjusting 53 the pillars and the track, a variety of different lengths and angles can be achieved with very little effort. The track consists of two parallel rails that extend between the two tall pillars. This track is a low friction Ignus rail that keeps the cart in place using a set of Teflon-coated guides that provide a smooth sliding motion for the cart. Figure 4.2: Picture of the Simulator The cart is large enough to include a number of sensors including a camera, INS and laser rangefinder. The position and orientation of the cart is controlled with a set of JR Kerr servo control boards that provide multi-axis movement. The simulator consists of a linear belt-driven actuator and a rotation mechanism that are controlled using the JR Kerr motion boards. Another set of boards provide both digital and analog IO to communicate with a variety of sensors. Both the cart and platform contain limit switches that allow the device to auto level itself and adapt to any changes made by the user to the length and/or angle of the track. The cart is moved along the track using a linear belt-driven actuator. The belt goes around two idle pulleys that provide a simple tension system for adjusting the tension of the belt when the inclination and hence the length of the track changes. For this 54 reason, two shorter pillars are positioned between the taller pillars. Each short pillar holds one of the idle pulleys. The pillars are mounted on the main platform that can rotate about two steel shafts, which are aligned along the center axis. The platform rotates about the center axis using a pulley system. As a result of the three different motions mentioned above, the cart can move along the inclined rail and swing, simulating a variety of different motion patterns. The linear motion along the track and the rotation of the platform about the central axis are provided by two servomotors. The servomotors are controlled using the multi-axis motion controller in real time. The controller is modular, consisting of several indepen- dent boards that communicate through an RS485 network. The network is terminated by an RS232 card that is connected to a PC and controls all other boards in the network. The network can support up to 32 modules, but it currently consists of five boards in- cluding two servo controllers, two I/O boards each with 24 digital channels and six 8-bit resolution analog channels, and an RS232 board. The control board has it's own timer that allows time synchronization for all boards that are connected to it. The software of the simulator is divided into a Human Machine Interface (HMI), motion control, and data analysis. The HMI allows the user to have a wide range of simulation patterns, by changing the velocity, acceleration, and number of incremental movements along the track as well as the starting angle, angular velocity, and acceleration of the ro- tating platform. Since the length and angle of the track can vary, the program determines the track length and calibrates the simulator automatically before every simulation. This is accomplished by moving the cart along the track at a slow speed until a limit switch on the cart hits the pillars at each end of the track. Then, the software determines the position of the cart in the desired starting location. The next step is to level the main platform. The simulator uses the on/off signal of a mercury switch to level the main platform horizontally. Once the platform is leveled, the platform rotates to a desired 55 starting angle and the experiment starts. The motion control software simultaneously records the position data measured by the optical encoders on the servomotors as well as the localization and visualization sensors. The data is recorded into a computer file for post-processing. Matlab scripts are used to import and process this data. The program uses a Denavit-Hartenburg (DH) model to transform all recorded data into a common coordinate frame so that positions obtained from the simulator can be compared to those obtained from the localization sensors. 4.2 Sensors The localization system consists of an INS and a digital camera. The INS is capable of tracking the linear motion and rotation of an object with a very low processing power. This is very important in, for example, an AUV or a pipe inspection robot that runs on limited battery power. The INS is a MicroStrain 3DM-GX1 which utilizes a tri-axial DC accelerometer, a tri-axial magnetometer and a tri-axial gyroscope [51]. It uses an embedded microprocessor to fuse the data from the sensors to calculate the static and dynamic orientations. It is noted that the magnetometers will not be functional inside a pipe; hence, the use of an INS with gyroscopes is essential for the measurement of changes in orientation due to the rotation of the pipe inspection robot. For this reason the static measurements for angles are neglected and the angle rates are used instead. This represents the data coming from the gyros instead of a combination of the gyro and magnetometers. The accuracy of the gyros is reported as 0.7 degrees/sec. The accelerometers track accelerations along the x, y and z axes with an accuracy of 0.0981 m/s2. Theoretically, the velocity as well as the travel distance (and hence the new position of the sensor) can be computed by finding the integral of the measured accelerations with respect to time. However, due to the well-known drift problem of 56 the INS, a small error in acceleration can accumulate over time and corrupt the results. Therefore, the INS cannot be considered a reliable means for the measurement of position over an extended period of time. The behavior of the INS used was investigated through several experiments. Figure 4.3 shows 20 experiments with the INS at rest. The measurements of the INS were recorded for about 2 seconds for each experiment. Each of the 20 experiments yielded completely different results leading to unpredictable sensor behavior. Nevertheless, with an update rate of 12 milliseconds, the INS can be used to determine changes in position over very short periods of time and should make an excellent prediction of the movement of the camera [13]. Figure 4.3: INS Stationary Characteristics The camera is a Pixelink PL-A741 with a wide angle lens. The camera is used to take pictures of the featured wall, which represents the inside surface of a pipe and is often void of any easily-matched features. Figure 4.4 shows two pictures, one that is taken when the camera is close to the wall and another when the camera is far from the wall. This illustrates the kind of pictures that the algorithm has to deal with. 57 Figure 4.4: Image Examples The Pixelink PL-A741 is a Firewire (IEEE 1394) monochrome camera with a maximum resolution of 1280 x 1024. This camera is fully configurable, including the resolution. For most tests, a resolution of 496 x 496 was used because it provided a sufficient number of frames per second when the camera tests were being performed close to the wall. By having a high number of frames per second, this allowed for quick movements of the camera when it was close to the wall. It is noted that during fast motions, objects in the FOV can move almost the full camera view and as such don't remain in more than two camera images. An optional laser rangefinder provides a distance measurement of the camera and the wall. Since the laser rangefinder has a limited effective range, it was only usable in tests where the camera was close to the wall, i.e. the high speed tests. For the long three-dimensional tests the laser was not used because it was out of range. 4.3 Result Categories The results are broken up into three categories. This includes Simulated Motions, Section 4.4, Computer Generated Results, Section 4.6, and Simulator Experiments, Section 4.7. 58 Each category of results are different. The Simulated Motions are completely computer generated. So that means the motions as well as the sensor readings are all computer gen- erated. These experiments were preformed to test the image registration under varying simulated conditions. The Computer Generated Results represents data captured from the simulator for mo- tions of the cart. The sensor data for the INS and camera are then generated and noise is added to this data. The reason for using the cart trajectory data is because the results from this section can be directly compared to the third section, Real world results. The Simulator Experiments use real data for the camera and INS. This data is compared to the actual location of the cart which is gathered from the simulator. 4.4 Simulated Motions Over 100,000 test patterns were created to find a relationship between error in the image registration and the error of the motion estimation. The test patterns are made up of a variety of different motions, from single parameter motions to motions involving the full 6 degrees of freedom. Just over 1% of the motions involved are made up of predetermined motions, while the remaining motions are generated from a normal distribution. Of these randomly generated motions, 25% of them involve only translations, i.e. a combination of x, y and z movements, and another 25% involve only rotations, i.e. a combination of pitch, roll and yaw. The remaining 50% of the randomly generated motions are made up of all six motions, i.e. x, y, z, pitch, roll and yaw. Of the translations, they are made up of 3 equally-distributed sample sets that include standard deviations of 2mm, 2.66mm and 4 mm. All of the angles are made up of standard deviations of 0.33 degrees. The reasoning behind using small motions was that the motions the image registration will 59 most likely come across will be small because the time laps between images are small. So this sample set represents the most likely scenarios in real applications. The results of the simulated movements were processed in one of three ways. Case 1 represents the image registration trying to solve for all 6 motions. Case 2 represents the three translations as well as the pitch angle. The reason for using pitch instead of roll and yaw was because it would have the most dissimilar motion with respect to the camera. In a pitch change the features would rotate around the center axis of the picture. A roll movement would mostly look like a Z movement with small changes in the distance the features are away from each other and a yaw movement would mostly look like a X movement but with small changes in the separation of the features. Case 3 is the three translations. This eliminates many potential problems of divergence, but requires a good estimate of the angles to work properly. Each of the three cases were examined with four different scenarios using the same data. These four scenarios are different combinations of adding noise and motion prediction. The noise that was added had a normal distribution with a standard deviation of one third of a pixel. This represents a 99% confidence that the noise will be within one pixel. Since the feature matching algorithm used has sub-pixel accuracy, this is a reasonable range for noise. When a motion prediction is used, a random amount of noise from a normal distribution is added to the actual motion. This noise has standard deviations of two thirds of a mm and 0.033 degrees for rotations. This scenario represents an excellent estimation and is the best that can be expected. When no motion prediction is used, the motion estimation is assumed 0, i.e. the object did not move or rotate from the last image. This scenario represents a poor estimation. The first set of results represents the best case scenario of an excellent motion prediction and no noise, see Table 4.1 . Under these conditions Case 1 performed flawlessly, while Case 2 and Case 3 didn't perform very well. Figure 4.5 shows the relationship between 60 the image registration error and the motion error. Both Case 1 and Case 2 show a strong linear relationship between the process error and the actual error, i.e. as the process error increases so does the motion error. However Case 3 does not show a linear relationship like Case 1 and Case 2, instead the results are pooled in a large central area. This is most likely caused by errors in the rotations, since the results between Case 2 and Case 3, though different in figures, produced results in the table that are almost identical. Case σ2x(mm) σ 2 y(mm) σ 2 z(mm) σ 2 pitch(deg) σ 2 roll(deg) σ 2 yaw(deg) 1 5.2521e-20 8.4066e-20 2.3824e-20 1.0636e-23 1.7149e-23 5.2987e-25 2 5.1124 0.3359 5.3206 4.4471e-07 - - 3 5.3253 0.3360 5.3609 - - - Table 4.1: Image Registration Results with Motion Prediction and No Noise Figure 4.5: Image Registration Results with Motion Prediction and No Noise (Top Left Case 1, Top Right Case 2, Bottom Center Case 3) 61 The next set of test results, see Table 4.2, represents no noise and no motion prediction. Here the results are very similar to the motion predictions. Once again Case 1 performs flawlessly and Case 2 and Case 3 don't perform very well. Additionally, all three cases performed worse than when there was a prediction. This is to be expected, since the estimation that was sent to the image registration is close to the solution, this decreased the likelihood of a divergence. Another interesting observation is that the results in Figure 4.6 show a distinct separation between results in all three cases. This separation most likely indicates which motions were small and which were large, because the small motions will have closer estimates as all estimates were assumed zero. In addition, the results that performed very well in Case 2 and Case 3 are most likely the motions that had little or no rotations. The reason why this will not show up when there is an excellent prediction is because a good prediction can still have an error in the rotation, but a motion with no rotation and an estimate of 0 will be the perfect estimation for rotation. It should be noted that roughly 25% of motions that did not contain any rotation at all. Case σ2x(mm) σ 2 y(mm) σ 2 z(mm) σ 2 pitch(deg) σ 2 roll(deg) σ 2 yaw(deg) 1 1.4019e-17 1.4547e-16 3.7700e-17 2.7106e-21 2.9764e-20 1.4102e-22 2 10.2344 0.05797 10.1722 3.3285e-07 - - 3 10.2152 0.05780 10.2427 - - - Table 4.2: Image Registration Results with No Motion Prediction and No Noise The results from Table 4.3 represent what can be expected when there is an excellent motion estimation. This is because varying amounts of noise were added as described above. This noise can cause the image registration to diverge even when there is an excellent estimation. The results from this test clearly show the benefits of Case 2 and Case 3 over Case 1. When there is no noise, Case 1 performs flawlessly, but when noise is added, it will adjust the rotations to find a good fit to this noise and as a result degrades the solution. This noise, on the other hand, will help Case 2 and Case 3 because the translations cause a distinct motion in the pixels that should not resemble noise. When 62 there is 6 degrees of freedom, the fit can continue to improve, even though the results are degrading, i.e. there is more freedom to adjust the movement to fit the data but this movement isn't necessary the correct motion. Figure 4.6: Image Registration Results with No Motion Prediction and No Noise (Top Left Case 1, Top Right Case 2, Bottom Center Case 3) Case σ2x(mm) σ 2 y(mm) σ 2 z(mm) σ 2 pitch(deg) σ 2 roll(deg) σ 2 yaw(deg) 1 11.3449 0.8301 11.8220 1.9147e-06 1.6652e-04 1.6597e-04 2 0.07170 0.08844 0.07479 9.3443e-07 - - 3 0.07340 0.08689 0.07329 - - - Table 4.3: Image Registration Results with Motion Prediction 63 Figure 4.7: Image Registration Results with Motion Prediction (Top Left Case 1, Top Right Case 2, Bottom Center Case 3) The final set of results are displayed in Table 4.4 and Figure 4.8. This represents a bad case scenario where there is noise and no estimation. Here all three cases perform poorly. Case 1 seemed to perform the same as when there was a good estimation. This indicates that the problem is inherent to the noise and not the estimation, otherwise there would be a difference between these results. Case σ2x(mm) σ 2 y(mm) σ 2 z(mm) σ 2 pitch(deg) σ 2 roll(deg) σ 2 yaw(deg) 1 11.3387 0.8230 11.8278 1.8962e-06 1.6664e-04 1.6585e-04 2 5.1149 0.4189 5.3248 1.3814e-06 - - 3 5.3262 0.4192 5.3617 - - - Table 4.4: Image Registration Results with No Prediction 64 Figure 4.8: Image Registration Results with No Prediction (Top Left Case 1, Top Right Case 2, Bottom Center Case 3) 4.5 Experiment Parameters For all 3D test examples, the same test was used. This test pattern starts with the platform in a level position. The cart quickly accelerates to a constant velocity. Once the cart starts to move the platform rotates quickly towards the wall and then maintains it's position. The velocity of the cart is very slow; this allows testing the localization over long periods. This test pattern evaluates the accuracy of the algorithm over different speeds, i.e. the start of the test is much faster then the rest of the test. It is noted that fast motions, which were tested in the 2D tests, were not possible with 3D tests. The 65 reason for this was that the iris of the camera lens had to be closed to increase the depth of focus in absence of an auto focus camera. In the future, further experiments could be performed with an auto focus camera system. For all of the 3D experiments different cases were used for the configuration of the image registration, see Section 4.4. It is noted that for the real world and computer generated results, only Case 1 and Case 3 are displayed in this thesis. Case 2 was performed, but didn't show any advantages over Case 1 and Case 3 and as such was removed from this thesis. For each case there is a frame skip category. The frame skip number describes how often a camera image is processed, see Table 4.5. The more time that passes between images, the more likely the motion estimation will degrade, and the more likely the image registration will diverge. On the other hand, the greater the amount of time that passes, the larger the motion is and as such the greater the motion to noise ratio will be. For Case 1, Case 2 and Case 3, frame skips 1, 5, 10, 20 and 30 were performed and are summarized in Tables 4.6 and 4.7. Frames skips 1 and 20 were selected to be displayed in detail in the thesis. It is noted that frame skip 5 and 10 produced results that were in between frame skip 1 and 20. Also frame skip 30 failed to complete in all instances because of the large time lap. This was most likely caused by the motion estimation to degrade to such a point that the feature matching was not possible or that the image registration diverged. Frame Skip Number 1 5 10 20 30 Delay between Camera Images (s) 0.021 0.105 0.210 0.42 0.63 Table 4.5: Frame Skip Number to Time Conversion 66 4.6 Computer Generated Results All of the computer generated results use the same motion data captured from the actual simulator. Only the sensors (camera and INS) data is generated. This means the actual location data is collected from the simulator and the camera, INS and EKF data are generated to represent expected sensor data given the actual location of the cart. This allows a direct comparison of the computer generated results to the real world results. The results are separated into two sections, Case 1 and Case 3. These cases are the same cases described in Section 4.4. For each case frame skip 1 and 20 were examined, see Section 4.5 for further details. It is noted that simulations were run for Cases 1,2 and 3 for frame skip 1,5,10,20 and 30. The summary of these tests are presented in Table 4.6. Frame Skip Case 1 Case 2 Case 3 1 Pass Pass Pass 5 Pass Pass Pass 10 Pass Pass Pass 20 Pass Pass Pass 30 Pass Pass Pass Table 4.6: Computer Generated Results Summary Case 1 Case 1 represents all 6 degrees of freedom within the image registration algorithm. In Section 4.4, the results indicated that this case performed flawlessly without noise, but quickly diverged when noise was introduced. The computer generated results support this as well. Figures 4.9, 4.11 and 4.13 show the displacement results with every frame being processed. These results demonstrate that the image registration produced a better result than the INS alone, but was very susceptible to the noise in the sensor data. Figures 4.10, 4.12 and 4.14 show the velocity results. These figures clearly show the effects of the noise over a small period of time. The velocity measurements are scattered and unusable. 67 The results for the orientations are displayed in Figures 4.15, 4.16 and 4.17. From these figures it is clear that the INS provides a far better representation of the true orientation than the camera. Figure 4.9: Simulated X Distance, Case 1, Frame Skip 1 Figure 4.10: Simulated X Velocity, Case 1, Frame Skip 1 68 Figure 4.11: Simulated Y Distance, Case 1, Frame Skip 1 Figure 4.12: Simulated Y Velocity, Case 1, Frame Skip 1 Figure 4.13: Simulated Z Distance, Case 1, Frame Skip 1 69 Figure 4.14: Simulated Z Velocity, Case 1, Frame Skip 1 Figure 4.15: Simulated Pitch, Case 1, Frame Skip 1 Figure 4.16: Simulated Roll, Case 2, Frame Skip 10 70 Figure 4.17: Simulated Yaw, Case 1, Frame Skip 1 The performance of the image registration with six degrees of freedom is greatly increased when larger motions occur between images. This is clearly illustrated with the frame skip 20 results, see Figure 4.18, 4.19, 4.20, 4.21, 4.22, 4.23, 4.24, 4.25 and 4.26. It's also important to note that the results are a large improvement, even though the motion estimation being passed to the image registration will be a lot worse. This is because the algorithm is relying on the INS for 20 times longer than in the above results. Figure 4.18: Simulated X Distance, Case 1, Frame Skip 20 71 Figure 4.19: Simulated X Velocity, Case 1, Frame Skip 20 Figure 4.20: Simulated Y Distance, Case 1, Frame Skip 20 Figure 4.21: Simulated Y Velocity, Case 1, Frame Skip 20 72 Figure 4.22: Simulated Z Distance, Case 1, Frame Skip 20 Figure 4.23: Simulated Z Velocity, Case 1, Frame Skip 20 Figure 4.24: Pitch, Case 1, Frame Skip 20 73 Figure 4.25: Simulated Roll, Case 1, Frame Skip 20 Figure 4.26: Simulated Yaw, Case 1, Frame Skip 20 Case 3 In the third case for the image registration, only the displacements are calculated. This means that the angles are assumed correct in the motion estimation. This case clearly shows that the noise introduced into the simulated camera data has a much smaller effect on the overall results. The velocity measurements in Figures 4.28, 4.30 and 4.32 aren't nearly as chaotic as Figures 4.10, 4.12 and 4.14. 74 Figure 4.27: Simulated X Distance, Case 3, Frame Skip 1 Figure 4.28: Simulated X Velocity, Case 3, Frame Skip 1 Figure 4.29: Simulated Y Distance, Case 3, Frame Skip 1 75 Figure 4.30: Simulated Y Velocity, Case 3, Frame Skip 1 Figure 4.31: Simulated Z Distance, Case 3, Frame Skip 1 Figure 4.32: Simulated Z Velocity, Case 3, Frame Skip 1 76 Figure 4.33: Simulated Pitch, Case 3, Frame Skip 1 Figure 4.34: Simulated Roll, Case 3, Frame Skip 10 Figure 4.35: Simulated Yaw, Case 3, Frame Skip 1 Similar improvements for Case 3 are also seen when comparing the frame skip 20 data to the frame skip 1 data. This further supports the idea that the noise compromises 77 the results when using frame skip 1 with the current speeds. The Case 3 frame skip 20 produced the best results for all of the simulated tests. This supports the results seen in Section 4.4. Figure 4.36: Simulated X Distance, Case 3, Frame Skip 20 Figure 4.37: Simulated X Velocity, Case 3, Frame Skip 20 78 Figure 4.38: Simulated Y Distance, Case 3, Frame Skip 20 Figure 4.39: Simulated Y Velocity, Case 3, Frame Skip 20 Figure 4.40: Simulated Z Distance, Case 3, Frame Skip 20 79 Figure 4.41: Simulated Z Velocity, Case 3, Frame Skip 20 Figure 4.42: Pitch, Case 3, Frame Skip 20 Figure 4.43: Simulated Roll, Case 3, Frame Skip 20 80 Figure 4.44: Simulated Yaw, Case 3, Frame Skip 20 4.7 Results from Real Experiments 2D Modeling Results Early work was performed on testing the feasibility of a motion tracking system with a low feature wall. The motion for the simulator was a 2D motion where the platform was locked in a flat position and the cart moved parallel to the wall. During this test 4 blocks were used with a separation of 64 pixels. A low separation value was used because of the fast paced nature of this test. If a larger separation value was used, some of the blocks would not be in the target frame. This test was designed to see if the block matching algorithm was able to track objects under fast pace motion. Three scenarios were tested: INS only, camera only, and finally INS and camera together. For the Camera results, the entire target frame was searched for the matching blocks. The best matching location was used to calculate the motion between the anchor frame and the target frame. For the camera + INS, the INS was used to predict the locations of the matching blocks in the target frame. Also because there was an estimated motion, the blocks were taken from an optimal location within the anchor frame. It is noted that 81 the actual location is data captured from simulator on the location of the cart carrying the sensors. This data is obtained from optical encoders on the servo motors and will have an error less then 1 mm when including belt flex. Figures 4.45 and 4.46 clearly show that both the camera and the Camera + INS performed well. These figures also show that the Camera + INS combination performed better than just the camera. It is noted that the Camera + INS required only a fraction of the calculations then the camera alone because the whole frame doesn't have to be search and still returned a superior result. The reason for the superior results are clearly indicated in Figure 4.47. Frames 157, 168 and 172 show that the motions surpassed 216 pixels. This means that when the blocks were grabbed from the center of the anchor frame, some or the entire block was missing in the target frame. So even though the entire frame was searched, the ideal match was not possible and the next closest match was taken. This match essentially means nothing and the results show a decrease in the performance of the camera. It is also noted that the Camera + INS can detect motions of up to 432 pixels, Figure 4.47 shows that it was able to detect motions of close to 300 pixels. Figure 4.45: Block Matching Results 82 Figure 4.46: Block Matching Results (Zoom) Figure 4.47: Block Matching Results Pixel Movement 3D Model Results The results displayed in this section feature a single test processed in a number of ways. This allows different scenarios to be compared directly since the same data is used in each scenario. The results are separated into 2 different categories: Case 1 and Case 3. These two cases represent different configurations of the image registration algorithm. For further details see Section 3.3. This test builds on what was learned in the 2D motion 83 tests and applies it to a 3D motion. It also relates to the simulated results seen in Section 4.6. The results are categorized into the Actual, INS, EKF and Camera. The actual position data is obtained from the optical positioning sensors on the simulator and there total error should be less then 1 mm. This would include the flex of the belt and flex in the apparatus frame. The INS uses only the data obtained from the INS, while the EKF is a complete fusion of the INS and Camera data, and the camera is the data obtained from the camera with only a INS data being used for prediction. This allows the results of the camera to be compared to both the INS and the EKF. It should also be noted that the actual position is grabbed from the simulator and the results are not infallible. Only the encoders are used to track the motion, so if there is any belt flex or flex in the structure, as well as gear slack, these will not be reflected in the actual position. The most important part of the results is that they follow the trend of the actual position. Frame Skip Case 1 Case 2 Case 3 1 Pass Pass Pass 5 Pass Pass Pass 10 Pass Pass Pass 20 Fail Pass Pass 30 Fail Fail Fail Table 4.7: Simulator Results Summary Case 1 Case 1 with a skip frame 1 did not perform well, see Figures 4.48, 4.49, 4.50, 4.51, 4.52, 4.53, 4.54, 4.55 and 4.56. This was expected because both the simulated motions (see Section 4.4) as well as the computer generated results (see Section 4.6) predicted a poor performance. This is most likely caused by the small motions between camera images. These motions are so small that the noise from the camera is in the same order as the 84 motions. For instance, if the cart is traveling at a velocity of 6 mm/s in the X direction and the camera is approximately 300 mm from the wall, then with a frame skip 1, the motion seen in the X direction for the camera will be about 1.1 pixels. If it is assumed that the standard deviation for the feature detection is one third of a pixel, which is the value used for noise in the simulated results for Sections 4.4 and 4.6, then this will give a signal to noise ratio (SNR) of 3.67. The Rose criterion states that a SNR of at least 5 is needed to distinguish image features at 100% [52]. Additionally, the X directions will experience the greatest motions with respect to the other parameters. This means that both directions Y and Z as well as pitch, roll and yaw will have much smaller SNR. Another good indication of a lot of noise is the velocity measurements, see Figures 4.49, 4.51 and 4.53. The velocity measurements change consistently from one extreme to another for the camera. This was also shown in the computer generated results, see Figures 4.10, 4.12 and 4.14. It should also be noted that in the first 10 seconds of the experiment the results were good for the X, Y and Z displacements. The decline in performance after this is most likely caused by a buildup of error in the orientation. This error of orientation will degrade the performance of the INS because the removal of the force of gravity will be wrong and the INS will be indicating accelerations in the wrong direction. This means that the prediction of motion degrades leading to a greater chance of the divergence for the image registration algorithm. Figure 4.48: X Distance, Case 1, Frame Skip 1 85 Figure 4.49: X Velocity, Case 1, Frame Skip 1 Figure 4.50: Y Distance, Case 1, Frame Skip 1 Figure 4.51: Y Velocity, Case 1, Frame Skip 1 86 Figure 4.52: Z Distance, Case 1, Frame Skip 1 Figure 4.53: Z Velocity, Case 1, Frame Skip 1 Figure 4.54: Pitch, Case 1, Frame Skip 1 87 Figure 4.55: Roll, Case 1, Frame Skip 1 Figure 4.56: Yaw, Case 1, Frame Skip 1 Case 1 with a frame skip 20 produced much better results than a frame skip 1, see Figures 4.57, 4.58, 4.59, 4.60, 4.61, 4.62, 4.63, 4.64 and 4.65. The X displacement, see Figure 4.57 performed exceptionally well. This is most likely due to the X displacement having the highest SNR. It is also noted that the velocity results, see Figures 4.58, 4.60 and 4.62, are greatly improved and the erratic behavior that was most likely caused by noise has dissipated. In addition, the orientation measurements coming from the image registration are usually worse than the INS measurements, see Figures 4.63, 4.64 and 4.65. This was to be expected as both simulated motions and computer generated results indicated poor orientation results. It is also noted that the image registration algorithm failed after 130 seconds. This is usually caused by an-inability to obtain a match on the set of features caused by poor estimation. As more and more frames are skipped, the estimation from 88 the INS will only degrade further and this is shown in the frame skip 20 when compared to smaller frame skips. Figure 4.57: X Distance, Case 1, Frame Skip 20 Figure 4.58: X Velocity, Case 1, Frame Skip 20 Figure 4.59: Y Distance, Case 1, Frame Skip 20 89 Figure 4.60: Y Velocity, Case 1, Frame Skip 20 Figure 4.61: Z Distance, Case 1, Frame Skip 20 Figure 4.62: Z Velocity, Case 1, Frame Skip 20 90 Figure 4.63: Pitch, Case 1, Frame Skip 20 Figure 4.64: Roll, Case 1, Frame Skip 20 Figure 4.65: Yaw, Case 1, Frame Skip 20 91 Case 3 As described in Section 4.4, Case 3 limits the image registration to translations only. This means that the camera will only provide motion data in the X, Y , and Z directions. When frame skip is 1, Case 3 performs better than Case 1, see Figures 4.66, 4.67, 4.68, 4.69, 4.70, and 4.71, but the results degrade over time. For the first 20 seconds of the experiment, the camera performed exceptionally well. Even in the Y direction, see Figure 4.68, the results are excellent for the start of the test. When the platform was moving the camera is able to track the motion, but when the platform stops the SNR ratio greatly decreases and the results deteriorated. Figure 4.66: X Distance, Case 3, Frame Skip 1 Figure 4.67: X Velocity, Case 3, Frame Skip 1 92 Figure 4.68: Y Distance, Case 3, Frame Skip 1 Figure 4.69: Y Velocity, Case 3, Frame Skip 1 Figure 4.70: Z Distance, Case 3, Frame Skip 1 93 Figure 4.71: Z Velocity, Case 3, Frame Skip 1 Case 3, frame skip 20 performs the best among all other cases, see Figures 4.72, 4.73, 4.74, 4.75, 4.76 and 4.77. Once again the results drift as time progresses, but the degree at which the drift occurs has been reduced with respect to Case 3 frame skip 1. Also the velocity measurements coming from the camera are excellent, see Figures 4.73, 4.75 and 4.77. Another observation is related to the Y displacement, see Figure 4.74. While the camera results show an increase in Y displacement with respect to the actual location, the results from the EKF show sharp spikes of increased Y displacement. The reverse also seems to appear. This must be caused by the INS and the most likely reason why the INS starts to show acceleration in this direction is an incorrect orientation. The force of gravity has not been accounted for correctly and the residual effects appear as a change in displacement. This means there is a possibility of correcting this error and increasing the accuracy with a better estimation of orientation. This can be a focus point for future work. 94 Figure 4.72: X Distance, Case 3, Frame Skip 20 Figure 4.73: X Velocity, Case 3, Frame Skip 20 Figure 4.74: Y Distance, Case 3, Frame Skip 20 95 Figure 4.75: Y Velocity, Case 3, Frame Skip 20 Figure 4.76: Z Distance, Case 3, Frame Skip 20 Figure 4.77: Z Velocity, Case 3, Frame Skip 20 Other observations worth noting are that the EKF usually produced worse results than 96 the camera. One reason for this may be related to the covariance values used for the camera in the EKF. A constant value was used all the time. Ideally this value should be affected by the results of the image registration and feature matching. Another ap- proach would be to incorporate the image registration equations into the Kalman filter. Originally this was attempted with an EKF, but problems occurred, see Section 3.4. In the future a more advanced filtering technique can be used to overcome the difficulties experienced with the EKF. 97 5 Conclusions and Future Work In this research, a localization system was developed and analyzed. The localization system involves a camera and an INS. The system was developed with a pipe inspection vehicle in mind, but can be applied to any environment where a wall is present. This thesis focuses on the different components of the localization system, as well as a testing platform referred to as the simulator that was constructed for the purpose of testing the system. The localization system is comprised of a camera and an INS. The INS is used to gather data on the motions that occur between camera images. This data is then used to predict the motion that occurred between camera images. This allows the machine vision algorithm to minimize its search and increase the chances that a successful match is found. The machine vision algorithm is divided into a number of components. First, a block matching algorithm is used to refine the motion estimation carried out by the camera. Next, an optical flow method based on Lucas-Kanade algorithm is used to refine the results from the block match. This will allow the blocks to change shape and account for the 3D motion that occurred between camera images. Once there are good matches for the blocks, feature points can be obtained. This method has one major advantage over the other processes. There is no need for distinctive features in the images to obtain feature matches. The feature points are then passed to the image registration algorithm. The image registration looks at the change of location of different feature points, and calculates the motion that will be required to account for the change in the features. The 98 process continues recursively and determines the motion for as long as the localization system is moving. A simulator was constructed to test the performance of the localization system. The simulator moves a cart containing the INS and camera along a wall from one known position to another known position along a given path. This allows the results from the algorithm to be directly compared to the actual location of the cart. A number of different parameters were used when testing the algorithms. Tests included different configurations of the image registration and different intervals between camera frames. The results captured from these tests closely match the results gathered from simulated data. When the image registration algorithm was used to determine the translation and ori- entation of the motion, it was very sensitive to noise but returned results which were superior to the INS alone. When the image registration was used to only detect the translation, the system was less sensitive to noise and the performance was improved. Results also demonstrated that increasing the time between camera images, thereby in- creasing the motion between camera images which in effect increases the SNR. This made it easier to detect the motion and hence increased the accuracy of the algorithm. Even though the algorithms showed promising results, there is need for further improve- ments. One of the major shortcomings was the EKF data fusion. In most circumstances, the EKF performed worse than the camera data. This is most likely caused by the poor estimation of the accuracy for the data obtained from the image registration in the EKF calculations for the update of the covariance matrix. As long as the feature detection and image registration error values were below preset values, the accuracy of the results was always considered the same. If more advanced filtering techniques were used to replace the EKF, then the image registration calculation could be incorporated directly into the 99 filter. This will allow the feature locations to be the sensor input, which in turn will allow for a more stable definition of expected variances. The Lucas-Kanade algorithm can also be modified to decrease the number of calculations required. This can be done by experimenting with other versions including compositional, inverse compositional and inverse additive approaches. In addition, a pyramid version of the algorithm can be examined for this purpose. If successful, this will alleviate the need for performing the block matching. If features were tracked over multiply frames, then the camera could be run at full speed, i.e. frame skip 1. This will provide the maximum SNR while also limiting the time between updates from the camera. This would decrease the period of time the system is relying on the INS and would increase the accuracy of the motion predictions. It should be noted that this idea will greatly complicate the image registration and filtering algorithm. Another location where future work can be concentrated on is the use of an auto focus camera. This would allow the wall to remain in focus over a much larger distance with respect to the wall. This will also allow for faster motions because the iris can open further, which allows more light into the camera. If this algorithm is to be used in a real application, then this is a major priority. Given the above short comings and assuming an auto-focus camera doesn't have an ill effect on the performance of the system. The results for Case 3 frame skip 20 can be used to extrapolate the results for a long term test. Assuming that the results would be similar to the this test, and the accuracy required to locate a pit in one pipe length (6 m) in the X direction, then 400 m of pipe could be scanned before crossing this threshold is surpassed. This result isn't accurate enough for a useful product in long water mains. Ideally the localization system should achieve a range that is greater then 1000m. With that said, the velocity measurements are more then accurate enough for reporting the 100 speed of the vehicle. Another thing to keep in mind is that the 400 m result is only the output from the camera and not the EKF. If the data fusion was working better, then this result should have been much greater. 101 Bibliography [1] Dennis Krys and Homayoun Najjaran. Ins assisted visual object localization in an instructed environment. In CIRA - Computational Intelligence in Robotics and Automation, 2007. [2] Dennis Krys and Homayoun Najjaran. Ins assisted vision-based localization in un- structured environments. In SMC - Systems, Man and Cybernetics, 2008. [3] Dennis Krys and Homayoun Najjaran. Ins-assisted monocular robot localization. In ASME International Mechanical Engineering Congress & Exposition, 2010. [4] I. Cox. Blanche: An experiment in guidance and navigation for an autonomous robot vehicle. In IEEE Trans. on Robotics and Automation, volume 7, pages 193 204, 1991. [5] M. Carlowicz. Between iraq and a hard place. Woods Hole Currents, 10(1), 2003. [6] H. Najjaran. Robot torpedo to inspect water mains. Canada Focus, 13(9):4, August 2005. [7] H. Najjaran. Irc creating an underwater robotic vehicle to inspect in-service water mains. Construction Inovation, 10(2), June 2005. [8] Hafmynd, Gavia The Great Northern Diver, April 2009. [online] Avaiable: www.gavia.is. 102 [9] Aboelmagd Noureldin Lorinda Semeniuk. Bridging gps outages using neural network estimates of ins position and velocity errors. Measurement Science and Technology, 17, 2006. [10] Jinling Wang, Matthew Garratt, Andrew Lambert, Jack Jianguo Wang, Songlai Hana, and David Sinclair. Integration of gps/ins/vision sensors to navigate un- manned aerial vehicles. In ISPRS - The International Archives of the Photogram- metry, Remote Sensing and Spatial Information Sciences., volume Vol. XXXVII. of B1, pages 963970, Beijing, July 2008. [11] Dariusz Lapucha. Precise gps/ins positioning for a highway inventory system. Mas- ter's thesis, Surveying Engineering, University of Calgary, 1990. [12] N. Bulusu, J. Heidemann, and D. Estrin. Gps-less low-cost outdoor localization for very small devices. Personal Communications, 7(5):2834, Oct 2000. [13] M. Pachter. Ins aiding using optical flow - theory. In ICNPAA 2004: Mathematical Problems in Engineering and Aerospace Sciences, 2004. [14] Peter Lang, Miguel Ribo, and Axel Pinz. A new combination of vision-based and inerial tracking for fully mobile, wearable, and real-time operation. In 26th Workshop of the AAPR/OAGM, 2002. [15] Sooyong Lee and Jae-Bok Song. Robust mobile robot localization using optical flow sensors and encoders. In International Conference on Robotics and Automation, 2004. [16] John Lee G Sudhir. Video annotation by motion interpretation using optical flow streams. Technical report, Hong Kong University of Science and Technology, 1997. [17] A. Giachetti, M. Campani, and V. Torre. The use of optical flow for road navigation. Robotics and Automation, 14:3448, 1998. 103 [18] Espen Hagen and Eilert Heyerdahl. Navigation by optical flow. In Computer Vision and Applications, volume 1, pages 700704, The Hague, Netherlands, 1992. [19] Sebastrian Thurn, Wolfram Burgard, and Dieter Fox. Probsbilistics Robotics. The MIT Press, 2005. [20] Wolfram Burgard Frank Dellaert Dieter Fox, Sebastian Thrun. Particle filters for mobile robot localization. Sequential Monte Carlo Methods in Practice, 2001. [21] F.; Bergman N.; Forssell U.; Jansson J.; Karlsson R.; Nordlund P.-J.; Gustafsson, F.; Gunnarsson. Particle filters for positioning, navigation, and tracking. Signal Processing, 50:425437, 2002. [22] Volkan Cevher and James H. McClellan. Fast initialization of particle filters using a modified metropolis-hastings algorithm: Mode-hungry approach. In International Conference on Acoustics, Speech, and Signal Processing, 2004. [23] Greg Welch and Gary Bishop. An introduction to the kalman filter. Technical report, Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599-3175, July 24,2006. [24] J. Sasiadek and P. Hartana. Sensor data fusion using kalman filter. ISIF, WeD5:19 25, 2000. [25] Robert Brown and Patrick Hwang. Introduction to Random Signals and Applied Kalman Filtering. John Wiley & Sons, 1992. [26] Howie Choset, Kevin Lynch, Seth Hutchinson, George Kantor, Wolfram Burgard, Lydia Kavraki, and Sebastian Thrun. Principles of Robot Motion. The MIT Press, 2005. [27] Naser El-Sheimy Aboelmagd Noureldin, Ahmed Osman. A neuro-wavelet method 104 for multi-sensor system integration for vehicular navigation. Measurement Science and Technology, 15, 2004. [28] Jonghyuk Kim and Galen Brambley. Dual optic-flow integrated navigation for small- scale flying robots. In Australasian Conference on Robotics & Automation, 2007. [29] P. Thambidurai M. Ezhilarasan. Simplified block matching algorithm for fast motion estimation in video compression. Journal of Computer Science, 4:281289, 2008. [30] David Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91110, 2004. [31] Javier Ruiz-del-Solar Patricio Loncomilla. Lecture Notes in Computer Science, chap- ter Improving SIFT-Based Object Recognition for Robot Applications, pages 1084 1092. Springer Berlin / Heidelberg, 2005. [32] Danica Kragic Mårten Björkman. Combination of foveal and peripheral vision for ob- ject recognition and pose estimation. In IEEE International Conference on Robotics and Automation, 2004. [33] S. Cheung A. Gyaourova, C. Kamath. Block matching for object tracking. 2003. [34] Harvey Ho. 2d-3d block matching. Master's thesis, The University of Auckland. [35] Mark S. Drew Steven L. Kilthau and Torsten Möller. Full search content independent block matching based on the fast fourier transform. In IEEE ICIP, 2002. [36] Shan Zhu and Kai-Kuang Ma. A new diamond search algorithm for fast block- matching motion estimation. In IEEE TRANSACTIONS ON IMAGE PROCESS- ING, 2000. [37] Xuan Jing and Lap-Pui Chau. An efficient three-step search algorithm for block motion estimation. In IEEE Transaction on Multimedia, 2004. 105 [38] Christoph Schnörr Andrés Bruhn, Joachim Weickert. Lucas/kanade meets horn/schunck: Combining local and global optic flow methods. International Jour- nal of Computer Vision, 61:211231, 2004. [39] Simon Baker Iain Matthews. Active appearance models revisited. International Journal of Computer Vision, 60:135164, 2004. [40] Aaron Hertzmann-Steven Seitz Li Zhang, Brian Curless. Shape and motion un- der varying illumination: Unifying structure from motion, photometric stereo, and multi-view stereo. In IEEE International Conference on Computer Vision, 2003. [41] Gary Bradski and Adrian Kaehler. Learning OpenCV. O'Reilly, 2008. [42] Simon Baker and Iain Matthews. Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3):221255, 2004. [43] S. Baker and I. Matthews. Equivalence and efficiency of image alignment algorithms. In Computer Vision and Pattern Recognition, 2001. [44] Leopoldo Jetto and Sauro Longhi. Development and experimental validation of an adaptive extended kalman filter for the localization of mobile robots. IEEE Transactions on Robotics and Automation, 15:219229, 1999. [45] M. Vidyasagar Mark W. Spong, Seth Hutchinson. Robot Modeling and Control. Wiley, 2006. [46] L. Jetto, S. Longhi, and D. Vitali. Localization of a wheeled mobile robot by sensor data fusion based on a fuzzy logic adaptive kalman filter. Control Engineering Practices, 7:763771, 1999. [47] Alberto Bemporad and Mauro Di Marco Alberto Tesi. Sonar-based wall-following control of mobile robots. Journal of Dynamic Systems, Measurement, and Control, 122:226230, 2000. 106 [48] Huiyu Zhou Yaqin Tao, Huosheng Hu. Integration of vision and inertial sensors for home-based rehabiliation. In IEEE International Conference on Robotics and Automation, 2005. [49] Bruce Lucas and Takeo Kanade. An iterative image registration technique with an application to stereo vision. In IJCAI - Proceedings of the 7th International Joint Conference on Artificial Intelligence, pages 674679, Vancouver, British Columbia, August 1981. [50] M. Agrawal and K. Konolige. Real-time localization in outdoor environments using stereo vision and inexpensive gps. In ICPR - International Conference on Pattern Recognition, 2006. [51] MicroStrain Inc, 3DM-GX1, April 2009. [online] Available: http://www.microstrain.com/3dm-gx1.aspx. [52] Edwin M. Leidholdt Jr. Jerrold T. Bushberg, J. Anthony Seibert and John M. Boone. The Essential Physics of Medical Imaging. Lippincott Williams & Wilkins, 2001. 107

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0071679/manifest

Comment

Related Items