Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Ultrasound registration and tracking for robot-assisted laproscopic surgery Yip, Michael Chak Luen 2011

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2011_fall_yip_michael.pdf [ 13.54MB ]
Metadata
JSON: 24-1.0072185.json
JSON-LD: 24-1.0072185-ld.json
RDF/XML (Pretty): 24-1.0072185-rdf.xml
RDF/JSON: 24-1.0072185-rdf.json
Turtle: 24-1.0072185-turtle.txt
N-Triples: 24-1.0072185-rdf-ntriples.txt
Original Record: 24-1.0072185-source.json
Full Text
24-1.0072185-fulltext.txt
Citation
24-1.0072185.ris

Full Text

Ultrasound Registration and Tracking for Robot-Assisted Laproscopic Surgery  by Michael Chak Luen Yip B.A.Sc., University of Waterloo, 2009  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  Master of Applied Science in THE FACULTY OF GRADUATE STUDIES (Electrical and Computer Engineering)  The University Of British Columbia (Vancouver) August 2011 c Michael Chak Luen Yip, 2011  Abstract In the past two decades, there has been considerable research interest in medical image registration during surgery. The overlay of medical images over the images from a surgical camera allows the surgeon to see sub-surface features such as tumor boundaries and vasculature. Ultrasound imaging is a prime candidate for medical image registration, as it is a real-time imaging modality and therefore is commonly-used for intraoperative surgical guidance. Prior technologies that attempted ultrasound-based registration have used external trackers in order to establish a geometric correspondence between the surgical cameras and the ultrasound probes; this requires probe and camera calibration, which is time-consuming, requires additional equipment, and adds additional sources of error to the registration. Another problem is how to maintain a registration between the ultrasound image and the underlying tissues, since tissues will move and deform from patient breathing and heartbeat, and from surgical instrument interaction with tissues. In order to overcome this, the underlying tissue should be tracked, and previously acquired ultrasound images should be registered and moved with the tracked tissue. Prior work has had limited success in providing a real-time solution for estimating local tissue deformation and movement; furthermore, there has been no work in estimating the accuracy of maintaining a registration — that is, the accuracy of the registration after having been moved with the tracked tissue. In this work, we establish an image registration method between ultrasound images and endoscopic stereo-cameras using a novel registration tool; this method does not require external tracking or ultrasound probe calibration, thus providing a simple method for performing a registration. In order to maintain an image registration over time, we developed a tissue tracking framework. Its key innovation ii  is in achieving real-time tracking of a dense tissue surface map. We use the STAR detector and Binary Robust Independent Elementary Features and compare their performance to prior tissue feature tracking methods, showing that they perform significantly faster while still managing to track the tissue at high densities. Experiments are performed on ex-vivo bovine heart, kidney, and porcine liver tissues, and initial results show that registrations can be maintained within 3 mm.  iii  Preface This thesis was prepared under the guidance of Dr. Tim Salcudean and Dr. Robert Rohling. They provided the research topic of intraoperative ultrasound registration in robot-assisted laparoscopic surgery, and have provided ongoing suggestions and feedback throughout the course of work described in this thesis. This thesis has been written in a manuscript-based style, and therefore includes contributions from a number of co-authors. A version of Chapter 2 has been published by Springer Verlag, in the Lecture Notes in Computer Science [79], titled “3D Ultrasound to Stereoscopic Camera Registration through an Air-Tissue Boundary.” The paper is co-authored with Troy Adebar, Dr. Robert Rohling, Dr. Tim Salcudean, and Dr. Chris Nguan. The author and Troy Adebar are credited equally with the design and fabrication of the instrumentation, algorithm development, experimental validation, testing, and data analysis. The author wrote the manuscript, with assistance and editing from Troy Adebar. Dr. Tim Salcudean and Dr. Robert Rohling are credited with developing the overall idea and assisting with suggestions and editing of the manuscript. Dr. Chris Nguan provided guidance for the development of the idea from a clinical and surgical standpoint. A version of Chapter 3 has been submitted for publication. In particular, only the introductory section has been modified and significantly shortened from the publication version such that there is reduced overlap of background material with that covered in Chapter 2, as this is a continuation of the work from that Chapter. The paper was co-authored with Troy Adebar, Dr. Robert Rohling, Dr. Tim Salcudean, and Dr. Chris Nguan. The author and Troy Adebar are credited evenly with the design and fabrication of the instrumentation, algorithm development, experiiv  mental validation, testing, and data analysis. Troy Adebar wrote the manuscript, with assistance and editing from the author. Dr. Tim Salcudean and Dr. Robert Rohling are credited with developing the overall idea and assisting with suggestions and editing of the manuscript. Dr. Chris Nguan provided guidance for the development of the idea from a clinical and surgical standpoint. A version of Chapter 4 has been submitted for publication. This manuscript is co-authored with Dr. David Lowe, Dr. Robert Rohling, and Dr. Tim Salcudean. The author is credited with the algorithm development, experimental validation, testing, and data analysis. The author also wrote the manuscript. Dr. David Lowe, Dr. Tim Salcudean and Dr. Robert Rohling are credited with providing guidance for the development of the algorithms, and assisting with suggestions and editing of the manuscript. Algorithms were coded by the author in C++ and used two opensource packages for basic data structures, the computer vision libraries OpenCV (http://opencv.willowgarage.com) and Vlfeat(http://www.vlfeat.org). The work described was performed under the approval of the University of British Columbia Clinical Research Ethics Board (828 West 10th Avenue, Vancouver, BC V5Z 1L8), under “Pilot Study: Real-time image Guidance for RobotAssisted Laparoscopic Partial Nephrectomy.” The UBC CREB Number was H0802798.  v  Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ii  Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  iv  Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  vi  List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ix  List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  x  List of Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  xv  Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  xvi  1  2  Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  1.1  Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  1  1.2  Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  3  1.2.1  Image Registration Techniques . . . . . . . . . . . . . . .  3  1.2.2  Maintaining an Image Registration . . . . . . . . . . . . .  5  1.3  Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . .  8  1.4  Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . .  10  3D Ultrasound to Stereoscopic Camera Registration through an AirTissue Boundary: An Initial Feasibility Study . . . . . . . . . . . . .  12  2.1  Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  12  2.2  Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  14  2.2.1  15  Experimental Setup . . . . . . . . . . . . . . . . . . . . .  vi  3  4  2.3  Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  18  2.4  Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  19  2.5  Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  21  Registration of 3D Ultrasound to Laparoscopic Stereo Cameras through an Air-Tissue Boundary: Tests with the da Vinci Stereo Cameras . .  22  3.1  Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  24  3.1.1  Registration Concept . . . . . . . . . . . . . . . . . . . .  24  3.1.2  Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . .  24  3.1.3  Registration Procedure . . . . . . . . . . . . . . . . . . .  26  3.1.4  Validation Procedure . . . . . . . . . . . . . . . . . . . .  29  3.2  Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  30  3.3  Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  32  3.4  Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  35  Real-time 3D Tissue Tracking for Image-Guided Surgery . . . . . .  36  4.1  Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  36  4.1.1  Clinical Problem . . . . . . . . . . . . . . . . . . . . . .  36  4.1.2  Related Research on Feature Detection in Natural Scenes .  37  4.1.3  Related Research on Feature Detection in Endoscopy . . .  39  4.1.4  Contributions . . . . . . . . . . . . . . . . . . . . . . . .  42  4.1.5  Outline of Chapter . . . . . . . . . . . . . . . . . . . . .  42  Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  44  4.2.1  Choice of Feature Detectors and Extractors . . . . . . . .  44  4.2.2  Temporal Feature Matching . . . . . . . . . . . . . . . .  50  4.2.3  3D Depth Estimation . . . . . . . . . . . . . . . . . . . .  54  4.2.4  Region Tracking and Registration . . . . . . . . . . . . .  54  4.2.5  Selection of Parameter Values . . . . . . . . . . . . . . .  58  Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  59  4.3.1  Temporal Tracking . . . . . . . . . . . . . . . . . . . . .  59  4.3.2  3D Depth Estimation . . . . . . . . . . . . . . . . . . . .  61  4.3.3  Region Tracking and Registration . . . . . . . . . . . . .  62  4.3.4  Apparatus and Test Data . . . . . . . . . . . . . . . . . .  63  4.2  4.3  vii  4.4  Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  64  4.4.1  Temporal Tracking Results . . . . . . . . . . . . . . . . .  64  4.4.2  3D Depth Estimation Results . . . . . . . . . . . . . . . .  74  4.4.3  Region Tracking and Registration Results . . . . . . . . .  77  4.5  Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  81  4.6  Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  84  Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  85  5.1  Summary of Contributions . . . . . . . . . . . . . . . . . . . . .  85  5.2  Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  87  Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  89  Appendix A Other Materials . . . . . . . . . . . . . . . . . . . . . . .  99  5  A.1 Two-sample t-test  . . . . . . . . . . . . . . . . . . . . . . . . . 100  A.2 Random Sample Consensus - RANSAC . . . . . . . . . . . . . . 100 A.3 Camera Calibration for a da Vinci Camera System . . . . . . . . . 101 A.4 Camera Capture System . . . . . . . . . . . . . . . . . . . . . . 103  viii  List of Tables Table 2.1  Mean, standard deviation and median of errors associated with localizing bead fiducials at air-tissue boundaries. * = Significantly different from control case . . . . . . . . . . . . . . . .  Table 2.2  Mean errors (n = 12) between points in a registered 3DUS volume and its location in the stereo-camera frame. . . . . . . . .  Table 3.1  19 19  Registration accuracy imaging through PVC phantom. (Asterisks indicate statistically significant improvements over the single tool result.) . . . . . . . . . . . . . . . . . . . . . . . . . .  31  Table 3.2  Registration accuracy imaging through ex-vivo liver tissue. . .  31  Table 4.1  Parameters for temporal tracking, stereoscopic matching, and object tracking . . . . . . . . . . . . . . . . . . . . . . . . . .  ix  72  List of Figures Figure 1.1  Localizing fiducials across an air tissue-boundary for image registration. . . . . . . . . . . . . . . . . . . . . . . . . . . .  Figure 1.2  8  Tracking tissue surface feature locations from frame to frame allows spatial estimation of underlying feature movement. In the same way, a registered medical image can be transformed and maintain its spatial correspondence to the underlying tissue features. . . . . . . . . . . . . . . . . . . . . . . . . . . .  9  Figure 2.1  a) Schematic of the registration method, b) Experimental Setup  16  Figure 2.2  a) Fiducial localization test plate, b) Registration tool, c) Registration accuracy test tool . . . . . . . . . . . . . . . . . . .  Figure 2.3  17  Example images of an air-tissue boundary (left) and a 3 mm fiducial pressed against an air-tissue boundary (right). . . . . .  18  Figure 3.1  Registration concept. . . . . . . . . . . . . . . . . . . . . . .  23  Figure 3.2  Robotic TRUS imaging system. . . . . . . . . . . . . . . . .  25  Figure 3.3  Schematic of registration tool. Dimensions are shown in millimeters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  26  Figure 3.4  Experimental setup. . . . . . . . . . . . . . . . . . . . . . . .  27  Figure 3.5  3DUS and da Vinci stereo endoscope arranged to image the ex-vivo air-tissue boundary (a) and da Vinci camera view of the tool pressed against the surface of the porcine liver (b). . .  Figure 3.6  28  Schematic of cross wire phantom. Dimensions are shown in millimeters. . . . . . . . . . . . . . . . . . . . . . . . . . . .  x  29  Figure 3.7  Surface fiducial against an air-tissue boundary and imaged through a PVC phantom (a) and an ex-vivo liver tissue sample (b). Illustration of method for localizing fiducial tip (c): The axis of the reverberation is identified, and the tip is selected along that line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  Figure 3.8  31  Overlay of TRUS information on da Vinci camera view based on registration. Segmented urethra (red) and seminal vesicles (blue) are shown. . . . . . . . . . . . . . . . . . . . . . . . .  Figure 4.1  32  Repeatedly identifying and tracking tissue locations in endoscopy (top), and maintaining an image registration based on tracked movement (bottom). . . . . . . . . . . . . . . . . . . . . . .  38  Figure 4.2  Typical photometric properties of a in-vivo scene during surgery. 43  Figure 4.3  Two bi-level center-surround kernels, one in a square configuration, and one in a diamond configuration, are constructed using integral images, and added together to produce STARshaped kernel used in STAR detector. . . . . . . . . . . . . .  Figure 4.4  47  The extraction of a BRIEF vector. Pairs of pixels are shown to be chosen in correspondence with a isotropic Gaussian probability function about the image patch center. . . . . . . . . . .  Figure 4.5  48  Flowchart depicting the proposed feature tracking framework on a single image.Features extracted in the current frame, and matched to features extracted in previous frames. Matched features are updated, and new features are saved. Stability of the list of features are assessed, and those which are not stable are deleted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  xi  49  Figure 4.6  Flowchart depicting the proposed region tracking and registration framework. A region is defined in frame 0, and features within the region are saved as object features. Features tracked in subsequent frames are matched to the object features to keep the object features up to date, and the new features in the tracked region are appended to the object feature list. All object features, registered images, and region outlines are kept in the frame 0 coordinate system to avoid drift from successive transformations. . . . . . . . . . . . . . . . . . . . . . . . . .  Figure 4.7  56  Screen captures of frame 20 of the Series and Heartbeat video. Circle represent feature locations. Row 1 is the original image, Row 2 is using the proposed BRIEF+STAR method, Row 3 is using the SURF method, and Row 4 is using the SIFT. . . . .  65  Figure 4.8  Number of Features found per frame. . . . . . . . . . . . . .  66  Figure 4.9  Percentage of these features that are matched to a previously detected feature. . . . . . . . . . . . . . . . . . . . . . . . .  67  Figure 4.10 Number of features found previously that is kept in a historypreserving feature list. . . . . . . . . . . . . . . . . . . . . .  67  Figure 4.11 Percentage of features from the saved list that are removed every frame due to a low frequency of finding a suitable temporal match. (*) = not statistically significant: pBRIEF,SURF = 0.1174, pBRIEF,SIFT = 0.1758, pSURF,SIFT = 0.8325. . . . . . . . . . . .  68  Figure 4.12 Histogram of depicting the percentage of features that are found in a certain percentage of subsequent frames. The graph is cumulative such that feature numbers drop off as the ratio between times found and total lifetime increases. The figure insets reveal the zoomed version of the higher percentages. . . .  xii  69  Figure 4.13 A sample feature location is chosen for each feature tracking algorithm, and the feature’s pixel-location is tracked from the beginning to the end of the video (red). Subsequently, starting at the end of the video, the feature is tracked backwards towards the start of the video (blue). Due to frames where a feature is momentarily lost, there are gaps within the tracking in both directions. . . . . . . . . . . . . . . . . . . . . . . . .  70  Figure 4.14 Example of a BRIEF feature being tracked over time. . . . . .  71  Figure 4.15 Speed of the feature tracking framework for BRIEF+STAR, SURF, and SIFT feature types. . . . . . . . . . . . . . . . . .  73  Figure 4.16 A sample image of matched left and right channel feature. The image above is the left channel, populated by features (•) found in the current frame. They are connected to the locations on which the matching right features ( ) would be located. . .  74  Figure 4.17 Number of features matched between the stereo camera channels. 76 Figure 4.18 Percent of features matched between the stereo camera channels. (*) = not statistically significant: pBRIEF,SURF = 0.3629, pBRIEF,SIFT = 0.8839, pSURF,SIFT = 0.3654. . . . . . . . . . . .  76  Figure 4.19 Time required to process a pair of stereo frames for feature matching. (*) = not statistically significant, pSURF,SIFT = 0.871.  77  Figure 4.20 Depth estimation of in-vivo stereo video. (a) shows the stereo matches in the left frame, (b) shows the stereo-triangulated points and estimated camera position, (c) shows the interpolated depth map, and (d) shows the reprojected image. . . . .  79  Figure 4.21 Registration and tracking combination. Circles represent starting location of a surface fiducial, and triangles represents the final position of the surface fiducial after tracking through the videos. Tests were performed on (a) kidney, (b) heart, and (c) liver. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure A.1  80  Checkerboard pattern as seen in the left camera image (top row) and the right camera image (bottom row). . . . . . . . . 102  xiii  Figure A.2  NVIDIA Digital Video Pipeline, containing HD capture, GPU, and output card. Images courtesy of http://www.nvidia.com. . 105  Figure A.3  3D Vision Pro Kit: Radio frequency emitter and a pair of 3D glasses shown. Images courtesy of http://www.nvidia.com. . . 105  xiv  List of Acronyms ANOVA  Analysis of Variance  BRIEF  Binary Robust Independent Elementary Features [11]  SURF  Speeded Up Robust Features [7]  CENSURE  Center Surrounded Extremas for Real-time Feature Detection [2]  3 DUS  3D Ultrasound  TRUS  Transrectal Ultrasound  LUS  Laparoscopic Ultrasound  TRE  Target Registration Error  FRE  Fiducial Registration Error  RALRP  Robot-Assisted Laparoscopic Radical Prostatectomy  DOG  Difference of Gaussians  RANSAC  Random Sample Consensus  xv  Acknowledgments First and foremost, I would like to thank Dr. Tim Salcudean and Dr. Robert Rohling, for their valuable guidance and feedback through the course of my research; without their support, this thesis would not have been possible. Furthermore, throughout the entire research program, they have given me the freedom to explore different avenues of research that I was particularly interested in, for which I am very grateful. I would like to thank Dr. David Lowe for his support and feedback of the tissue tracking research efforts. Thank you to Chris Nguan, M.D., who has been very supportive of the work and has offered many useful suggestions and advice for clinical applications. Thank you to Dr. Guang-Zhong Yang, Dr. Peter Mountney, and Dr. Danail Stoyanov of the Hamlyn Center at Imperial College of London for their help with their endoscopic in-vivo image data sets. There are many colleagues in the Robotics and Controls Lab at the University of British Columbia that have helped me during my research program. Most of the work presented in this thesis would not have been possible without Troy Adebar, the main co-author of two chapters of this thesis, with whom I was working closely with since the beginning of my research program until the completion of this thesis. Many thanks to Caitlin Schneider for her assistance in numerous experiments and research discussions. I would also like to thank John Bartlett, Raoul Kingma, Hedyeh Rafii-Tari and Jeff Abeysekera for making the the lab such a fun and supportive place to work. Finally, I would like to thank my family for their love and support throughout my research program.  xvi  Chapter 1  Introduction 1.1  Motivation  In the past two decades, minimally invasive surgery (MIS) has become a standard alternative to a number of traditionally open procedures. During minimally invasive surgery, thin surgical instruments and an endoscopic camera are passed through small incisions in the patient into the body cavity. This reduces blood loss, decreases the chance of infection, and limits the amount of trauma to the patient when compared to open surgery, leading to quicker recovery times and shorter hospital stays. However, there are certain drawbacks to minimally invasive surgery. First, passing surgical instruments through small ports severely restricts their ability to move. Secondly, due to the fulcrum effect of the ports on laparoscopic instruments, non-axial instrument motion within the body is reversed in the surgeon’s perspective. Finally, without the ability to feel the tissues, the surgeons are required to rely on video displays from endoscopic cameras for surgical guidance, which typically have a small field of view and provide poor lighting conditions. In order to improve the ergonomics and visual feedback of minimally invasive surgery, robot-assisted surgical systems have been developed. The da Vinci Surgical System (Intuitive Surgical Inc., Sunnyvale, California, USA) provides a teleoperated surgical system for robot-assisted laparoscopy [23]. The system is comprised of a surgeon’s console and a bedside robot, which effectively represents 1  a master and slave system. The surgeon’s controllers are mapped such that the laparoscopic tool motions are no longer reversed in the surgeon’s perspective, and it allows the surgeon to scale his or her motion in order to perform micro-surgical procedures. In addition, the surgical console provides a 3D view of the scene to the surgeon by utilizing a stereoscopic laparoscope, effectively placing the surgeon’s eyes inside the body, at the tip of the endoscopic cameras. The daVinci Surgical System is an enabling platform for integrating augmented reality techniques into the operating room. Registering a medical image, such as an ultrasound (US), computed tomography (CT), or magnetic resonance (MR) image, to the surgical camera view can potentially allow the surgeon to visualize subsurface features of the underlying tissues such as tumours or vasculature. The problem of registration of medical images to the surgical cameras is of increasing interest because of the growing popularity of minimally invasive surgery and robotassisted interventions. Partial nephrectomy is an example of a surgical procedure where minimallyinvasive techniques have become the de-facto standard [28] and for which image registration can play a critical role. During laparoscopic partial nephrectomy, the kidney is mobilized from the surrounding tissues and the renal artery is clamped such that blood flow to the kidney is halted. The surgeon then resects the part of kidney with the tumour, leaving the adjacent renal tissues intact in order to preserve kidney function. Since there is a lack of blood flow to the kidney during tumour resection, the kidney is subject to a warm ischemia time. After 30 minutes, this will result in permanent tissue damage. Providing a medical image registration will allow the visualization of the tumour boundaries overlaid on the kidney, and will provide image guidance to the surgeon for the excision of the tumour. Better guidance can potentially reduce the warm ischemia time, improve surgical margins and allow more tissue to be left intact, thus improving renal function [49]. A suitable imaging technology in this case is ultrasound, as 2D images or 3D volumes can be acquired intraoperatively using either an external ultrasound probe, a laparoscopic ultrasound probe, or a drop-in ultrasound probe [60]. Most recently, Ukimura and Gill [70] have provided the first initial clinical results of an image-guided partial nephrectomy using a registered CT image, as well as results of an image-guided laparoscopic radical prostatectomy using a trans-rectal US registration. They over2  layed a 3D-segmented tumor from the medical images into the stereoscopic cameras and drew a radius around the tumor boundaries that represented areas in which the surgeon needs to cut in order to achieve negative margins. The study showed that augmented reality was feasible and that that surgeons found the medical image overlays useful for tumor resection guidance.  1.2 1.2.1  Background Image Registration Techniques  There are numerous strategies that have been proposed in the literature on how to acquire an image registration between an ultrasound volume and a surgical camera. The use of external tracking equipment is one of the most common methods. Dedicated tracking equipment such as the Micron Tracker system (Claron Technologies Inc., Toronto, Ontario, CA) and the Optotrak system (Northern Digital Inc., Waterloo, Ontario, CA) use optical markers that are attached to the ultrasound probe and the surgical cameras in order to track their relative position to an accuracy of between 0.1 mm and 0.35 mm [6]. Another common tracking technology is the use of magnetic markers such as the Aurora system (Northern Digital Inc.) and the Flock of Birds tracker (Ascension Technology Corporation, Milton, Vermont, USA), that do not require a line of sight but have an accuracy within 0.9 mm and 1.6 mm [6]. In order to acquire the transformation between the ultrasound images’ frame of reference and the markers secured to the ultrasound probe, a calibration is required. Typically, ultrasound probe calibration involves imaging a calibration phantom with the ultrasound probe, and determining a correspondence between features in the ultrasound image (such as metal bead fiducials or thin cross wires) and their known geometric configuration in the phantom. Then, given that the calibration phantom has tracking markers in known locations relative to the ultrasound features, an ultrasound probe calibration can be achieved. Bergmeir et al. [8], Hsu et al. [26], and Mercier et al. [39] provide an extensive overview of calibration techniques for 3D and tracked 2D ultrasound probes. Another common ultrasound calibration method involves the use of a tracked sty3  lus with an ultrasound fiducial. Target registration errors (TRE), the distance after registration between corresponding points not used to calculate the registration transformation, of these methods have been reported to be between 0.81 mm and 1.48 mm using phantom-based methods, and between 1.52 mm and 8.22 mm for the stylus-based methods, respectively [27]. A camera calibration also must be performed in order to relate the tracked markers to the camera’s frame of reference, typically performed by imaging a checkerboard with optical/magnetic trackers at known locations. An image registration between a laparoscopic US probe and stereoscopic laparoscope has been demonstrated in [14] and [13] showing calibration accuracies of 2.38 mm and 2.43 mm, respectively. There are several shortcomings of using an external tracking system. For optical tracking, a line of sight must be maintained from the ultrasound probe and the surgical camera markers to the tracking system, which is not easily accomplished in the cluttered operating room environment. Furthermore, ultrasound and camera calibrations add another source of error into the registration procedure. Finally, these calibration procedures require time and extra equipment to perform, which can take valuable time and space in the operating room. Another method for identifying the position and orientation of the ultrasound probe and the surgical camera is through robot kinematics. In this method, the robot picks up an ultrasound probe (such as the one described in [60] in one of its manipulators and relies on kinematics to determine the relative movement of the ultrasound probe. Again, a calibration must be performed to relate the coordinate frames of the camera and the ultrasound image to the robot end-effector’s state. Leven et al. [31] used the daVinci robot API in order to attain the relative positions between the tip of the endoscope and a laparoscopic ultrasound end effector; comparing the robot kinematics method to an optical tracking method using the Optotrak system showed target registration errors of 2.16 mm and 2.83 mm, respectively. The issue with using robot kinematics is that errors from encoder readings will accumulate from forward kinematics; the manipulators have been estimated to produce up to 1 mm of localization error, and will increase significantly due to instrument bending at laparoscopic ports [29]. This poses a problem since the ultrasound probe must be held in a repeatable position by the robot. Furthermore, a calibrated ultrasound probe as well as a calibration between the probe and 4  the robot’s end effector is still necessary, adding additional potential sources of error. Recently, there have been some techniques proposed that do not rely on external trackers or robot kinematics. In [31], the authors examined the feasibility of using the endoscopic camera to track a calibrated, marked laparoscopic US probe, showing the ability to overlay the LUS image in the correct position and orientation in the surgical cameras; however, no accuracy measurements were provided regarding marker-based tracking. Su et al. [64] presented a CT to stereoscopic video registration method that utilizes iterative closest point (ICP) registration algorithm to find the transformation between manually selected points in a CT volume and manually selected points in the stereoscopic camera scene. However, it requires a recursive registration method to improve the manual alignment of the registration, which was often found not to converge. Another method by Teber et al. [65] involved securing navigational aids (needles with an optical marker) to an in-vitro kidney and imaging the kidney using both a C-arm and stereoscopic cameras. The navigational aid locations as seen in the camera could then be identified in the Carm images to provide a registration. They reported navigational aid localization accuracies of 0.5 mm, but did not provide any target registration errors between the CT and the camera images. The fundamental problems of previously investigated image registration techniques necessitates the investigation of other methods that do not require time consuming calibration techniques or additional equipment in the operating room in order to achieve higher registration accuracies with a high rate of repeatability, and impact to the operating room workflow.  1.2.2  Maintaining an Image Registration  Assuming that image registration can be achieved, there needs to be consideration of how the registration is maintained over time after intra-operative image acquisition is complete. For instance, during partial nephrectomy, the kidney is mobilized prior to the excision of the tumour, and therefore is allowed to move and deform relatively freely. Therefore, if the kidney moves after the initial ultrasound registration, the image is no longer registered to the kidney, and a new ultrasound image  5  and registration must be performed. This is a non-trivial task since the kidney is mobilized and persistent contact with the ultrasound probe can no longer be maintained. Therefore, the movement and deformation of the tissue must be tracked such that the registered US image can be transformed in a similar manner in order to keep its spatial correspondence to the underlying tissues. Endoscopic images of tissue can be considered a particularly difficult case for image-based tracking, as the tissues themselves are often visually limited in their distinctiveness from neighbouring regions, the environment is subject to illumination and shadowing effects, and the field of view of the endoscopic camera is especially narrow. Furthermore, the wet tissue surfaces exhibit a high degree of specular reflection, which obscures the tissue textures from the tracking algorithms and presents specular regions that mask the true movement and deformation of tissue. Tissue tracking is a relatively new area for image-guided surgery; Nakamura et al. [48] and Ginhoux [20] presented the first work for tracking a beating heart using artificial markers secured to the surface of the beating heart. However, since the deposition and securing of artificial markers onto tissue surfaces is a time-consuming task, especially for minimally invasive procedures, and they do not necessarily provide a dense profilometry of the surface due to minimum separation distances between markers [20], the research community has quickly moved towards methods which only rely on the images themselves to track the movement of the tissue. Image-based tracking of tissue can be divided into two main categories: modelbased characterization and tracking, and salient feature detection and tracking. Model-based methods involve an explicit knowledge of the tissue that is being imaged, establishing a model to fit the structure of the tissue, and updating model location and parameters over time. Lau et al. [30] presented a method that allows the motion recovery from a beating heart region using a B-spline surface. A thin plate spline model was used for estimating the motion of the beating heart in [55][57][56][58]; however, these models are only suitable for estimating deformation for small patches with low-order warping. Other methods include motionbased modeling techniques specifically for the beating heart, such that an expected periodic motion can predict the motion of the tissue in the endoscopic cameras and provide a prediction of the tracked tissue position over time [22][51]. The ma6  jor issue with model-based tracking is that it relies on computationally expensive and complex methods for fitting the models to the tissue geometry, and therefore must use low-order models that are unable to capture the complex deformation and movement of soft tissue. In order to acquire a dense profilometry and tracking of soft tissue surfaces in the surgical endoscopes, salient feature detection and tracking is required [46]. Salient features are locations within an image that are 1. Detectable 2. Localizable 3. Sufficiently Distinct in Texture 4. Repeatable from frame to frame. There are a wide range of salient features that have been developed in the computer vision community, from ones based on corner detectors that are susceptible to illumination, scale, orientation, and viewpoint changes, to those that are illumination invariant, scale-invariant, orientation-invariant, and resistant to viewpoint change. Mikolajczyk, Schmid and Tuytelaars have provided several evaluations of popular salient features in the computer vision community [41][42][68]. Mountney et al. [45] implemented 21 different salient feature detectors on endoscopic images, and found that the most accurately tracked features were those that relied on intensity gradient-based methods such as the ScaleInvariant Feature Transform (SIFT [35]), Speeded Up Robust Features (SURF [7]), Gradient Location Orientation Histogram (GLOH [41]) and Geodesic-Intensity Histogram (GIH [33]). These methods are computationally expensive as they require significant image processing in every frame to identify the locations of image patches that are scale invariant, rotation invariant and illumination invariant. Preliminary work in increasing the accuracy of salient feature techniques for endoscopic surgery has been presented in [19]; however, these methods require additional image processing and therefore are more suited for offline processing. There is a clear trade-off in the tissue tracking literature between tracking ability and speed. Specifically, if one is to acquire highly-distinct features that can be matched and tracked from frame to frame, considerable image processing is required in order to achieve the higher degree of saliency. Corners and edges from 7  Stereo Camera  air-tissue boundary Air Fiducial Tissue  Ultrasound  Figure 1.1: Localizing fiducials across an air tissue-boundary for image registration. salient feature detectors such as the Harris corner detector and the Canny edge detector are fast but are easily lost or mismatched in subsequent frames; blob features such as SIFT are highly descriptive of a region of tissue and therefore are much better localized and tracked over time, but cannot be processed in real-time with present hardware platforms. Therefore, a compromise must be found.  1.3  Thesis Objectives  In this thesis, we present two novel methods: one for acquiring an image registration between intraoperative ultrasound and surgical stereo cameras, and another for maintaining the image registration using salient feature tracking from the camera images. The hypothesis is that by pressing ultrasound bead fiducials against the tissue surface, the fiducials can be seen in both the camera images and the ultrasound  8  Stereo Camera  tissue surface at time t2  tissue surface at time t1  surface features at time t2  surface features at time t1  Underlying features, no longer seen by US Underlying features  Figure 1.2: Tracking tissue surface feature locations from frame to frame allows spatial estimation of underlying feature movement. In the same way, a registered medical image can be transformed and maintain its spatial correspondence to the underlying tissue features. images. Using these images, we are able to accurately determine their 3D locations in both camera and ultrasound coordinate systems, thus providing the means for acquiring a registration between the camera and the ultrasound frames (Figure 1.1). The second hypothesis is that by using camera-based tracking of the tissue surfaces of a medical image registration, we are able to maintain the registration of a medical image to the imaged tissues in spite of movement and deformation of the underlying tissues (Figure 1.2). In order to explore these hypotheses, the following objectives are defined: 1. Studying the accuracy of localizing fiducials across an air tissue-boundary. 9  2. Developing a method for acquiring an ultrasound image registration using fiducials on the air tissue boundary, and comparing its registration accuracies with other registration methods. 3. Developing a camera-based tissue tracking method for dense 3D tracking of tissue profiles, and comparing its tracking ability to state of the art algorithms. 4. Incorporating a framework for maintaining an image registration to underlying tissues using the tissue tracking method, and investigating the accuracy of the registrations after periods of tracking.  1.4  Thesis Outline  This thesis is presented in a manuscript-based thesis format, as permitted by the Faculty of Graduate Studies at the University of British Columbia, with slight modifications that have been mentioned in the Preface. Each chapter contains an introduction and background of chapter-specific content, methodologies, experiment results, discussions, and conclusions. Since each chapter is presented as a standalone manuscript, the chapters can be read individually. The only exception is in Chapter 3, which represents an extension of the work presented in Chapter 2; As such, the majority of the introduction and review of prior work is presented in Chapter 2. The contributions of this thesis are summarized below: 1. Chapter 2: We present a registration method that involves only the use of a registration tool that can be imaged by both the 3D ultrasound probe and stereo cameras in order to obtain an accurate registration. The registration tool comprises of a set of optical markers and a set of ultrasound bead fiducials in a known geometric configuration. We evaluate the accuracy of identifying bead fiducials in a 3D ultrasound volume under various conditions as the registration tool is pressed against a tissue phantom (termed the air-tissue boundary), achieving sub-millimeter fiducial localization accuracy. We then provide a closed-form solution for acquiring the transformation between the stereoscopic camera frame to the ultrasound frame through the registration 10  tool optical markers and the ultrasound fiducials. Evaluations of target registration accuracies are carried out on an ultrasound phantom using a stereocamera system with a large camera disparity as a proof of concept. 2. Chapter 3: We extend the registration tool method from Chapter 2 for registering volumes acquired from a trans-rectal ultrasound probe to the da Vinci Surgical System stereo laparoscopic cameras. We modify the registration tool such that it can fit through a trocar, and use a least squares registration method that provides a higher degree of registration accuracy than the previous method. We present experimental results of registration accuracies on in-vitro tissues using the da Vinci stereoscopic laparoscope (which has a small camera disparity), and register an ultrasound image into the stereo camera frames, showing an overlay of sub-surface features boundaries (i.e. urethra) of a prostate phantom. 3. Chapter 4: We will present a framework for persistent tracking of tissue using an endoscopic stereo camera in real-time. We use recently developed, efficient salient feature detectors (the STAR detector and the Binary Robust Independent Elementary Features extractor) to show that dense image feature representation of 3D tissue deformation and motion tracking is possible at high frame rates, and compare its performance to other standard feature detectors in literature such as SIFT and SURF. We then demonstrate that, given an image registration from a method such as the one suggested in Chapter 2 and 3, we can successfully maintain a registration based on the tracking of the underlying tissues.  11  Chapter 2  3D Ultrasound to Stereoscopic Camera Registration through an Air-Tissue Boundary: An Initial Feasibility Study 2.1  Introduction  Augmented reality in surgery often involves the superposition of medical images of a patient’s anatomy onto a camera-based image of the overlying tissues. With sufficient registration accuracy, the surgeon is then able to localize the internal anatomy (subsurface features such as lesions or nerves) for improved surgical guidance. Prior work in augmented reality for surgical applications has been applied to x-ray, computed tomography (CT), magnetic resonance imaging (MRI) and ultrasound (US) [21]–[34]. This paper explores the ability to display an external three-dimensional ultrasound (3DUS) volume in a laparoscopic camera view during minimally invasive surgery. To date, registration between an US volume and camera images has generally involved three tasks: calibrating the stereo cameras, calibrating the ultrasound volume to the pose of the US transducer, and tracking both the US transducer and  12  the cameras in 3D space. The first task uses well-known techniques for modelling the relative camera poses, focal point, principal points, and lens distortion. For the second task, numerous techniques for US transducer calibration have been previously investigated; Lindseth [32] and Mercier [39] offer extensive reviews on the subject of US transducer calibration. The third task, tracking of the cameras and an US transducer, can be performed using magnetic trackers, optical trackers, or robot kinematics [14]–[34]. While these techniques have proven useful in the operating room, they still have their shortcomings: time-consuming transducer calibrations, additional tracking equipment, line-of-sight issues, modifications to the US transducers and cameras for tracking, and consequently the consumption of valuable time and space in the operating room. In addition, cumulative errors in transducer calibration and equipment tracking contribute to errors in registration that may be amplified by a lever-arm effect. We address the above issues by introducing a new technique for registering stereoscopic cameras and 3DUS using a registration tool. Fiducials attached to the registration tool are held against the air-tissue boundary and imaged through the tissue. Their locations in the US volume are determined, and through the known geometry of the tool and the tracking of the tool by the stereoscopic cameras, a 3DUS to camera registration can be found. This provides a direct transformation from the US to the stereoscopic cameras, thus eliminating errors related to calibrating the US transducer and tracking the US transducer and the cameras. Registration of 3DUS to the cameras directly across the air-tissue boundary is the key innovation of this paper. We envisage using the novel registration system in laparoscopic surgery, where structures being operated on can be imaged more effectively with external or endocavity US, or with laparoscopic US from a different direction than the camera view. For example, during partial nephrectomy, the kidney can be imaged with external US through the abdomen, or with laparoscopic US from one side of the kidney while the camera and laparoscopic instruments operate from the other. Similarly, in laparoscopic radical prostatectomy, endorectal US can image the prostate as in prostate brachytherapy, while the surgery proceeds with the trans-abdominal or trans-perineal approach. In both cases, registration is required to overlay the US image onto the endoscopic camera view. We propose that it be carried out with the 13  technique described in this paper.  2.2  Methods  Our study had two goals: to determine the accuracy of locating US fiducials on an air-tissue boundary, and to determine the feasibility of using these fiducials to register 3DUS to stereoscopic cameras. We first examined the accuracy of localizing spherical fiducials on an air-tissue boundary in US. Air-tissue boundaries exhibit high reflection at their surfaces that may make it difficult to accurately localize fiducials. We considered five variables that could affect the accuracy of fiducial localization: (1) fiducial size, (2) lateral position in US image, (3) angle of air-tissue boundary, (4) boundary depth, and (5) stiffness of tissue. Next, we implemented a direct closed-form registration method between 3DUS and a stereoscopic camera by localizing surface fiducials in both the 3DUS volume and the stereo camera (Figure 2.1a). This method is described below. We begin by defining four coordinate systems, the stereo camera system {o0 ,C0 }, the optical marker system {o1 ,C1 }, the US fiducial system {o2 ,C2 } and the 3DUS system {o3 ,C3 }. The transformation from {o1 ,C1 } to {o0 ,C0 }, 0 T1 , is found by stereo-triangulating the optical markers on the registration tool. The transformation from {o2 ,C2 } to {o1 ,C1 }, 1 T2 , is constant and known from the tool geometry. The transformation from {o3 ,C3 } to {o2 ,C2 }, 2 T3 , is found by localizing three fiducials that define {o2 ,C2 } in the 3DUS system {o3 ,C3 }. The three fiducial locations in coordinate system {o3 ,C3 }, 3 x0 , 3 x1 and 3 x2 , define two perpendicular vectors with coordinates  3  v1 =3 x1 − 3 x0  (2.1)  3  v2 =3 x2 − 3 x0 ,  (2.2)  that can be used to define the unit vectors of frame C2 in system {o3 ,C3 }:  14  3  i2 =  3v  ||3 v 3v  1  (2.3)  1 ||  3v 2 ||3 v1 × 3 v2 || 1×  3  k2 =  3  j2 = 3 k2 × 3 i2 .  (2.4) (2.5)  The origin o2 has coordinates 3 x0 in {o3 ,C3 }. The homogeneous transformation from the 3DUS system {o3 ,C3 } to the US fiducial system {o2 ,C2 }, 3 T2 , is then 3  and 2 T3 = 3 T2  −1  T2 =  3i  2  0  3j  2  0  3k 2  3x  0  1  0  (2.6)  . The overall transformation between the stereo camera system  {o0 ,C0 } and the 3DUS system {o3 ,C3 } is then 0  T3 = 0 T1 1 T2 2 T3 .  (2.7)  A homogenous transformation can then be constructed to register the 3DUS frame to the stereo camera frame. Lastly, with known camera parameters (focal length, image center, distortion coefficient, etc.), the registered US volume in the camera frame can be projected onto the two stereoscopic images.  2.2.1  Experimental Setup  Figure 2.1b shows the experimental setup used in this study. 3DUS volumes were captured using a Sonix RP US machine (Ultrasonix Medical Corp., Richmond, Canada) with a mechanical 3D transducer (model 4DC7-3/40). A three-axis mechanical micrometer stage was used for accurate positioning of registration tools relative to the fixed US transducer and stereo cameras. Surface Fiducial Localization Sets of steel spherical fiducials arranged in precisely known geometry were pressed against tissue-mimicking phantoms, imaged and localized in the 3DUS volumes. 15  (a)  (b)  Figure 2.1: a) Schematic of the registration method, b) Experimental Setup The steel plates contained three sets of fiducials spaced 10 cm apart, with each set consisting of a center fiducial and eight surrounding fiducials at a radius of 10 mm (Figure 2.2a). The fiducials were seated in holes cut into the plate on a water jet cutter with dimensional accuracy of 0.13 mm. Fiducial diameters of 2 mm, 3 mm and 4 mm were imaged through phantoms with thicknesses of 3 cm, 6 cm and 9 cm, stiffnesses of 12 kPa, 21 kPa and 56 kPa, and boundary angles of 0 degrees, 20 degrees and 40 degrees. The phantoms were made from polyvinyl chloride (PVC) using a ratio of liquid plastic to softener of 1:1 (12 kPa, low stiffness), 2:1 (21 kPa, medium stiffness), and :0 (56 kPa, high stiffness) to create phantoms that mimicked tissue properties [3]. To create fully developed speckle, one percent (mass) cellulose was added as a scattering agent. The five independent variables evaluated were varied independently about a control case (3 mm fiducials, 6 cm depth, 21 kPa stiffness, 0 degree angle, and central location). The surface fiducial plates were pressed lightly into the PVC tissue phantoms, and imaged through the phantoms. The fiducials were then manually localized in the US volume, and the Euclidean distances between the outer fiducials and the center fiducials were compared to the known geometry to determine the accuracy of localization. For every variable level, 10 tests with 8 error measurements were performed (n = 80). The focal depth was set to the boundary depth in all tests.  16  Registration For the registration experiments, a Micron Tracker H3-60 optical tracking system (Claron Technology, Toronto, Canada) was used as the stereoscopic cameras. This provided a stable and accurate pre-calibrated camera system and allowed the analysis of registration accuracy to focus mainly on localizing the registration tool in US. The registration tool was also built on a steel plate cut with a waterjet cutter (Figure 2.2b). On the top surface are three Micron Tracker markers spaced 20 mm and 15 mm apart forming an ”L” shape; on the bottom surface are three surface fiducials (3 mm) seated in holes cut directly in line with the Micron Tracker markers. Registration accuracy was measured using the test tool shown in Figure 2.2c. The test tool consists of a steel frame, a nylon wire cross which can be accurately localized by US in a water bath, and three Micron Tracker markers which allow the tracker to determine the location of the wire cross in the camera frame. We first determined the homogeneous transformation relating points in the US frame to the camera frame using the registration tool. We then evaluated the accuracy of the transformation using the test tool placed in a water bath. The crosswire location in US registered into the camera frame was compared to the location of the crosswire in the camera coordinates. This was done by saving a US volume of the crosswire in a water bath, and then draining the water to track the optical markers on the test tool in the stereoscopic cameras. Registration error was defined as the Euclidean distance between the position predicted by registration and the tracked  (a)  (b)  (c)  Figure 2.2: a) Fiducial localization test plate, b) Registration tool, c) Registration accuracy test tool  17  position of the crosswire in the cameras. The errors were transformed into the US frame so that they could be specified in the lateral, elevational and axial directions. To ensure accurate scales, the US volumes were generated using the correct speeds of sound for the phantoms and for water.  2.3  Results  The results of the fiducial localization experiments are shown in Table 2.1. Hypothesis testing was used to determine the statistical signficance of variables. Given two groups x and y, unpaired student t-tests determine the probability p that the null hypothesis (µx = µy ) was true. A one-way analysis of variance (ANOVA) is a generalization of the student t-test for more than two groups and produces probability statistics {F, p}. For both the student t-test and ANOVA, a generally accepted probability for suggesting statistical significance is p < 0.05. The ANOVA results showed that the size of the fiducial, the depth of the boundary, and the angle at which the boundary was imaged affect the accuracy of fiducial localization (ANOVA: Fsize = 7.34, psize = 9.96E − 04; Fdepth = 15.5, pdepth = 1.11E − 06; Fangle = 8.49, pangle = 3.61E − 04). However, the tissue stiffness does not significantly change the accuracy of fiducial localization (ANOVA: Fsti f f ness = 0.0414, psti f f ness = 0.960). T -tests results showed that the lateral position of the fiducials on the boundary plays a significant role in the accuracy of localization (t-test: p = 7.10E − 18). In our registration experiment, four unique poses of the registration tool were  Figure 2.3: Example images of an air-tissue boundary (left) and a 3 mm fiducial pressed against an air-tissue boundary (right). 18  Table 2.1: Mean, standard deviation and median of errors associated with localizing bead fiducials at air-tissue boundaries. * = Significantly different from control case Variable Fiducial Size  Boundary Depth  Tissue Stiffness  Boundary Angle  Lateral Position On Boundary  Value 2 mm 3 mm 4 mm Long (9 cm) Med. (6 cm) Short (3 cm) High (12kPa) Med. (21kPa) Low (56kPa) 0◦ 20◦ 40◦ Center Offset (10 cm)  Mean ± Std Dev. (mm) 0.94 ± 0.34* 0.82 ± 0.28 0.70 ± 0.20 0.54 ± 0.18* 0.82 ± 0.28 0.66 ± 0.20* 0.81 ± 0.30 0.82 ± 0.28 0.80 ± 0.19 0.82 ± 0.28 0.78 ± 0.28 1.04 ± 0.35* 0.82 ± 0.28* 0.60 ± 0.28*  Median (mm) 0.89 0.78 0.67 0.55 0.78 0.64 0.78 0.78 0.80 0.78 0.75 0.97 0.78 0.59  RMS Error 1.00 0.87 0.73 0.57 0.87 0.69 0.86 0.87 0.82 0.87 0.83 1.10 0.87 0.66  Table 2.2: Mean errors (n = 12) between points in a registered 3DUS volume and its location in the stereo-camera frame. Registration 1 Registration 2 Registration 3 Registration 4 Average  eLateral (mm) 0.90 ± 0.44 1.02 ± 0.45 0.65 ± 0.43 0.57 ± 0.40 0.78 ± 0.45 mm  eElevational (mm) 0.77 ± 0.33 0.60 ± 0.32 0.76 ± 0.33 0.82 ± 0.30 0.74 ± 0.32 mm  eAxial (mm) 1.08 ± 0.75 1.14 ± 0.99 1.01 ± 0.63 1.03 ± 0.79 1.07 ± 0.78 mm  eTotal (mm) 1.75 ± 0.56 1.83 ± 0.74 1.55 ± 0.53 1.60 ± 0.58 1.69 ± 0.60 mm  used for a physical configuration of the camera and US transducer. The registration errors were computed at 12 different locations in the US volume. To test the repeatability of this method, the registration was repeated four times on the same images. Table 2.2 shows that the average error among all the transformed points for all transformations was 1.69 mm, with a minimum error of 1.55 mm and a maximum error of 1.84 mm. The time required to perform a registration was approximately equal to the acquisition time of a 3DUS volume (2 sec).  2.4  Discussion  The fiducial localization tests showed that errors associated with localizing surface fiducials at an air-tissue boundary ranged from 0.54 mm to 1.04 mm. Several vari19  ables had a significant effect on accuracy. The smaller fiducials (2 mm) produced higher localization errors, suggesting that the fiducials became lost in the boundary reflection. The larger fiducials presented larger features that were easier to detect. Boundary depths farther away from the US transducer produced lower localization errors, as fiducial centers were more difficult to localize when approaching the near field [52]. Two results from the localization error analysis that have practical implications are that tissue stiffness does not significantly affect the accuracy of fiducial localization and that only large angles (e.g. 40 degrees) significantly affect the localization accuracy. Our registration method should therefore remain accurate for tissues with a wide variety of stiffnesses and shapes. The lateral location of the fiducials on the air-tissue boundary, however, was significant to the localization accuracy. The air-tissue boundary exhibited greater specular reflection near the axis of the US transducer, and thus fiducials offset laterally from the axis were less obscured by specular reflection and could be more accurately localized. The registration experiment showed that using fiducials on an air-tissue boundary for direct registration between 3DUS and stereo cameras is feasible with an accuracy of 1.69 ± 0.60 mm. The largest errors were in the axial directions since the tail artifacts of the surface fiducials obscured the true depth at which the fiducials were located in the US volume (Figure 2.3). Repeated registrations on the same data and registrations using different physical locations of the registration tool all gave consistent overall and component errors, suggesting a model of the reverberation tail could improve localization and registration accuracy further. Nevertheless, based on the overall errors, our registration method is a promising alternative to using tracking equipment, where errors for similar US-to-camera registration systems are within 3.05 ± 0.75 mm [14] for magnetic tracking and 2.83 ± 0.83 mm [31] for optical tracking. It is clear that the main source of error for the new registration method is the localization of registration tool fiducials, as any localization errors would be amplified by a lever-arm effect. The proposed registration is ideal for situations where the camera and the US transducer are fixed. However, if the US transducer or the camera is moved, a new registration can simply be acquired. Alternatively, in the case of robot-assisted surgery, the robot kinematics can be used to determine the new locations of the 20  camera or the US transducer and maintain continuous registration to within the accuracy of robot kinematic calculations from joint angle readings. A few practical issues with the proposed registration method should be considered. First, stereo-camera disparity plays a significant role in the accuracy of registration. The registrations presented in this paper were performed using the Claron Micron Tracker; this represents an ideal case, as the cameras have a large disparity (12 cm) and a tracking error of ± 0.2 mm. In minimally invasive surgery, laparoscopic stereo-cameras having much smaller disparities would be used, possibly resulting in higher errors (although the cameras are imaging a much shallower depth so that the effect of disparity is lessened). This can be compensated for by maximizing the size of the registration tool, producing a well-conditioned system for computing the transformations. Such a registration tool could be designed to fold and fit through a trocar for laparoscopic surgery. Another way to improve registration accuracy is to introduce redundancy into the registration data. Our registration tool featured only the minimum three fiducials required to extract the six degrees of freedom transformation between the US volume and the stereoscopic cameras; with more fiducials on the registration tool, averaging could be used to reduce errors. In addition, higher accuracies can be achieved by considering different poses of the registration tool in both the US and the camera frame [53].  2.5  Conclusions  In this study, we evaluated the accuracy of localizing fiducials pressed against an air-tissue boundary in ultrasound. We have shown that this method can be used to perform 3D ultrasound to stereo camera registration for augmented reality in surgery. This method provides a direct closed-form registration between a 3DUS volume and a stereoscopic camera view, does not require calibration of the US transducer or tracking of cameras or US transducers, and provides improved accuracies over tracking-based methods. Chapter 3 will investigate the use of laparoscopic stereo-cameras in the registration technique.  21  Chapter 3  Registration of 3D Ultrasound to Laparoscopic Stereo Cameras through an Air-Tissue Boundary: Tests with the da Vinci Stereo Cameras Robotic-assisted surgery using the da Vinci Surgical System is potentially an excellent platform for augmented reality, because the da Vinci surgeon already views the surgical field through a 3D computer display. Robotic-Assisted Radical Prostatectomy (RALRP) is a common application of the da Vinci system. Previous studies have found that intraoperative transrectal ultrasound (TRUS) is useful for surgical navigation in prostatectomy, and may improve oncological and functional outcomes[69, 71–73]. This suggests that an augmented reality system for RALRP based on intraoperative TRUS might also be useful for the surgeon. During the initial feasibility study explored in Chapter 2 that used a registration tool for acquiring an 3D ultrasound image registration, we found the accuracy of that method to be comparable to or higher than methods using external tracking systems. However, that study used stereo cameras with much greater disparity  22  Figure 3.1: Registration concept. than the stereo endoscope of the da Vinci Surgical System, making it uncertain that the high accuracy we found would translate into a practical clinical system. In this study, we examine the accuracy of registering a 3D ultrasound volume from intraoperative TRUS to the stereo endoscope of the da Vinci Surgical System using a registration tool pressed against an air-tissue boundary. We also redesign the registration tool from [79] such that it fits within a standard trocar and therefore can be dropped into the body cavity through an instrument port. We also investigate a closed-form least-squares solution to the registration problem in order to improve the accuracy of registration. Direct US to camera registration, without any external tracking system, requires common features that can be identified in both modalities and used as fiducials. Unfortunately, US and camera data only overlap at boundaries between air and tissue. Therefore, common features must be located at the air-tissue boundary in order to be used as registration targets. This paper describes a method for registering three-dimensional ultrasound (3DUS) to stereoscopic cameras based on this concept. In our method, a registration tool with three optical markers and three ultrasound fiducials is pressed against the air-tissue boundary so that it can be imaged by both the cameras and the 3DUS, thus providing common points in the two frames. By eliminating the US transducer calibration and the external tracking systems, this method reduces the possible sources of error in the registration. In an initial feasibility study, we found the accuracy of this method to be comparable to  23  or higher than methods using external tracking systems [79]. However, that study used stereo cameras with much greater disparity than the stereo endoscope of the da Vinci Surgical System, making it uncertain that the high accuracy we found would translate into a practical clinical system. In this study, we examine the accuracy of registering 3DUS to the stereo endoscope of a da Vinci Surgical System using a registration tool pressed against an air-tissue boundary.  3.1  Method  3.1.1  Registration Concept  Figure 3.1 depicts our registration concept. We define three coordinate systems: the stereo camera coordinate system {o0 ,C0 }, the optical marker coordinate system {o1 ,C1 }, and the 3DUS coordinate system {o2 ,C2 }. The goal of the registration is to determine the homogeneous transformation 0 T2 from {o0 ,C0 } to {o2 ,C2 }. The coordinates of the three camera markers in {o0 ,C0 }, 0 xc0 , 0 xc1 , and 0 xc2 , are determined by stereo triangulation. Likewise, the coordinates of the three US fiducials in {o2 ,C2 }, 2 xus0 , 2 xus1 , and 2 xus2 , are determined by segmenting the fiducials out of the 3DUS volume. The offset between the camera markers and the ultrasound fiducials, 1 vuc , is known from the geometry of the tool. The offset is applied to yield the position of the ultrasound fiducials in {o0 ,C0 }, 0 xus0 , 0 xus1 , and 0 xus2 . There are then three common points known in both {o0 ,C0 } and {o2 ,C2 }, which means that a standard least squares approach can be used to solve for the transformation 0T . 2  Multiple positions of the registration tool can be imaged and incorporated to  increase the number of fiducials, and thus the accuracy of registration.  3.1.2  Apparatus  Laparoscopic Stereo Cameras A 12-mm 0-degree da Vinci stereo endoscope was used for camera imaging. A da Vinci Standard model was used for phantom testing, and a da Vinci Si model was used for ex-vivo tissue testing. The stereo camera images were captured using two Matrox Vio cards (Matrox Electronic Systems, Dorval, QC), with the left and right 24  Figure 3.2: Robotic TRUS imaging system. channel DVI outputs from the da Vinci surgical console streamed to separate cards. The capture system ran on an Intel PC with 10 GB memory running Windows XP 32-bit Edition. The images were captured synchronously using the native Matrox API at 60 frames per second and a resolution of 720 by 486 pixels. Three-dimensional Ultrasound System All ultrasound data for this study were captured using a biplane parasagittal/transverse TRUS transducer in combination with a PC-based ultrasound console (Sonix RP; Ultrasonix Medical Corp., Richmond, BC). The 128-element, linear parasagittal array was used for all imaging, with an imaging depth of 55 mm. Focus depth was adjusted before testing to produce the best possible image, and remained constant. A robotic TRUS imaging system (shown in Figure 3.2) based on a modified brachytherapy stepper [1] was used to capture 3D data by rotating the TRUS transducer around its axis and recording ultrasound images at angular increments of 0.3 degrees. The range of angles was adjusted according to the position of the registration tool.  25  Figure 3.3: Schematic of registration tool. Dimensions are shown in millimeters. Registration Tool Figure 3.3 shows the registration tool used in this experiment. It consists of a machined stainless steel plate, with angled handles designed to be grasped by da Vinci needle drivers. Optical markers on the top surface are arranged directly above stainless steel spherical fiducials on the opposite face. The spherical fiducials are 3 mm in diameter, and are seated in 1-mm circular holes machined into the plate by a water jet cutter with dimensional accuracy of 0.13 mm, in order to locate them accurately. The tool was designed to fit through the 10-mm inner diameter of the da Vinci cannulas. It is approximately 9.5 mm wide, with an overall length of approximately 54 mm.  3.1.3  Registration Procedure  In this study, we applied our method to two different tissue phantoms. A custommade PVC prostate phantom was used to register our 3DUS system to a da Vinci Standard system in a research lab at The University of British Columbia. An exvivo porcine liver was used to register our 3DUS system to a da Vinci Si system in a research lab at Vancouver General Hospital. Both tests followed the same experimental procedure, described below. Figure 3.4 shows an overview of the experimental setup used in this study. 26  Figure 3.4: Experimental setup. The TRUS transducer was installed on a standard operating room table using a brachytherapy positioning arm (Micro Touch; CIVCO Medical Solutions, Kalona, IA). The da Vinci stereo endoscope was positioned so that it could view the parasagittal imaging array of the TRUS transducer. An ultrasound imaging phantom was installed over the TRUS probe, with the top surface visible in the da Vinci camera view. The registration tool was applied to the top surface of the phantom using the da Vinci manipulators. The tool was positioned so that the three ultrasound fiducials were visible in the 3DUS and the three optical markers were visible in the da Vinci camera view. The left and right camera images and a 3DUS volume were captured. The registration tool was moved to a new position, and reimaged. A total of twelve positions were imaged for each ultrasound phantom. A standard stereo camera calibration was performed using Bouguet’s camera calibration toolbox for Matlab [9]. The registration tool optical markers were selected in the left and right camera images, and the initial selection was automatically refined to sub-pixel precision using a Harris corner detector. The left and right image points were then used to triangulate the positions of the registration tool optical markers in the 3D camera frame. Similarly the tips of the US fiducials were manually localized in the 3DUS volumes. As described above, the common fiducial points on the registration tool were used to solve the homogeneous transformation between the camera and ultrasound frames using a standard least-squares  27  (a)  (b)  Figure 3.5: 3DUS and da Vinci stereo endoscope arranged to image the exvivo air-tissue boundary (a) and da Vinci camera view of the tool pressed against the surface of the porcine liver (b). algorithm [66] minimizing the sum of squared distance error between the common points. Fiducial points from multiple positions of the registration tool were incorporated in order to increase the accuracy of the registration. Between one and four positions of the registration tool were used, with the registration tool translated and 28  25  32.7  A  20  A  SECTION A-A  Figure 3.6: Schematic of cross wire phantom. Dimensions are shown in millimeters. rotated at random across the portion of the phantom’s surface that could be imaged by both the TRUS and the stereo endoscope (approximately 30 mm by 30 mm). Fiducial registration error (FRE) was defined as the average residual error between the camera markers and ultrasound fiducials.  3.1.4  Validation Procedure  Figure 3.6 shows a cross wire phantom used to evaluate the accuracy of our registration method. The phantom was designed to provide points that could be precisely localized by both 3DUS and stereo cameras. It consists of 8 intersection points of 0.2-mm nylon wire arranged in a grid approximately 35 mm by 25 mm by 20 mm. The wire grid is supported by a custom-built stainless steel frame. After the registration was determined, without moving either the US transducer or the stereo endoscope the ultrasound phantom was removed and the transducer was immersed in a waterbath. The cross wire phantom was installed in the waterbath and a 3DUS volume of the phantom containing all eight cross wire points was captured. Again without disturbing any of the apparatus, the water in the bath was  29  drained and left and right camera images of the cross wire phantom were captured. This process was repeated for a second position of the cross wire phantom, yielding sixteen target points in all. The cross wire points were localized in the camera frame using stereo triangulation, and in the US frame by manually localizing the points in the 3DUS volumes. (All ultrasound data were corrected for differences in speed of sound (cPVC =1520 m/s; ctissue =1540 m/s; cwater =1480 m/s).) To determine registration error, we used the previously found registrations to transform the positions of the cross wire intersections from the 3DUS frame into the stereo camera frame. These transformed points were compared to the triangulated positions of the cross wires, with Target Registration Error (TRE) defined as the distance between the transformed US points and the triangulated camera points. We measured the error for registrations incorporating between one and four positions of the registration tool (i.e. between three and twelve fiducials). A one-way analysis of variance (ANOVA) is a generalization of the student t-test for more than two groups. We used an ANOVA to determine whether registrations incorporating between two and four positions of the tool were significantly more accurate than the single tool case. The registration was also used to create examples of images that could be used by the da Vinci surgeon for guidance during surgery. The da Vinci camera images and a TRUS volume were captured while the stereo endoscope and the TRUS both imaged a prostate elastography phantom (CIRS, Norfolk, VA). The simulated anatomic features in the phantom were then overlaid in the correct position and orientation on both images.  3.2  Results  Figure 3.7 shows the appearance of spherical fiducials pressed against the surface of a PVC ultrasound phantom and an ex-vivo liver sample. Table 3.1 lists TRE and FRE results when registering through the PVC prostate phantom. Between one and four positions of the registration tool were used to determine the transformation, so results are averaged over multiple possible combinations (e.g. 12 choose 4 combinations, 12 choose 3 combinations, etc). Table 3.2 lists registration accuracy results when imaging through the ex-vivo  30  (a)  (b)  (c)  Figure 3.7: Surface fiducial against an air-tissue boundary and imaged through a PVC phantom (a) and an ex-vivo liver tissue sample (b). Illustration of method for localizing fiducial tip (c): The axis of the reverberation is identified, and the tip is selected along that line. Table 3.1: Registration accuracy imaging through PVC phantom. (Asterisks indicate statistically significant improvements over the single tool result.) tool poses  fiducials  targets  FRE (mm)  TRE (mm)  1  3  16  0.20 ± 0.09  3.85 ± 1.76  2  6  16  0.75 ± 0.38  2.16 ± 1.16*  3  9  16  0.81 ± 0.55  1.96 ± 1.08*  4  12  16  0.85 ± 0.62  1.82 ± 1.03*  Table 3.2: Registration accuracy imaging through ex-vivo liver tissue. tool poses  fiducials  targets  FRE (mm)  TRE (mm)  1  3  16  0.54 ± 0.20  2.36 ± 1.01  2  6  16  0.82 ± 0.29  1.67 ± 0.75*  3  9  16  0.91 ± 0.32  1.57 ± 0.72*  4  12  16  0.95 ± 0.34  1.51 ± 0.70*  porcine liver. Figure 3.8 shows an example of an overlay image produced using our registra31  Figure 3.8: Overlay of TRUS information on da Vinci camera view based on registration. Segmented urethra (red) and seminal vesicles (blue) are shown. tion method. The left da Vinci camera image, captured while the stereo endoscope imaged the prostate elastography phantom, is shown. The segmented simulated urethra and seminal vesicles are shown superimposed in the correct position and orientation on the camera image.  3.3  Discussion  In our previous feasibility study, we found an average TRE of 1.69 ± 0.6 mm using a single registration tool position and imaging through a PVC tissue phantom [79]. In that experiment we used stereo cameras with 120-mm disparity, and a relatively large registration tool. In this experiment we used a da Vinci stereo endoscope with 3.8-mm disparity, and a smaller registration tool designed to fit through a 10-mm cannula. Based on the differences in camera and tool geometry, it is not surprising that we found the equivalent accuracy measure in this study to be greater. In this study, using a single registration tool position, we found an average TRE of 3.85 ± 1.76 mm imaging through PVC and 2.36 ± 1.01 mm imaging through exvivo liver. To improve the results, we compensated for the geometry changes by  32  adding one or more additional positions of the tool to the registration. For the exvivo liver testing, two positions of the tool produced an average TRE of 1.67 ± 0.75 mm. This is comparable to our previous result and represents a statistically significant reduction in TRE over a single tool position. Previous studies have reported accuracies of 3.05 ± 0.75 mm based on magnetic tracking [14] and 2.83 ± 0.83 mm based on optical tracking [31], although these studies used different accuracy measures. Adding more registration tool positions further reduced the average TRE. Incorporating four tool positions, for example, produced an average TRE of 1.51 ± 0.70 mm for the ex-vivo liver test. While the increase from one tool position to two tool positions produced a statistically significant improvement, no other additional tool position produced an incremental improvement that was statically significant. In this experiment the registration tool was randomly repositioned over a section of the phantom surface approximately 30 mm by 30 mm, so adding a second tool position to the first would be equivalent to using a larger registration tool. Incorporating two registration tool positions would likely be the best choice for a clinical system, as this provides equivalent accuracy to more tool positions without the time needed for additional ultrasound scans. Based on the previous studies that have considered intraoperative TRUS in RALRP, an AR system based on TRUS would potentially be useful for identifying the correct planes of dissection at the prostate base, at the prostate apex, and medial to the neurovascular bundles [69, 72, 73]. Identifying the correct plane between the prostate and the NVB is the most critical step, and requires the highest level of accuracy. Ukimura et al. [69] found that the mean distance between the NVB and the lateral edge of the prostate ranged from 1.9 ± 0.8 mm at the prostate apex to 2.5 ± 0.8 mm at the prostate base. This is suggestive of the required accuracy for an AR system in RALRP. Our system approaches this accuracy when incorporating two positions of the registration tool. In our previous feasibility study in Chapter 2, we measured the accuracy of manually localizing fiducials to be approximately 1 mm [79]. The appearance of the spherical fiducials in US, and the ability to localize them accurately, clearly has an important effect on the overall accuracy of our method. Because our 3D TRUS system uses sweeps of 2D images to construct 3D data, the resolution is lowest in the elevational direction of the array. Altering the incremental angle between 33  images might thus affect registration accuracy significantly. Altering the depth of the US focus relative to the fiducials might also affect the overall accuracy, as the boundaries of the fiducials become less clear. The boundaries of the fiducials raise another issue, as it is uncertain whether the high-intensity response at the top of the fiducials represents the actual edge of the sphere or reflections from within the metal sphere. Hacihaliloglu et al. also considered this problem when using a stylus pointer with a spherical edge in US. They imaged a row of spheres with different known diameters, and compared the differences between the edges in the ultrasound images with the known differences in diameter. They concluded that the edge of the high-intensity response was in fact the edge of the sphere, perhaps with a small constant offset [24]. The force applied to hold the registration tool against the air-tissue boundary would also appear to be important to the appearance of the fiducials, but we qualitatively observed that varying the applied force did not greatly affect the appearance of the fiducials, as long as the fiducials were in contact with the phantom or tissue. As we have discussed, performing a registration currently requires manual segmentation and triangulation of registration tool fiducials. Once we have identified the common points, finding the transformation itself requires only a simple leastsquares calculation that takes less than a second on a typical PC. If fiducial detection and localization in both the ultrasound and camera frames could be made automatic, the registration process would take no longer than the time required to capture a 3DUS volume. Automatic detection of markers in a camera image is a well studied problem in computer vision (many commercial optical tracking systems are based on this) so we do not believe this step would present an obstacle. Automatically detecting and localizing ultrasound surface fiducials has not, to our knowledge, been previously accomplished. Based on the regular appearance of the fiducials, and the fact that their distinctive comet-tail reverberations (see Figure 3.7) distinguish them from surrounding features, we believe standard computer vision algorithms should be able to automatically detect and localize the fiducials without much alteration. Our initial investigations into this problem suggest that a detection algorithm based the Adaboost algorithm [18], similar to common face detectors [74], might be successful in detecting surface fiducials at nearly real-time speeds. Boosting algorithms have previously been applied successfully to detect 34  features in ultrasound [12, 37, 54]. Once the fiducials are detected, their tips can be localized using an edge-detection algorithm, producing accurate positions of the fiducials in the ultrasound frame. It is worth noting that while other methods for ultrasound to camera registration provide updated registrations as the transducer or camera is translated or rotated, our method provides a one-time registration that is invalidated by any movement of either the camera or transducer. For several reasons, we do not believe this is a critical disadvantage. First, because our method could be made to provide registrations very quickly (i.e. in 5 seconds or less), surgeons could simply reapply our registration tool every time they moved either the camera or transducer. Alternatively, because our main application area is robotic surgery using the da Vinci Surgical System, the kinematics of the robot manipulators could be used to provide updated registrations based on the movement of the da Vinci camera arm. In the case of robotic surgery, if a stepper or positioning arm or robotic system such as our TRUS robot is used to hold the ultrasound transducer in place, the registration might only need to be performed once for the entire surgery, with the robot sensors used to update the registration.  3.4  Conclusion  We have presented a new method for registering 3DUS data to a stereoscopic camera view. Compared to existing methods, our use of a registration tool with both camera and ultrasound fiducials eliminates the need for external tracking systems in the operating room. It also does not require any modifications to existing ultrasound or camera systems. The only additional equipment required is a simple, inexpensive tool which can be made sufficiently compact to fit through a cannula. Validation shows an average TRE of 3.85 ± 1.76 mm and 1.51 ± 1.01 mm when imaging through PVC phantom and liver tissue respectively, which are considered ideal conditions. Incorporating two poses of the registration tool significantly improves TRE to 2.16 ± 1.16 mm and 1.67 ± 0.75 mm for PVC and liver respectively. After further developing methods for automatic ultrasound fiducial localization and optical marker triangulation, we plan to apply our method to augmented reality systems in clinical trials.  35  Chapter 4  Real-time 3D Tissue Tracking for Image-Guided Surgery 4.1 4.1.1  Introduction Clinical Problem  Image-guided surgery has received a lot of interest in recent years due to the rapid progression in the fields of robot assisted surgery, medical imaging technology, and augmented reality. Augmented reality enables the identification and augmentation of sub-surface tissue (e.g. lesions or vasculature) in the surgical cameras and can provide stable and persistent medical image registrations, improve surgical margins and reduce the time required in the operating room [70, 71]. Partial nephrectomy and radical prostatectomy are two surgical procedures that would benefit from providing a persistent medical image registration to the surgeon’s view. During partial nephrectomy, the kidney is dissected from the surrounding tissue in order to allow a surgeon to clamp the renal artery and stop blood flow prior to tumor resection. A laparoscopic ultrasound (LUS) image or an external US image can be acquired intraoperatively and registered to the kidney. Since the kidney is mobile, a method for maintaining a registration would allow for subsurface tumor boundaries to be maintained in the camera images during resection.  36  In radical prostatectomy, an exposed prostate is imaged with a transrectal ultrasound (TRUS) and then disected from surrounding tissues prior to resection. Since contact is lost between the TRUS and the prostate after mobilization and new images cannot be attained, it is critical to track the prostate surface in the surgical cameras in order to maintain an image registration. Therefore, the acquisition, identification, updating and tracking of local tissues are essential for maintaining registration for the in-vivo surgical environment (see Figure 4.1). The following are key components to successfully track tissue surfaces within an endoscopic scene: 1. Tissue tracking should provide a dense set of locations on the tissue surfaces that can describe local deformation and movement. 2. These locations must be repeatedly found in the endoscopic images over time. 3. Locations that are being tracked should not drift over time relative to the anatomy. 4. Tracking of these locations must be resistant to scale changes, illumination changes, and viewpoint changes that may occur from tissue and camera movement. 5. Tracking must be performed in real-time (greater than 10 Hz).  4.1.2  Related Research on Feature Detection in Natural Scenes  Many visual tracking techniques have been developed in the computer vision community for tracking natural scenes and urban environments [78]. These techniques rely on a combination of color information, edges, pixel intensities, and texture characteristics of an image to identify locations, or features, in order to track the motion of the environment within a sequence of images [68]. Feature-based visual tracking is comprised of two main elements: first, feature detection and extraction to identify image locations and descriptors within the image that should be tracked, and second, the matching algorithms that compare features in one frame to features in the next frame in order to determine their movement. Feature detection and extraction should actually be considered as a two-step process: first is the detection of a pixel location within an image (or frame) that rep37  frame 0  frame n  frame 0  frame n  Figure 4.1: Repeatedly identifying and tracking tissue locations in endoscopy (top), and maintaining an image registration based on tracked movement (bottom). resents a point of interest, and second is the extraction of a feature descriptor vector that describes this point and its surrounding area. With a library of feature descriptors describing their respective locations, their motions from one frame to the next frame can be determined by finding similar features based on proximal locations as well as similarities between descriptor vectors. There are many image feature detectors and extractors that offer highly-descriptive feature vectors that promote stability and high matching accuracy even in the presence of affine deformation, specular reflection and poor lighting conditions. Evaluations by Mikolajczyk et al. [41, 42], have shown that the most stable and distinct (i.e. salient) features 38  are the Scale Invariant Feature Transform (SIFT) [35] and SIFT-like features, such as Speeded Up Robust Features (SURF)[7]. These features rely on identifying locations with high intensity gradients across different image scales, extracting descriptors of the image patch surrounding each identified location, and normalizing the descriptors such that they are invariant to scale, orientation, and pixel intensity. One of the greatest challenges in developing real-time, dense feature tracking algorithms is achieving real-time performance, meaning that the processing time for feature identification and tracking should be 100 msec or lower for the images of interest. With the increased resolution of current high-definition video systems, this means that processing rates of 20 megapixels per second are required. On the one hand, the feature detectors that exhibit the highest degree of matching accuracy and proficiency require a great deal of processor and memory overhead to identify, extract, and match its salient feature descriptors. On the other hand, there are many fast and efficient feature detection algorithms that can perform in realtime environments, but also have much lower accuracies in matching of features from frame to frame. Because of this trade-off between efficiency of computation and feature saliency, different environments require different features. Although SIFT and SURF are very stable and offer high matching accuracies, Mikolajczyk and Schmid [41] have shown that they require a great deal of image processing and therefore may be too slow for real-time tracking techniques. Light-weight interest point detectors (such as corner and edge detectors [68]) and correlation-based matching of image patches [61] have been used successfully within the Simultaneous Localization and Motion (SLAM) community [4, 16] to achieve reasonable processing speeds. However, their success is deeply tied to the assumptions that the scene is perfectly rigid, and therefore camera positions can be estimated with respect to the scene, and iterative reprojection of estimated feature points can lead to a quick convergence for feature tracking purposes.  4.1.3  Related Research on Feature Detection in Endoscopy  Tracking algorithms developed for natural scenes and urban environments rely on the assumption that the environment exhibit strong edge features, low shadowing and lighting effects, visually distinctive textures, and is generally not deformable.  39  Such assumptions can not generally be made for endoscopic scenes due to their unique photometric properties (see Figure 4.2). Tissue surfaces are deformable and are constantly affected by patient motion due to breathing and heartbeat; furthermore, interactions with surgical instruments cause significant tissue deformation. Tissue images exhibit visually non-distinctive tissue textures and therefore image patches are difficult to distinguish from their local environment. Finally, tissue images exhibit a high degree of specular reflection caused by the wet tissue surfaces, and endo-cavity lighting creates large shadow effects and dynamic lighting conditions. Work by Oda et al. [50] provided the first implementations of using frame-byframe tracking of visual features in endoscopic images by utilizing image patches and performing correlation-based matching; the inaccuracy of correlation-based matching was noted as the the major drawback to the approach. Nonetheless, Gr¨oger et al.[22] and Ortmaier et al. [51] developed the correlation-based methods for tracking tissue within a beating heart. Sauv´ee et al. [59] improved the correlation methods by defining an 8-element vector, which serves as a image patch descriptor for improving the saliency of the patches in the correlation-based methods. However, with these methods, only general heartbeat motion could be captured due to the small number of image patches used in order to maintain performance and reduce the search space. Stoyanov et al. [63] used the SLAM concepts to perform beating heart tissue tracking, using the Maximally Stable Extrema Regions detector[38] in order to capture the visually homogeneous regions of tissue images into a feature vector. In order to track the MSER regions in 3D, they used the Kanade-Lucas-Tomasi (KLT) Tracker [5, 61] for each stereo-endoscope channel in order to estimate the movement of features temporally in each frame. After extracting feature location and descriptors from each channel, the left and right stereo-endoscope image features were matched using a descriptor correlation method, and successful feature matches were triangulated in 3D space. Since the KLT optical flow algorithm and the MSER region detector are fairly fast to calculate, they were able to achieve 11 frames per second (fps) on 320 × 288 images. However, they reported that correlation-based outlier rejection of the MSER regions significantly reduced the number of features being stereoscopically matched. Furthermore, op40  tical flow algorithms such as the KLT tracker are susceptible to drift and are less stable in poorly-textured and poorly-lit environments due to intensity-based correlation matching [61]. Mountney et al. [45, 47] investigated the use of different salient feature detectors for endoscopic images; they observed that the features that exhibited the highest density and and temporal persistency in the images was SIFT and SIFT-like features such as SURF and Gradient Location and Orientation Histograms (GLOH) [42]. However, these methods were unable to achieve the real-time speeds required for an operating room setting. Therefore, a number of strategies have since been proposed in order to maintain real-time performance; in [75], eye-tracking techniques were implemented such that only a small region of the endoscopic image that the user is focusing on was tracked. Other methods such as the ones described in [43] used GPU acceleration to achieve SIFT and SURF feature tracking in realtime to perform depth estimation and dense tissue reconstruction. However, even with GPU acceleration, real-time performance on high-resolution, feature-dense images is difficult [62], and therefore speed is still an important issue. There has been a gap in the literature between the application of tissue tracking in surgical environments and the efforts in medical image registration. Fiducialbased systems [65, 67] and magnetic tracking systems [44] provide registration and tracking techniques, but rely on the depositing of artificial markers onto the tissue, which severely limits their application in surgical situations. Burschka et al. [10] provided an initial proof of concept that showed that endoscopic image features could be used to maintain registrations in nasal surgery; however, in-vitro tissues were marked using ink in order to create distinguishable landmarks for tracking in the endoscopic camera. We propose that by using appropriate salient feature detectors and extractors to identify and track stable feature points in endoscopic images, accurate and dense tissue profiling in 3D can be achieved at real-time speeds, and it can facilitate the maintenance of spatial-correlation of a medical image registration to the underlying tissue.  41  4.1.4  Contributions  In this chapter, the following contributions are presented: 1. We develop a history-preserving framework for tracking tissue in stereoscopic endoscopy using salient features. 2. We present the first use of the STAR detector and the Binary Robust Independent Elementary Features (BRIEF) feature extractor for dense tissue tracking. 3. We provide evaluations of performance measures (speed, stability, and accuracy) to compare STAR+BRIEF, SURF, and SIFT for both 2D and 3D tissue tracking. 4. We extend the history-preserving framework to support region detection and tracking for maintaining a medical image registration over time. 5. We present preliminary data on maintaining a registration using STAR+BRIEF tissue tracking for various tissue types.  4.1.5  Outline of Chapter  This chapter is structured as follows. Section 4.2 describes the methods: Section 4.2.1 describes the salient feature algorithms compared in this paper in some detail, Section 4.2.2 describes the temporal tracking methods, Section 4.2.3 describes the 3D depth estimation methods, Section 4.2.4 describes region tracking and registration methods, and Section 4.2.5 describes the selection of parameters for the methods. Section 4.3 presents the experimental setup and evaluation methods for comparing the different tracking algorithms. Section 4.4 presents experimental results, and Section 4.5 offer discussions of the results and the proposed framework.  42  Figure 4.2: Typical photometric properties of a in-vivo scene during surgery.  43  4.2 4.2.1  Methods Choice of Feature Detectors and Extractors  In order to determine suitable locations for feature extraction, we would like to find points within an image that can be repeatably detected in subsequent frames and that can be identified quickly. We also would like to identify all of these interest points over a large scale-space. Studies [41, 42] have shown that one of the most stable and reliable feature detection/extraction algorithm is SIFT [35]. The SIFT algorithm relies on a Difference of Gaussians operator (DoG) over pyramidal scale space to identify salient feature points, and extracts a large feature descriptor vector based on the local gradients in the surrounding region. Despite the high matching accuracies reported in [41], the computational complexity of the algorithm means that it may be too slow for real-time performance. There have been numerous efforts to speed up the SIFT algorithm through certain techniques such as graphics-acceleration (GPU-SIFT, [62]), which has been shown to be able to extract approximately 800 features on 640 × 480 images at 10 Hz. SURF [7] is another popular feature detection/extraction method that maintains a high degree of saliency. The method uses a rough approximation of the Hessian operator. The Hessian, similar to the Difference of Gaussians, is used to estimate the local curvatures within the image and identify local extrema for feature extraction. Second order partial derivatives of Gaussian kernels are convolved with the images to perform the Hessian operation. To significantly improve efficiency, the partial derivative Gaussians are approximated as a ternary operator composed of 1, 0, and 1 multipliers in order to reduce floating point operations. The determinant of the approximated Hessian will produce local maxima that are then used as the central locations for descriptor extraction. The entire feature detection and extraction of SURF features are made more efficient through the use of integral images (summed-area tables). Conceptually, SURF is very similar to SIFT as it also uses pyramidal searching based on multi-octave multi-level blurring to identify features and extract gradient-based feature vectors. Studies on natural scenes have generally revealed SURF to achieve high matching accuracies but slightly less than SIFT, while achieving approximately 5 times  44  speed gains [41, 68]. However, despite the speed increases of SURF, it still has yet to reach the real-time 10 Hz performance on CPUs. Similar to SIFT, there have been efforts to speed up SURF detection and extraction using various modifications to the original algorithm [15, 80] that require specific hardware. Approaching real-time performance are the Center Surrounded Extremas for Real-time Feature Detection (CenSuRE) feature detector [2] and the Binary Robust Independent Elementary Feature (BRIEF) extractor [11]. The CenSuRE feature detector and the BRIEF feature extractor are especially fast salient feature algorithms as they are both intensity-based, binary methods that evaluate square patches of an image. Furthermore, binarized methods reduce floating point operations and can use boolean operations for feature definition and feature matching. In the following section, we will describe both the CenSuRE/STAR feature detector and the BRIEF extractors and descriptors in more detail. STAR Feature Detector The STAR detector is a slight modification of the CenSuRE. The approach of the CenSuRE detector is to estimate the Laplacian of an image by making certain simplifying approximations to the kernel operator. Whereas SIFT uses a DoG operator to simplify the detection of local maxima/minima, CenSuRE estimates the Laplacian by building a bi-level center surround square kernel, where pixel values within the kernel are multiplied by either 1 or -1. These kernels are applied over a pyramidal scale space at all locations and all scales of an image, and local extrema are identified at each scale. The extrema are then filtered by computing the scaleadapted Harris measure [40] to determine the “cornerness” of a location, and those locations with a weak corner response are eliminated. A final step that is performed is line suppression. Features that lie along an edge or a line will exhibit high principal curvatures in only one direction, which causes them to be poorly localized along the gradient edge. Therefore, a second moment matrix H is computed at each location, and features that have a large ratio of principal curvatures are eliminated. The bi-level center-surround kernel that is used by the STAR detector is slightly modified to represent a star shape, which is a better estimation of a circular Lapla-  45  cian kernel and is shown to improve the results of the feature detector at little cost to performance [77]. In order to compute the star kernels efficiently, a combination of regular and rotated integral images are used to add and subtract pixel intensities. The square integral images N and 45 degrees rotated integral images N are computed as: y  N(x, y) =  x  ∑ ∑ I(i, j)  (4.1)  j=0 i=0  and  x+y− j x  N (x, y) =  ∑ ∑ I(i, j).  (4.2)  j=0 i=0  I(i, j) represents the image intensity at location {i, j}, and where x and y represent a location in the integral image N. At each location, the response R is computed from the STAR kernel through two successive center-surrounded bi-level kernel operations according to configuration presented in Figure 4.3: R = [(N1 + N4 − N2 − N3 ) − 2(N5 + N8 − N6 − N7 )] + [(N2 + N4 − N1 − N3 ) − 2(N6 + N8 − N5 − N7 )],  (4.3)  which results in a star-shaped pattern. BRIEF Feature Extraction The BRIEF Feature Extractor takes the feature locations from the STAR Feature detector as inputs, and extracts the corresponding feature vector that describes a characteristically-scaled patch around each location. We begin with the location of a feature, Q = {xq , yq }, for which we want to build a BRIEF descriptor. Let ℜ represent a rectangular feature patch centered about Q in its native scale space, with a width w = 2 · r + 1 and height h = 2 · r + 1, where r is the half-kernel size of the image patch. To build a feature vector v[i] of length N where i = 0..N, we randomly select a pair of points within the region R, pi = {xi , yi } and pi = {xi , yi }, with the only 46  N1  N1’  N2 N5  N6  N5’ N2’  N7  N6’  N8’  N8  N3  N4’  N7’  N4  N3’ Figure 4.3: Two bi-level center-surround kernels, one in a square configuration, and one in a diamond configuration, are constructed using integral images, and added together to produce STAR-shaped kernel used in STAR detector. constraint that the probability of choosing the selected point locations corresponds to the probability of an isotropic Gaussian distribution about the center of region R. This provides the highest degree of saliency, as shown in [11]. We build the vector v by evaluating pairs of pixel locations pi and pi for all i = 0..N, where  0, if I(p ) ≤ I(p ) i i v[i] = 1, if I(p ) > I(p ) i i and where I(p) is the pixel intensity at location p = {x, y} within the region ℜ. This results in v being a vector of N binary values. One feature vector is to be built for each location for which there is a feature. Figure 4.4 provides a visual depiction of the BRIEF feature extractor. Feature matching is performed by comparing one feature vector to another feature vector and measuring the Hamming distance between them. Given feature vectors vm and vn , the Hamming distance d between them is N  dm,n = ∑ |vm [i] − vn [i]|.  (4.4)  i=1  If we let v0 be a feature vector to be matched, and let vk be the kth feature vector from a list of matching feature candidates, we look for the smallest Hamming distance to identify the best match within the list. 47  Figure 4.4: The extraction of a BRIEF vector. Pairs of pixels are shown to be chosen in correspondence with a isotropic Gaussian probability function about the image patch center. The Hamming distance d is a summation of exclusive-or (XOR) commands in binary logic. This operation can be performed extremely quickly. Furthermore, newer Intel processors (i7 and later) use a new instruction set SSE4.2 that includes a population count (POPCNT) functionality, which counts the number of ones in a bit string in one instruction, driving the computational costs of matching to insignificant levels. We follow the formulation of the BRIEF feature extractor described in [11], choosing a feature vector length of N = 256 and a patch size of S = 25. We apply the BRIEF techniques to a pyramidal-scale space such that features are extracted from the scale level that matches the STAR point’s characteristic scale. This effectively makes the BRIEF features scale-invariant.  48  Capture Raw Image  A  Gaussian Smoothing Smoothed Image  B Keypoint Detection Feature Extraction  Previous Features List  Features found in current frame  C  Feature Matching Match features based on Θ, κ, δx , δy, λ, γ, ε  Temporal matches  D Update Features Update matched features Append new features to list  Display Modified Features List  E Delete old Keypoints  (Updated) Previous Feature List  Figure 4.5: Flowchart depicting the proposed feature tracking framework on a single image.Features extracted in the current frame, and matched to features extracted in previous frames. Matched features are updated, and new features are saved. Stability of the list of features are assessed, and those which are not stable are deleted. 49  4.2.2  Temporal Feature Matching  We present a visual flowchart of the proposed tissue tracking framework in Figure 4.5, depicting a single iteration of feature detection and matching. We begin by capturing an image from the surgical camera, and perform a pre-processing filter step with a 3 × 3 Gaussian smoothing kernel (Figure 4.5A). We extract the image features from the current frame (Figure 4.5B), and compare them to features found in previous frames. We update the descriptors and locations of features that are matched, and save features that were not matched and considered new, and increase the timestep. In the following section, we will describe these methods in more detail. Filters In order to perform frame-by-frame tracking, we need to look at how positions within an image move locally over time. If the feature detection algorithms used are successful, they will produce a cloud of points that will populate the image and can be used as local landmarks for tracking. Therefore, given a list of feature locations and descriptors from a frame in a video channel, we would like to match each feature on the list to a list of features extracted from the subsequent frame, in order to determine their movement in the scene. Since we expect that the features will not move considerably, a number of techniques can be applied to narrow down the number of possible matches, thus reducing the need to perform unnecessary descriptor comparisons and improving performance, as well as reducing the possibility of incorrect matches [78] (Figure 4.5C). Given the ith feature from list 1, fi (xi , yi , ki , θi ), and the jth feature from list 2, f j (x j , y j , k j , θ j ), where xi , yi , x j , and y j are their pixel location in the original images, ki and k j are their characteristic scales, and θi and θ j represents their orientation, we will only include within the set of matching possibilities those pairs of features that satisfy the criteria listed below. 1. Similar scales: Since all features identified by SIFT, SURF, and BRIEF have a characteristic scale in order to make them scale-invariant, we can set  50  a threshold for the maximum ratio κ of scales to be: log 2. Similar orientations:  ki kj  < κ.  (4.5)  Both SURF and SIFT have characteristic orienta-  tions of each feature that refers to the eigenvector direction of the maximum eigenvalue from the second moment matrix. Therefore, this makes the feature vectors invariant to rotation, whereas BRIEF features are not. Therefore, for SIFT and SURF, we define a maximum allowable orientation difference Θ, where |θi − θ j | < Θ.  (4.6)  3. Within local area: Given that we do not expect to see large motions from consecutive frames, we can limit our search of possible temporal feature matches to a small rectangular region with a width δx and height δy centered about a feature’s last known location: |xi − x j | < δx |yi − y j | < δy .  (4.7)  4. Descriptor distance ratio: After calculating all the descriptor vector distances for the subset of possible matching features, we should perform a confidence check. By keeping track of the descriptor distances for the best and the second best matches to a feature, we can estimate a level of confidence for the match. If we let d f irst represent the descriptors’ distance for the best match and dsecond represent the descriptors’ distance for the second best match, then we can set a confidence threshold λ , where a match is only considered valid if  dsecond < λ. dfirst  51  (4.8)  Checking with Neighboring Matches One of the characteristics of in-vivo surgical video is that the soft-tissue within the scene deforms with the adjacent tissues, and therefore the spatial location and movement of a feature location within the image will be similar to those of neighboring feature locations. Furthermore, unlike in natural scenes, depth discontinuities within the image are sparse. Therefore, from a list of matched features, we can look at its nearest neighbors in 2D image space to identify whether the features are moving in similar directions. Given a set of two neighboring feature locations in frame n, Q1,n = {x1,n , y1,n } and Q2,n = {x2,n , y2,n }, and their matched locations in the previous frame, Q1,n−1 = {x1,n−1 , y1,n−1 }and Q2,n−1 = {x2,n−1 , y2,n−1 }, we consider their movements to be significantly different if log  δ x12 + δ y21 δ x22 + δ y22  > γ,  (4.9)  where {δ x1 , δ y1 } = {x1,n −x1,n−1 , y1,n −y1,n−1 } and {δ x2 , δ y2 } = {x2,n −x2,n−1 , y2,n − y2,n−1 }, where γ represents the maximum allowable ratio of squared distances of movements between the neighboring features. Furthermore, we consider the dot product of their movement vectors, and evaluate the direction of movement of neighboring features to be significantly different if   ∆θ = acos   δ x1 · δ x2 + δ y1 · δ y2 δ x12 + δ y21  δ x22 + δ y22    > ε,  (4.10)  where ε is the maximum allowable difference in the direction of movement. In practice, we only check against ε and λ if there is a temporal displacement of 5 pixels for each matched feature, as smaller displacements results in coarser distances and orientations values, and therefore would reduce the efficacy of the methods. Management of Feature Lists When identifying matches between feature lists of consecutive frames, it is expected that there will be a large number of matched features, but also a set of features that will not be detected in the new frame. This can be caused by motion 52  in the scene, and can also arise from noise in images, causing features to flicker in and out. Therefore, without keeping a history of features, all the features from a single frame will eventually be lost in the consecutive frames. In contrast, if one were to keep all new and old features from all frames for temporal tracking, then the lists would grow indefinitely in size, thus increasing memory requirements and decreasing speed. Management of feature lists is therefore required in order to identify and keep only the features that are the most stable in spite of scene motion and camera noise. In practice, the overwhelming majority of the work that has been done on managing endoscopic image features is with SLAM techniques. In these cases, only the previous frame’s features and the current frame’s features are saved in order to identify camera motion [36, 47]. However, if features are not only kept track of between consecutive frames, but also when they are not found in consecutive frames, it provides a history preserving framework for which each image feature can be considered a unique position on the tissue surface, and therefore can be used effectively for registering and maintaining medical image correspondence to the underlying tissues. Wengert et al. [76] maintains a database of tracked and untracked features, and uses camera position estimates and bundle adjustment methods common in SLAM to preserve the feature sets in subsequent frames. Our feature tracking architecture independently keeps track of the features over time by maintaining a list of features that have been previously found (Figure 4.5D). For a set of features in a newly captured frame, we perform feature matching to identify matches to features found in previous frames. All previous features that were matched to the new frame’s features are updated with the feature location and new descriptor vector, and all features from the new frame that were not matched are appended to the list. For each feature on the list, the frame number in which the feature was first detected is recorded, as well as the number of frames the feature has been found in (i.e. matched) since it was first detected. Whenever the ratio between the number of times found and the number of frames since first detection falls beneath a threshold, empirically set to 40%, the feature within the list is deleted (Figure 4.5E). This ensures that the list only maintains features that are deemed to be persistent within the scene. Since we do not want to throw away brand new features as quickly as they appear, we set a count to prevent any new features from being deleted until 10 frames after their first detection. These values 53  were chosen from preliminary experiments, which showed a significant drop-off of the stability of points at approximately 40%.  4.2.3  3D Depth Estimation  Given the feature tracking solution we proposed, we will extend its function into stereo depth estimation and dense 3D point localization for reconstructing depth maps of the scene. Given that we have two dense feature populations for the left and right stereoscopic channels, we will attempt to match features between the channels to establish a stereo-triangulated point. The strategies described for temporal feature matching can be used again for stereo matching during depth estimation: 1. Similar scales (equation (4.5)) 2. Similar orientations (equation (4.6)) 3. Within local area (equation (4.7)) 4. Ratio of matching distances (equation (4.8)) Since stereo-matching does not involve temporal movement, equations (4.9) and (4.10) are not considered here.  4.2.4  Region Tracking and Registration  We first assume that an image registration is already acquired within a localized region in the stereoscopic images. This can be achieved using a simple method such as the one described in [79]. A region is then selected on the tissue surfaces that appears closest to the center of the registered volume, and stereoscopicallymatched features that are found within this region are saved to a separate feature list, as are their 3D locations. For the sake of clarity, we will call this list the object feature list, as the methods we propose are similar to those used for object detection and tracking. The region selection is performed manually, although it is possible to automatically choose a set of feature points that have the least Euclidean distance from the 3D center of the registered image. We present a visual flowchart of our proposed region tracking and registration framework in Figure 4.6, depicting a single iteration. We begin with the updated 54  feature list from Figure 4.5; we then compare these features to the object features to find matches. Given these matches, we compute the least-squares transformation required to transform the object features to the matched scene features in the 3D coordinate system. We update the descriptors and locations of features that are matched, and save new features that fall within the tracked region, and increase the timestep. We will describe these methods in more detail below. Temporal Tracking of a Tissue Region In order to match the selected region’s features to the features in the scene, two strategies can be used (Figure 4.6A). First, we can rely on the matching parameters described in 4.2.2 to match object features (saved from the left stereoscopic frame) to current features in the left frame that have been matched stereoscopically. A second strategy is to give each feature a unique ID that they retain as they are tracked from frame to frame. As such, when the original anchor points are identified for object tracking, their unique IDs can be used to identify their location in subsequent frames (if they appear). This method does not require feature match filtering and feature descriptor comparisons and therefore is much more efficient. It is an effective method for tracking as long as the tracking patches do not stray from the visible camera scene. If they do, then the feature matching filters described in 4.2.2 can be used to reacquire lost matches. In these situations, we suggest matching the object features to the stereo-matched scene features based on the follow parameters: 1. Similar scales (equation (4.5)) 2. Similar orientations (equation (4.6)) 3. Within local Area (equation (4.7)) 4. Ratio of matching distances (equation (4.8)) 5. Similar movement (Distance) with Neighboring Features (equation (4.9)) 6. Similar movement (Direction) with Neighboring Features (equation (4.10)) We choose to revert back to location/descriptor based matching if the tissue region being tracked exits the scene, since their feature points will eventually be deleted from the feature list history and their feature IDs will not be maintained. Therefore, a tissue object can be rediscovered regardless of the length of time spent off-screen. 55  StereoMatched Features (Left)  Object Feature List in frame 0 coord. system  A Feature Matching [nT0]  StereoMatched Features (Right)  Features in frame n+1 coord. system  Match features based on their unique ID OR Match features based on Θ, κ, δx , δy, λ, γ, ε  X  Matched Features  B Depth Estimation  3D Object features in frame n coord. system  3D matched features in frame n+1 coord. system  C Acquire Registration  Medical Image in frame 0 coord. system  [n+1Tn] and [n+1T0]  [n+1T0]  X  D Object List Management  Region Outline in frame 0 coord. system  Update matched object features Transform region outline to frame n+1 Append new features in region to object list  Display  Modified Feature List  E Delete Old Keypoints (Updated) Object Feature List  Figure 4.6: Flowchart depicting the proposed region tracking and registration framework. A region is defined in frame 0, and features within the region are saved as object features. Features tracked in subsequent frames are matched to the object features to keep the object features up to date, and the new features in the tracked region are appended to the object feature list. All object features, registered images, and region outlines are kept in the frame 0 coordinate system to avoid drift from successive transformations.  56  Registration The object feature points and their 3D locations act as ‘anchors’ by finding temporal stereoscopic matches in subsequent frames. The movement of the object features are used to define a 3D transformation matrix between the two time steps that can be applied to a registered volume in order to keep the registration to the tissue as it moves in the scene. This makes a simplifying assumption that the tissue is moving rigidly, which is a reasonable approximation in some surgical cases such as radical prostatectomy. Below we describe the method for which we solve for the registration. A temporal registration T , requires a set of object features locations in 3D, denoted by X, to be matched to another set of 3D feature locations in the current frame, denoted by Y , such that Y = T X. Due to the tight disparity of the cameras, any outliers in these matching pairs will result in 3D point locations that are completely outside of the regular model boundaries and can significantly affect registration accuracies. We found that simply calculating the least squares estimate for all the points to be extremely unstable from frame to frame due to outlier effects. Therefore, in order to reject outlier matches, we use Random Sample Consensus (RANSAC) in order to identify the best registration using only a subset of the N points [17]. With RANSAC, we can systematically consider only points that fall within the bounds of a registration model, while omitting outlier influence from estimating the model parameters. This is performed iteratively such that the model improves over iterations to better match the inlier points. We consider outliers to be those feature points that fall outside the registration model by over 1 mm, as this represents greater than approximately 5 pixels of error in stereo-matching. The least squares method to find T was performed using Horn’s quaternion-based method [25] (Figure 4.6C). See Appendix for a more detailed description of the RANSAC algorithm. Management of Object Feature Lists Given a set of object features that are selected in frame 0 and their locations X0 , we can identify their locations in the current frame n, Xn such that Xn =n T0 · X0 (Figure 4.6D). We do not update the feature locations based on the matches from Xn ;  57  rather, we only update their feature descriptor vectors, and the transformation nT0 . By doing so, we are able to maintain a base frame of reference for all subsequent frames. Therefore, for the following frame at n + 1, we first transform the features from the original object frame to frame n using n T0 , in order to estimate their position in the last frame. Then we calculate the incremental transformation n+1 Tn to transform the object features to the new frame. Therefore, n+1 T0 =n+1 Tn ·n T0 . The reason for this becomes clear when we begin to add new object features, described below. As time continues, features that were originally used as anchors will eventually be lost due to image noise and erroneous matches; therefore, the outline of the image patch that originally identified the anchor features are also transformed from frame to frame, and any new feature that appears within the outline will be added as an anchor point. Since these new feature locations are within the new n + 1 frame of reference and we would like to establish their position relative to all the other object points saved in the object feature list, we calculate the inverse transform 0 Tn+1 =n+1 T0−1 in order to reproject them into the original object frame n. Although this requires more computational effort than just moving all the object features to an updated location (either based on their temporal matches, or if they are not matched, based on frame by frame transformations), it avoids drift of previously saved object features caused by successive transforms. Finally, since the list of object points will grow indefinitely without applying some mechanism for feature deletion, we set a threshold number of anchor points to track, and delete those that have the least likelihood of appearance (based on percentage of frames matched since detection, see Figure 4.6E). Since 3 feature point matches are needed for a 6 degree of freedom rigid transform, and each additional correct feature match will improve the accuracy by diminishing returns, only a small number of the most stable features need to be saved. In practice, we only keep the most prevalent 500 features and their 3D locations.  4.2.5  Selection of Parameter Values  Table 4.1 summarizes the parameters used for tissue tracking, stereoscopic matching and object tracking. The scale threshold κ is chosen in accordance with the  58  octaves at which features are selected for feature algorithms. The threshold value of κ = (2.0) is chosen such that features are at most one octave apart; feature scales are not expected to change significantly between the left and right stereo √ channels, and therefore κ = log( 2). The orientation threshold Θ only affects the SURF and SIFT features; feature orientations are not expected to change more than π/18 radians during a single timestep; however, since viewpoint changes between the left and right cameras may influence the feature orientation slightly, we chose a higher threshold value of π/12. Although scale and orientation thresholds are effective at reducing the number of necessary feature vector matches to those of certain scale levels and orientations, the distance between features is the parameter that influences most the matching accuracy. Features are allowed to move 20% of the total width of the image (δx , δy = 0.2∗image width) in order to capture large tissue motions during temporal tracking. When stereo-matching, we can use the epipolar constraint to limit the feature matches to within a narrow band (δx = 0.5∗image width, δy = 0.05∗image height). For situations where the tracked region leaves the image scene, we cannot predict the re-entry location of the region in the image, and therefore perform a wide search (δx , δy = 0.5∗image width). The descriptor distance ratio λ = 0.5 is chosen such that the best feature match is 100% better than the second best match, as this represents a high confidence of a correct match. The difference in movements γ = 2 ∗ log(1.5) and difference in angles thresholds ε = π/18 were chosen to restrict the movement of features bundles to represent smooth and consistent motion, which should be characteristic of in-vivo tissue.  4.3 4.3.1  Experiments Temporal Tracking  To evaluate the proposed method in tissue tracking, we compare the STAR+BRIEF detection and extraction algorithm to two of the de-facto standards for salient feature detectors in the literature, SIFT and SURF. We follow the implementations of SIFT and SURF specified in their original papers [35] and [7] respectively. We use 3 octaves with 3 levels each for the SIFT and SURF implementations with the first 59  octave being the original image resolution. We chose a Hessian threshold value of 50 for the SURF implementation. For the STAR detector, we constructed the two successive center-surrounded bi-level kernels to have an outer edge length of 8 pixels, an inner edge length of 4 pixels, and a scale space pyramid of 9 levels between 1.0 to 5.0 at increments of 0.5. The operating parameters of the algorithm were experimentally chosen as a Harris corner response threshold of 2, a non-maximal extrema suppression of 5, and a line suppression ratio of 10. Evaluation of the salient feature algorithms will be based on several criteria: 1. Speed of algorithm: calculated as the time required for feature detection, feature extraction, and matching with previously found features (including location, scale and orientation filtering) as well as neighboring match consistency. 2. Average number of features found in each frame: this can vary significantly depending on video characteristics (e.g. size and resolution, noise, motion blurring, shading, etc.) 3. Percent of features matched between consecutive frames: calculated as the ratio between number of features found and the number of features matched to previously saved features in each frame. 4. Average lifetime of feature: the number of frames between a feature’s first detection and the moment a feature is considered lost and its history is deleted. 5. Percentage of time features are found: evaluated as the ratio of frames in which a certain feature is found to the number of frames since its first detection. Given that features will often flicker in and out of an image due to video artifacts and noise, this evaluation will provide a measure of the temporal stability of features. 6. Average size of static feature list: given the history-preserving framework for previously-found features, this will evaluate the average number of saved features that features extracted from a new frame will be matched to.  60  7. Localization accuracy and drift of select features: Since there is no ground truth available for the in-vivo video sequences, we performed a forwardbackward time comparison of select features. Features are tracked in frames as they move forwards in time, and at time n, the video frame sequences are reversed and the features are subsequently tracked backwards in time until the beginning of the video. Performing this forward-backward tracking essentially allows us to investigate how likely a feature is to drift within the image sequences due to feature mismatches in frames. A perfect feature correspondence would be achieved when the position of a feature moving forward and backwards in time line up at every time step. At any point in time, if a feature is mismatched to a different location, it is likely that this error is continued into subsequent frames resulting in drift.  4.3.2  3D Depth Estimation  In order to evaluate the feature tracking framework for stereoscopic depth estimation, we will evaluate the following parameters: 1. Speed of the stereo matching: the time required to identify matches between features found in the left and right frames given the two lists of features. 2. Number of features matched across channels: this value will differ depending on the video sequence photometric properties. 3. Percentage of features stereo-matched from a single frame: evaluated as the ratio of the number of features that were matched between the left and right camera frames, and the number of features found in the left camera frame. We performed the t-test analysis to determine if we can reject the null hypothesis, which is that the means of the measured values are the same (see Appendix for details). The null hypothesis was rejected if p < 0.05.  61  4.3.3  Region Tracking and Registration  For tissue region tracking experiments, we set up an in-vitro experiment on three different tissue types: kidney (bovine), heart (bovine), and liver (porcine). On each tissue, we placed a 2 mm diameter steel bead fiducial. These fiducials are used to represent locations within the tissue that can be segmented clearly in the camera and will be used to test the accuracy of registration. We placed the camera approximately 5-10 cm away from the tissue such that the fiducials could be seen in both stereo channels. We then moved the tissue, with rotation and translation in all three dimensions, taking care to keep the fiducials continuously seen in all three channels. We used the STAR+BRIEF tissue tracking framework to acquire and track features within the scene. Since we do not want the fiducials to provide any help in determining a registration, we automatically remove all the extracted features whose templates would overlap with the fiducial locations (based on their patch location and size). This ensures that the fiducials, which represent a strong trackable location, will not improve the registration. We assumed that the tissue surfaces remained rigid and extracted a six degreeof-freedom rigid registration for the feature points from frame to frame. In the first frame of tracking, we manually identified the locations of the fiducial points in the left and right camera images, and performed stereo-triangulation to acquire their location in 3D coordinate space. In subsequent frames, we apply the chain of transformations to the original fiducial locations in order to predict the motion of the fiducials over time. In order to acquire an estimate of the true movement of the fiducials over time, we locate the fiducials in each frame for both the left and right channels, and performed stereo-triangulation to acquire their 3D coordinates. The frame by frame fiducial identification and tracking was performed using a 31 × 31 normalized cross-correlation window over the fiducial locations that estimates their locations in each stereo channel, and we perform stereo-triangulation to get their 3D coordinate points.  62  4.3.4  Apparatus and Test Data  The system we used was a PC with an Intel  R  CoreTM i7 CPU 960 at 3.20GHz  with 12GB RAM, on the Windows 7 64-bit platform. No GPU acceleration was performed, and all the processing was kept on the system memory and processor, and video sequences were read frame-by-frame from hard disk. For evaluating temporal tracking and 3D depth estimation, we used a range of videos of intraoperative laparoscopic porcine studies from an Imperial College in-vivo dataset, which can be found online, along with associated camera calibration files, at http://vip.doc.ic.ac.uk/vision. These videos represent a wide array of scene motions that cover camera translation, scale change, camera rotation, multiple viewpoints (i.e. affine transformation), and tissue deformation. The videos are: 1. Translation: Abdominal cavity just after inflation. Surgeon moves the endoscopic camera to approximate translation. Image size is 320 × 240. Due to the relatively small video resolution, we upscale the images during algorithm runtime to 640 × 480 (with linearly interpolation) such that the feature algorithms can detect small features. 2. Rotation: Abdominal cavity just after inflation. Surgeon moves the endoscopic camera to approximate rotation. Image size is 640 × 480. 3. Series: Abdominal cavity just after inflation. Surgeon moves the endoscopic camera to approximate a series of movements involving translation and scaling. Image size is 640 × 480. 4. Heartbeat: Open-chest procedure with an exposed heart. Significant instrument footprint in images as are there to hold open the chest. A stationary camera images a rapid heartbeat. Image resolution is 360 × 288. Again, we upscale the images during algorithm runtime to 720 × 576 in order to detect small features. The videos used for region tracking and registration evaluation was captured using a da Vinci SI laparoscopic stereo camera. The videos that were used are of the kidney (250 frames ≈8 sec), heart (650 frames ≈ 20 sec), and liver (700 frames 63  ≈ 23 sec) undergoing translation, rotation, and scaling. Videos were cropped to a resolution of 560 × 352. These datasets can be found online at http://rcl.ece.ubc. ca/. We aim to keep these videos online as a repository in order to allow other  researchers to compare their algorithms using common data. In order to verify the statistical significance of the results, we performed a two sample t-test for each measured evaluation criteria to compare STAR+BRIEF to SURF, STAR+BRIEF to SIFT and SURF to SIFT. The null hypothesis is rejected if the outcome p < 0.05. One of the major issues with performing stereo-triangulation using laparoscopic or endoscopic cameras is that the camera disparities are extremely small, and therefore any errors in 2D pixel locations between two matched features can cause significant changes in depth. We examine this behavior by computing the camera calibration of a da Vinci SI Surgical System Laparoscopic stereo-camera. The camera calibration was performed using the Matlab calibration toolbox by Bouguet [9]. This allows us to estimate the camera distortion, the focal points and principal points, the skew of the camera images, and the relative 3D cartesian transformation between the left and right cameras. Our camera calibration of the da Vinci SI cameras were found to have approximately a 5 mm disparity and identical left and right camera orientations.  4.4 4.4.1  Results Temporal Tracking Results  Figure 4.7 shows samples of each video test performed with an overlay of extracted and matched features. This is shown in order to provide a sense of the feature densities with regards to each video, which will be different from one another due to differences in cameras, resolution, aspect ratios, lighting and image noise, and tissue photommetric properties. It can be seen that features are densely populating the image for all feature types. For the following results, the t-test analyses showed that the null hypothesis was rejected in nearly all cases with p ≈ 0.00, since there was more than enough data from the high number of frames in the video sequences to support the claims 64  (a)  (b)  Figure 4.7: Screen captures of frame 20 of the Series and Heartbeat video. Circle represent feature locations. Row 1 is the original image, Row 2 is using the proposed BRIEF+STAR method, Row 3 is using the SURF method, and Row 4 is using the SIFT. 65  STAR+BRIEF  SURF  SIFT  Number of Features  2000  1500  1000  500  0  translation  rotation  series  heartbeat  Figure 4.8: Number of Features found per frame. of statistical significance. Therefore, the following results will be considered statistically significant unless otherwise specified. Figure 4.8 shows the average number of features that are found by SIFT, SURF, and the BRIEF algorithm, Figure 4.9 shows the percentage of these features that are matched temporally to a previously found feature, and Figure 4.10 shows the running size of the previously matched feature lists (dynamic size due to the addition of new features and removal of old features per frame, using the criteria presented in Section 4.2.2). SIFT features are shown to be the most numerously detected, and although the percentage of features matched temporally is less than the other feature types, it still offers a higher density of features within the each frame. Both BRIEF and SURF have similar feature densities, but BRIEF is seen to have a slightly higher percentage in temporal matching. This result is reflected in the average running sizes of the feature lists, where higher percentage of matches implicitly encourages smaller feature lists. Smaller feature lists reduces the amount of matching required between features from the new frame to the list of saved features from previous frames, thereby improving the process efficiency. Figure 4.11 shows the average percentage of features that are deleted at every frame. A higher percentage represents the loss of history for a previously saved feature; if the same feature is found in subsequent frames, it will be considered a 66  STAR+BRIEF  100  SURF  SIFT  Percent of New Features Matched  90 80 70 60 50 40 30 20 10 0  translation  rotation  series  heartbeat  Figure 4.9: Percentage of these features that are matched to a previously detected feature.  STAR+BRIEF  SURF  SIFT  16000 14000  Number in List  12000 10000 8000 6000 4000 2000 0  translation  rotation  series  heartbeat  Figure 4.10: Number of features found previously that is kept in a historypreserving feature list.  67  STAR+BRIEF  SURF  SIFT  Percent of List Features Deleted  12  10  8  6  4  2  0  translation  rotation  series  heartbeat*  Figure 4.11: Percentage of features from the saved list that are removed every frame due to a low frequency of finding a suitable temporal match. (*) = not statistically significant: pBRIEF,SURF = 0.1174, pBRIEF,SIFT = 0.1758, pSURF,SIFT = 0.8325. brand new feature. A reduction of this value will improve the longterm persistency of features within a video sequence. BRIEF is shown to have the least percentage of features deleted, followed by SURF, and subsequently SIFT. It should be noted that the persistency of features in no way represents the accuracy of tracking – drift can occur if one feature is accidentally incorrectly matched within a series of frames, and therefore subsequent frames can lead to drift, which is unidentifiable without a ground-truth. To elucidate the prevalence of features among consecutive frames, we need to investigate just how often they are found and subsequently matched in following frames after their first detection and extraction. In order to do this, we generate a normalized histogram shown in Figure 4.12, depicting the percentage of features during runtime that are found in at least a certain percentage of the frames following initial detection. Since features are deleted from the list if they are found in less than 40% of all the frames, a large majority of the features found are deleted at 40% and below. However, it is interesting to see the trend of features that occur frequently within the frames, shown as Figure 4.12’s insets. BRIEF features are 68  100  100  STAR+BRIEF  SIFT  STAR+BRIEF 80  1.5  translation  series  80  SURF  1  60  0.5 40 0  60  80  0 0  0.5 40 0  60  80  100  20  20  40  60  80  0 0  100  20  40  60  80  100  % of a Feature’s Lifetime  100  100  STAR+BRIEF 80  SURF  SIFT  STAR+BRIEF 80  3 2  60  heartbeat  rotation  1  60  % of a Feature’s Lifetime  1 40 0  60  80  SURF  SIFT  0.1  60  0.05  40 0  100  20  0 0  SIFT  1.5  100  20  SURF  60  80  100  20  20  40  60  80  0 0  100  % of a Feature’s Lifetime  20  40  60  80  100  % of a Feature’s Lifetime  Figure 4.12: Histogram of depicting the percentage of features that are found in a certain percentage of subsequent frames. The graph is cumulative such that feature numbers drop off as the ratio between times found and total lifetime increases. The figure insets reveal the zoomed version of the higher percentages. found to be consistently the most prevalent over time as compared to both SIFT and SURF features. Since the number of BRIEF features found per frame is less than the other feature detectors, it indicates that per unit of feature, it is more likely to be a feature that is tracked for a long period of time, which is consistent with the findings of Figure 4.9. Figure 4.13 shows an example of a feature tracked in both forwards and backwards time for each tracking algorithm. The feature chosen to be tracked is one that appeared at highest frequency moving forward in time and therefore suggestive of the persistency of the feature in video from beginning to end. Figure 4.13 shows that each tracking algorithm manages to track the feature well; matching  69  X  Y  STAR+BRIEF  250 Forward Backward  350  200  300  150  250 200  Forward Backward  100 50  200 400 600 800 1000 1200  200 400 600 800 1000 1200  SURF  350 300  Forward Backward  300  250  Forward Backward  200  250  150 200  100 200 400 600 800 1000 1200  200 400 600 800 1000 1200 250  Forward Backward  SIFT  400  200  Forward Backward  350 150 300 100 250 50  200 400 600 800 1000 1200  200 400 600 800 1000 1200  Figure 4.13: A sample feature location is chosen for each feature tracking algorithm, and the feature’s pixel-location is tracked from the beginning to the end of the video (red). Subsequently, starting at the end of the video, the feature is tracked backwards towards the start of the video (blue). Due to frames where a feature is momentarily lost, there are gaps within the tracking in both directions. forwards and backwards motions will fall on the exact same pixel location in both forwards and backwards time. Figure 4.14 shows the proposed STAR+BRIEF feature tracked in Figure 4.13 over the entire sequence of the videos at 100 frame intervals. It can be seen that the feature remains true to its original location over the duration of tracking.  70  Original Frame  0  100  200  300  400  500  600  700  800  900  1000  1100  Figure 4.14: Example of a BRIEF feature being tracked over time.  71  72  κ Θ δ λ γ ε  Scale Threshold Orientation Threshold  Local Area Threshold  Descriptor Distance Ratio Difference in movements Difference in angles  0.5 2 ∗ log(1.5) π/18  0.2∗rm image width  log(2.0) π/18  Temporal Tracking  log(2.0) N/A  log( (2)) π/12 0.5∗image width on x-axis 0.05∗image height in y-axis 0.5 N/A N/A  0.5 2 ∗ log(1.5) π/18  0.5∗image width  Object Tracking*  Stereo Matching  * only uses these parameters when trying to re-establish a registration after it has been lost; otherwise use feature IDs  Symbol  Parameter  Value  Table 4.1: Parameters for temporal tracking, stereoscopic matching, and object tracking  30  STAR+BRIEF  SURF  SIFT  Frames Per Second (s)  25  20  15  10  5  0  translation  rotation  series  heartbeat  Figure 4.15: Speed of the feature tracking framework for BRIEF+STAR, SURF, and SIFT feature types. Figure 4.15 shows the speed of the algorithm for different feature types. Unsurprisingly, using BRIEF features is far more efficient than SURF and SIFT, achieving nearly real-time framerates. The performance of SIFT is much more memory demanding and processing intensive than the other feature detector and extractors, which is reflected in its significant increase in time taken compared to the other algorithms. One can presume that the relative differences between speed of the algorithms will be relatively consistent despite differences in system hardware, and differences in image size and resolution. It is clear that the standard salient feature detectors can offer dense, trackable feature locations within a in-vivo image scene during surgery. However, one of the requirements for the implementation of these technologies into an operating room is that they can perform at real-time speeds in order to provide the surgeon with an augmented reality guidance during surgical procedures. Whereas SIFT features offer higher feature densities as well as stable tracking, the time required to perform a single frame analysis requires a couple of seconds. SURF only shows a 2-3 times speed performance over SIFT, not enough to reach real-time performance. The combined STAR detector and BRIEF feature extractor is able to identify an ac-  73  Figure 4.16: A sample image of matched left and right channel feature. The image above is the left channel, populated by features (•) found in the current frame. They are connected to the locations on which the matching right features ( ) would be located. ceptable number and density of features to populate a in-vivo surgical scene, while maintaining accurate tracking at approximately 15-20Hz for most video sequences.  4.4.2  3D Depth Estimation Results  In order to visualize the 3D depth estimation, Figure 4.16 shows an example of the matching that occurs between left and right camera frames. The features within the left channel are matched to the features in the right channel (using the STAR+BRIEF method), and both locations as well as an interconnecting line are overlayed on top of the left channel frame. This image is a typical representation of the feature density as well as the disparities between the left and right channel features. Due to the curvature of the camera lens, the images experience slight warping near the frame edges, and therefore feature locations experience a gradual change of matching angles. This is accounted for by allowing features to be matched within 0.05*(image width) pixels along their epipolar lines. Distortion effects become more significant as the scene moves closer to the camera, and less significant as the scene moves further away. 74  Figure 4.17 shows the number of features that are matched between the left and right stereo camera frames. BRIEF is seen to match the least number of features as compared to SURF, and SIFT is seen to match many more features than the other feature tracking algorithms. However, since BRIEF and SURF both find fewer numbers of features within each frame than SIFT, this is to be expected. The percentage of left channel features (from the left frame) that are matched between left and right channels are shown in Figure 4.18. Here, we see that BRIEF features exhibit the highest percentage of matching (approximately 50%), followed by SURF and SIFT. This number will vary slightly depending on the depth at which the camera is imaging, since the closer the imaged scene, the lower the number of features that can be seen in both channels. One interesting effect that is immediately noticeable is that the video of the porcine heartbeat has significantly fewer features that are stereo-matched from frame to frame. This can be attributed to the fact during systole there are high frequencies contributing to the motion (100Hz and above) that cannot be accurately captured with a 30fps camera. In particular, since we are matching between two channels, any minute differences in the triggering time of the left and right camera shutters will cause significant shift of features in the images. Furthermore, motion blur results in less saliency among features. When the heart is in diastole, we are able to see a great deal more stereo-matched features. Figure 4.19 shows the time required to perform feature matching between left and right channel features. Due to the heavy filtering that is applied prior to performing descriptor vector matching, the processing time for each is significantly less than that of temporal tracking. We can see that stereo-matching for BRIEF is extremely fast and can be added to temporal tracking with little added cost to performance. Although SURF and SIFT exhibit longer processing times, their performance in stereo-matching is also within a range that can be optimized for high frame rates (assuming that the feature lists are pre-computed). These results can be added to the time required for single-channel feature tracking with little and insignificant change to the time required for computation. In order to maintain the performance of single-channel feature tracking (Figure 4.15) but also include depth estimation and 3D point triangulation, multi-threading techniques can be employed. 75  STAR+BRIEF  SURF  SIFT  Number of Features Matched (Stereo)  1000 900 800 700 600 500 400 300 200 100 0  translation  rotation  series  heartbeat  Figure 4.17: Number of features matched between the stereo camera channels.  Percent of New Features Matched (Stereo)  0.7  STAR+BRIEF  SURF  SIFT  0.6  0.5  0.4  0.3  0.2  0.1  0  translation  rotation  series  heartbeat*  Figure 4.18: Percent of features matched between the stereo camera channels. (*) = not statistically significant: pBRIEF,SURF = 0.3629, pBRIEF,SIFT = 0.8839, pSURF,SIFT = 0.3654.  76  0.12  Stereo Processing Time Per Frame (s)  STAR+BRIEF  SURF  SIFT  0.1  0.08  0.06  0.04  0.02  0  translation  rotation  series  heartbeat*  Figure 4.19: Time required to process a pair of stereo frames for feature matching. (*) = not statistically significant, pSURF,SIFT = 0.871. We provide two visualization of the 3D feature depth maps in Figure 4.20, one of the beating heart and one of the inflated abdomen. We first take the feature pairs matched in Figure 4.20a, and we perform stereo-triangulation in order to acquire a 3D cloud of points (Figure 4.20b, camera locations are shown in the figure). We present a cubic interpolation over these points in Figure 4.20c to create a contiguous depth map, and we reproject the points among the depth map to visualize their depth consistency. We can see that the 3D point locations follow closely with the smoothed contiguous depth map. Finally, we reproject the image onto the depth map 4.20d, showing that the extracted contours are congruent with the perceived contours of the tissue images.  4.4.3  Region Tracking and Registration Results  Figure 4.21 shows fiducials registered and tracked over time for the kidney, the heart, and the liver. Tracking from the first frame until the final frame, the Euclidean Error between the estimated true fiducial position and the fiducial position from registration is 3.31 mm (heart), 2.02 (kidney) and 1.27 mm (liver), i.e. within a few millimeters of accuracy. We provide the frame-by-frame tracking of the fidu-  77  cials in Figure 4.21 to show the progression of the tracking over time. For the most part, the registration was maintained accurately and was stable. We noticed that that the registration to the heart tissue began to drift as the registration fell towards the outer edges of the images; we believe that this is the result of the camera distortion model from the camera calibration, suggesting that depth estimation of endoscopic images are sensitive to the camera model used for camera calibration and stereo-triangulation. The accuracy of the registration is affected by a number of assumptions and simplifications to tracking. First, we assumed a rigid homogeneous transformation from frame to frame. Second, we grouped all the tissues in the field of view in order to estimate this transformation. The errors are exacerbated by the sensitivity of the camera calibration due to the short camera disparities of the daVinci laparoscopic stereo-cameras. These results by no means give a full picture of the accuracy of keeping a registration intact over time; rather, they are meant to provide a proof-of-concept to show that maintaining a registration is possible to within a few millimeters.  78  Y  Y  X  X  (a) L  L  R  R  44  30  40  Z  Z  42  38  28 −10  26 −10  −10 −5  −5  −5  −5  0  0  0 0  5 10  X  5  Y  15  5 10  5  X  Y 10  (b)  44  Z  42 40  −10  38 −5  −10 −5 0  0 5 5  10  X  Y 15  (c)  (d)  Figure 4.20: Depth estimation of in-vivo stereo video. (a) shows the stereo matches in the left frame, (b) shows the stereo-triangulated points and estimated camera position, (c) shows the interpolated depth map, and (d) shows the reprojected image. 79  Using Transformations Invidual Fiducial Tracking  Using Transformations Invidual Fiducial Tracking  20  50  15  40 30 Z  Z  10 5  20 10  0  0  0 0  5 10  10  20  0 15  5 10 15  20  Y  0 30 40  40  20 25  20  25  60  50  30  Y  X  80  X  (a)  (b)  Using Transformations Invidual Fiducial Tracking  40 30 Z  20 10 0 −10 −10 0 10 20  0 10 20  30 30 40  40  50 50  Y  60 70 X  (c)  Figure 4.21: Registration and tracking combination. Circles represent starting location of a surface fiducial, and triangles represents the final position of the surface fiducial after tracking through the videos. Tests were performed on (a) kidney, (b) heart, and (c) liver.  80  4.5  Discussion  The ability to localize the tissues in 3D world frame coordinates with respect to the cameras and the surgical tools can lead to techniques in robot-assisted instrument guidance, implementation of virtual textures, etc. A 3D anatomically correct depth mapping of tissue surfaces can provide reconstructions of the in-vivo surgical scene, for reducing the time and invasiveness required for explorative operations such as colonoscopy, and an accurately reconstructed organ can be used for medical image surface-to-surface registration. Furthermore, stereoscopic cameras are becoming more and more common within the surgical environment due to the increasing use of surgical robotics that provide stereoscopic vision to the surgeon through a surgical console. Our simulations have shown that it is possible to provide real-time tracking of in-vivo tissue surfaces using efficient salient feature detectors and descriptors. Whereas previous tracking techniques in computer vision have focused on visionbased tracking techniques for natural scenes and urban environments, we have specifically focused on tracking tissue surfaces as they would appear during a surgical procedure. The photometric properties of tissue surfaces make them particularly difficult to track compared to the natural scenes and urban environments; tissue textures are often visually non-unique when compared to their local tissues, there are few strong edge or corner features that are commonly the basis for tracking in conventional computer vision techniques, there are strong specular reflections due to wet surfaces, and poor lighting causes dynamic shadowing effects. However, using more descriptive features, we have shown that it is possible to attain densely-populated feature maps that can be accurately stereo-matched and triangulated into 3D space. Furthermore, by using efficient feature detectors and descriptors such as the STAR detector and the BRIEF feature extractor, tracking and stereo-triangulation can be performed in real-time. Given that salient features are suitable for object detection, we have developed a preliminary system for tracking a region of tissue using salient features, and have shown experimental tests for a simulated registration case. We believe that by using camera-based registration methods such as the ones described in [79], we may be able to acquire maintain a registration with no extra equipment and little effort to the surgeon. For par-  81  tial nephrectomy, where the kidney is mobilized and clamped and therefore does not experience significant motion effects from patient heartbeat and breathing, the use of rigid-body transformations may be enough for effectively improving surgical guidance. During radical prostatectomy, a mobilized prostate experiences negligible motion effect from patient breathing and heartbeat, and therefore a rigid registration would likely suffice. There are a number of key points that need to be kept in mind. All of the simulations and test cases using the feature tracking framework were performed offline. However, since we would like to apply this to real-time video streams, there are several important things to consider. First, offline processing requires extra time for hard-disk access when extracting frames from the videos and loading into system memory. Second, almost all videos files are encoded for compression, and therefore for every frame extracted, a frame decompression step is required, adding more time. In a real-time system without writing to storage, the video streams can stay in memory as an uncompressed image and be accessed directly, therefore being slightly more efficient. Another issue to consider is that the proposed tracking framework does not identify and handle the effects of instrument occlusions, and therefore a next step would be to investigate possible methods to track the instruments and mask their effects from the salient feature trackers. This can be done through color filtering and other techniques such as straight edge detection. Currently, the STAR+BRIEF framework provides nearly real-time speeds for standard definition video streams. There are several obvious techniques that can be applied in order to significantly improve the frame rates. First, if only a certain region of the tissue needs to be tracked, a region-of-interest (ROI) window can be used to focus on only that tissue region and reduce the amount of feature detection, extraction, and matching that is required. Secondly, unique feature ID’s can be used during stereoscopic matching in order to identify matching features between the left and right frames without needing to compute the descriptor vector Hamming distances. Hamming distances can be computed using the new Intel SSE4.2 population count (POPCNT) instruction to drive descriptor matching times even lower. Finally, implementing multithreading into the algorithm (such as dividing each level of the pyramidal scale space to a separate thread for feature extraction) will significantly improve the speed of the algorithm. 82  The application of stereoscopic triangulation and 3D depth estimation techniques for use on in-vivo tissue scenes will enable the development of new augmented reality and image-guidance techniques for surgery. For robotic surgeries, the ability to localize the tissues in 3D world frame coordinates with respect to the cameras and the surgical tools can lead to techniques in robot-assisted instrument guidance, implementation of virtual fixtures, etc. A 3D anatomically correct mapping of tissue surfaces can provide reconstructions of the in-vivo surgical scene, for reducing the time and invasiveness required for explorative operations such as colonoscopy, and an accurately reconstructed organ can be used for medical image surface-to-surface registration techniques. Furthermore, stereoscopic cameras are becoming more and more common within the surgical environment due to the increasing use of surgical robotics that provide stereoscopic vision to the surgeon through a surgical console. Our evaluation of maintaining a medical image registration based on objecttracking techniques defined in this paper has been shown as a proof of concept rather than a definitive method for maintaining a registration. We had used a rigidbody registration based off of RANSAC model estimation and have shown that for various tracking cases, the error has fluctuated within approximately 3 mm; however, since tissue will experience localized deformations from patient breathing and heartbeat as well as deformations from surgical instrument interactions, it can be expected that errors will increase within a real surgical environment. Accurate medical image registration is already a very difficult problem, and therefore maintaining such accuracies in a deformable environment using a static image is a trivial case. Therefore, more complex registration techniques may need to be applied in order to capture the true movement and deformation of the tissue and maintain a higher registration accuracy over time. However, even without medical image registration, object tracking can still be very useful during a surgical procedure. For example, due to the narrow view of the endoscope/laparoscopes, tumors may be constantly moving in and out of a scene. The use of object tracking can provide persistent cues of tumor boundaries throughout the operation. In cases of visual biopsying such as during colonoscopy, the region tracking techniques presented in this paper can provide medical professionals the ability to create and track boundaries of suspect tissue regions through83  out the biopsy, and automatically find and revisit them with relative ease. Beating heart operations would benefit greatly from tissue region tracking in that the regular heartbeat motion can be localized over interest regions and tracked in 3D, providing a means for enabling motion compensation of surgical instruments. Another open area for research is the use of SLAM techniques for scene reconstructions, such as for camera-based 3D organ modeling or for the purpose of offline virtual endo-cavity exploration.  4.6  Conclusion  In summary, we have presented an overall framework for tracking tissue in 3D within an in-vivo surgical environment and maintaining a medical image registration over time. In particular, the work can be divided into the following three categories: 1. Dense 2D tracking for in-vivo tissue scenes using efficient salient features 2. Stereo matching of salient features for depth estimation 3. Region tracking of tissue surfaces for maintaining a registration We described a dense feature tracking solution based on the STAR feature detector and the BRIEF feature extractor, and showed that it can acquire dense feature maps at real-time speeds in 2D and 3D tracking scenarios. We compared the results to other popular salient features (SURF and SIFT) for a variety of surgical scenes, tissue deformations and camera motions, showing that the STAR+BRIEF algorithm, unlike SURF and SIFT, can reach real-time tracking speeds while still maintaining feature tracking density and accuracy. We extend the salient feature tracking algorithm to support tracking for regions of tissue within a surgical scene using successive rigid registrations. Finally, we performed a series of experiments on different tissue types to show how a registered image moves over time. We show that within these test case, we are able to keep the registration within 1 to 3 mm of its true motion.  84  Chapter 5  Conclusion In this thesis, a novel medical image registration method for registering intraoperative ultrasound to surgical cameras has been presented. We also developed a framework for 3D tissue tracking in the surgical stereoscopic cameras such that a medical image registration can be maintained over time. These two methods are complementary in nature and can be integrated together to present an efficient method for obtaining and maintaining a intraoperative ultrasound registration in the surgical camera views. The target surgical procedures for testing these methods have been partial nephrectomy and radical prostatectomy; however, these methods are not dependent on either surgical procedures and therefore can be applied to a wide array of surgeries for which persistent medical image registration would be beneficial. In the following sections, we will describe a summary of contributions and describe future work.  5.1  Summary of Contributions  • Registration of endoscopic stereo cameras with ultrasound through an air-tissue boundary: We have presented a new method for acquiring a registration between a 3DUS probe or a transrectal US probe with a laparoscopic stereo-camera using an registration tool. The registration tool uses ultrasound fiducials that, when pressed against an imaged tissue, can be found and localized in an ultrasound image, and optical markers that can be seen  85  and triangulated in the stereo camera frames. This provides a direct transformation from the ultrasound coordinate system to the camera coordinate system, thereby avoiding the need for external tracking equipment or the use of robot kinematics. Since no calibration is required for the ultrasound probe and no extra tracking equipment is required, this method provides a highly efficient method for acquiring a fast, repeatable registration in the operating room. This represents a novel registration method where prior methods have relied on external tracking sources, robot kinematics and complicated probe calibrations in order to achieve an image registration. Using our method with a 3DUS probe and the transrectal US probe with the Da Vinci Surgical System stereoscopic laparoscope, we are able to achieve an average target registration error (TRE) of 1.69 ± 0.60 mm and 1.51 ± 0.71 mm respectively. These results represent a higher registration accuracy than previous work involving external trackers [14] and robot kinematics [31]. • Dense feature tracking in endoscopic stereo cameras for maintaining a medical image registration: We have developed a framework for tracking tissue in a laparoscopic stereo-camera in order to maintain an image registration as the underlying tissues move and deform. Prior work in tissue tracking had been limited to either complex salient feature algorithms that could not reach real-time processing rates when constructing and tracking a dense tissue surface profilometry, or model-based tracking techniques that were specific to a particular surgical application and could only capture generalized tissue motion and low-order deformations. Other previous work utilized fast corner detectors with correlation techniques for tracking, but these were not accurate (due to poor texturization of tissue) and were prone to drift. Our tracking system used the STAR detector and BRIEF extractor as the salient-feature algorithms for identifying distinct feature points. These algorithms can be performed extremely quickly due to the use of binary kernels during feature detection and extraction. This is the first time in which STAR and BRIEF have been used on endoscopic images; we successfully showed its capability of acquiring and tracking a dense tissue surface profile in 3D at up to 20 Hz. We compared its performance based on speed,  86  feature density, and tracking persistency with two other common feature detectors used on endoscopic images, SIFT and SURF, showing STAR+BRIEF provide comparable results at much higher frame-rates. We then extended the work to support the maintenance of a medical image registration via anchoring the medical image to local tissue surfaces and moving the registered image with the underlying tissue surfaces. Preliminary in-vitro results on a excised bovine heart, bovine kidney, and porcine liver showed registration accuracies were maintained within 3.31 mm , 2.02 mm and 1.27 mm respectively.  5.2  Future Work  The primary focus of future work will be towards the clinical evaluation of the registration methods, and further experimental validations of registration accuracies. In addition, there is significant room for improvement in the tissue tracking and registration maintenance framework such as improving the efficiency and accuracy of maintaining a registration. In particular, the following areas of research should be investigated: • Extensive validation of camera to ultrasound registration in a clinical setting: In this thesis, we validated the proposed registration method using ultrasound phantoms and in-vitro tissues. Future work should involve the validation of the registration method in animal studies, and potential clinical validations of the registration method. • Automating fiducial localization in the ultrasound and camera images: The registration method presented in this thesis involves a semi-automatic selection of the fiducial locations in the camera frame utilizing corner detectors; however, the localization of fiducials in the ultrasound images are performed manually. Future work involving the automatic detection and segmentation of fiducials from the ultrasound images will provide a seamless registration method that does not require any manual input in order to acquire a registration.  87  • Instrument tracking and masking: The tissue tracking method presented does not take into account surgical instrument occlusions. In order to prevent instrument motion from affecting the estimation of tissue motion, the instruments must be identified and tracked in the stereo camera frame, and then be masked out during feature detection and tracking. • Incorporating optical flow methods for tissue tracking: Optical flow can be useful for predicting the motion of a scene, and if used properly, can provide estimates of high-resolution local movement of tissues. This can reduce the search space required during temporal matching of salient features, thereby increasing the speed and precision of identifying feature movement from frame to frame. • GPU-accelerated 3D tissue tracking: The tissue tracking method presented in this thesis has been catered towards standard PCs; that is, the real-time tracking methods have been selected based on a compromise between speed and the density and accuracy of feature tracking. By using dedicated graphics hardwares for image processing, more complex salient feature detectors and extractors such as SIFT can provide a higher density of tracked features, thereby improving the tracking resolution of local deformation and movement. Furthermore, GPU acceleration will likely be required in order to overlay visually appealing semi-transparent image registrations such as ray-traced or ray-casted ultrasound volumes. • System integration: In order for the presented registration acquisition and maintenance techniques to be useful in a clinical setting, these methods must be integrated into a single system in order to provide a seamless transition between registration of an ultrasound volume and maintaining the registration in the surgical cameras. The benefit of the proposed methods is that they do not rely on any specific hardware and therefore can be easily integrated together and used in a surgical setting. In addition, a way to measure and display tracking accuracy, and display reliability of tracking needs to be investigated, as it is important for the surgeon to know when failure has likely occurred.  88  Bibliography [1] T. Adebar, S. E. Salcudean, S. Mahdavi, M. Moradi, C. Nguan, and S. L. Goldenberg. A robotic system for intra-operative trans-rectal ultrasound and ultrasound elastography in radical prostatectomy. In Information Processing in Computer-Assisted Interventions - IPCAI 2011, volume 6689. Springer-Verlag, 2011. → pages 25 [2] M. Agrawal, K. Konolige, and M. Blas. Censure: Center surround extremas for realtime feature detection and matching. In Computer Vision ECCV 2008, volume 5305 of Lecture Notes in Computer Science, pages 102–115. Springer Berlin / Heidelberg, 2008. → pages xv, 45 [3] A. Baghani, H. Eskandari, S. Salcudean, and R. Rohling. Measurement of viscoelastic properties of tissue-mimicking material using longitudinal wave excitation. IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control, 56(7):1405 –1418, july 2009. → pages 16 [4] T. Bailey and H. Durrant-Whyte. Simultaneous localization and mapping (slam): part ii. IEEE Robotics Automation Magazine, 13(3):108 –117, sept. 2006. → pages 39 [5] S. Baker, D. Scharstein, J. Lewis, S. Roth, M. Black, and R. Szeliski. A database and evaluation methodology for optical flow. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1 –8, Oct. 2007. → pages 40 [6] M. Baumhauer, M. Feuerstein, H.-P. Meinzer, and J. Rassweiler. Navigation in endoscopic soft tissue surgery: Perspectives and limitations. Journal of Endourology, 22(4):751–766, 2008. → pages 3 [7] H. Bay, T. Tuytelaars, and L. Van Gool. Surf: Speeded up robust features. In Computer Vision - ECCV 2006, volume 3951 of Lecture Notes in Computer Science, pages 404–417. Springer Berlin / Heidelberg, 2006. → pages xv, 7, 39, 44, 59 89  [8] C. Bergmeir, M. Seitel, C. Frank, R. Simone, H.-P. Meinzer, and I. Wolf. Comparing calibration approaches for 3d ultrasound probes. International Journal of Computer Assisted Radiology and Surgery, 4:203–213, 2009. → pages 3 [9] J. Bouguet. Camera calibration toolbox for matlab, 2010. → pages 27, 64, 103 [10] D. Burschka, M. Li, M. I. A, R. H. Taylor, and G. D. H. B. Scale-invariant registration of monocular endoscopic images to ct-scans for sinus surgery. In Medical Image Computing and Computer Assisted Intervention - MICCAI 2004, pages 413–421, 2004. → pages 41 [11] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. BRIEF: Binary Robust Independent Elementary Features. In Computer Vision - ECCV 2010, volume 6314 of Lecture Notes in Computer Science, pages 778–792, Berlin, Heidelberg, 2010. Springer Berlin / Heidelberg. → pages xv, 45, 47, 48 [12] G. Carneiro, B. Georgescu, S. Good, and D. Comaniciu. Automatic fetal measurements in ultrasound using constrained probabilistic boosting tree. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2007, volume 4792, pages 571–579. Springer-Verlag, 2007. → pages 35 [13] C. Cheung, C. Wedlake, J. Moore, S. Pautler, and T. Peters. Fused video and ultrasound images for minimally invasive partial nephrectomy: A phantom study. In Medical Image Computing and Computer-Assisted Intervention MICCAI 2010, volume 6363 of Lecture Notes in Computer Science, pages 408–415. Springer Berlin / Heidelberg, 2010. → pages 4 [14] C. L. Cheung, C. Wedlake, J. Moore, S. E. Pautler, A. Ahmad, and T. M. Peters. Fusion of stereoscopic video and laparoscopic ultrasound for minimally invasive partial nephrectomy. volume 7261, page 726109. Society for Photonics and Instrumentation Engineers (SPIE) Medical Imaging 2009: Visualization, Image-Guided Procedures, and Modeling, 2009. → pages 4, 13, 20, 33, 86 [15] N. Cornelis and L. Van Gool. Fast scale invariant feature detection and matching on programmable graphics hardware. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition - CVPR 2008, pages 1 –8, june 2008. → pages 45  90  [16] H. Durrant-Whyte and T. Bailey. Simultaneous localization and mapping: part i. IEEE Robotics Automation Magazine, 13(2):99 –110, june 2006. → pages 39 [17] M. A. Fischler and R. C. Bolles. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM, 24:381–395, June 1981. → pages 57 [18] Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Computational Learning Theory, volume 904, pages 23–37. Springer-Verlag, 1995. → pages 34 [19] S. Giannarou, M. V. Scarzanella, and G.-Z. Yang. Affine-invariant anisotropic detector for soft tissue tracking in minimally invasive surgery. In IEEE International Symposium on Biomedical Imaging - ISBI 2009, pages 1059–1062, 2009. → pages 7 [20] R. Ginhoux, J. Gangloff, M. de Mathelin, L. Soler, M. M. A. Sanchez, and J. Marescaux. Beating heart tracking in robotic surgery using 500 hz visual servoing, model predictive control and an adaptive observer. In IEEE International Conference on Robotics and Automation - ICRA 2004, pages 274–279, 2004. → pages 6 [21] E. Grimson, M. Leventon, G. Ettinger, A. Chabrerie, F. Ozlen, S. Nakajima, H. Atsumi, R. Kikinis, and P. Black. Clinical experience with a high precision image-guided neurosurgery system. In Medical Image Computing and Computer-Assisted Interventation - MICCAI 1998, volume 1496 of Lecture Notes in Computer Science, pages 63–73. Springer Berlin / Heidelberg, 1998. → pages 12 [22] M. Groeger, T. Ortmaier, W. Sepp, and G. Hirzinger. Tracking local motion on the beating heart. volume 4681, pages 233–241. Society for Photonics and Instrumentation Engineers (SPIE) Medical Imaging 2002: Visualization, Image-Guided Procedures, and Display, 2002. → pages 6, 40 [23] G. Guthart and J. Salisbury, J.K. The intuitive (tm) telesurgery system: overview and application. In IEEE International Conference on Robotics and Automation - ICRA 2000, volume 1, pages 618 –621 vol.1, 2000. → pages 1 [24] I. Hacihaliloglu, R. Abugharbieh, A. Hodgson, P. Guy, and R. Rohling. Bone surface localization in ultrasound using image phase based features. Ultrasound in Medicine & Biology, 35(9):1475–1487, 2009. → pages 34 91  [25] B. K. P. Horn. Closed-form solution of absolute orientation using unit quaternions. Journal of the Optical Society of America A, 4(4):629–642, Apr. 1987. → pages 57 [26] P. Hsu, G. Treece, R. Prager, N. Houghton, and A. Gee. Comparison of freehand 3-D (3D) ultrasound calibration techniques using a stylus. Ultrasound in Medicine and Biology, 34(10):1610–1621, Oct. 2009. → pages 3 [27] P. W. Hsu, R. W. Prager, A. H. Gee, and G. M. Treece. Freehand 3D Ultrasound Calibration: A Review. Advanced Imaging in Biology and Medicine, pages 47–84, 2009. → pages 4 [28] G. Janetschek. Laparoscopic partial nephrectomy: how far have we gone? Current opinion in urology, 17(5):316–321, Sept. 2007. → pages 2 [29] D. Kwartowitz, S. Herrell, and R. Galloway. Toward image-guided robotic surgery: determining intrinsic accuracy of the da vinci robot. International Journal of Computer Assisted Radiology and Surgery, 1:157–165, 2006. → pages 4 [30] W. Lau, N. Ramey, J. Corso, N. Thakor, and G. Hager. Stereo-based endoscopic tracking of cardiac surface deformation. In Medical Image Computing and Computer-Assisted Intervention MICCAI 2004, volume 3217 of Lecture Notes in Computer Science, pages 494–501. Springer Berlin / Heidelberg, 2004. → pages 6 [31] J. Leven, D. Burschka, R. Kumar, G. Zhang, S. Blumenkranz, X. Dai, M. Awad, G. Hager, M. Marohn, M. Choti, C. Hasser, and R. Taylor. Davinci canvas: A telerobotic surgical system with integrated, robot-assisted, laparoscopic ultrasound capability. In Medical Image Computing and Computer-Assisted Intervention MICCAI 2005, volume 3749 of Lecture Notes in Computer Science, pages 811–818. Springer Berlin / Heidelberg, 2005. → pages 4, 5, 20, 33, 86 [32] F. Lindseth, G. A. Tangen, T. Langø, and J. Bang. Probe calibration for freehand 3-d ultrasound. Ultrasound in Medicine & Biology, 29(11):1607–1623, 2003. → pages 13 [33] H. Ling and D. W. Jacobs. Deformation invariant image matching. In In ICCV, pages 1466–1473, 2005. → pages 7  92  [34] C. A. Linte, J. Moore, A. D. Wiles, C. Wedlake, and T. M. Peters. Virtual reality-enhanced ultrasound guidance: A novel technique for intracardiac interventions. Computer Aided Surgery, 13(2):82–94, 2008. → pages 12, 13 [35] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60:91–110, 2004. → pages 7, 39, 44, 59 [36] X. Lu´o, M. Feuerstein, T. Reichl, T. Kitasaka, and K. Mori. An application driven comparison of several feature extraction algorithms in bronchoscope tracking during navigated bronchoscopy. In Proceedings of the 5th international conference on Medical imaging and augmented reality, MIAR’10, pages 475–484, Berlin, Heidelberg, 2010. Springer-Verlag. → pages 53 [37] A. Madabhushi, P. Yang, M. Rosen, and S. Weinstein. Distinguishing lesions from posterior acoustic shadowing in breast ultrasound via non-linear dimensionality reduction. In Engineering in Medicine and Biology Society EMBS 2006, pages 3070–3073. IEEE, 2006. → pages 35 [38] J. Matas, O. Chum, M. Urban, and T. Pajdla. Robust wide baseline stereo from maximally stable extremal regions. In British Machine Vision Conference, volume 1, pages 384–393, 2002. → pages 40 [39] L. Mercier, T. Lango, F. Lindseth, and L. D. Collins. A review of calibration techniques for freehand 3-d ultrasound systems. Ultrasound in Medicine and Biology, 31(2):143 – 165, 2005. → pages 3, 13 [40] K. Mikolajczyk and C. Schmid. Indexing based on scale invariant interest points. In IEEE International Conference on Computer Vision - ICCV 2001, volume 1, pages 525 –531 vol.1, 2001. → pages 45 [41] K. Mikolajczyk and C. Schmid. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27:1615–1630, 2005. → pages 7, 38, 39, 44, 45 [42] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Gool. A comparison of affine region detectors. International Journal of Computer Vision, 65:43–72, 2005. → pages 7, 38, 41, 44 [43] M. Moll, H.-W. Tang, and L. Van Gool. Gpu-accelerated robotic intra-operative laparoscopic 3d reconstruction. In Information Processing in 93  Computer-Assisted Interventions, volume 6135 of Lecture Notes in Computer Science, pages 91–101. Springer Berlin / Heidelberg, 2010. → pages 41 [44] K. Mori, D. Deguchi, K. Akiyama, T. Kitasaka, C. Maurer, Y. Suenaga, H. Takabatake, M. Mori, and H. Natori. Hybrid bronchoscope tracking using a magnetic tracking sensor and image registration. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2005, volume 3750 of Lecture Notes in Computer Science, pages 543–550. Springer Berlin / Heidelberg, 2005. → pages 41 [45] P. Mountney, B. Lo, S. Thiemjarus, D. Stoyanov, and G. Zhong-Yang. A probabilistic framework for tracking deformable soft tissue in minimally invasive surgery. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2007, volume 4792 of Lecture Notes in Computer Science, pages 34–41. Springer Berlin / Heidelberg, 2007. → pages 7, 41 [46] P. Mountney, D. Stoyanov, and G.-Z. Yang. Three-dimensional tissue deformation recovery and tracking. IEEE Signal Processing Magazine, 27(July):14–24, 2010. → pages 7 [47] P. Mountney and G.-Z. Yang. Soft tissue tracking for minimally invasive surgery: Learning local deformation online. In Medical Image Computing and Computer-Assisted Intervention MICCAI 2008, volume 5242 of Lecture Notes in Computer Science, pages 364–372. Springer Berlin / Heidelberg, 2008. → pages 41, 53 [48] Y. Nakamura, K. Kishi, and H. Kawakami. Heartbeat synchronization for robotic cardiac surgery. In IEEE International Conference on Robotics and Automation - ICRA 2001, volume 2, pages 2014 – 2019, 2001. → pages 6 [49] J. S. S. B. Nascimento, Rodrigo Ga; Coleman. Current and future imaging for urologic interventions. Current Opinion in Urology, 18(1):116–121, Jan. 2008. → pages 2 [50] N. Oda, J. Hasegawa, T. Nonami, M. Yamaguchi, and N. Ohyama. Estimation of the surface topography from monocular endoscopic images. Optics Communications, 109(3-4):215 – 221, 1994. → pages 40 [51] T. Ortmaier, M. Groger, D. Boehm, V. Falk, and G. Hirzinger. Motion estimation in beating heart surgery. Biomedical Engineering, IEEE Transactions on, 52(10):1729 –1740, Oct. 2005. → pages 6, 40 94  [52] T. C. Poon and R. N. Rohling. Tracking a 3-d ultrasound probe with constantly visible fiducials. Ultrasound in Medicine and Biology, 33(1):152 – 157, 2007. → pages 20 [53] R. W. Prager, R. N. Rohling, A. H. Gee, and L. Berman. Rapid calibration for 3-D freehand ultrasound. Ultrasound in Medicine & Biology, 24(6):855–869, July 1998. → pages 21 [54] O. Pujol, M. Rosales, P. Radeva, and E. Nofrerias-Fern´andez. Intravascular ultrasound images vessel characterization using adaboost. In Functional Imaging and Modeling of the Heart, volume 2674, pages 1006–1006. Springer-Verlag, 2003. → pages 35 [55] R. Richa, A. P. L. Bo, and P. Poignet. Motion prediction for tracking the beating heart. In Engineering in Medicine and Biology Society, 2008. EMBS 2008. 30th Annual International Conference of the IEEE, pages 3261 –3264, aug. 2008. → pages 6 [56] R. Richa, P. Poignet, and C. Liu. Deformable motion tracking of the heart surface. In IEEE/RSJ International Conference onIntelligent Robots and Systems - IROS 2008, pages 3997 –4003, Sept. 2008. → pages 6 [57] R. Richa, P. Poignet, and C. Liu. Efficient 3d tracking for motion compensation in beating heart surgery. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2008, volume 5242 of Lecture Notes in Computer Science, pages 684–691. Springer Berlin / Heidelberg, 2008. → pages 6 [58] R. Richa, P. Poignet, and C. Liu. Three-dimensional motion tracking for beating heart surgery using a thin-plate spline deformable model. International Journal of Robotics Research, 29:218–230, February 2010. → pages 6 [59] M. Sauvee, A. Noce, P. Poignet, J. Triboulet, and E. Dombre. Three-dimensional heart motion estimation using endoscopic monocular vision system: From artificial landmarks to texture analysis. Biomedical Signal Processing and Control, 2(3):199 – 207, 2007. → pages 40 [60] C. Schneider, J. Guerrero, C. Y. Nguan, R. Rohling, and S. E. Salcudean. Intra-operative “pick-up” ultrasound for robot assisted surgery with vessel extraction and registration: A feasibility study. In Information Processing in Computer Assisted Interventions - IPCAI 2011, pages 122–132, 2011. → pages 2, 4 95  [61] J. Shi and C. Tomasi. Good features to track. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition - CVPR 1994), year = 1994, pages = 593 -600, month = jun. → pages 39, 40, 41 [62] S. N. Sinha, J.-m. Frahm, M. Pollefeys, and Y. Genc. GPU-based Video Feature Tracking and Matching. Technical report, 2006. → pages 41, 44 [63] D. Stoyanov, G. Mylonas, F. Deligianni, A. Darzi, and G. Yang. Soft-tissue motion tracking and structure estimation for robotic assisted mis procedures. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2005, volume 3750 of Lecture Notes in Computer Science, pages 139–146. Springer Berlin / Heidelberg, 2005. → pages 40 [64] L.-M. Su, B. P. Vagvolgyi, R. Agarwal, C. E. Reiley, R. H. Taylor, and G. D. Hager. Augmented reality during robot-assisted laparoscopic partial nephrectomy: Toward real-time 3d-ct to stereoscopic video registration. Urology, 73(4):896 – 900, 2009. → pages 5 [65] D. Teber, S. Guven, T. Simpfendorfer, M. Baumhauer, E. O. Guven, F. Yencilek, A. S. Gozen, and J. Rassweiler. Augmented reality: A new tool to improve surgical accuracy during laparoscopic partial nephrectomy? preliminary in vitro and in vivo results. European Urology, 56(2):332 – 338, 2009. → pages 5, 41 [66] The Math Works, Inc. Matlab Statistics Toolbox User’s Guide. 24 Prime Park Way, Natick, MA 01760-1500, 2008. → pages 28 [67] R. U. Thoranaghatte, G. Zheng, F. Langlotz, and L. P. Nolte. Endoscope based hybrid-navigation system for minimally invasive ventral-spine surgeries. Computer Aided Surgery, pages 351–356, 2005. → pages 41 [68] T. Tuytelaars and K. Mikolajczyk. Local invariant feature detectors: a survey. Foundation and Trends in Computer Graphics and Vision, 3:177–280, July 2008. → pages 7, 37, 39, 45 [69] O. Ukimura, I. Gill, M. Desai, A. Steinberg, M. Kilciler, C. Ng, S. Abreu, M. Spaliviero, A. Ramani, J. Kaouk, et al. Real-time transrectal ultrasonography during laparoscopic radical prostatectomy. Journal of Urology, 172(1):112–118, 2004. → pages 22, 33 [70] O. Ukimura and I. S. Gill. Imaging-assisted endoscopic surgery: Cleveland clinic experience. Journal of Endourology, 22(4):803–810, 2008. → pages 2, 36 96  [71] O. Ukimura and I. S. Gill. Augmented reality for computer-assisted image-guided minimally invasive urology, pages 179–184. Contemporary Interventional Ultrasonography in Urology. Springer-Verlag, 2009. → pages 22, 36 [72] O. Ukimura, C. Magi-Galluzzi, and I. Gill. Real-time transrectal ultrasound guidance during laparoscopic radical prostatectomy: impact on surgical margins. Journal of Urology, 175(4):1304–1310, 2006. → pages 33 [73] H. van der Poel, W. de Blok, A. Bex, W. Meinhardt, and S. Horenblas. Peroperative transrectal ultrasonography-guided bladder neck dissection eases the learning of robot-assisted laparoscopic prostatectomy. British Journal of Urology International, 102(7):849–852, 2008. → pages 22, 33 [74] P. Viola and M. J. Jones. Robust real-time face detection. International Journal of Computer Vision, 57(2):137–154, 2004. → pages 34 [75] M. Visentini-Scarzanella, G. Mylonas, D. Stoyanov, and G.-Z. Yang. i-brush: A gaze-contingent virtual paintbrush for dense 3d reconstruction in robotic assisted surgery. In Medical Image Computing and Computer-Assisted Intervention MICCAI 2009, volume 5761 of Lecture Notes in Computer Science, pages 353–360. Springer Berlin / Heidelberg, 2009. → pages 41 [76] C. Wengert, P. Cattin, J. Duff, C. Baur, and G. Szkely. Markerless endoscopic registration and referencing. In Medical Image Computing and Computer-Assisted Intervention MICCAI 2006, volume 4190 of Lecture Notes in Computer Science, pages 816–823. Springer Berlin / Heidelberg, 2006. → pages 53 [77] Willow Garage. Star detector. → pages 46 [78] A. Yilmaz, O. Javed, and M. Shah. Object tracking: A survey. ACM Computer Surveys, 38, Dec. 2006. → pages 37, 50 [79] M. Yip, T. Adebar, R. Rohling, S. Salcudean, and C. Nguan. 3d ultrasound to stereoscopic camera registration through an air-tissue boundary. In Medical Image Computing and Computer-Assisted Intervention - MICCAI 2010, volume 6362 of Lecture Notes in Computer Science, pages 626–634. Springer Berlin / Heidelberg, 2010. → pages iv, 23, 24, 32, 33, 54, 81 [80] N. Zhang. Computing optimised parallel speeded-up robust features (p-surf) on multi-core processors. International Journal of Parallel Programming, 38:138–158, 2010. → pages 45 97  [81] Z. Zhang. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(11):1330 – 1334, nov 2000. → pages 101, 102  98  Appendix A  Other Materials  99  A.1  Two-sample t-test  A two sample t-test is used to determine whether it is possible to reject the hypothesis that two independently random variables with normal distributions have equal means (null hypothesis). The t-test produces a confidence value 0 < p < 1.0, where p represents the probability of incorrectly rejecting the null hypothesis — that is, the probability of incorrectly assuming that the means of the two variables are statistically different. Given n pairs of xi and yi measured values in a sample population, let xˆi = (xi − x) ¯ and yˆi = (yi − y), ¯ where x¯ and y¯ is the mean of x and y respectively. A t value is calculated as: n(n − 1)  t = (xˆ − y) ˆ  . Σni=1 (xˆ − y) ˆ2  Given t, a look-up table of student t-distribution confidence intervals will provide the the value for p. A value of p = 0.05 is often chosen as a value for suggesting statistical significance between the two variables.  A.2  Random Sample Consensus - RANSAC  Below is the pseudo-code for RANSAC, which is a iterative technique for estimating parameters to a model that best fits the data while rejecting outlier influence. In summary, RANSAC selects N model points at random to define a model (in this case a registration between frames, T model). It then sees how many other points fit into this model based on a error measurement consensus threshold. Points that fall within this model are considered to be possible inliers (concensus list). Given enough points (N consensus) in the list of possible inliers, a model is derived from the possible inlier set T consensus, and all possible inliers that match this model up to a certain threshold (T error) are considered true inliers. This processes is repeated max iterations times, and the best model to match the inliers, T best is the returned model. We set the number of iterations to 20, N consensus to 0.5*number of features, and N model to 0.3*number of features. m a x i t e r a t i o n s : = number o f i t e r a t i o n a t t e m p t s m o d e l l i s t : = a s u b s e t o f m a t c h e d f e a t u r e s u s e d f o r a model N model = number o f f e a t u r e s t o u s e f o r model  100  c o n s e n s u s l i s t : = a s u b s e t o f m a t c h e d f e a t u r e s i n c o n s e n s u s w i t h model N c o n s e n s u s = number o f f e a t u r e s t o u s e f o r c o n s e n s u s T model : = T r a n s f o r m a t i o n b a s e d on m o d e l l i s t ( from o b j e c t f r a m e t o c u r r e n t f r a m e ) T c o n s e n s u s : = T r a n s f o r m a t i o n b a s e d on c o n s e n s u s ( from o b j e c t f r a m e t o c u r r e n t f r a m e ) T e r r o r : = t o t a l e r r o r b e t w e e n t r a n s f o r m e d p o i n t s and t r u e l o c a t i o n s T b e s t : = b e s t t r a n s f o r m a t i o n ( from o b j e c t f r a m e t o c u r r e n t f r a m e ) T l e a s t e r r o r : = t o t a l e r r o r b e t w e e n t r a n s f o r m e d p o i n t s and t r u e l o c a t i o n s f o r t h e b e s t transformation c o n s e n s u s t h r e s h o l d : = t h r e s h o l d b e t w e e n t r a n s f o r m e d and t r u e l o c a t i o n s  f o r ( i t e r = 0 , i t e r < m a x i t e r a t i o n s , i t e r += 1 ) { while ( s i z e of c o n s e n s u s l i s t < c o n s e n s u s t h r e s h o l d ){ while ( s i z e of m o d e l l i s t < m o d e l t h r e s h o l d ){ r a n d o m l y s e l e c t a f e a t u r e , add t o m o d e l l i s t } f i n d T model from m o d e l l i s t for a l l other f e a t u r e s not in m o d e l l i s t{ a p p l y T model t o f e a t u r e } i f ( transformed l o c a t i o n − t r u e l o c a t i o n < c o n s e n s u s t h r e s h o l d ){ add f e a t u r e t o c o n s e n s u s l i s t } } f i n d T c o n s e n s u s from c o n s e n s u s l i s t f o r each f e a t u r e s in c o n s e n s u s l i s t { e r r o r = e u c l i d e a n d i s t a n c e b e t w e e n t r a n s f o r m e d and t r u e l o c a t i o n T e r r o r += e r r o r } i f ( T e r r o r < T l e a s t e r r o r ){ T best = T consensus } } return T best  A.3  Camera Calibration for a da Vinci Camera System  Camera calibration is necessary for any augmented reality system, and an accurate camera calibration is essential for stereo cameras. The da Vinci Surgical System’s stereo laparoscope have a left to right camera disparity of approximately 5 mm; therefore, it is important that the camera calibration is performed properly and accurately such that the effects of imprecise camera parameters on depth estimation will be limited. Without knowledge of any camera parameters, it is possible to acquire the camera properties using a series of test images. Zhang et al. [81] provides a flexi101  Figure A.1: Checkerboard pattern as seen in the left camera image (top row) and the right camera image (bottom row). ble technique for camera calibration that has been implemented in both MATLAB (Math Works, Inc., Natick, MA) and OpenCV (Willow Garage, Menlo Park, CA). This method uses a flat, black and white checkerboard pattern with known sizes in order to correlate the 2D camera points to 3D world coordinates (Figure A.1). This is done by moving the checkerboard pattern around in the stereo cameras such that it is seen in both left and right channels. If calibration is performed using a video capture system, the left and right video streams should ideally be genlocked to ensure that the left and right captures image frames are acquired at exactly the same time. If this is not possible, individual images rather than video streams can be captured of the checkerboard patterns. In practice, this is done by moving the checkerboard in the cameras, securing it so that it remains still, and capturing the left and right image. This should be done at least 20 times, with a checkerboard of at with at least 100 corners (e.g. 10 × 10 grid) such that there are 2000 points for the camera to estimate its parameters with. We use a standard PC with an OpenCV framework in order to analyze stillframe images of the checkerboard patterns. An iterative approach is used as described in [81] to estimate the following camera parameters for each camera: 1. focal point, f  102  2. principal point, c 3. camera distortion k 4. skew α Camera distortion is represented as a fifth order polynomial function, where {k0 , k1 , k2 , k3 , k4 , k5 } are the polynomial coefficients. In practice, k5 is often set to 0, as this increases the sensitivity of the camera model to errors in the parameter estimation. Given that a calibration is acquired for each individual channel of the cameras, a stereoscopic calibration can be acquired to relate the two cameras’ frames of reference to one another. The transformations between the left and the right camera frames are computed as Pright = R ∗ Ple f t + T,  (A.1)  where Pright and Ple f t are 3D point coordinates in the right and left frames respectively, and T and R are the translation and rotations from the left camera frame to the right camera frame respectively. We performed a camera calibration on the da Vinci SI system using the OpenCV framework, and the stereo camera parameters are  T  T=  −5.43 0.152 0.00182  and   0.999   R =  −0.00652 0.0242  0.000651  0.0242     0.000565  . −0.000723 0.999 0.999  For a detailed description and tutorials for performing camera calibration, refer to Bouguet’s MATLAB Calibration Toolbox Guide [9].  A.4  Camera Capture System  The da Vinci Surgical Systems capture two independent video streams from the left and right channels of a stereo laparoscope. The videos are in an uncompressed Serial Digital Interface (SDI) format; the original da Vinci Surgical System provides standard definition 720i SDI video streams, whereas the newer da Vinci SI Surgical System provides high definition 1080i HD-SDI video streams. The video 103  streams are then routed from the bedside video cart to the surgical console in order to provide the surgeon with a 3D view of the scene. The SDI and HD-SDI formats are specific to professional video streaming applications and are not supported in commercial capture devices. For this reason, we have put together a stereoscopic camera capture system for high-definition video that will be able to capture two channel SDI and HD-SDI video streams in real-time (60Hz), and output two stereoscopic image streams in HD-SDI format in real-time (60Hz). Our system has the following key components: • 1 × NVIDIA Quadro R SDI Capture Card • 1 × NVIDIA Quadro R 6000 Card • 1 × NVIDIA Quadro R SDI Output Card • 1 × NVIDIA 3D Vision Pro Kit • 1 × PC system running an Intel Core i7 960 processor (4 cores) and 12 GB RAM, with 3 PCI-Express Slots. The NVIDIA Quadro Video Capture Pipeline (Quadro SDI Capture Card, Quadro 6000 Card, Quadro SDI Output Card), shown in Figure A.2, has a four channel HD-SDI input in order to capture up to four genlocked 1080p video streams and save them directly to GPU memory on the Quadro 6000 card. This allows for much faster transfer speeds and avoids any timing and buffering issues arising from transferring data through the motherboard. The system also allows a direct stream from GPU memory to two channels of HD-SDI outputs. Therefore, any modified video streams (such as an medical image overlay) can be streamed from our capture system’s GPU memory to the two video stream inputs on the surgical console, such that the surgeon will be presented with a real-time augmented reality of the laparoscopic scene. In addition to providing the surgeon with an augmented reality view of the endoscopic cameras, the NVIDIA 3D Vision Pro Kit allows for up to 8 surgical staff to view the stereo laparoscopic images on a 3D-enabled monitor using 3D glasses (Figure A.3). The NVIDIA 3D Vision Pro kit uses active shutter glasses, and relies on radio-frequency signals rather than conventional infra-red signals to 104  Figure A.2: NVIDIA Digital Video Pipeline, containing HD capture, GPU, and output card. Images courtesy of http://www.nvidia.com.  Figure A.3: 3D Vision Pro Kit: Radio frequency emitter and a pair of 3D glasses shown. Images courtesy of http://www.nvidia.com. communicate with the glasses in order to minimize crosstalk and ghosting issues. Using a 3D monitor display of the stereoscopic video from the laparoscopic camera provides the surgical staff with the same level of depth perception as in the surgical console. Providing depth perception to the surgical assistants during robot-assisted laparoscopic surgery will improve their understanding of the surgical environment within the body cavity. To the best of our knowledge, these technologies have not been used in an operating room setting. We have tested this system using two 1080i video streams from a da Vinci SI surgical camera system, and have shown that we are able to capture and display the two stereo channels at real-time speeds (60 frames per second).  105  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0072185/manifest

Comment

Related Items