UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Effective image registration for motion estimation in medical imaging environments Miao, Shun 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2016_september_miao_shun.pdf [ 8.67MB ]
JSON: 24-1.0308695.json
JSON-LD: 24-1.0308695-ld.json
RDF/XML (Pretty): 24-1.0308695-rdf.xml
RDF/JSON: 24-1.0308695-rdf.json
Turtle: 24-1.0308695-turtle.txt
N-Triples: 24-1.0308695-rdf-ntriples.txt
Original Record: 24-1.0308695-source.json
Full Text

Full Text

Effective Image Registration for Motion Estimation inMedical Imaging EnvironmentsbyShun MiaoB.Sc., Zhejiang University, 2009M.Sc., North Carolina State University, 2010A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)August 2016c© Shun Miao, 2016AbstractMotion estimation is a key enabler for many advanced medical imaging /image analysis applications, and hence is of significant clinical interest. Inthis thesis, we study image registration for motion estimation in medicalimaging environments, and focus on two clinically interesting problems:1) deformable respiratory motion estimation from dynamic Magnetic Res-onance Imagings (MRIs), and 2) rigid-body object motion estimation (e.g.,surgical devices, implants) from fluoroscopic images.Respiratory motion is a major complicating factor in many image acqui-sition applications and image-guided interventions. Existing respiratorymotion estimation methods typically rely on motion models learned fromretrospective data, and therefore are vulnerable to unseen respiratory mo-tion patterns. To address this limitation, we propose to use dynamic MRIacquisition protocol to monitor respiratory motion, and a scatter-to-volumeregistration method that can directly recover the dense motion fields fromthe dynamic MRI data without explicitly modeling the motion. The pro-posed method achieves significantly higher motion estimation accuracythan the state-of-the-art methods in addressing varying respiratory motionpatterns.Object motion estimation from fluoroscopic images is an enabling tech-nology for advanced image guidance applications for Image-Guided Ther-apy (IGT). Complex and time-critical clinical procedures typically requirethe motion estimation to be accurate, robust and real-time, which cannotbe achieved by existing methods at the same time. We study 2-D/3-Dregistration for rigid-body object motion estimation to address the aboveiichallenges, and propose two new approaches to significantly improve therobustness and computational efficiency of 2-D/3-D registration. We firstpropose to use pre-generated canonical form Digitally Reconstructed Ra-diographs (DRRs) to accelerate the DRR generation during intensity-based2-D/3-D registration, which boosts the computational efficiency by ten-fold with little degradation in registration accuracy and robustness. We fur-ther demonstrate that the widely adopted intensity-based formulation for2-D/3-D registration is ineffective, and propose a more effective regression-based formulation, solved using Convolutional Neural Network (CNN).The proposed regression-based approach achieves significantly higher ro-bustness, capture range and computational efficiency than state-of-the-artintensity-base approaches.iiiPrefaceThe research presented herein is primarily based on 4 conference papersand 3 journal papers, resulting from the collaboration between multiple re-searchers. In all publications, the Ph.D. candidate is the 1st author, and hiscontributions include developing, implementing and evaluating the pro-posed methods.Studies described in Chapter 2 have been published in:• S. Miao, R. Liao, G. Moran, J. Butler, L. Pan, and Z. J. Wang. Dynamicmr-based respiratory motion compensation for hybrid pet/mr sys-tem. In Industrial Electronics and Applications (ICIEA), 2014 IEEE 9thConference on, pages 1915–1920. IEEE, 2014• S. Miao, Z. J. Wang, and R. Liao. Non-parametric orthogonal slice tovolume deformable registration: Application to pet/mr respiratorymotion compensation. In Signal and Information Processing (ChinaSIP),2014 IEEE China Summit & International Conference on, pages 530–534.IEEE, 2014• S. Miao, Z. J. Wang, and R. Liao. Mri-based motion estimation viascatter to volume registration. In Biomedical Imaging (ISBI), 2015 IEEE12th International Symposium on, pages 596–600. IEEE, 2015• S. Miao, Z. J. Wang, L. Pan, J. Butler, G. Moran, and R. Liao. Mri-basedrespiratory motion estimation via scatter to volume registration. InPress, Computerized Medical Imaging and Graphics, 2016. doi:10.1016/j.compmedimag.2016.03.002ivThe contribution of the author was in developing, implementing and eval-uating the method. Li Pan helped with designing dynamic MRI acquisitionprotocol. John Butler and Gerald Moran helped with acquiring data fromcanine studies for the research. Rui Liao and Z. Jane Wang helped withtheir valuable suggestions in improving the methodology.Studies described in Chapter 3 have been published in:• S. Miao, A. Tuysuzoglu, Z. J. Wang, and R. Liao. Real-time 6dof poserecovery from x-ray images using library-based drr and hybrid opti-mization. In Press, International Journal of Computer Assisted Radiologyand Surgery, 2016. ISSN 1861-6429. doi:10.1007/s11548-016-1387-2The contribution of the author was in developing, implementing and eval-uating the method. Ahmet Tuysuzoglu helped with the adaptation of trustregion optimizers for hybrid optimization problems. Rui Liao and Z. JaneWang helped with their valuable suggestions in improving the methodol-ogy.Studies described in Chapter 4 have been published in:• S. Miao, Z. J. Wang, Y. Zheng, and R. Liao. Real-time 2d/3d registra-tion via cnn regression. In Biomedical Imaging (ISBI), 2016 IEEE 12thInternational Symposium on. IEEE, 2016• S. Miao, Z. J. Wang, and R. Liao. A cnn regression approach for real-time 2d/3d registration. IEEE Transactions on Medical Imaging, 35(5):1352–1363, May 2016. ISSN 0278-0062. doi:10.1109/TMI.2016.2521800The contribution of the author was in developing, implementing and eval-uating the method. Rui Liao, Z. Jane Wang and Yefeng Zheng helped withtheir valuable suggestions in improving the methodology.vTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Problems Investigated . . . . . . . . . . . . . . . . . . . . . . 31.2.1 Deformable Respiratory Motion Estimation from Dy-namic MRIs . . . . . . . . . . . . . . . . . . . . . . . . 51.2.2 Rigid-body Object Motion Estimation from Fluoro-scopic Images . . . . . . . . . . . . . . . . . . . . . . . 61.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3.1 Respiratory Motion Estimation from Dynamic MRIs . 71.3.2 Rigid-body Object Motion Estimation from Fluoro-scopic Images . . . . . . . . . . . . . . . . . . . . . . . 101.4 Thesis Objective, Outline and Main Contributions . . . . . . 14vi1.4.1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 141.4.2 Outline and Main Contributions . . . . . . . . . . . . 152 Deformable Respiratory Motion Estimation: Scatter-to-VolumeRegistration of Dynamic MRIs . . . . . . . . . . . . . . . . . . . . 202.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 212.2.1 MRI-based Motion Estimation . . . . . . . . . . . . . 212.2.2 Scatter-to-Volume Registration . . . . . . . . . . . . . 232.2.3 Parameter Selection . . . . . . . . . . . . . . . . . . . 242.3 Numerical Solution and Algorithm . . . . . . . . . . . . . . . 252.3.1 Minimization of the Dissimilarity Energy . . . . . . . 252.3.2 Minimization of the Regularization Energy . . . . . . 252.3.3 Numerical Solution for Smoothing Spline . . . . . . . 282.3.4 Pseudo Code of the Proposed Method . . . . . . . . . 292.3.5 Multi-resolution Strategy . . . . . . . . . . . . . . . . 292.4 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 302.4.1 Smoothing Spline Approximation . . . . . . . . . . . 302.4.2 Synthetic MRI Dataset . . . . . . . . . . . . . . . . . . 322.4.3 Compared Methods . . . . . . . . . . . . . . . . . . . 322.4.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . 352.4.5 Validation on Real MRI dataset . . . . . . . . . . . . . 362.4.6 Motion Corrected PET Imaging . . . . . . . . . . . . . 382.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.5.1 Smoothing Spline Approximation . . . . . . . . . . . 402.5.2 Results on Synthetic MRI Dataset . . . . . . . . . . . . 422.5.3 Results on Real MRI Dataset . . . . . . . . . . . . . . 432.5.4 Motion Corrected PET Imaging . . . . . . . . . . . . . 442.6 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463 Rigid-body Object Motion Estimation: Real-time 2-D/3-D Reg-istration using Library-based DRR . . . . . . . . . . . . . . . . . 503.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50vii3.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.2.1 X-ray Imaging Model . . . . . . . . . . . . . . . . . . 513.2.2 2-D/3-D Registration . . . . . . . . . . . . . . . . . . . 523.3 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 573.4 Library-based Accelerated DRR Generation . . . . . . . . . . 573.4.1 Rigid-body Transformation Parameters . . . . . . . . 573.4.2 Canonical DRR . . . . . . . . . . . . . . . . . . . . . . 623.4.3 Encoding of Canonical DRR . . . . . . . . . . . . . . . 633.5 2-D/3-D Registration using Library-based DRR . . . . . . . 643.5.1 Hill Climbing . . . . . . . . . . . . . . . . . . . . . . . 663.5.2 Nelder-Mead . . . . . . . . . . . . . . . . . . . . . . . 663.5.3 BOBYQA . . . . . . . . . . . . . . . . . . . . . . . . . . 663.6 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 673.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.7.1 Comparison with Other DRR Generation Methods . 683.7.2 Comparison with Other Accelerated 2-D/3-D Regis-tration Methods . . . . . . . . . . . . . . . . . . . . . . 703.7.3 Validation of 6-DoF Tracking . . . . . . . . . . . . . . 723.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 734 Rigid-body Object Motion Estimation: Real-time 2D/3D Regis-tration via CNN Regression . . . . . . . . . . . . . . . . . . . . . . 764.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 764.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . 774.2.1 Revisit Intensity-based 2-D/3-D Registration . . . . . 774.2.2 Regression-based 2-D/3-D Registration . . . . . . . . 784.3 Pose Estimation via Hierarchical Learning . . . . . . . . . . . 804.3.1 Parameter Space Partitioning . . . . . . . . . . . . . . 804.3.2 Local Image Residual . . . . . . . . . . . . . . . . . . 814.3.3 Hierarchical Parameter Regression . . . . . . . . . . . 844.3.4 CNN Regression Model . . . . . . . . . . . . . . . . . 854.4 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . 894.4.1 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 89viii4.4.2 Synthetic Training Data Generation . . . . . . . . . . 924.4.3 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . 934.4.4 Performance Analysis . . . . . . . . . . . . . . . . . . 934.4.5 Comparison with State-of-the-Art Methods . . . . . . 954.4.6 Experiment Environment . . . . . . . . . . . . . . . . 964.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964.5.1 Performance Analysis . . . . . . . . . . . . . . . . . . 964.5.2 Comparison with State-of-the-Art Methods . . . . . . 1014.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1035 Conclusions and Future Works . . . . . . . . . . . . . . . . . . . . 1065.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . 1085.2.1 Clinical Applications of Dynamic-MRI based Respi-ratory Motion Estimation . . . . . . . . . . . . . . . . 1085.2.2 Extensions of CNN Regression-based 2-D/3-D Reg-istration . . . . . . . . . . . . . . . . . . . . . . . . . . 1095.2.3 Global 6-DoF Pose Estimation and Tracking via CNNRegression . . . . . . . . . . . . . . . . . . . . . . . . . 1105.2.4 Deep Learning-based 3-D/3-D Registration of Medi-cal Images . . . . . . . . . . . . . . . . . . . . . . . . . 111Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112ixList of TablesTable 1.1 Thesis outline. . . . . . . . . . . . . . . . . . . . . . . . . . 14Table 2.1 Summary of breathing patterns from 5 volunteers. . . . . 33Table 2.2 Quantitative comparison of respiratory motion estima-tion accuracies on synthetic human MRI data . . . . . . . 41Table 2.3 Quantitative comparison of respiratory motion estima-tion accuracies on real canine MRI data . . . . . . . . . . . 43Table 2.4 Quantitative comparison of PET motion correction methods 45Table 3.1 Results of 6-DoF Tracking. . . . . . . . . . . . . . . . . . . 73Table 4.1 Transformation parameter distributions used for generat-ing training data. . . . . . . . . . . . . . . . . . . . . . . . . 89Table 4.2 RMSE of the 6 transformation parameters yielded byPEHL and CLARET on the training data. . . . . . . . . . . 99Table 4.3 Quantitative experimental results of PEHL and baselinemethods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100xList of FiguresFigure 1.1 Motion-induced artifacts in medical images. . . . . . . . 3Figure 1.2 Motion estiamtion via image registration. . . . . . . . . . 4Figure 1.3 Object motion estimation from fluoroscopic image. . . . 5Figure 1.4 Respiratory motion estimation from MRI. Note: figuresare reproduced from [12]. . . . . . . . . . . . . . . . . . . 5Figure 2.1 Flowchart of the proposed MRI-based motion estimationsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 2.2 Synthetic pattern for evaluating smoothing spline ap-proximation . . . . . . . . . . . . . . . . . . . . . . . . . . 31Figure 2.3 Example of respiratory motion patterns . . . . . . . . . . 33Figure 2.4 Example of lung segmentations and surfaces . . . . . . . 36Figure 2.5 Examples of dynamic MRIs from canine with and with-out artificially added noise. . . . . . . . . . . . . . . . . . 36Figure 2.6 Samples of MRIs acquired from canine study . . . . . . . 39Figure 2.7 RMSE results of the smoothing spline approximation . . 40Figure 2.8 Comparison of registration results on synthetic data . . . 42Figure 2.9 Image residual after registration on canine data . . . . . 44Figure 2.10 Comparison of PET motion correction methods. . . . . . 46Figure 3.1 X-ray Imaging Geometry. . . . . . . . . . . . . . . . . . . 52Figure 3.2 Optimization-based 2-D/3-D registration framework. . . 53Figure 3.3 Perspective geometry of the X-ray imaging system. . . . 58xiFigure 3.4 Demonstration of rotation effects caused by in-planetranslations using fixed axes. . . . . . . . . . . . . . . . . 58Figure 3.5 Demonstration of rotation effects caused by in-planetranslations using moving axes. . . . . . . . . . . . . . . . 59Figure 3.6 Extraction of the canonical DRR from an orginal DRR . . 63Figure 3.7 2-D/3-D registration using libDRR. . . . . . . . . . . . . 64Figure 3.8 Example 2-D/3-D registration data used for validation. . 67Figure 3.9 Comparison of DRR generation methods. . . . . . . . . . 69Figure 3.10 Comparison with the state-of-the-art real-time 2-D/3-Dregistration methods. . . . . . . . . . . . . . . . . . . . . . 71Figure 4.1 Formulation of intensity-based 2-D/3-D registration. . . 78Figure 4.2 Effects of the 6 rigid-body transformation parameters. . 79Figure 4.3 Workflow of LIR feature extraction. . . . . . . . . . . . . 82Figure 4.4 The structure of the CNN regression model. . . . . . . . 87Figure 4.5 Structure of the CNN applied for each input channel. . . 87Figure 4.6 Example 2-D/3-D registration data used for validation. . 91Figure 4.7 Examples of local ROIs and LIRs. . . . . . . . . . . . . . 91Figure 4.8 Example synthetic X-ray images used for training. . . . 92Figure 4.9 HAAR features used in the experiment “Without CNN”. 93Figure 4.10 Success rates of PEHL with 1 to 10 iterations. . . . . . . . 97Figure 4.11 RMSDproj from the registered locations of each target totheir centroid. . . . . . . . . . . . . . . . . . . . . . . . . . 98Figure 4.12 mTREproj before and after registration. . . . . . . . . . . 101xiiGlossaryBOBYQA Bound Optimization By Quadratic ApproximationCC Cross CorrelationCNN Convolutional Neural NetworkCT Computed TomographyDoF Degree of FreedomDRR Digitally Reconstructed RadiographDSC Direct Splatting CorrelationEE End of ExpirationEI End of InspirationFOV Field of ViewFPS Frames Per SecondGC Gradient CorrelationGD Gradient DifferenceHPR Hierarachical Parameter RegressionICE Intracardiac EchocardiographyIGT Image-Guided TherapyxiiiIVUS Intravascular UltrasoundlibDRR Library-Based DRRLIR Local Image ResidualMI Mutual InformationMRI Magnetic Resonance ImagingmTREproj Mean Target Registration Error in the Projection DirectionNCC Normalized Cross CorrelationPCA Principal Component AnalysisPEHL Pose Estimation via Hierarchical LearningPET Positron Emission TomographyPSP Parameter Spacing PartitioningRAM Random Access MemoryrcDRR Ray Casting DRRRTA Rreconstruct Transform AverageRMSDproj Root Mean Squared Distance in the Projection DirectionSGD Stochastic Gradient DescentSHMI Sparse Histogramming MISNR Signal to Noise RatioTEE Transesophageal EchocardiographyTKA Total Knee ArthroplastyVIPS Virtual Implant Planning SystemXEF X-ray Echo FusionxivAcknowledgmentsI would like to express my special appreciation and thanks to my super-visor, Dr. Z. Jane Wang, and co-supervisor, Dr. Rui Liao, for supervisingme throughout my doctoral studies. You have been tremendous mentors tome. I would also like to thank Dr. J. Martin Mckeown, Dr. Rabab Ward, Dr.Tim Little, Dr. Roger Tam and Dr. Carl Michal for serving as my committeemembers and for their brilliant comments and suggestions. I owe particu-lar thanks to Dr. Christophe Chefd’hotel, Dr. Charles Florin, Dr. TommasoMansi and Dr. Comaniciu Dorin in Siemens Medical Solutions for theirsponsorship and support to my doctoral studies. I would also like to thankDr. Li Pan, Dr. Yefeng Zheng, Dr. Ahmet Tuysuzoglu, and many othercolleagues in Siemens Medical Solutions for the insightful discussions onmy doctoral studies.A special thanks to my family. Words cannot express how grateful I amto my parents and parents-in-law for all of the sacrifices that youve madeon my behalf. Your prayer for me was what sustained me thus far. I wouldalso like to thank all of my friends who supported me in writing, and in-cented me to strive towards my goal. In the end, I would like express ap-preciation to my beloved wife Jialun Guo who spent sleepless nights withand was always my support in the moments when there was no one toanswer my queries.xvChapter 1Introduction1.1 MotivationMotions of internal organs, surgical instruments and imaging devices areconstantly presented during medical image acquisitions for both diagnosisand therapies. For instance, respiratory and cardiac motions cause inter-nal organs in the thorax and abdomen to continuously move and deform.Surgical instruments like implants and catheters are manipulated by thephysician throughout the surgical procedure. Interventional imaging de-vices like ultrasound probe and endoscope are operated by the imagingspecialist during the imaging session to acquire images at the target side.Since motions are constantly presented and can significantly impact theoutcome of medical imaging, it is of great clinical interest to effectively es-timate the motions in medical imaging environments, so that they can beaccounted for during image processing / analysis.Motions of the anatomical objects (e.g., internal organs) are very com-mon and almost inevitable during most medical imaging procedures, andthe geometric uncertainties caused by them are a significant source of errorsin medical imaging. In particular, for imaging modalities with an acquisi-tion time longer than a few seconds, e.g., Computed Tomography (CT),Magnetic Resonance Imaging (MRI) and Positron Emission Tomography(PET), motions of the patient’s internal organs caused by breathing and/or1heart beating lead to motion-induced artifacts in the reconstructed image(Fig. 1.1). Motion artifacts, such as blurring, ghosting and distortion arti-facts, can significantly increase the difficulty and error rate of image-baseddiagnosis and therapy planning. For example, motion artifacts can blurthe boundary of lesions and reduce their contrast level in diagnostic im-ages, which make them more difficult to be identified and hence increasesthe chance of missing a cancer diagnosis. In image-based planning for ra-diotherapy treatments, motions of the patient’s internal organs can reducethe precision of lesion localization, which can cause inaccurate radiationtherapy plans. Therefore, it is of significant clinical interest to estimate themotions of internal organs during medial imaging, so that they can be com-pensated in the image reconstruction process.Besides the motions of internal organs, motions of non-anatomical ob-jects, e.g., surgical instruments and imaging devices, are also widely pre-sented during many medical procedures. Motion estimation of these ob-jects is of significantly clinical interest because the motion information canbe utilized to facilitate the interpretation and provide advanced visualiza-tion of the images. In particular, for internal fixation for bone fractures us-ing guided by X-ray fluoroscopy, motion estimation of the fixation devices(e.g., screws and plates) and the bone anatomy can bring them into a virtualenvironment for computer-aided planning for the placement of the screwto achieve maximum stability [11]. In addition, motion estimation of inter-ventional imaging devices, e.g., Transesophageal Echocardiography (TEE)probe, Intracardiac Echocardiography (ICE) probe and Intravascular Ultra-sound (IVUS) probe, can bring different imaging modalities into the samecoordinate system so that they can be processed and visualized in an in-tegrated manner. Such multi-modality image fusion can greatly facilitateminimally invasive Image-Guided Therapy (IGT) procedures that requiremultiple imaging modalities to be used simultaneously to provide comple-mentary information for guidance.In the rest of this chapter, we will first describe the two problems in-vestigated in this thesis (Section 1.2). We then give literature reviews ofthe two problems investigated (Section 1.3). In the last section of this chap-2(a) (b) (c)(d) (e) (f)Figure 1.1: Motion-induced artifacts in medical images. (a) CT withlittle motion artifacts. (b) MRI with little motion artifacts. (c) PETimage reconstructed with respiratory gating, which leads to littlemotion artifacts but has a low SNR. (d) CT with motion artifacts.(e) MRI with motion artifacts. (f) PET image reconstructed di-rectly, showing blurring artifacts caused by respiratory motion.Note: (a) and (d) are reproduced from [8]. (d) and (e) are repro-duced from [9]. (c) and (f) are reproduced from [10].ter, we provide the objectives and an outline of this thesis with the maincontributions (Section 1.4).1.2 Problems InvestigatedThe above clinical context led us to investigate image-based motion esti-mation in medical imaging environments (Fig. 1.2). In particular, we areinterested in recovering the motion from imaging data using image regis-tration methods, as illustrated in Fig. 1.2. Due to the wide range of imagingmodalities employed and the various types of motions presented in med-ical environments, there are many different but still related image regis-3Figure 1.2: Motion estiamtion via image registration.tration scenarios. For example, the imaging modalities include CT, MRI,PET, X-ray fluoroscopy, ultrasound, endoscopy. The types of motion in-clude cardiac motion, respiratory motion, patient movement, surgical de-vice manipulation. In our effort to tackle image-based motion estimationproblems, we investigated two clinically interesting and representative sce-narios, and developed corresponding image registration algorithms. Whilespecific imaging modalities and types of motions were studied in each sce-nario, our entire research was carried out with a goal to solve more generalimage-based motion estimation problems, and we aim to develop imageregistration algorithms that can be applied on multiple imaging modalitiesand types of motions.In the two scenarios, we studied two most commonly presented andimpactful types of motions: 1) deformable respiratory motion and 2) rigid-body object motion (e.g., surgical devices, imaging devices), respectively.We investigated image registration methods to recover these two types ofmotions from 1) dynamic MRI data (Fig. 1.4) and 2) X-ray fluoroscopy data,respectively (Fig. 1.3).The two investigated problems are detailed below.4Figure 1.3: Object motion estimation from fluoroscopic image.Figure 1.4: Respiratory motion estimation from MRI. Note: figures arereproduced from [12].1.2.1 Deformable Respiratory Motion Estimation from DynamicMRIsRespiratory motion is the most common and dominant source of anatom-ical motions in the thorax and abdomen, and therefore, it is of significantclinical interest to reliably and accurately estimate it. The first problemwe investigated is deformable respiratory motion estimation from dynamicMRIs, where we aim to use dynamic MRI data to monitor the underlyingmotion, and use image registration methods to recover the dense motionfield from the MRIs. Existing model-driven motion estimation methodstypically indirectly measure the motion via surrogate signals (e.g., respi-ratory belt, MR navigator and etc.) and fit a motion model to the signals.5A major disadvantage of model-driven methods is that respiratory motionmodels are mostly based on typical respiratory motion patterns, which areoften not valid in real medical imaging environments, especially on pa-tients with pathological abnormalities that affect his/her breathing pattern.As a result, model-driven methods are typically unable to produce accu-rate motion estimation when abnormal breathing patterns are presented.In contrast, our effort is toward directly measuring the underlying motionvia dynamic MRIs and recovering the motion from the images, to achieveaccurate motion estimation of various breathing patterns. It can be appliedto provide detailed lung motion path for studying pulmonary compliance,where the subjects studied often have abnormal breathing patterns. It canalso be applied for motion compensation in MRI-guided interventions, andfor motion compensated PET-MRI imaging.1.2.2 Rigid-body Object Motion Estimation from FluoroscopicImagesThe next problem we investigated is rigid-body object motion estimationfrom the most widely used interventional imaging modality, i.e., X-ray flu-oroscopy. In IGT procedures guided by X-ray fluoroscopy, objects of in-terests like catheters, implants and other imaging devices are often visi-ble in fluoroscopic images. Therefore, these objects’ motion can be esti-mated from the fluoroscopic image using image-based motion estimationmethods. However, since many IGT procedures require a high precision(e.g., deploying prosthetic heart valve) and are time-critical, it is a commonrequirement for the object motion estimation to be highly accurate (e.g.,sub-millimeter), robust (e.g., above 95% success rate) and computationallyefficient (e.g., real-time). While image-based motion estimation has beenstudied for decades and many methods have been proposed, it remainsa big challenge to have one method that meets the above requirements atthe same time. Motivated by the clinical demand, our effort is toward de-veloping an image-driven motion estimation method that meets all aboverequirements so that it can be applied to facilitate a wide range of IGT pro-cedures.61.3 Related Works1.3.1 Respiratory Motion Estimation from Dynamic MRIsBecause of the clinical importance, respiratory motion estimation has beenan actively researched topic in the past decade, and many attempts havebeen made to tackle this challenging problem [13]. A major difficulty inestimating deformable 3-D anatomical motion is that only limited informa-tion of the underlying motion can be acquired dynamically. In particular,modern diagnostic 3-D imaging techniques, e.g., CT, MRI and PET, all haverelatively long image acquisition time and hence are not able to acquirereal-time 3-D images with sufficient Field of View (FOV) and quality forestimating respiratory motion. Therefore, most existing respiratory motionestimation methods use surrogate signals, e.g., MRI navigator echo, bel-lows, spirometer and chest/abdomen displacement, as indirect measure-ments of the underlying motion. To recover the dense 3-D deformationfield of the respiratory motion with a very high Degree of Freedom (DoF)from the surrogate signals typically with a low DoF, strong prior knowl-edge of the respiratory motion is required. Such prior is often presentedas a motion model of the respiratory motion, which can be a simple cyclicmodel, a statistical model or even a bio-mechanical model.The basic idea of respiratory motion model is to exploit the commonal-ity of respiratory motions of human beings, so that the respiratory motioncan be effectively represented by a small number of parameters. Whileit is intuitive that respiratory motions of human beings share significantcommonalities, they also have relatively large variations across differentsubjects and even for the same subject across time. The variations of respi-ratory motions come from many different sources. For example, differentsubjects naturally have their specific breathing patterns. Pathology can alsoaffect a patient’s breathing patterns, as it can cause discomfort to the patientand change the physical properties of the patient’s soft tissue. Even anxietycan have a significant impact on one’s breathing patterns, as people tendto breathe faster and shallower under stress. Given the different sourcesand types of variations for respiratory motions, it is extremely challenging7to construct a model that represents all possible respiratory motions by asmall number of parameters by exploiting the commonality, while still beable to handle all the variations. Indeed, most existing respiratory motionmodeling methods oversimplify the problem by employing a too strongassumption of the breathing pattern and hence tend to produce inaccuratemotion estimation when the assumption is violated.A common assumption made by existing respiratory motion modelingmethods is that the respiratory motion is “repeatable” [13]. Several dif-ferent assumptions on the repeatability of respiratory motions have beenreported in the literature for assisting respiratory motion estimation. Thestrongest assumption is that the motion paths are the same from one breath-ing cycle to the next, which is known as inter-cycle repeatability, and themotion during expiration is the same as the motion during inspiration,which is known as intra-cycle repeatability. Motion models based on thisstrong assumption construct a patient-specific motion model that can usea 1-D scalar value to uniquely indicate breathing states between the Endof Expiration (EE) and End of Inspiration (EI) [14]. As the model has onlyone DoF, application of the model for motion estimation only require a dy-namically acquired 1-D surrogate signal (e.g., spirometer). However, theassumption of intra-cycle repeatability does not alway hold, and researchhas found that the motions during expirations are in general different fromthat during inspirations [15]. To address the intra-cycle variation of respi-ratory motions, many proposals have been made to model the inspirationpath and expiration path separately, i.e., the breathing states during inspi-rations are considered to be different from those during expirations. Whilethese methods allow the motions to be different during inspirations andexpirations, they still assume inter-cycle repeatability [16], which is alsonot always valid. Significant variations in breathing motion paths for thesame subject across breathing cycles and imaging sessions have been re-ported [17]. There have been several attempts to address the inter-cyclevariation. Low et al. and Yang et al. proposed motion models with 5DoF to capture the variability of breathing patterns [18, 19]. Zhang et al.proposed a Principal Component Analysis (PCA) model to cover the vari-8ations of respiratory motions [20]. Although these methods could captureinter-cycle variations in theory, they all require 4-D CT data to construct themotion model, which are acquired using respiratory-gated imaging basedon the assumption of inter-cycle repeatability of respiratory motions. Morespecifically, respiratory-gated data acquired across multiple breathing cy-cles are combined to reconstruct a motion-free 3-D image. This imagingstrategy itself requires the motion to be the same from cycle to cycle at themotion modeling stage.Only a few methods dealt with inter-cycle variation at both the motionmodeling and data acquisition stages. King et al. introduced a PCA modelfor respiratory motion, formed using dynamic 3-D MRI data acquired fromthe same subject [21]. While this method does not assume repeatable respi-ration in neither the PCA modeling nor the dynamic MR imaging, there isan implicit assumption that the PCA model learned from the training datacan cover the variations presented during the model application. However,for a long imaging or treatment session, the patient’s respiratory patterncould change significantly [22]. For example, it is common that a patientbreaths shallower at the beginning of the session because of anxiety, anddeeper as he/she becomes more relaxed. In this case, a PCA model learnedfrom a portion of the session may not be able to provide sufficient motionvariation coverage. In [21], to ensure sufficient coverage, the subjects areinstructed to perform different breathing patterns during the acquisition ofthe training data. In addition, a model applicability test is conducted afterevery model application to check if the model is applicable to the observeddata. If the model is not applicable, more 3-D MRI data will be acquired toupdate the model. In general, this image acquisition strategy proposed by[21] is cumbersome in clinical settings and may not be practical for patientssuffering from painful diseases.Besides cyclic models and statistical models, some groups have also em-ployed physical driven bio-mechanical models to address the intra- andinter-cycle variations of respiratory motions, e.g., using Finite ElementMethod (FEM) to model lung and/or liver during respirations [23–25].Due to the high computational cost of FEM, these methods typically use9simple anatomical models. For example, in [25], a patient’s thorax and ab-domen are modeled as four components, i.e., two lungs, the thorax andsub-diaphragm area. While such simplified and general anatomical mod-els may be able to represent breathing patterns of normal and healthy sub-jects, they are typically not applicable to patients with diseases of the tho-rax and/or the abdomen, which can create unique tissue properties (e.g.,lesions) that cannot be represented by the general anatomical model.1.3.2 Rigid-body Object Motion Estimation from FluoroscopicImagesEstimation of the rigid-body motion of objects from fluoroscopic imagesis a subject that has been actively researched for decades, and many 2-D/3-D registration methods have been proposed to solve this problem.However, real-time pose recovery with high accuracy remains a very chal-lenging problem. In existing methods, accurate rigid-body pose estima-tion is typically achieved using intensity-based 2-D/3-D registration meth-ods [26–29]. In these methods, a simulated fluoroscopic image, referred toas Digitally Reconstructed Radiograph (DRR), is generated from the 3-Dfluoroscopic attenuation map by simulating the attenuation of virtual X-rays. An optimizer is employed to maximize an intensity-based similaritymeasure between the DRR and fluoroscopic images. Intensity-based meth-ods are known to be able to achieve high registration accuracy [30], but atthe same time, they suffer from two major drawbacks: 1) long computa-tion time, and 2) small capture range. Specifically, because intensity-basedmethods involve a large number of evaluations of the similarity measure,each requiring a heavy computation in rendering the DRR, they typicallyrequire above 1 second running time, and therefore are not suitable for real-time applications. In addition, because the similarity measures to be op-timized in intensity-based 2-D/3-D registration methods are often highlynon-convex, the optimizer has a high chance of getting trapped into localmaxima, which makes these methods suffer from a small capture range.The small capture range of intensity-based methods is often addressedby employing initialization methods before registration to make the start-10ing position close enough to the target [31, 32]. However, initializationmethods typically utilize dominant features of the target object for pose es-timate, and therefore, are very application-specific. For example, Varnavaset al. [31] applied Generalized Hough Transform (GHT) for initial pose es-timation of spine vertebrae. This method is specific to applications wherespine vertebrae edges are clearly visible in both the fluoroscopic and CTimages. Miao et al. [32] proposed to use shape encoding combined withtemplate matching for initial pose estimation. This method can only be ap-plied on metal implants, which are highly X-ray opaque objects that can bereliably segmented from fluoroscopic images for shape encoding.In order to improve the computational efficiency of intensity-based 2-D/3-D registration, some efforts have been made toward accelerating DRRgeneration. One strategy for faster DRR generation is sparse sampling,where a randomly selected subset of the pixels are statistically chosen forDRR rendering and similarity measure calculation [33, 34]. However, onlya few similarity measures are suitable to be calculated on a random subsetof the image, i.e., Mutual Information (MI) [33] and Stochastic Rank Corre-lation (SRC) [34]. Another strategy is splatting DRR, which is a voxel-basedvolume rendering technique that directly projects single voxels to the imag-ing plane [35, 36]. One advantage of splatting DRR is that since only voxelsabove a certain threshold are used for rendering, it significantly reducesthe number of voxels to be visited during the rendering process. However,one inherent problem of splatting is that due to aliasing artifacts, the im-age quality of the generated DRR is significantly degraded compared tothe DRR generated by the standard Ray Casting algorithm [37], which cansubsequently degrade the registration accuracy. In addition, the efficientGPU implementation of splatting proposed in [36] is suitable only for spe-cific types of similarity measures such as Cross Correlation (CC).Another strategy for accelerating DRR generation is mesh-based ren-dering, which approximates DRRs from segmentations of the target objectsusing mesh rendering and hence is more computationally efficient thanvolume rendering. Combinations of binary masks have been reported tobe used as approximated DRR for 2-D/3-D registration [38]. While high11computational efficiency can be achieved, the appearance of binary masksand their combinations are very different from standard DRRs, which leadsto different similarity profiles that could negatively affect the registrationaccuracy [39]. A more accurate mesh-based DRR rendering method simu-lates X-ray attenuation by ray casting on a mesh model of the object [39].Although faster than ray casting on the whole volume, ray casting on amesh model is still heavy in computation, especially when the object has acomplex structure that requires a large number of triangles for an accuraterepresentation. Furthermore, this type of methods requires a segmentationstep, and the errors in the segmented model could be propagated into thefinal registration error.Beside 2-D/3-D registration methods, template matching methods havealso been exploited for solving pose recovery problems, aiming at achiev-ing a high computational efficiency and robustness [32, 40–42]. The ba-sic idea of template matching methods is to pre-generate DRRs of the tar-get object with known transformation parameters that cover the parame-ter space with sufficient density, which are known as templates, and thepose of the object in the fluoroscopic image is estimated by matching theimage with the templates. Since the number of templates increases expo-nentially as the number of DoF increases, it is not feasible to generate andstore templates to cover 6 DoF rigid-body transformation. Therefore, manyexisting template matching method only generate rotation templates anduse them to recover rotation [32, 41, 42]. In order to match the rotation tem-plates without knowing the translation, these methods require segmenta-tion of the target object from the image and encode the shape using transla-tion invariant descriptors, e.g., Fourier descriptors [41, 42] and Shape Con-text [32]. Due to the requirement of the segmentation, these methods hasvery limited range of applications, i.e., they are only suitable for applica-tions where the target object has high contrast in fluoroscopic images sothat it can be reliably segmented (e.g., metal implants).Supervised learning has also been explored for 2-D/3-D registration.Several metric learning methods have been proposed to learn similaritymeasures using supervised learning [43, 44]. While learned metrics could12have better capture range and/or accuracy than general purpose similaritymeasures on specific applications or image modalities, 2-D/3-D registra-tion methods using learned metrics still fall into the category of intensity-based methods with a high computational cost. As a new direction, sev-eral attempts have been made recently toward learning regressors to solve2-D/3-D registration problems in real-time [45, 46]. Gouveia et al. [45] ex-tracted a handcrafted feature from the fluoroscopic image and trained aMulti-Layer Perceptron (MLP) regressor to estimate the 3-D transforma-tion parameters. However, the reported accuracy is much lower than thatcan be achieved using intensity-based methods, suggesting that the hand-crafted feature and MLP are unable to accurately recover the underlyingcomplex transformation. Chou et al. [46] computed the residual betweenthe DRR and fluoroscopic images as a feature and trained linear regressorsto estimate the transformation parameters to reduce the residual. Since theresidual is a low-level feature, the mapping from it to the transformationparameters is highly non-linear, which makes it difficult to achieve accurateregistration using linear regressors.In recent years, promising results on object matching for computer vi-sion tasks have been reported using machine learning methods [47–50].While these methods are capable of reliably recovering the object’s loca-tion and/or pose for computer vision tasks, they are unable to meet theaccuracy requirement of 2-D/3-D registration tasks in medical imaging,which often target at a very high accuracy (i.e., sub-millimeter) for diag-nosis and surgery guidance purposes. For example, Wohlhart et al. [47]proposed to train a Convolutional Neural Network (CNN) to learn a posedifferentiating descriptor from range images, and use k-Nearest Neighborfor pose estimation. While global pose estimation can be achieved usingthis method, its accuracy is relatively low, i.e., the success rate for angleerror less than 5 degrees is below 60% for k = 1. Dolla´r et al. [48] proposedto train cascaded regressors on a pose-indexed feature that is only affectedby the difference between the ground truth and initial pose parameters forpose estimation. This method solves 2-D pose estimation from RGB imageswith a large number of iterations, but is not applicable for 2-D/3-D registra-13Table 1.1: Thesis outline.ProblemInvestigatedDeformable respiratory motion esti-mation from dynamic MRIRigid-body object motion estimationfrom fluoroscopic imagesChallengesOnly limited dynamic MRI can beacquired to represent the underlyingmotionIGT procedures require high accu-racy, robustness and speed for mo-tion estimationObjectives Development of a dynamic MRI ac-quisition scheme and an image reg-istration method to recover the res-piratory motion via dynamic MRimagingDevelopment of an all rounded 2-D/3-D registration method that en-ables new image guidance applica-tions IGT proceduresProposedMethodsChapter 2 - Recover respiratory mo-tions from sparsely spaced dynamic2-D MRIs using a scatter-to-volumeregistration methodChapter 3 - Use pre-generated li-brary of canonical DRRs to achievereal-time 2-D/3-D registrationChapter 4 - Formulate 2-D/3-D reg-istration as a regression problemand solved it using CNN to achievehigh accuracy, robustness and real-time performancetion problems. However, we found the idea of using pose indexed featurefor pose regression enlightening for regression-based 2-D/3-D registration,as it reduces the complexity of the regression problem.1.4 Thesis Objective, Outline and Main Contribu-tions1.4.1 ObjectiveThe global objective of this thesis is to develop effective image registra-tion methods for motion compensation in medical imaging environments.As stated above, two practical motion estimation problems are clinicallyinteresting but remain technically challenging: 1) respiratory motion es-timation from dynamic MRI data, and 2) object motion estimation fromfluoroscopic images. The main challenge in the first problem is that the dy-14namic information associated the underlying motion acquired using MRIis often insufficient for recovering of the dense motion field. Therefore,our first objective is to develop a dynamic MRI acquisition scheme and animage registration method to recover the respiratory motion via dynamicMR imaging. The main challenge associated with the second problem isthat high accuracy, robustness and real-time performance are all requiredfor object motion estimation in many IGT procedures. This is particularlychallenging and cannot be achieved by the state-of-the-art methods. There-fore, the second objective of this thesis is to develop an all rounded 2-D/3-Dregistration method that enables new image guidance applications for IGTprocedures.1.4.2 Outline and Main ContributionsThe rest of this thesis is sub-divided into four chapters as outlined below:Chapter 2 - Deformable Respiratory Motion Estimation: scatter-to-volume Registration of Dynamic MRIsIn this chapter, we present an image-driven dynamic MRI-based respira-tory motion estimation approach, aiming at robustly estimate the respira-tory motion from dynamic MRI data in the presence of large variations ofthe respiratory motion pattern. Previous methods rely on learned respi-ratory motion models for motion estimation, which are often vulnerableto intra-subject and/or inter-subject variations of respiratory motion. Inthis work, we estimate the motion purely from dynamic MRI data usingan image registration approach without employing any motion model. Weproposed a dynamic MRI acquisition protocol to acquire MRIs that containrich information of the underlying respiratory motion. We also proposeda scatter-to-volume deformable registration algorithm to register the dy-namic MRIs with a reference MRI to recover dense deformation fields. Theperformance of the proposed method has been evaluated on both syntheticand real MRI datasets, and the results show significant improvements overthe state-of-the-art model-based methods. In addition, we also demon-strated a potential application of the proposed method on MRI-based mo-tion corrected PET imaging using hybrid PET/MRI via simulation studies.15The main contributions of this chapter include:• We have designed a novel dynamic MRI acquisition protocol that iseffective for monitoring respiratory motion. It is able to acquire 2-DMRIs in both sagittal and coronal planes in real-time to capture therespiratory motion.• We have developed a scatter-to-volume registration method that canregister sparsely sampled 3-D intensity data. This method enablesa new possibility of motion estimation from dynamic imaging data,which can often be acquired only sparsely in real-time.• In contrast to the previous model-based motion estimation meth-ods, our method does not rely on any retrospectively learned motionmodel and directly recovers the patient’s respiratory motion from dy-namic 2-D MRI data, which makes it robust to the variation of respi-ratory motion patterns.Chapter 3 - Rigid-body Object Motion Estimation: Real-time 2-D/3-DRegistration using Library-based DRRThis chapter introduces a real-time intensity-based 2-D/3-D registrationmethod that can be applied to efficiently estimate and track an object’spose from its fluoroscopic image. A major limitation of intensity-based2-D/3-D registration is its high computational cost, due to the iterative cal-culation of DRRs and image similarity measures. To increase the computa-tional efficiency, we propose a library-based DRR generation approach. Wefirst parameterize the 6-DoF rigid-body transformation parameters with4 geometry-relevant and 2 geometry-irrelevant parameters. And for allDRRs with the same geometry-relevant parameters, we show that there isa canonical DRR, from which the normal DRRs with arbitrary geometry-relevant parameters can be efficiently generated. This allows efficientDRR generation during registration from a pre-generated library of canon-ical DRRs. Since the DRRs reconstructed from the canonical DRRs canonly have discrete geometry-irrelevant parameters, 2-D/3-D registration16becomes a hybrid optimization problem, i.e., the parameter space is con-tinuous in some dimension, but is discrete in others. To solve this prob-lem, we describe a guideline to adapt existing optimization methods forhybrid optimization, and give 3 examples, i.e., adaptations of Hill Climb-ing, Nelder-Mead and BOBYQA algorithms. Our experiments show thatusing the above efficient library-based DRR generation with a hybrid opti-mization, the computational efficiency of 2-D/3-D registration can be sig-nificantly improved with little degradation in robustness and accuracy.The main contributions of this chapter include:• We have developed a parameterization scheme for rigid-body trans-formation using 2 geometry-relevant (i.e., affecting the object’s shapein the 2-D projection) and 4 geometry-irrelevant parameters. Bytaking into consideration the perspective geometry, the geometry-relevant and -irrelevant parameters can be completely decoupled.• We proposed to pre-compute a library of canonical DRRs to covershape variations corresponding to the 2 geometry-relevant parame-ters, which can be used to efficiently generate DRRs during 2-D/3-Dregistration. The speed-ups gained by our method compared to stan-dard ray-casting DRR rendering enable real-time 2-D/3-D registra-tion.• We have adapted three well-studied optimization methods, HillClimbing, Nelder-Mead and BOBYQA, for a hybrid optimization,i.e., continuous in geometry-irrelevant parameters while discrete ingeometry-relevant parameters, and combined them with library-based DRRs for real-time 2-D/3-D registration. The proposedmethod is significantly faster (up to 10 folds) than the state-of-the-artmethods.Chapter 4 - Rigid-body Object Motion Estimation: Real-time 2D/3D Reg-istration via CNN RegressionThis chapter presents a CNN regression approach to address the two ma-jor limitations of existing intensity-based 2-D/3-D registration methods: 1)17slow computation, and 2) small capture range. Different from intensity-based methods, which iteratively optimize the transformation parametersover a scaler-valued image similarity measure, we propose to solve 2-D/3-D registration as a regression problem, where the information embeddedin the DRR and X-ray images are fully exploited to directly estimate thetransformation parameters. We propose several strategies to simplify themapping to be revealed by regression. We introduce an automatic featureextraction method to calculate 3-D pose-indexed features that are sensitiveto the variables to be regressed while robust to other factors. We also showthat by marginalizing the transformation parameters to be regressed andregressing them hierarchically, we can break down the complex regressiontask into multiple simpler sub-tasks that can be solved more effectively.We finally propose a CNN regression model that is suitable for the 2-D/3-D registration problem. Our approach has been quantitatively evaluatedon 3 potential clinical applications, demonstrating its significant advantagein providing highly accurate real-time 2-D/3-D registration with a signifi-cantly enlarged capture range compared to previous intensity-based meth-ods.The main contributions of this chapter include:• We have introduced a novel formulation of the 2-D/3-D registra-tion problem as a regression problem. In contrast to previousoptimization-based formulation, which optimizes transformation pa-rameters over a 1-D energy function, the proposed regression-basedformulation is able to fully exploit the information from images forthe estimation of the transformation parameters.• We have proposed a 3-D pose index feature and a hierarchical regres-sion strategy to reduce the complexity of the regression problem in2-D/3-D registration, making the regression learning more effectiveand the registration results more accurate.• We have developed a novel CNN architecture with patch-basedweight-sharing for the regression using the proposed 3-D pose in-dexed feature. The proposed architecture retains the expressive18power of standard CNN, while significantly reducing the size of thenetwork, so that it can be computed with a high memory and com-putational efficiency.Chapter 5 - Conclusions and Future WorksThis chapter concludes this thesis by summarizing the addressed problemsand presented results. Some perspectives for future work are listed anddiscussed.19Chapter 2Deformable RespiratoryMotion Estimation:Scatter-to-Volume Registrationof Dynamic MRIs2.1 OverviewIn this chapter, we propose a method to estimate the respiratory motionfrom dynamic 2-D MRIs without employing any motion model, to trulyaddress both inter-cycle and intra-cycle variations of respiratory motion.The work described in this chapter has been published in [4]. As stated inChapter 1, existing respiratory motion estimation methods typically rely onmotion models that assume certain respiratory motion pattern, and there-fore are vulnerable to variations and abnormalities in respiratory motions.In the proposed method, the respiratory motion is directly recovered fromdynamic 2-D MRIs that acquired in sparsely-spaced sagittal and coronalplanes in real-time. The dynamic 3-D motion field is recovered by regis-tering the dynamic 2-D MRIs to a reference 3-D MRI. As the intensities insparsely-spaced 2-D MRIs can be regarded as scattered 3-D data, we pro-pose a scatter-to-volume registration algorithm to register the 2-D MRIs20and the reference 3-D MRI.This chapter is structured as follows. Section 2.2 introduces the prob-lem formulation for MRI-based motion estimation. Section 2.3 provides anumerical solution for the proposed problem formulation. Section 2.4 de-scribes the simulation and experiment methods. Section 2.5 reports the re-sults and compares the proposed method with the state-of-the-art methods.Section 2.6 discusses the strengths and potential limitations of the proposedmethod, as well as future works.2.2 Problem Formulation2.2.1 MRI-based Motion EstimationAs discussed in Section 1.3.1, state-of-the-art respiratory motion estimationmethods are model-based, where a motion model (e.g., cyclic model) is fitto a surrogate signal representing the status of the underlying motion. Afoundational shortcoming of model-based methods is the lack of flexibilityto address varying or unique motion patterns. In this chapter, an MRI-based motion estimation method is proposed, which recovers the motionfield by directly registering MRIs. As flexibility of the estimated motionfield is not limited by a motion model, the proposed method enjoys a po-tential advantage in addressing variations of respiratory motion patterns.The proposed MRI-based motion estimation method consists of twoparts: 1) acquisition strategy of static 3-D and dynamic 2-D MRIs; 2) motionestimation via image registration. A flow-chart of the proposed system isshown in Fig. 2.1. It first takes a static motion-free 3-D MRI as input, de-noted as M : Ω 7→ R, where Ω ⊆ R3 is the imaging domain, typicallycovering the patient’s thoracic and/or abdomen. The static motion-free3-D MRI provides image information at the reference state for motion esti-mation, and can be acquired with breath holding. During the period thatmotion estimation needs to be performed, dynamic 2-D MRIs are repeat-edly acquired from multiple sagittal and coronal slices to provide real-timeimage information. The dynamic 2-D MRIs acquired at a time point i (re-ferred to as a Dynamic 2-D MRI Set) are denoted as Fi : Φ 7→ R, where21Static 3-D MRI M(·)Dynamic MRI F1(·)Dynamic MRI F2(·)Dynamic MRI F3(·)Registration Registration RegistrationMotion Field s1(·)Motion Field s2(·)Motion Field s3(·)Time: TFigure 2.1: Flowchart of the proposed MRI-based motion estimationsystem.Φ ⊆ R3 is the dynamic imaging domain formed by the union of 3-D coor-dinates of pixels from the dynamic 2-D MRIs.The dynamic 2-D MRI sets need to be acquired at a reasonably fast ratein order to capture the motion. We propose acquiring a dynamic 2-D MRIset every 0.5 seconds, which consists of 2-D MRIs in both sagittal and coro-nal slices. For respiratory motion estimation, we heuristically choose to use4 sagittal slices and 2 coronal slices to capture motion in all directions. Suchdynamic 2-D MRI can be acquired within 72 ms per slice using a productMR pulse sequence that is widely available on Siemens MR systems. De-tails of the MR pulse sequence and example images can be found in Section2.4.5.With a static 3-D MRI M(·) and a series of dynamic 2-D MRI sets Fi(·),a motion field si : Ω 7→ R3 is estimated for each time point i by registeringM(·) and Fi(·) using the proposed scatter-to-volume registration method.The dynamic 2-D MRI sets can be regarded as scattered 3-D data, mean-ing that the MRI intensity information is only observed at scattered loca-tions. Therefore, the proposed image registration method is referred to asa scatter-to-volume registration.222.2.2 Scatter-to-Volume RegistrationWe propose a scatter-to-volume registration method to register scattered 3-D data Fi : Φ 7→ R at each time point i with the 3-D volume M : Ω 7→ R toestimate the underlying motion field. As image registration is performedon Fi(·) at each time point separately, the subscript i is omitted in the restof this chapter.Non-parametric deformable registration is typically solved by minimiz-ing an energy function over the deformation field s : Ω 7→ R3:D(s) + λR(s), (2.1)where D(s) is the dissimilarity term and R(s) is the regularization termweighted by λ. The above energy minimization problem is typically solvedusing gradient-based optimization methods (e.g., Gradient Descent) forthe registration of densely observed data. In our case, the image data isonly observed at sparse locations, and hence the deformation field in un-observed areas is purely driven by the regularization term to follow thedata-driven deformation field estimated at observed locations. Since theregularization term typically measures local smoothness, using gradient-based optimization, the impact of the deformation field driven by observedlocations is too local, leading to very inefficient and unreliable propagationto unobserved areas.Inspired by Cachier et al. [51], we calculate the dissimilarity term andthe regularization term on separate deformation fields, and couple themwith an auxiliary term:D(c) + σA(c, s) + σλR(s), (2.2)where c : Φ 7→ R3 and s : Ω 7→ R3 are defined on different domains, drivenby the dissimilarity term and the regularization term, respectively. The twodeformation fields are coupled via the auxiliary term A(c, s), weighted by23σ. The dissimilarity term is defined as the Sum of Squared Distance (SSD):D(c) =1‖Φ‖ ∑x∈Φ‖F(x)−M(x+ c(x))‖2. (2.3)The auxiliary term is defined as:A(c, s) =1‖Φ‖ ∑x∈Φ‖c(x)− s(x)‖2 (2.4)The regularization term is defined as the second order smoothness:R(s) =∫Ω‖∇2s‖2dx =∑i,j∫Ω( ∂2s∂xi∂xj)2dx (2.5)Minimization of Eqn. (2.2) can be solved by alternating optimizationover c and s, which leads to two separate optimization problems:mincD(c) + σA(c, s), (2.6)minsA(c, s) + λR(s). (2.7)The main advantage of this formulation is that Eqn. (2.7) can be solvedin closed form using smoothing spline (as shown in Section 2.3.2), whichallows the data-driven deformation field estimated at observed locations tobe effectively extrapolated to unobserved areas in every iteration.2.2.3 Parameter SelectionThe formulation in Eqn. (2.2) contains two registration parameters σ andλ, which represent the confidence we have in the image intensity similar-ity and the smoothness of the recovered transformation, respectively. Bothparameters need to be selected based on the prior knowledge of the imageand the real transformation. σ influences the contribution of the dissimi-larity term D(c), i.e., a higher value of σ reduces the impact of the imagedissimilarity term D(c). Therefore, it is typically set to the noise level ofthe image. For image acquired using the proposed dynamic MR imaging24protocol, we set the noise level to be 10% of the highest intensity of theimage. In diffusion-based deformable registration methods, the smooth-ness weight λ is directly related to the size of convolution kernel, whichis usually selected heuristically. In our experiment on respiratory motionestimation, we heuristically chose λ to be 5× 10−3.2.3 Numerical Solution and Algorithm2.3.1 Minimization of the Dissimilarity EnergyThe minimization of the dissimilarity energy in Eqn. (2.6) is solved usingthe gradient-descent method. It can be shown that the gradient of the en-ergy function Ed is∇cEd = 2(M(x+ c)− F(x))∇M(x+ c) + 2σ(c− s) (2.8)Using gradient-descent method, c is updated iteratively along the gradientdirection:ci+1 = ci − γ∇cEd, (2.9)where ci denotes c in the i-th iteration. The gradient-descent optimizationis initialized with c0 = s.2.3.2 Minimization of the Regularization EnergyThe minimization of the regularization energy in Eqn. (2.7) amounts to ap-proximating a vector field s(x), x ∈ Ω close to c(x) at x ∈ Φwith a smooth-ness constraint enforced by λR(s), which can in general be solved usingsmoothing splines [52]. The regularization term in Eqn. (2.5) can be calcu-lated as the summation of the smoothness in the 3 components of s:R(s) =3∑k=1H(sk) =3∑k=1∑i,j∫Ω( ∂2sk∂xi∂xj)2dx (2.10)where sk denotes the k-th component of s. Therefore, Eqn. (2.7) can bedecomposed into three scalar smoothing spline problems that can be solved25independently as follows:minsk1nn∑i=1(ck(xi)− sk(xi))2 + λH(sk), k = 1, 2, 3 (2.11)where xi, i = 1, ..., n denotes the pixels in the domain Φ. The above 3 opti-mization problems are in the same form, which can be rewritten as:minη1nn∑i=1(Yi − η(xi))2 + λH(η), (2.12)where Yi and η are corresponding to s(xi) and c(xi) in Eqn. (2.11).We note that H(·) is a special form (i.e., m = 2) ofHm( f ) = ∑i1,...,im∫ ( ∂m f∂xii · · · ∂xim)2dx, (2.13)which is known as quadratic differential. It has been shown in [52] that forHm(·), Eqn. (2.12) has a closed form solution:η(x) =M∑ν=1ανφν(x) +n∑i=1δiE (|xi − x|), (2.14)where {φν}Mν=1 spans the null space of Hm( f ), which is of dimension M =( m+23), δi’s are subject to the constraints STδ = 0 and S is the n×M matrixwith Sij ≡ φj(xi), | · | is the Euclidean distance, andE(x) =Γ(3/2−m)22mpi(m− 1)! x2m−3. (2.15)Therefore, we need to solve α and δ and insert them into Eqn. (2.14) toobtain the minimizer.For δi’s satisfying STδ = 0, it can be shown thatHm(n∑i=1δiE(|xi − x|))=∑i∑jδiδjE(|xi − xj|). (2.16)26Defining the matrix K with Kij ≡ E(|xi− xj|) and combining Eqn. (2.14) andEqn. (2.16) into Eqn. (2.12), we can express the smoothing spline problemin the following matrix form:minα,δ‖Y− Sδ− Kα‖2 + nλδTKδ (2.17)s.t. STδ = 0The solution of this minimization problem involves O(n3) operations,where n is the number of voxels observed in F(·) (i.e., the number of pixelsin dynamic 2-D MRIs used for registration in our application). When nbecomes large, i.e., on the order of 105 in our application, Eqn. (2.17) isimpractical to be solved with a reasonable amount of time and memoryconsumption. To address the complexity issue, an approximated solutionof Eqn. (2.17) is employed here. This approximation method constructs alow-rank smoother using the parameter basis of a given rank that perturbsthe original problem as little as possible [53]. The truncated basis with rankk is denoted as Γk, the columns of which form a k-dimensional basis of δparameter space, so that δ = Γkδk, where δk is a k-vector. Substituting δ byΓkδk in Eqn. (2.17), the minimization problem becomes:minα,δk‖Y− Sα− KΓkδk‖2 + nλδTk ΓTk KΓkδk, (2.18)s.t. STΓkδk = 0.It is shown in [53] that the optimal basis is the truncated eigenbasis of K. Letthe eigendecomposition of K be UDUT, where D is the diagonal matrix ofeigenvalues of E with |Di,i| ≤ |Di+1,i+1|, i = 1, . . . , n− 1, and U is a matrixwhose ith column is the eigenvector corresponding to Di,i. The optimalbasis of rank k is formed by the first k columns of U, denoted as Uk. Let Dkbe the upper left k× k submatrix of D, it can be easily shown that KUk =UkDk and UTk KUk = Dk. Plugging them into Eqn. (2.18), the approximation27to Eqn. (2.17) becomes:minα,δk‖Y− Sα−UkDkδk‖2 + nλδTk Dkδk, (2.19)s.t. STUkδk = 0,which can be solved with O(k3) time complexity.To approximate Uk without computing the complete matrix K, the im-proved Nystrom method [54] is used. Let {zi}ki=1 denotes k < n centroidsof x ∈ Φ using k-mean clustering. Matrix K can be approximated asKˆ =[ATUDˆk]Dˆk[DˆTk UT A], (2.20)where Ai,j ≡ E(|zi − xj|), and U and Dˆk are the eigen system of Bi,j ≡E(|zi − zj|), i.e., B = UΛUT. This approximation provides an approx-imated diagonalization, but the eigenvectors are not orthogonal. Theorthogonalization step is carried out by following the steps. Let Z =[BTUDˆk]Dˆ1/2k , and FΣFT denotes the diagonalization of ZTZ. The ma-trix containing k orthonormalized eigenvectors of Kˆ with largest absoluteeigenvalues can be computed asUˆk = ZFΣ1/2. (2.21)2.3.3 Numerical Solution for Smoothing SplineTo solve the minimization problem in Eqn. (2.19), we first find any orthog-onal column basis Zk such that STUkZkδ = 0 using QR-factorization. δk isrestricted to the null space of STUk spanned by columns of Zk by writingδk = Zkδ˜. (2.22)Eqn. (2.19) is then converted to an unconstrained problem:minα,δ˜‖Y− Sα−UkDkZkδ˜‖2 + nλδ˜ZTk DkZkδ˜. (2.23)28The solution of the above unconstrained minimization problem can becomputed by solving the linear systemATY− (AT A + B)[αδ˜]= 0, (2.24)whereA =[S 00 UkDkZk],andB =[ZTk DkZk 00 0],leading to the solution [αδ˜]= (AT A + B)−1ATY. (2.25)Since matrices A and B only depend on the imaging domain of the dynamic2-D MRIs, the matrix (AT A+ B)−1AT can be pre-computed and re-used inregistration for each dynamic 2-D MRI set as the imaging domain is fixed.After solving α and δk, they are plugged into Eqn. (2.14) to compute s.During the iterations, s only needs to be computed at coordinates involvedin the computation of Eqn. (2.8) to save computation time. Only in the lastiteration, s needs to be computed on the whole FOV to generate the densemotion fields.2.3.4 Pseudo Code of the Proposed MethodPseudo code of the proposed algorithm is shown in Algorithm Multi-resolution StrategyMulti-resolution strategy is widely used in image registration techniquesfor improving computational efficiency and reducing the chance of beingtrapped into local minima. In our scatter-to-volume registration, a multi-resolution strategy is implemented by building a scattered data pyramid.29Algorithm 1 MRI-based Motion Estimation1: compute Uk and Dk following Eqn. (2.20) and (2.21)2: δ← Ukδk3: s(x)← 0 ∀x ∈ Ω4: repeat5: c(x)← s(x) ∀x ∈ Φ6: repeat7: c← c− γ∇cEd ∀x ∈ Φ8: until convergence9: solve Eqn. (2.19) to obtain α and δk)10: compute s(x) ∀x ∈ Φ following Eqn. (2.14)11: until convergence12: compute s(x) ∀x ∈ Ω following Eqn. (2.14) return s(x) ∀x ∈ ΩAt the i-th level, the 2-D slices are down-sampled by 2i−1, and therefore thenumber of data points is reduced by a factor of 4 in each level of the pyra-mid. In each level of the pyramid, Algorithm 1 is applied with s initializedwith the result obtained from the previous level of the pyramid (substituteLine 3 in Algorithm 1).2.4 Experiment Setup2.4.1 Smoothing Spline ApproximationThe proposed scatter-to-volume registration algorithm requires approxi-mation of the smoothing spline, which consists of two steps: 1) approximat-ing the eigensystem of the kernel matrix; and 2) using the approximatedeigensystem to approximate the smoothing spline. We evaluated the errorintroduced by the smoothing spline approximation to demonstrate that itsaccuracy is sufficient for image registration. To make the experiment asclose to the real application as possible, the control points for the smooth-ing spline used in this experiment are pixels from 4 sagittal slices and 2coronal slices similar to our target application, as shown in Fig. 2.2. Eachsagittal slice has 64× 32 pixels, and each coronal slice has 64× 64 pixels,with 8 mm spacing between neighboring slices. Therefore, there are in total2× 642 + 4× 64× 32 = 16384 control points. We didn’t use more control30(a) (b)Figure 2.2: Synthetic pattern for evaluating smoothing spline approx-imation. (a) Control points used for analyzing the accuracy ofsmoothing spline approximation. (b) A coronal slice passingthrough the center of the synthetic pattern.points in this experiment because the smoothing spline without approxi-mation is also performed for the purpose of comparison, which makes theO(N2) space complexity and O(N3) time complexity unaffordable for alarge number of pixels. The smoothing splines with and without approxi-mation were fitted onto a synthetic generated sinusoid pattern for compar-ison. The control point at xi is assigned the value Yi = cos(|xi− x0|), wherex0 is a selected center point. An example of the synthetic pattern is shownin Fig. 2.2(b). Then we fitted the smoothing thin plate spline model to thedata by solving the minimization problem in Eqn. (2.12). We first solvedthe smoothing spline without approximation to obtain the “ground truth”for the approximation. Then we applied the approximation method de-scribed in Eqn. (2.18) and evaluated the approximation error by root-mean-squared-error (RMSE) between the approximated result and the groundtruth. In our experiment, we had λ = 5× 10−3, as this is the value we usedfor motion estimation experiments shown in this chapter. We computedthe approximation error for different k (i.e., the number of eigenvectors ofK to retain) to evaluate the impact of k on the approximation accuracy.312.4.2 Synthetic MRI DatasetIn order to compare different motion estimation methods that require dif-ferent MRI acquisition protocols, we synthetically generated the requiredMRI data for evaluation from a public dataset “Open Access SimulatedPET-MR Datasets Based on Real MR Acquisitions” provided by King’sCollege London [55]. It consists of dynamic 4-D MRIs from 5 volunteers,each 4-D MRI with 70 336 × 336 × 45 dynamic 3D MRIs, a voxel size of1.48× 1.48× 5.50 mm, and a temporal resolution of 0.7 s.The 4-D MRIs are divided into two segments. Segment 1 consists ofthe first 35 frames, and Segment 2 consists of the last 35 frames. For 3volunteers (i.e., volunteer A, B and E), the breathing patterns in the twosegments are noticeably different, with Segment 2 having deeper breath-ing depth than Segment 1. This simulates a common scenario that duringthe PET-MRI acquisition, the patient breaths shallower at the beginningbecause of anxiety and breaths deeper later on as he/she becomes morerelaxed. To measure how different the breathing patterns are between Seg-ment 1 and Segment 2, we computed a metric, named the out-of-range(OOR) rate, which represents the percentage of frames in Segment 2 thathas a diaphragm movement out of the range of the diaphragm movementpresented in Segment 1. Table 2.1 summarizes the OOR rates for the 5volunteers. It is shown that 3 volunteers (A, B and E) have significantlydeeper breathing in Segment 2 than Segment 1. Their OOR rates are 48.5%,31.4% and 65.7%, respectively. Volunteer C and D have much smaller vari-ation in breathing depth across the two segments, with an OOR rate of 0%and 14.2%, respectively. The closest frame to the middle of the diaphragmmovement range from Segment 1 was used to simulate the static 3-D MRI.From each frame of the 4-D MRI, 4 sagittal and 2 coronal slices were ex-tracted to simulate dynamic 2-D MRIs.2.4.3 Compared MethodsWe carefully selected 3 different motion estimation methods from the liter-ature that represent different categories of motion estimation methods, andcompared the proposed method with them. Previous respiratory motion32Table 2.1: Summary of breathing patterns from 5 volunteers.Volunteer A B C D ESegment 1 (mm)a 14.8 20.7 42.9 17.7 13.3Segment 2 (mm)b 16.8 ±12.4 15.5 ±14.6 17.0 ±12.8 9.46 ±7.28 28.4 ±20.4OOR Rate (%)c 48.5% 31.4% 0% 14.2% 65.7%a This row shows the maximum diaphragm movement from the end-expiration (EE)in Segment 1. The maximum range represents breathing patterns that a motionmodel trained on Segment 1 can ideally cover.b This row shows the mean and standard deviation of diaphragm movement fromthe EE in Segment 2. The standard deviation represents the variation of breathingpatterns that need to be recovered.c This row shows the percentage of frames in Segment 2 with diaphragm movementout of the range covered by Segment 1.(a) (b)Figure 2.3: Respiratory motion of volunteer B. (a) A coronal slice,where the yellow line indicates the column used to extract di-aphragm position. (b) Movement of the diaphragm across time.estimation methods are typically based on motion models learned from 3-D images acquired a priori. In fact, to the best of the author’s knowledge,the proposed method is the first image-based motion estimation methodthat recovers motion fields purely from dynamically acquired MRIs. In or-der to simulate the impact of varying breathing patterns, motion modelsused by all 3 compared methods were built from the MRI data in Segment1, and motion estimation was performed and evaluated on the MRI data inSegment 2.• Surrogate driven, average cycle (SD-AC): This method is similar to anumber of average cycle motion models in the literatures [56][57]. In33this method, the 3-D motion is represented as an average breathingcycle. The model consists of dense 3-D motion fields at 5 uniformlydistributed breathing states, obtained by registering the correspond-ing static 3-D MRIs using Demons algorithm. An 1-D breathing sur-rogate signal generated by measuring the position of the diaphragm(as shown in Fig. 2.3) is used to interpolate the model. For input sig-nal values that are outside of the range of the model, the motion fieldat the closest breathing state is used.• Data driven, average cycle (DD-AC): This method uses the same aver-age cycle motion model as described above. However, the motionmodel is driven by dynamic 2-D MRIs, instead of the 1-D breathingsurrogate signal [58]. The input signal t to the motion model is opti-mized to maximize the similarity between the dynamic 2-D MRIs andthe reference image warped using the interpolated motion field:t = arg maxtSim(Mt, F), (2.26)where Mt(·) is the reference image warped using the interpolatedmotion field, Sim(·, ·) is the negative SSD between the images. The1-D optimization of t is performed using Hill Climbing optimizer.• Data driven, PCA model (DD-PCA): This method was proposed in [21]and is the most relevant work to the proposed method. It constructsa patient-specific PCA motion model using a series of free-breathingdynamic 3-D MRIs acquired from the target patient. We used thefirst 5 PCA modes in our experiments as suggested by the originalpaper [21]. To apply the motion model, dynamic 2-D MRIs are ac-quired and the parameters of the PCA model are optimized usingHill Climbing optimizer to minimize the SSD between the 2-D MRIsand the reference 3-D MRI warped using the PCA model. In our ex-periment, the 4-D MRI in Segment 1 were used to construct the mo-tion model, and for a fair comparison, the dynamic 2-D MRIs wereextracted at the same 6 slices used by the proposed method.342.4.4 Evaluation MetricsThe performances of the proposed algorithm and 3 compared algorithmswere validated by comparing the original 3-D MRIs from which the dy-namic 2-D MRIs were extracted (referred to as the ground truth image) andthe static 3-D MRI (referred to as the target image) warped using the esti-mated motion field via the following three metrics:• Mean lung surface distance (LSD) from the warped target image andthe ground truth image, computed as:LSD =√1‖A‖ ∑a∈Aminb∈B‖a− b‖2, (2.27)where A and B denote point sets from the two surfaces.• Dice Similarity Coefficient (DSC) between the lung segmentations.DSC is defined as the overlap between two segmentations [59] (i.e.,from the warped target image and the ground truth image, respec-tively):DSC =2|P ∪Q||P|+ |Q| , (2.28)where P and Q denote the segmentations.• Normalized Cross Correlation (NCC) between the warped target im-age and the ground truth image, computed as [60]:NCC =1N∑x∈Φ(I(x)− I¯)(J(x)− J¯)√∑x∈Φ(I(x)− I¯)2√∑x∈Φ(I(x)− I¯)2, (2.29)where I and J denote the warped target image and the ground truthimage, respectively. Φ ⊆ R3 denotes the 3D locations covered by thedynamic MRIs.The lung segmentation was performed semi-manually. We first binarizedan ROI surrounding the lung using a heuristically selected threshold, and35(a) (b)Figure 2.4: Example of lung segmentations and surfaces used for com-puting LSD and Dice’s coefficient. (a) Coronal slice overlaid withits lung segmentation (b) rendering of the surface of the segmen-tation from (a).(a) W.O noise (b) With noiseFigure 2.5: Examples of dynamic MRIs from canine with and withoutartificially added noise.then cleaned the binary mask by manually removing leakages and fillingholes. The lung surface meshes were generated using the marching-cubealgorithm. An example of the segmentation is shown in Fig. 2.4. The NCCis calculated within a region of interest (ROI) covering the thoracic area toeliminate the impact of the background.2.4.5 Validation on Real MRI dataset36We also validated the efficacy of the proposed MR acquisition strategy andmotion estimation method on MRI datasets acquired using the MR imagingstrategy described in Section 2.2.1 from 2 animal studies on canines. The ca-nine study was approved by the Animal Care Committee of the Universityof Western Ontario (Animal Use Protocol 2011-67). Two adult female, bred-for-research hounds (19-24 kg) were used. Anesthesia was induced usingpropofol and maintained with isoflurane (2%) From each animal study, astatic 3-D MR imaging protocol to acquire static 3-D MRI with breath hold-ing with the following settings:• T1 weighted 3D spoiled gradient echo (turbo FLASH) sequence,GRAPPA PAT mode with an acceleration factor of 4. TR/TE = 3ms/1.42 ms, flip angle = 8 degree, bandwidth 1200 = Hz/Px, field ofview =420×420×182 mm3, acquired image resolution =4.5×1.3×3.5mm3, reconstructed image resolution =1.3×1.3×3.5 mm3, and acqui-sition time = 5.5 s per dataset.From each animal study, 50 sets of dynamic 2-D MRIs were consecu-tively acquired and divided into 2 segments, i.e., the first 25 frames andthe last 25 frames. The breathing magnitudes are observed to be differentin the 2 segments because of the variability of breathing pattern. On thetwo segments, we performed the proposed scatter-to-volume registrationseparately. We implemented such 2-D MR imaging protocol using a prod-uct MR pulse sequence available on the 3.0 Tesla Biograph mMR system(Siemens Healthcare, Erlangen, Germany):• T1 weighted 2D spoiled gradient echo (turbo FLASH) sequence,GRAPPA PAT mode with an acceleration factor of 4. TR/TE = 2.1ms/1.12 ms, flip angle = 8 degree, bandwidth = 1200 Hz/Px, fieldof view = 420×420×8 mm3, acquired image resolution =1.3×4.5×8mm3, reconstructed image resolution =1.3×1.3×8 mm3, and acquisi-tion time = 72 ms per slice.To evaluate the robustness of the proposed methods, we also artificiallyadded Rician noise [61] to the 2-D MRIs, and performed registration on37the noisy data. We used σ = 5% of the maximum intensity for the Riciannoise, which led to∼16 dB signal to noise ratio (SNR) (the ratio between theaverage image intensity excluding background and the root mean squarederror introduced by the noise).In addition to the 4 sagittal and 2 coronal slices used for motion estima-tion, one additional coronal slice in the middle position was also acquiredfor the validation purpose only. The evaluation metrics described in Sec-tion 2.4.4 were evaluated solely on this validation image in order to testthe performance of the proposed method for recovering the motion field inunobserved areas.Example dynamic 2-D and static 3-D MRIs acquired from canines us-ing this protocol are shown in Fig. 2.6(a)(b). Because of the selective ex-citation in MR imaging, acquiring intersecting sagittal and coronal sliceswithin a short time (i.e., less than T1 relaxation time) could lead to satu-ration artifacts (i.e., darker bands) in the intersecting areas, which can beobserved in Fig. 2.6(b). From the acquired 2-D MRIs, we observed that theintensity difference caused by the saturation artifact is relatively small. De-pending on the registration algorithm used, such mild image artifacts mayonly have limited impact on the registration results. Nevertheless, for theintegrity of our method, we excluded pixels from the impacted areas (i.e.,8 mm bands centered at the intersection lines) in the 2-D MRIs and onlyused the remaining pixels for registration (as shown in Fig. 2.6(c)). Pleasenote that such input data with sparsity and irregularity in the space domaincan be naturally supported by the proposed scatter-to-volume registrationmethod as discussed in Section 2.2.2. This is one of the advantages of theproposed framework.2.4.6 Motion Corrected PET ImagingWe applied the proposed motion estimation method on one potential ap-plication, i.e., MRI-based motion corrected PET imaging, to demonstrate itsefficacy. For comparison, we also evaluated DD-PCA and another widelyadopted PET respiratory motion artifact reduction method, RespiratoryGated PET (RGPET) [62]. In RGPET, only the PET data acquired at a spe-38(a) (b) (c)Figure 2.6: Samples of MRIs acquired from canine studies. (a) Sagit-tal and coronal slices from the dynamic 3-D MRI. (b) Sagittaland coronal dynamic 2-D MRIs, in which vertical darker bandscaused by saturation artifacts can be observed. (c) Red bands rep-resent the intersection areas that are excluded from registration.cific breathing state (e.g., EE) is used for PET reconstruction to avoid breath-ing motion artifacts.We used the PET phantom from the “Open Access Simulated PET-MR Datasets”, which was simulated from MRI data using the method de-scribed in [55]. With the simulated PET-MRI data, motion estimation meth-ods were applied on the MRI data to extract motion fields, which were thenfed into the PET reconstruction pipeline to obtain motion corrected PET im-ages using the Rreconstruct Transform Average (RTA) method [63]. In theRTA method, PET data in different time bins is first reconstructed indepen-dently. The reconstructed PET images are then transformed to a referenceframe using the estimated deformation field and averaged to obtain themotion corrected PET image.39 0% 10% 20% 30% 40%24681012x 10−3k/n ratioRMSEFigure 2.7: RMSE results of the smoothing spline approximation as thenumber of retained eigenvectors, k, increases.The following 4 metrics were computed for each lesion to quantita-tively evaluate PET reconstruction quality: 1) Full width at half maximum(FWHM) of the lesion; 2) The lesion’s maximum reconstructed intensity(SUVmax). 3) The lesion’s contrast (ratio between SUVmax and the meanintensity in a region of interest in the surrounding tissue); 4) SNR (ratiobetween SUVmax and the standard deviation of intensities in a region ofinterest in the surrounding tissue).2.5 Results2.5.1 Smoothing Spline ApproximationFig. 2.7 shows the approximation error results as k increases from 2% to40% of n. It is shown that the approximation error decreases fast as k in-creases for k < 0.1n, and much slower for k > 0.2n. At k = 0.2n, the RMSEof the approximation is less than 3× 10−3, which is less than 0.3% of themaximum magnitude of the synthetic sinusoid pattern. Based on this ob-servation, we set k = 0.2n in all the experiments presented in this chapter.40Table 2.2: Means and standard deviations of LSD (mm), DSC and NCC using different motion estimationmethods, calculated over all frames from each volunteer.Metric Method Volunteer A Volunteer B Volunteer C Volunteer D Volunteer ELSDOriginal 4.071±1.184 2.637±1.093 2.671±1.419 2.357±1.206 4.270±2.549SD-AC[56][57] 2.571±1.022 1.784±0.914 1.712±0.516 1.548±0.468 3.232±1.629DD-AC[58] 2.485±1.052 1.782±0.917 1.610±0.426 1.492±0.408 3.135±1.670DD-PCA[21] 2.331±1.070 1.745±0.934 1.606±0.402 1.423±0.499 3.083±1.669Proposed 2.067±0.262 1.643±0.212 1.580±0.290 1.484±0.181 1.944±0.827DSCOriginal 0.871±0.049 0.915±0.033 0.928±0.037 0.931±0.034 0.885±0.065SD-AC[56][57] 0.927±0.048 0.940±0.043 0.966±0.019 0.960±0.018 0.892±0.083DD-AC[58] 0.930±0.045 0.939±0.043 0.967±0.019 0.964±0.014 0.895±0.081DD-PCA[21] 0.932±0.048 0.942±0.043 0.970±0.019 0.971±0.018 0.897±0.083Proposed 0.960±0.009 0.962±0.008 0.970±0.013 0.970±0.007 0.961±0.022NCCOriginal 0.878±0.047 0.935±0.027 0.941±0.031 0.967±0.016 0.883±0.082SD-AC[56][57] 0.925±0.037 0.952±0.037 0.969±0.013 0.982±0.006 0.893±0.097DD-AC[58] 0.926±0.041 0.952±0.038 0.973±0.013 0.984±0.007 0.895±0.099DD-PCA[21] 0.930±0.041 0.953±0.037 0.979±0.014 0.984±0.009 0.896±0.099Proposed 0.942±0.011 0.972±0.010 0.975±0.011 0.987±0.006 0.964±0.02341(a) DD-PCA (b) ProposedFigure 2.8: Comparison of sample registrations using the DD-PCAmethod and the proposed method for volunteer E. The referenceimage is overlaid with lung segmentation of the target imagewarped by the estimated motion field.2.5.2 Results on Synthetic MRI DatasetA summary of the motion estimation results of the 4 methods is shownin Table. 2.2. It is shown that the overall performance of the proposedmethod on 5 volunteer datasets outperformed all other methods on all 3metrics. DD-PCA achieved the second best performance, and consistentlyoutperformed SD-AC and DD-AC. Therefore, we will mainly focus on thecomparison between the proposed method and DD-PCA. We observed thaton datasets with deeper breathing pattern in Segment 2 (i.e. volunteer A, Band E), the proposed method achieved significantly better motion estima-tion accuracy than other methods, demonstrating its advantage in address-ing varying breathing patterns. For example, on the volunteer with thelargest breathing pattern variation (i.e., volunteer E), the proposed methodhas 37% improvement on LSD over the second best method (i.e. DD-PCA).We also observed that the proposed method achieved similar performancecomparing to DD-PCA on datasets with little difference in breathing pat-terns in Segment 1 and 2 (i.e., volunteer C and D).422.5.3 Results on Real MRI DatasetA summary of different evaluation metrics before and after MoCo whenusing the proposed motion estimation method on 2 real MRI datasets is re-ported in Table 2.3. It is noted that, for all test cases, the improvements inall 3 evaluation metrics (i.e., LSD, DSC and NCC) gained by the proposedmethod are similar to that on synthetic data. For both animal studies, thebreathing magnitudes in the two segments are different. In particular, be-fore registration, the differences in the average LSDs in the two segmentsare 11.8% and 9.6% for the two animal studies, respectively. After reg-istration, the average LSDs in the two segments became much closer toeach other (i.e., within 4%), demonstrating the robustness of the proposedmethod to the breathing magnitude. With artificially added ∼16 dB Riciannoise, the proposed method achieved only slightly degraded accuracy (i.e.,less than 4% increase in LSD, less than 1% drop in DSC and almost identi-cal NCC), demonstrating its robustness to noise. Fig. 2.9 shows an exampleof motion estimation on the additionally acquired validation image (whichis not used in motion estimation), and it clearly demonstrates the ability ofthe proposed method to recover motion information in unobserved areas.Table 2.3: Means and standard deviations of LSD (mm), DSC and NCCbefore and after motion compensation on 4 test cases from 2 ani-mal studies. Test cases A.1 and A.2 are the two segments from oneanimal study, and test cases B.1 and B.2 are from the other one.A.1 A.2 B.1 B.2LSD (before) 2.64±1.63 2.95±1.84 3.63±1.33 3.98±1.41LSD (after w.o. noise) 1.37±0.22 1.42±0.22 1.89±0.38 1.95±0.40LSD (after w. noise) 1.43±0.25 1.49±0.26 1.91±0.42 1.98±0.47DSC (before) 0.86±0.06 0.85±0.07 0.85±0.05 0.84±0.06DSC (after w.o. noise) 0.96±0.00 0.96±0.00 0.94±0.02 0.94±0.02DSC (after w. noise) 0.95±0.01 0.95±0.01 0.94±0.02 0.93±0.02NCC (before) 0.94±0.03 0.93±0.04 0.91±0.03 0.90±0.03NCC (after w.o. noise) 0.96±0.01 0.96±0.01 0.94±0.02 0.94±0.02NCC (after w. noise) 0.96±0.01 0.96±0.01 0.94±0.02 0.94±0.0243(a) Before (b) After (c) After (noisy)Figure 2.9: The difference (via subtraction) between the validation dy-namic 2-D MRI (not used in motion estimation) and the corre-sponding slice from the static 3-D MRI. (a): before MoCo, and(b): after MoCo without artificially added noise. (c): after MoCowith artificially added noise.2.5.4 Motion Corrected PET ImagingFig. 2.10 shows an example of PET reconstructions using different mo-tion correction methods. The ground truth reconstruction is shown inFig. 2.10(b), which is obtained by applying the real motion fields used inPET data simulation for motion corrected reconstruction. Please note thatthis image is only used as the “Reference Standard” to show how a mo-tion corrected image would ideally look like, which cannot be achievedin real scenarios. Fig. 2.10(c) shows a directly reconstructed PET imagewithout motion correction, where the boundaries of liver, lung and heartare significantly blurred. The lesions are also shifted and blurred, whichsignificantly degrades the accuracy of the measurement of their sizes andlocations. Fig. 2.10(d) shows a motion corrected PET reconstruction usingDD-PCA. While the sharpness of anatomies is better than the reconstruc-tion without motion correction, the boundaries of lesions and the liver arestill blurred due to the limitation of DD-PCA in estimating motions withvarying breathing patterns. Fig. 2.10(e) shows a reconstructed PET imageusing RGPET. While there is very minimum motion artifacts, the recon-structed image has a lower Signal-to-Noise Ratio (SNR) than other methods44because only a portion of the acquired PET data is used for reconstruction.Finally, we applied the proposed method for PET motion correction andshow the reconstructed PET image in Fig. 2.10(f). It is clearly shown thatthe proposed method resulted in much less motion artifacts comparing toDD-PCA, and much higher SNR comparing to RGPET.Table 2.4 summarizes the percentage change of the 4 evaluation met-rics described in Section 2.4.6 using different MoCo methods comparingto the “Reference Standard” image. All values given are the mean overthe 3 artificially added lesions as shown in Fig 2.10. RGPET was able togreatly reduce the motion artifacts and yielded the best SUVmax (+1.09%)and Contrast (+0.51%) among all evaluated methods. However, because ofthe reduced amount of the PET data used for reconstruction, it also signifi-cantly lowered the SNR (-25.85%). DD-PCA cannot correct motion artifactscaused by varying breathing motion and therefore led to a significant dropin SUVmax (-16.67%). While the SNR drop using DD-PCA also appears tobe significant (-18.91%) as well, it is mainly caused by the drop in SUVmax,and the actual noise level using DD-PCA is close to the “Reference Stan-dard”. In comparison, the proposed method achieved the best overall per-formance among all evaluated methods. Although its SUVmax (-2.45%) andContrast (-1.13%) are slightly worse than that of RGPET, the differences areminimal. On the other hand, the proposed method achieved significantlybetter SNR (-1.27%) than RGPET.Table 2.4: Percentage change of evaluation measurements of PET re-constructions comparing to the “Reference Standard”Method FWHM SUVmax Contrast SNRUncorrected -23.33% -19.99% -43.15% -42.93%RGPET -4.27% +1.09% +0.51% -25.85%DD-PCA -5.47% -16.67% -5.37% -18.91%Proposed -1.2% -2.45% -1.13% -1.27%45(a) (b)(c) (d)(e) (f)Figure 2.10: Demonstration of MRI-based motion correction in PETimaging. (a) The emission image used for PET simulation. (b)A motion corrected PET reconstruction using ground truth mo-tion. (c) A directly PET reconstruction without MoCo. (d) A mo-tion corrected PET reconstruction using DD-PCA. (e) A motioncorrected PET reconstruction using respiratory gating. (f) A mo-tion corrected PET reconstruction using the proposed method.2.6 DiscussionsIn this chapter, we proposed a dynamic MRI-based respiratory motion es-timation methods. When compared to existing model-based methods, themajor advantage of the proposed method is that it does not employ any mo-tion model and therefore is able to address time-varying respiratory motionpatterns. Our results on synthetic data demonstrate that the performance ofmodel-based methods degrades significantly on data with large variationsin their motion patterns (e.g., volunteer E) comparing to that with smallvariations (e.g., volunteer D). In comparison, the proposed method signif-icantly outperformed model-based methods on data with large variationsthat is not observed in the training phase.46While the proposed method has a clear advantage over model-basedmethods in addressing time-varying respiratory motion, we also need tovalidate that it can achieve a similar performance compared to model-based methods when there is only minimum variation in the motion pat-terns. In principle, since the proposed method uses much less data for mo-tion measurement, it could potentially produce less accurate registrationresults when compared to the methods based on gated 3-D MRIs. We positthat a large scale motion, such as respiratory motion, could be reliably andaccurately recovered from sparsely observed data with appropriate regu-larization, such as the smoothness constraint using the smoothing spline.To validate, our experiment on synthetic data included 2 volunteers withsmall to no variation (i.e., C and D), and our results suggest that the pro-posed method performs comparably to methods based on 3-D images inthis case.We also applied the proposed dynamic MRI acquisition protocol on twocanine studies, and evaluated the proposed respiratory motion estimationmethod on real MRI data. We validated that using the proposed protocol, itis possible to acquire MRIs in real-time to capture the canine’s respiratorymotion. The motion estimation results on the real canine MRI data showsthat the proposed method is able to effectively estimate canines’ respiratorymotion purely from the MRI data without employing any motion model,which is consistent with our findings from the experiment using synthetichuman MRI data. Given that the proposed method do not assume anyrespiratory motion pattern, the results on canine studies can ideally be ex-pected to be reproducible on human subjects.A direct application of the proposed methods is motion corrected PETimaging using hybrid PET/MRI systems. Using PET/MRI, PET data andMRI data can be acquired simultaneously, which allows the MRI data to beused for monitoring the patient’s motion during the PET data acquisition.The estimated motion fields can then be used for motion corrected PET im-age reconstruction. We have demonstrated via experiments on syntheticdata the advantage of the proposed method for this application comparedto other widely adopted respiratory motion compensation methods. As47stated above, since the proposed method is robust to the variation of thepatient’s respiratory motion patterns, it can produce more accurate motionestimation and therefore result in PET images with fewer motion artifacts.Compared to respiratory gating methods, which use only the PET data ata specific breathing state for image reconstruction, the proposed methodfully utilizes all PET data and hence produces PET images with higherSNR.It worth noting that artifacts in MRIs can negatively affect the accu-racy of intensity-based image registration methods, including the proposedscatter-to-volume registration method. Such artifacts include the non-standardness of the MR image intensity gray scale, a slow-varying back-ground component of image inhomogeneity, and saturation artifacts (i.e.,darker bands) at the intersections of sagittal and coronal slices using theproposed dynamic MRI protocol. In the proposed method, since the MRIsto be registered are acquired in the same imaging session with MRI coilsplaced at the same locations, the two commonly presented MRI artifacts,intensity non-standardness and inhomogeneity, are consistent for all im-ages to be registered and hence have a small impact on image registration.However, the saturation artifacts (i.e., darker bands) in the dynamic 2-DMRIs as a result of the sequential 2-D MRI acquisition from intersectingplanes can potentially impact the proposed scatter-to-volume registrationbecause the static 3-D MRI does not contain the same artifact. To elimi-nate the impact of the saturation artifact, we exclude affected areas frombeing used for registration, taking advantage of the flexibility of the pro-posed scatter-to-volume registration method that it can register scattereddata that do not form complete images. Other image processing techniquesmay also be employed to minimize the impact of the saturation artifact inimage registration, such as intelligent histogram mapping.Compared to model-based methods, one potential weakness of the pro-posed method is its high computational complexity. Most cyclic model-based methods are computationally efficient in the application stage, whichsimply interpolates the model using surrogate data. In comparison, theproposed method does not employ any model and requires the computa-48tionally intensive scatter-to-volume registration to be performed at all timepoints to produce dynamic motion estimates. Currently, using a purelyMatlab-based implementation without any optimization for speed, the pro-posed image registration algorithm takes 284 seconds for a single registra-tion on an i7-4700MQ CPU @ 2.40GHz. While computational efficiency isnot critical for our main potential application, i.e., motion corrected PET re-construction, which is performed offline, it is important for other potentialreal-time applications. Since the proposed algorithm is fully parallelizable,the computation speed can be significantly improved by an implementa-tion using parallelized computation on GPU. Another potential approachto improve the computational efficiency of the proposed scatter-to-volumeregistration method is to use finite element method (FEM) to approximatethe solution of smoothing spline, which has advantages in higher computa-tional efficiency and more flexible regularizations (i.e., anatomy and phys-ically based).49Chapter 3Rigid-body Object MotionEstimation: Real-time 2-D/3-DRegistration usingLibrary-based DRR3.1 OverviewIn this chapter, we introduce a new approach to achieve real-time intensity-based 2-D/3-D registration. The described work has been published in [5].As stated in Chapter 1, the main cause of the computational inefficiency ofintensity-based 2-D/3-D registration is the iterative generation of DRRs,which imposes a heavy computation. The proposed approach utilizesa pre-computed library of canonical DRRs in conjunction with intensity-based 2-D/3-D registration, to avoid the high computational cost in onlineDRR rendering. By a sensible decomposition of the registration parameters,the pre-computed library only needs to cover a subspace and therefore canbe reasonably compact and feasible to be pre-loaded into the Random Ac-cess Memory (RAM) for efficient access. The resulting registration accuracyusing the library could still be relatively high and comparable to conven-tional 2-D/3-D registration using Ray Casting DRRs (rcDRRs), because the50optimization space is continuous on the 4 geometry-irrelevant parameters(expressed in a closed-form formulation) and densely sampled on the re-maining 2 geometry-relevant parameters (covered by the library).This chapter is structured as follows. Section 3.2.2 describes the back-ground of X-ray imaging and 2-D/3-D registration. Section 3.3 gives a briefoverview of the formulation of intensity-based 2-D/3-D registration. Sec-tion 3.4 describes the proposed efficient Library-Based DRR (libDRR) gen-eration method. We first introduce the decomposition of the 6-DoF trans-formation parameters into geometry-relevant and irrelevant parameters,and then propose a canonical form for DRRs that can be pre-generated of-fline and used online for efficient library-based DRR generation. Section3.5 adapts well-established optimization methods for hybrid optimization(i.e., the parameter space is continuous in some dimensions, but discretein other dimensions), so that they can be employed with the library-basedDRRs to achieve real-time 2-D/3-D registration. Section 3.7 describes ex-periment setups and presents results for validating the proposed approach.Section 3.8 discusses our findings in accelerating 2-D/3-D registration us-ing library-based DRRs.3.2 Background3.2.1 X-ray Imaging ModelFig. 3.1 shows the geometry of X-ray imaging. Assuming that the X-rayimaging system corrects the beam divergence and the X-ray sensor has alogarithm static response, X-ray image generation can be described by thefollowing model:I(p) =∫µ(L(p, r))dr, (3.1)where I(p) is the intensity of the X-ray image at point p, L(p, r) is the rayfrom the X-ray source to point p, parameterized by r, and µ(·) is the X-rayattenuation coefficient. Denoting the X-ray attenuation map of the objectto be imaged as J : R3 → R, and the 3-D transformation from the objectcoordinate system to the X-ray imaging coordinate system as T : R3 →R3, the attenuation coefficient at point x in the X-ray imaging coordinate51Figure 3.1: X-ray Imaging Geometry.system isµ(x) = J(T−1 ◦ x). (3.2)Combining Eqn. (3.1) and Eqn. (3.2), we haveI(p) =∫J(T−1 ◦ L(p, r))dr. (3.3)Note that given J, L and T, a synthetic X-ray image I(·) (i.e., DRR) can becomputed following Eqn. (3.3) using the Ray-Casting algorithm [37].3.2.2 2-D/3-D RegistrationThe task of 2-D/3-D registration is to estimate an object’s 3-D pose fromits 2-D fluoroscopic image. Most 2-D/3-D registration methods follow anoptimization framework that consists of three major components: transfor-mation model, similarity measure and optimization strategy, as shown inFig. 3.2. Transformation model describes how the 3D data is transformed inthe registration process by a certain number of transformation parameters,and it can be categorized into rigid-body transformation and deformabletransformation. Rigid-body transformation can be described by 6 parame-ters (3 for translation and 3 for rotation). Deformable transformation can bemodeled in a variety of ways, including free-form deformation field or sta-tistical shape mode [64–66]. Similarity measure quantitatively measures52Figure 3.2: Optimization-based 2-D/3-D registration framework.the quality of the registration (transformation). Given a transformationmodel and a similarity measure, optimization-based 2-D/3-D registrationis a process of looking for the optimal transformation that maximizes thesimilarity measures using an optimization strategy. The optimization strat-egy can be iterative or heuristic, depending on the formulation of the simi-larity measure.The above-mentioned 2-D/3-D registration framework can be mathe-matically formulated as follows. The similarity measure for the 3D dataI3d and 2D data I2D under rigid-body transformation T is denoted asM(T ◦ I3d, I2D). The registration is then performed by optimizing the sim-ilarity measure:T = arg maxTM(T ◦ I3d, I2D)(3.4)Depending on the similarity measure used, most 2-D/3-D similar-ity measures fall into two main categories: intensity-based and feature-based [26].Feature-based RegistrationFeature-based 2-D/3-D registration methods use similarity measures thatare concerned with the distance between 3D geometrical features, ex-53tracted from the pre-operative 3D data, and the corresponding 2D features.The common geometrical features include point sets, curves and surfaces,based on which feature-based methods can be further classified as point-to-point, curve-to-curve and surface-to-curve registrations.For blood vessels, a typical feature is the centerline of the vessel, and theregistration methods are commonly curve-to-curve [67–73]. Many curve-to-curve methods uses Iterative Closest Point (ICP) algorithm, where thepoint-to-point correspondence between the 3D and 2D curves is establishedin order to determine the distance between them [68, 69]. For point setsA and B, the correspondent point of a ∈ A is the point with minimumdistance from B:f (a) = arg minb∈BD(a, b), (3.5)where D(·, ·) is a distance defined for points in A and B. To find f (a), thedistances from a to all points in B need be computed. The point-to-pointcorrespondence f (a) needs to be established for all points in A, and needsto be updated iteratively because the distance D(·, ·) changes during reg-istration. The heavy computation of establishing point-to-point correspon-dence in 3D and 2D can be avoided using a 2D distance map [68, 72, 73].In 2-D/3-D curve-to-curve registration, the 2D curve is fixed and thereforethe distance from any point in R2 to the curve and can be pre-computedto constructed a distance map D : R2 → R. The 3D curve is projected to2D to directly retrieve the distance map value without computing the pointcorrespondence.For larger objects like bones, surfaces and curves can be used as geo-metrical features for 2-D/3-D surface-to-curve registration [74–77]. Somesurface-to-curves methods use projection strategy, where the 3D surfaceis projected to 2D to generate occluding contour or curve, which is used tocompute the similarity measure against the corresponding curves in the 2Dimage [74, 75]. There is also a back-projection strategy, in which virtual raysare created by connecting the 2D feature points and the X-ray source, andthe similarity measure is computed in 3D measuring the distance betweenthe 3D surface and virtual rays [76, 77].54By utilizing geometric features, feature-based registration methods typ-ically have high computational efficiency and a large capture range. Thedownside of feature-based methods lies in the fact that registration purelyrelies on the extracted geometric features. Therefore, the error in featuredetection step is inevitably propagated into the registration result, and insome cases can be magnified [30]. As a result, feature-based methods arenot suitable for some applications where feature detection itself is a chal-lenging problem.Intensity-based RegistrationIntensity-based 2-D/3-D registration methods produce simulated X-rayimages from the 3D data and measure the similarity between the simu-lated X-ray image and the true X-ray image. Simulated X-ray images aregenerated by Digitally Reconstructed Radiography (DRR) , which accumu-lates Housefield Units (HUs) of the CT data along simulated X-rays to ap-proximate the intensity of X-ray images. The similarity is then measuredon two 2D images (DRR and X-ray) using general purpose similarity mea-sures, e.g. MI [78], CC [79], Gradient Correlation (GC) [80], Gradient Differ-ence (GD) [60], and etc. Histogram based similarity measure, MI, was firstapplied in registering multi-modality 3D images [81], e.g. magnetic reso-nance (MR), positron emission tomography (PET) and CT, because it doesnot assume a linear relationship between the intensities of the two images.MI was later adopted by 2-D/3-D registration to compare DRR and fluo-roscopic images [33]. However, for 2D images, the joint histogram tendsto be more sparsely populated compared to 3D images, resulting in inac-curate MI and failed registration. Parzen-window density estimation is avery widely used technique for estimating histogram with limited numberof samples. Another way to increase the population in each bin of the his-togram is to use less number of bins, but it on the other hand sacrifices thesensitivity of MI. Similar to MI, Entropy of Difference Image (EDI) is an-other histogram-based similarity measure [82]. EDI only computes the 1-Dhistogram of the difference image, and therefore has a much more denselypopulated histogram. In addition, EDI assumes a linear relationship be-55tween the intensities of the two target images. Because DRR is createdbased on the attenuation theory of X-ray, its relationship with the fluoro-scopic image is close to linear, and therefore EDI is suitable for intensity-based 2-D/3-D registration. CC is another similarity measure that is widelyused in 2-D/3-D registration [79]. In CC, the contribution of a structurestrongly depends on the intensity and the size of the structure. Therefore,it mainly responds to large and high-contrast structures. Some similar-ity measures are computed from the gradient of the image, such as GCand GD. These gradient-based similarity measures are typically sensitiveto edges and therefore respond very well to thin and edge-like structures,such as vertebra edge. However, the profile of gradient-based similaritymeasures usually has lots of local maxima, leading to a small capture rangeof the resulting registration algorithm when these similarity measures areused. There are many other image similarity measures applied in 2-D/3-Dregistration, such as Pattern Intensity [83], Correlation Ratio [84]. A com-prehensive review of all image similarity measures is beyond the scope ofthis chapter, and readers are referred to [67] and [26] for more detailed dis-cussion.Compared to feature-based registration methods, intensity-based coun-terparts tend to be more accurate and robust to poor image quality, becausethe similarity measure is directly computed from the DRR and the fluoro-scopic image. However, the speed of intensity-based methods is typicallyslower than feature-based methods because of the high computational costof iterative DRR rendering. In addition, intensity-based similarity mea-sures, especially gradient-based ones, typically have much more local max-imums and a smaller capture range than feature-base counterparts. In thischapter, we focus on improving the computational efficiency of intensity-based 2-D/3-D registration to make it capable of real-time registration.563.3 Problem FormulationThe formulation of intensity-based 2-D/3-D registration can be written asan optimization problem:T = arg maxTM(P(T ◦ I3d), I2D), (3.6)where I2D denotes the X-ray image, I3d denotes the 3-D model, T denotesthe rigid-body transformation, P(·) denotes the DRR generator, andM(·, ·)denotes the image similarity measure to be optimized, e.g., MI, CC. Theabove optimization problem is often solved by derivative-free optimizationmethods, e.g., Hill Climbing [85], Nelder-Mead [86], Bound OptimizationBy Quadratic Approximation (BOBYQA) [87]. Since I3d is a constant vari-able for a given 2-D/3-D registration problem, we denote the DRR gener-ated with transformation T as Idrr(T) = P(T ◦ I3d) for notational simplicity.The low computational efficiency of intensity-based 2-D/3-D registrationis mainly because the DRR generation is a heavy computation and needsto be computed iteratively. In the next section, we will describe an acceler-ation approach for DRR generation using a pre-generated library of DRRs.3.4 Library-based Accelerated DRR Generation3.4.1 Rigid-body Transformation ParametersWe first study parameters for rigid-body transformations, and introduce aparameterization method that allow us to efficiently generate a group ofDRRs from a single template. The rigid-body transformation T representsthe transformation from the 3-D model to the world coordinate system (i.e.,where the X-ray imaging system resides). As shown in Fig. 3.3, in the worldcoordinate system, the center of the X-ray detector is denoted as c, and thehorizontal, vertical and normal directions of the X-ray detector are denotedas (ex, ey, ez), respectively.The rigid-body transformation T can be represented by a translation57Figure 3.3: Perspective geometry of the X-ray imaging system. s, vand t are the X-ray source, the center of the 3-D model and itsprojection on the X-ray detector, respectively. (ex, ey, ez) is thethree axes of the X-ray detector. (e′x, e′y, e′z) is the three movingaxes computed from s and t.Figure 3.4: Demonstration of rotation effects caused by in-plane trans-lations using fixed axes (ex, ey, ez). (A) is the original positionof the object. (B) is translated from (A) by an in-plane trans-lation (∆x,∆y). (C) is rotated from (A) by an in-plane rotation∆rz = 180◦. A reference vector is drawn on the object to show itsorientation. It is shown that both in-plane translation and rota-tion cause out-of-plane rotation effect, although the out-of-planerotation parameters remain unchanged.58Figure 3.5: Demonstration of rotation effects caused by in-plane trans-lations using moving axes (e′x, e′y, e′z). (A) is the original positionof the object. (B) is translated from (A) by an in-plane transla-tion (∆x,∆y). (C) is rotated from (A) by an in-plane rotation∆rz = 180◦. A reference vector is drawn on the object to showits orientation. It is shown that the moving axes correct the out-of-plane rotation effect of in-plane translation and rotation.vector v and an orientation matrix F:T ◦ p = F · p+ v, (3.7)where v is the center of the 3-D model in the world coordinate system, andF = [ f x, f y, f z] is the three axes of the 3-D model in the world coordinatesystem. We denote the projection of v on the X-ray detector as t (as shownin Fig. 3.3). Since the coordinate of the X-ray source is given in our 6-DoFpose estimation problem (denoted as s), the coordinate v can be uniquelydetermined by its projection t and its distance to the X-ray source, ‖v −s‖2. The projection t can be parameterized by its 2-D coordinate in theX-ray image, denoted as (x, y), such that t = x · ex + y · ey. The distance‖v − s‖2 can be parameterized by a magnification factor m = ‖v − s‖2 /‖t − s‖2. Therefore, the translation vector v can be parameterized by the 359parameters (x, y, m):v =m− 1ms+1m(x · ex + y · ey). (3.8)The orientation matrix F can be defined by three orthogonal axes inthe world coordinate system, e.g., the three axes of the X-ray detector(ex, ey, ez), and a rotation matrix R(rx, ry, rz) parameterized by 3 angles:F = [ex, ey, ez] · R(rx, ry, rz), (3.9)whereR(rx, ry, rz)=cos(rz) −sin(rz) 0sin(rz) cos(rz) 00 0 1cos(ry) 0 −sin(ry)0 1 0sin(ry) 0 cos(ry)1 0 00 cos(rx) −sin(rx)0 sin(rx) cos(rx)(3.10)According to Eqn. (3.7-3.9), the rigid-body transformation T can be pa-rameterized by 6 parameters (x, y, m, rx, ry, rz). The translation, magnifi-cation and in-plane rotation parameters (x, y, m, rz) largely affect the posi-tion, size and orientation of the object in the DRR, and hence are referred toas geometry-irrelevant transformations/parameters. The two out-of-planerotation parameters (rx, ry) largely affect the shape and appearance of theobject in the DRR, and hence are referred to as geometry-relevant transfor-mations/parameters. Such decoupling of the transformation parametersinto geometry-irrelevant and -relevant subgroups has been reported to beadopted in template matching methods to reduce the DoF covered by thetemplate [88] and in C-Arm calibration method to reduce the number ofvariations [89]. This is also critical in our DRR library generation, becausewith such decoupling, the library only needs to cover the subgroup of 2geometry-relevant parameters, as will be discussed in the next section.60It worth noting that using the three axes of the X-ray detector (ex, ey, ez)for calculating the orientation matrix, in-plane translation (x, y) and rota-tion rz are not strictly geometry-irrelevant. As shown in Fig. 3.4, chang-ing (x, y) and rz can cause slight out-of-plane rotation effect, although theout-of-plane rotation parameters (rx, ry) remain unchanged. Therefore, inorder to completely decouple geometry-relevant and -irrelevant transfor-mations, we calculate the orientation matrix using three corrected axes, de-noted as (e′x, e′y, e′z), which are defined based on (x, y) to compensate forout-of-plane rotation effects introduced by in-plane translation and rota-tion. We first define the in-plane rotation axis e′z as a unit vector in thedirection from the X-ray source to the center of the 3-D model:e′z =s− t‖s− t‖2=s− (x · ex + y · ey)‖s− (x · ex + y · ey)‖2 .(3.11)Rotation around e′z leads to pure 2-D rotation in DRRs without out-of-planerotation effect, as illustrated in Fig. 3.5. We then constrain the out-of-planerotation axes e′x and e′y to be orthogonal to e′z. Axis e′y is obtained by pro-jecting ey into the plane that is orthogonal to e′z, calculated as:e′y =ey − 〈ey, e′z〉e′z‖ey − 〈ey, e′z〉e′z‖2, (3.12)where 〈·, ·〉 denotes the inner product of the two vectors. We further calcu-late axis e′x as the cross product of e′z and e′y:e′x = e′z × e′y. (3.13)Since the out-of-plane rotation axes e′x and e′y remain orthogonal to theviewing direction e′z, changing in-plane translation (x, y) does not affect theout-of-plane appearance of the object in the DRR, as illustrated in Fig. 3.5.s613.4.2 Canonical DRRSince the 4 geometry-irrelevant parameters (x, y, m, rz) only causes approx-imately 2-D transformations (i.e., in-plane 2-D rigid-body + scaling), wedefine a canonical form for DRRs where these 4 parameters are normal-ized by the corresponding 2-D transformation. In particular, the canoni-cal DRR is extracted from a region of interest (ROI) in the original DRR,and is re-sampled to a pre-defined size. The ROI is centered at (x, y), withan orientation of rz and a size (m · w, m · h), where w and h are heuris-tically selected width and height. With the above ROI, the effects of the4 geometry-irrelevant parameters are already normalized by the position,size and orientation of the ROI, and the canonical DRR is only affected bythe 2 geometry-relevant parameters (as shown in Fig. 3.6). We denote themapping from an original DRR to a canonical DRR asXc(rx, ry) = Hϑ ◦ X(ϑ), (3.14)where ϑ denotes the 6 transformation parameters. The ROI extraction op-erator HT is uniquely determined by the 4 geometry-irrelevant parameters.Correspondingly, reconstruction of an original DRR from its canonical DRRcan be calculated using the inverse mapping H−1T :X(ϑ) = H−1ϑ ◦ Xc(rx, ry). (3.15)With the Hϑ and H−1ϑ , we only need to pre-compute and store canonicalDRRs with rx and ry spanning the parameter space in the DRR library.For a given T, the DRR can be generated by applying Eqn. (3.15) on thecorresponding canonical DRR retrieved from the DRR library, referred toas Library-based DRR (libDRR). The reconstruction of DRRs from canonicalDRRs is very computationally efficient and is fully parallelizable becausethe operation is simply an image interpolation. Since only finite numberof combinations of rx and ry can be stored in the canonical DRR library,libDRR can only be produced with discrete rx and ry that the library wasgenerated for. Nevertheless, since only 2 parameters need to be covered in62Figure 3.6: Extraction of canonical DRR from orginal DRRs. By placingthe ROI according to geometry-irrelevant parameters, DRRs withthe same geometry-relevant parameter but different geometry-irrelevant parameters have the same canonical DRR.the library, we could choose a relatively dense grid that meets the accuracyrequirement in many medical applications. In our experiments, we gen-erate the DRR library with rx and ry of integer values (i.e. step size of 1degree).3.4.3 Encoding of Canonical DRRA canonical DRR with a size of 200×200 pixels requires ∼40 kB if storedin 8 bits. Storing a library with 360×360 (i.e., 129,600) canonical DRRs re-quires 4.94 GB. Since the whole library needs to be preloaded into RAM forfast access during registration, the large memory footprint could be cum-bersome in clinical setups. In order to manage the memory footprint, weencode the canonical DRRs to preserve only the most critical informationfor calculating GC, the similarity measure used in our method. GC is de-fined as the CC of the gradients of the two images:GC(I1, I2) =12CC(∂I1∂x,∂I2∂x)+12CC(∂I1∂y,∂I2∂y), (3.16)63Figure 3.7: 2-D/3-D registration using libDRR.where CC(·, ·) denotes the CC of the two inputs. Since GC is largely domi-nated by a small number of image gradients with high magnitudes, we canapproximate it by eliminating the gradients in the DRR with magnitudeslower than a threshold. We empirically found that setting the threshold tobe 90% percentile of gradient magnitudes in the DRR causes a neglectableimpact on the behavior of 2-D/3-D registration using GC. Therefore, in thecanonical DRR library, instead of storing raw images, we store two gradi-ent images of each canonical DRR thresholded by 90% percentile of gra-dient magnitudes. Such images have 90% zero-valued elements that arelargely consecutive. We then encode the images by using an integer to rep-resent the number of upcoming consecutive zeros. This strategy can reducethe memory consumption for each canonical DRR by 70∼80%, leading to1∼1.5 GB memory footprint to cover full 360×360 degrees of rx and ry.3.5 2-D/3-D Registration using Library-based DRRThe library of encoded canonical DRRs allow very efficient DRR generationduring 2-D/3-D registration. In this section we will show how to apply thelibDRR in a hierarchical registration scheme to achieve 2-D/3-D registra-tion with a real-time performance. The registration scheme consists of twosequential steps:1. 3-DoF registration for estimating (x, y, rz).2. 6-DoF registration for estimating (x, y, m, rx, ry, rz).Among the 6 transformation parameters, the in-plane translation (x, y) androtation rz parameters cause dominant yet relatively simple 2-D transfor-mation, while the other parameters cause more subtle and/or complextransformations. Therefore, we perform the 3-DoF registration in the first64step to estimate the (x, y, rx), and the 6-DoF registration in the second stepto estimate (m, rx, ry) and fine-tune (x, y, rx) to achieve a high accuracy. Inorder to provide timely and accurate 6-DoF pose estimation, the registra-tion needs to be performed with a speed higher than the fluoroscopic framerate, which is 10∼15 frames per second. We use libDRR for registration toavoid the computationally expensive online DRR rendering, and hence toachieve a high registration speed. We use GC as the similarity measure forboth registration steps, as its effectiveness has been tested in a wide rangeof 2-D/3-D registration problems.The workflow of the proposed 6-DoF pose recovery framework isshown in Fig. 3.7. Both the 3-DoF and the 6-DoF registration steps aresolved by optimizing the target function (i.e, similarity measure) over theparameters to be estimated. Because libDRRs can only be generated withdiscrete (rx, ry), during the optimization processes for both the 3-DoF andthe 6-DoF registration steps, the target function can also only be evalu-ated with discrete (rx, ry). Since the 3-DoF registration only estimates threegeometry-irrelevant parameters (tx, ty, rz), we round the initial (rx, ry) todiscrete values, and solve the 3-DoF registration using continuous opti-mization methods. Since the 6-DoF registration estimates all parameters,a hybrid optimization is needed, i.e., optimizing over continuous space for(tx, ty, m, rz) and discrete space for (rx, ry). The hybrid optimization canbe achieved by adapting existing continuous heuristic optimization meth-ods. In principle, the search policy of the continuous optimization methodneeds to be adapted so that it only searches on points with discrete (rx, ry).We demonstrate adaptations for hybrid optimization on 3 well-studiedderivative-free optimization methods, i.e., Hill Climbing, Nelder-Meadand BOBYQA. The first two methods, Hill Climbing and Nelder-Mead,have already been widely applied for image registration problems. Thelast method, BOBYQA, which is the state-of-the-art trust-region optimiza-tion method, has not yet been reported for applications on solving 2-D/3-D registration. We will show later that BOBYQA significantly outperformsthe other two optimizers in terms of convergence speed.653.5.1 Hill ClimbingThe Hill Climbing algorithm [85] starts with initial parameters ϑ and initialstep sizes ∆ϑ. In every iteration, 2× N neighbors of the current ϑ obtainedby changing a single parameter in ϑ by ±∆ϑ at a time are evaluated, whereN is the length of ϑ (i.e., 6). If a better solution is found, the current ϑmoves to the best neighbor. Otherwise, the step sizes ∆ϑ are reduced byhalf. This process repeats until the convergence criteria are met. To extendthe Hill Climbing algorithm for hybrid optimization, we first round theinitial rx and ry so that the optimization start with integer values. Then weset the initial step sizes ∆rx and ∆ry to be a power of 2 (i.e., 4 degrees in ourexperiment). During the optimization, the step sizes ∆rx and ∆ry will notbe further halved once reaching 1 degree, to ensure the target function isonly evaluated with integer-valued rx and ry.3.5.2 Nelder-MeadThe Nelder-Mead algorithm [86] starts with an initial simplex in the pa-rameter space, denoted as {ϑi}N+1i=1 , and iteratively modifies it toward abetter solution of Eqn. (3.6) by performing the following 5 operations: 1)reflection, 2) expansion, 3) outside contraction, 4) inside contraction and 5)shrinking. To customize the Nelder-Mead algorithm for optimizing overdiscrete rx and ry, we apply rounding of rx and ry for all vertices in theinitial simplex, as well as for all outputs from the 5 operators during theoptimization process. By applying the rounding operator in all steps, theNelder-Mead algorithm only evaluates the target function with integer-valued rx and ry.3.5.3 BOBYQAAlthough BOBYQA [87] is designed for continuous optimization problems,the structure of the algorithm allows for hybrid optimization. BOBYQAoptimizes the target function by utilizing a surrogate quadratic functionthat is iteratively improved and minimized using a trust region approach.The initial quadratic function is constructed using m samples, {ϑ1, . . . , ϑm},generated by perturbing the initial parameters. The initial quadratic func-66Figure 3.8: Example data used for validation, including 5 fluoroscopicimages and a 3-D model of the object.tion, Q1(·), is built to minimize the approximation error: ∑mj=1 ‖Q1(ϑ j)−f (ϑ j)‖2. We only generate integer-valued perturbations of rx and ry in{ϑ1, . . . , ϑm}, so that the initial quadratic can be built using exclusively lib-DRR. In later iterations, the trust-region optimization step yields the mini-mizer of the quadratic function, and the sample with the worst target func-tion value is replaced by the new point. Since it is not guaranteed that rxand ry generated by the minimizer are integer-valued, we round the gener-ated rx and ry in all iterations, so that the target function is only evaluatedwith integer-valued rx and ry.3.6 Experiment SetupWe evaluated the proposed method on a potential clinical application,i.e., recovering 6-DoF pose of the transesophageal echocardiography (TEE)probe from X-ray images, which enables the fusion of X-ray and TEE im-ages [90]. We used 5 fluoroscopic sequences with in total 246 X-ray imagesacquired during an animal study using a Siemens Artis Zeego C-Arm sys-tem. The size of the X-ray images is 1024×1024 with a pixel spacing of0.154 mm. A micro-CT scan of the Siemens TEE probe was used for regis-tration. Examples of the 2-D and 3-D data are shown in Fig. 3.8. The groundtruth transformation parameters for the X-ray images were manually anno-tated by experts. For each X-ray image, a perturbation of the ground truthparameters was generated to create a mis-registration for the evaluated 2-67D/3-D registration method to recover. The perturbation was chosen froma zero-mean uniform distribution over ranges of 3.0 mm, 3.0 mm, 0.004,10◦, 10◦, 10◦ for the parameters x, y, m, rz, rx, ry, respectively. Given thatthe source to object distance is roughly 750 mm, m = 0.004 is equivalent toapproximately 3 mm in translation along axis e′z. We on purposely madethe perturbation large enough to cover the possible motion between twoneighboring frames, so that our experiment tests the evaluated method’scapability of 6-DoF tracking.The registration accuracy was assessed with the Mean Target Regis-tration Error in the Projection Direction (mTREproj) [91], calculated atthe 4 corners of the ultrasound cone at 60 mm depth. Following [92],mTREproj<2.5 mm was regarded as the criteria for successful registration.We calculated the mean mTREproj among all successful cases as a mea-surement of the accuracy of the algorithm. We calculated the success rateover all tested cases as a measurement of the robustness of the algorithm.We also recorded the time spent per registration as a measurement of thecomputational efficiency.The experiments were conducted on a workstation with an Intel Corei7-4790k CPU, 16GB RAM and a Nvidia GeForce GTX 980 GPU. For allevaluated methods, rcDRR generation (if needed) was implemented usinghardware-accelerated 3-D texture lookups on GPU. Other parts of the im-plementation, including the generation of libDRR and the computation ofsimilarity measures were implemented on CPU and executed on a singleCPU core.3.7 Results3.7.1 Comparison with Other DRR Generation MethodsWe first performed the 2-step 2-D/3-D registration using GC calculated onboth conventional rcDRRs and proposed libDRRs to compare their robust-ness, accuracy and speed. The effectiveness of 2-D/3-D registration usingGC and rcDRRs has been demonstrated in many applications, and it is stillbeing reported as the state-of-the-art method in recent literature [28]. We68Figure 3.9: Comparison of DRR generation methods. DRR-GC usesGC calculated on rcDRRs. ucDRR-tGC uses tGC calculated onlibDRR computed using the three axes of the X-ray detector(ex, ey, ez). libDRR-tGC is the proposed method.also performed registration using uncorrected libDRR (uclibDRR), i.e., lib-DRRs computed using the three axes of the X-ray detector (ex, ey, ez), asanother comparison. For libDRR-based methods, the GC was calculatedwith thresholded gradient, and therefore is referred to as tGC, thresholdedGC). All evaluated DRR generation methods were combined with eachof the three optimizers, i.e., Hill Climbing (HC), Nelder-Mead (NM) andBOBYQA (BO), for 2-D/3-D registration. To evaluate libDRR and uclib-DRR, the adaptations of HC, NM and BO for hybrid optimization wereused. To evaluate DRR, the original continuous optimization methods wereused.Results comparing different DRR generation methods are summarizedin Fig. 3.9. It is shown that registrations using rcDRR and libDRR achievedalmost the same success rate. In particular, libDRR achieved success ratesof 94.7%, 96.8% and 97.8% with HC, NM and BO, while rcDRR achieved94.7%, 96.8% and 98.9%, respectively. These results suggest that the useof libDRR with hybrid optimization causes little degradation in robustnesscompared to rcDRR and continuous optimization. On the other hand, theused of uclibDRR significantly degrades the success rate, suggesting thatcorrecting the out-of-plane rotation effects introduced by in-plane transla-tion and rotation is critical for the success of libDRR. This is because theappearance of uclibDRR could be systematically biased due to the out-of-plane rotation effects introduced by in-plane translation and/or rotation,69which can be up to 5 degrees if the object is near the boundary of the X-ray detector. Therefore, the similarity measure calculated using uclibDRRscould be significantly different from that using rcDRRs, leading to inac-curate registration results. Among successful cases, the mean mTREprojsachieved by libDRR-based methods are slightly higher than their rcDRR-based counterparts. The degradation in mean mTREprojs is mainly be-cause libDRR-based methods only allow integer-valued rx and ry, whichlead to inherited quantization errors. However, the accuracy achieved bylibDRR-based methods is still clinically relevant and significantly higherthan other real-time 2-D/3-D registration methods, as will be shown in thenext experiment.Fig. 3.9 also shows that methods using libDRRs achieved significantlyhigher registration speed compared to methods using rcDRRs. All threercDRR-based methods resulted in a registration frame rate lower than 1Frames Per Second (FPS), while the three libDRR-based methods resultedin a speed of 7.1 to 15.4 FPS. Therefore,∼10 times speedup can be achievedusing libDRR-based methods compared to DRR-based methods. Amongthe three libDRR-based methods, libDRR-tGC-BO achieved the highest effi-ciency, demonstrating the significant advantage of BOBYQA in fast conver-gence compared to the other two optimization methods. libDRR-tGC-BOachieved a registration frame rate above 15 FPS (i.e., 15.4 FPS), which is theupper-bound of fluoroscopy frame rate for most medical applications, andtherefore is suitable for 6-DoF pose recovery for a wide range of real-timemedical applications.3.7.2 Comparison with Other Accelerated 2-D/3-D RegistrationMethodsWe conducted another experiment to compare the robustness and accuracyof the proposed method with two accelerated intensity-based 2-D/3-D reg-istration methods, Sparse Histogramming MI (SHMI) [33] and Direct Splat-ting Correlation (DSC) [36]. These two methods are able to achieve fast2-D/3-D registration because they use sub-sampled [33] and splatting [36]DRRs to quickly compute an approximation of MI and CC, respectively. Be-70Figure 3.10: Comparison with the state-of-the-art real-time 2-D/3-Dregistration methods. rcDRR-MI and rcDRR-CC represent thehighest accuracies that the two state-of-the-art methods [33] and[36] can possibly achieve. libDRR-tGC is the proposed method.cause of the approximation, SHMI and DSC theoretically achieve the sameor degraded accuracy compared to using original MI and CC. In our exper-iment, we evaluated the same 2-step 2-D/3-D registration scheme usingthe three continuous optimization methods with MI and CC calculated onrcDRRs without approximation, referred to as rcDRR-MI and rcDRR-CC.These two compared methods show the upper bound of the robustnessand accuracy that SHMI and DSC are able to achieve.Fig. 3.10 summarizes the robustnesses and accuracies of rcDRR-MI, rcDRR-CC and the proposed method. It is shown that the pro-posed method constantly and significantly outperforms the rcDRR-MI andrcDRR-CC in both success rate and mean mTREproj using all three opti-mizers. In particular, the proposed method was able to achieve a successrate above 97% with any of the three optimizers, while in most scenarios,rcDRR-MI and rcDRR-CC achieved success rate lower than 85%. The pro-posed method constantly achieved a mean mTREproj lower than 1 mm,while rcDRR-MI and rcDRR-CC achieved mean mTREprojs between 1.36mm and 1.54 mm. The significant improvement is because of the high sen-sitivity of gradient-based similarity measures, (e.g., GC and tGC) to smallmis-registration, which enables them to achieve very high registration ac-curacy. Since the fast DRR and similarity measure approximation in SHMIand DSC are designed specifically for MI and CC, the achievable registra-71tion accuracy of these two methods are limited by the sensitivity of MI andCC themselves. In terms of computational efficiency, while all three meth-ods are capable of real-time registration, with an efficient GPU implemen-tation, DSC reported the highest registration speed (i.e., 23.6∼92.3 FPS).3.7.3 Validation of 6-DoF TrackingWe also conducted an experiment to validate the proposed method in areal 6-DoF tracking scenario. In this experiment, fluoroscopic frames weresupplied at a frame rate of 15 FPS. A rough alignment was manually pro-vided for the first frame of every fluoroscopic sequence, and the following6-DoF tracking mechanism was employed to obtain registration for all theremaining frames. To apply the proposed 2-D/3-D registration method fortracking, we repeatedly perform 2-D/3-D registration on incoming fluo-roscopic images. Once a registration is finished, a new registration willbe performed on the next incoming frame using the result of the previousregistration as initial parameters. If new frames come in before the previ-ous registration is finished, these frames will be skipped. The result of themost recent registration was used for computing mTREprojs for skippedframes. Since methods using rcDRRs are not fast enough for 6-DoF track-ing, we only validated the proposed method using libDRRs with the threeoptimization methods.Results of 6-DoF tracking are summarized in Table 3.1. It is shownthat, with all three optimization methods, the proposed method is able torobustly track 6-DoF pose from fluoroscopic sequence and achieve 100%success rate. The best performance of the proposed method was achievedusing BOBYQA, in terms of all three metrics, i.e., success rate, mean mTRE-proj and registration speed. The registration frame rate achieved usingBOBYQA is over twice of that achieved using the other two optimizationmethods, which is owing to the advantage of BOBYQA in fast convergence.BOBYQA also led to the lowest mean mTREproj among the three optimiza-tion methods, mainly because of its high registration frame rate, which al-lows it to perform 2-D/3-D registration more frequently and reduces thelag in the registration results.72Table 3.1: Results of 6-DoF Tracking.Method Success Rate Mean mTREproj (mm) Speed (FPS)libDRR-tGC-HC 100% 0.95 10.9libDRR-tGC-NM 100% 1.09 9.4libDRR-tGC-BO 100% 0.81 23.1Methods using rcDRRs are not evaluated because their speed is notsufficient for 6-DoF tracking.It is also worth noting that the registration frame rate achieved in 6-DoF tracking is in general higher than that achieved by the same methodin 6-DoF pose recovery from perturbation. In particular, libDRR-tGC-BOachieved 15.4 FPS in 6-DoF pose recovery from perturbation, and 23.1 FPSin 6-DoF tracking. This is mainly because the motion between neighboringframes to be captured by the 6-DoF tracking is smaller than the perturba-tion applied for 6-DoF pose recovery. With a smaller initial registrationerror, the optimizer requires fewer iterations (function evaluations) in opti-mization, which leads to improved registration frame rate.3.8 DiscussionIn this chapter, we introduced a real-time 2-D/3-D registration using lib-DRR. In order to achieve real-time performance, instead of rendering DRRsonline, a library of canonical DRRs is pre-computed for fast libDRR gener-ation during registration. Since libDRRs can only by produced with dis-crete geometric-relevant parameters, we extended three continuous opti-mization methods for hybrid optimization in order to perform 2-D/3-Dregistration using libDRR.The key strategy of the proposed method is to pre-generate DRRs toavoid the computationally expensive online DRR generation during 2-D/3-D registration. Since rigid-body transformations have 6 DoF, it is imprac-tical to pre-generate DRRs to densely cover the 6 dimensional parameterspace. We’ve shown that the decomposition of the 6 transformation param-eters into two geometry-relevant and four geometry-irrelevant parameters73allows reconstruction of DRRs with arbitrary geometry-irrelevant param-eters from a canonical DRR with the same geometry-relevant parameters.The reconstruction of DRRs from canonical DRRs is very computationallyefficient, because it only involves 2-D image transformation and interpo-lation. Since the canonical DRRs only have 2 DoF, it is more affordable topre-generate and store them with a relatively dense coverage in their 2-Dparameter space. In our experiments, we pre-generated canonical DRRswith 1 degree interval in the two geometry-relevant parameters, whichleads to a library of 360 × 360 = 129600 canonical DRRs. In fact, this isstill a relatively large number of images, considering that they need to beall pre-loaded into RAM for fast access during 2-D/3-D registration. With-out data compression, the library of canonical DRRs with a size of 200×200pixels takes about 5 GB. Since large RAM (e.g., above 32 GB) becomes veryaffordable in recent years, the requirement for 5 GB RAM consumption ismanageable. In the scenarios where such large RAM consumption is notaffordable, e.g., running 2-D/3-D registration on devices with small RAM,data compression techniques can be applied to reduce the memory foot-print of the proposed method. All standard image compression techniquescan be applied to compress the canonical DRRs, but it worth noting thatthe data decompression needs to be performed during 2-D/3-D registra-tion and hence increases the computation load. Therefore, image compres-sion methods with efficient decompression computation are preferred. Inaddition, if the image similarity measure used in 2-D/3-D registration isknown, the canonical DRRs can also be encoded specifically for the targetsimilarity measure, to both reduce memory footprint and offload the cal-culation for similarity measure to the pre-generation stage. In this chapter,we gave an example of encoding the images’ gradient in the library for thecalculation of GC.The proposed libDRR-based methods achieved ∼10 times speedupcompared to their rcDRR-based counterparts, mainly because of the fastgeneration of libDRR. In our experiment environment, CPU-based im-plementation of libDRR generation achieved 0.76 ms / generation, whilethe GPU-based implementation of rcDRR resulted in 6.4 ms / generation.74With libDRR and BOBYQA, the 2-D/3-D registration can be accomplishedwithin 40 to 100 ms, resulting in a registration speed of 10 to 25 FPS. It isworth noting that the above speed is achieved with our purely CPU-basedimplementation without any parallelization. Since both the libDRR gen-eration and the similarity measure calculation are highly parallelizable bynature, significant speedup (e.g., >5 times) of the proposed method can beachieved simply by an implementation with parallelized computation onGPU.Compared to methods using rcDRRs, the proposed method using lib-DRRs achieved significant improvement in computational efficiency, with-out degradation in the registration success rate. It suggests that the pro-posed hybrid optimization schemes retain the strength of their continu-ous counterparts in finding the correct parameters to maximize the energyfunction. The degradation in mean mTREprojs using libDRR is more no-ticeable, i.e., 9.1∼16.2%, which is expected because two dimensions of theoptimization space are constrained to be discrete using libDRR. Neverthe-less, the proposed libDRR-based methods still retained the high registra-tion accuracy of intensity-based 2-D/3-D registration, and resulted in sig-nificantly higher accuracy than other real-time 6-DoF pose recovery meth-ods as demonstrated by our experiments. This is owing to our hybrid opti-mization scheme, i.e. continuous in the majority (i.e., 4) of the optimizationdimensions, as well as a dense sampling (i.e., 1 degree) of the remaining 2optimization dimensions.We’ve also shown that the proposed method achieves the best perfor-mance using the adapted BOBYQA for hybrid optimization. In partic-ular, the registration frame rate achieved by BOBYQA is 1.5∼2 times ofthat achieved by Hill Climbing and Nelder-Mead. To achieve similar accu-racy, BOBYQA only requires 50∼60% of function evaluations required bythe other two optimization methods. It demonstrates that BOBYQA has asignificant advantage in rapid convergence, and therefore it is the recom-mended optimization method to be used with the proposed method.75Chapter 4Rigid-body Object MotionEstimation: Real-time 2D/3DRegistration via CNNRegression4.1 OverviewIn this chapter, we introduce a new paradigm for 2-D/3-D registration, re-ferred to Pose Estimation via Hierarchical Learning (PEHL), which usesCNN to achieve real-time 2-D/3-D registration with a large capture rangeand high accuracy. The work described in this chapter has been publishedin [7]. The key of this approach is to train CNN regressors to recover themapping from the DRR and fluoroscopic images to the difference of theirunderlying transformation parameters. Such mapping is highly complexand training regressors to recover the mapping is far from being trivial. Inthe proposed method, we achieve this by first simplifying the non-linear re-lationship using the following three algorithmic strategies and then captur-ing the mapping using CNN regressors with a strong non-linear modelingcapability.76• Local Image Residual (LIR): To simplify the underlying mapping to becaptured by the regressors, we introduce an LIR feature for regres-sion, which is approximately 3-D pose-indexed, i.e., it is only affectedby the difference between the initial and ground truth transformationparameters.• Parameter Spacing Partitioning (PSP): We partition the transformationparameter space into zones and train CNN regressors in each zoneseparately, to break down the complex regression task into multiplesimpler sub-tasks.• Hierarachical Parameter Regression (HPR): We decompose the trans-formation parameters and regress them in a hierarchical manner, toachieve highly accurate parameter estimation in each step.The remainder of this chapter is organized as follows. Section 3.2 pro-vides backgrounds of 2-D/3-D registration. Section 4.2 discusses formula-tions of 2-D/3-D registration and describe the proposed regression-basedformulation. Section 4.3 presents the proposed PEHL approach. Section 4.4describes the validation datasets, evaluation metrics and experiment con-figurations. Section 4.5 presents the experimental results. Section 4.6 dis-cusses the strength and limitations of the proposed CNN regression-based2-D/3-D registration approach.4.2 Problem Formulation4.2.1 Revisit Intensity-based 2-D/3-D RegistrationRevisiting the formulation of intensity-based 2-D/3-D registration revealsa fundamental shortcoming of this formulation that leads to its low com-putational efficiency and robustness. As shown in Fig. 4.1, intensity-based2-D/3-D registration formulates the problem as an optimization problems,where the transformation parameters are optimized to maximize the simi-larity (measured by a matching metric like MI) of the DRR and fluoroscopicimage. In this formulation, the information provided by the DRR and flu-oroscopic image are compressed to a 1-D signal indicating the similarity of77X-ray sourceSimilarityFluoroDRRDRR6D Pose ParameterOpmizer1D Matching MetricFigure 4.1: Formulation of intensity-based 2-D/3-D registration.the two images, while other higher dimensional information from the im-ages is not exploited at all. The unexploited information includes the ap-pearance of the mismatch, the direction of the offset and etc., which couldpotentially provide an indication on how the pose should be adjusted toalign the two images. In addition, the matching metrics employed are of-ten not very effective in measuring the alignment of the target object in thetwo images, as they are typically highly non-convex (i.e., have local max-ima in false positions), which makes the optimization of it a challengingtask. Due to the above shortcomings, intensity-based 2-D/3-D registrationhas an ineffective formulation that leads to its low computational efficiency(iterative optimization over a 1-D metric) and small capture range (opti-mization over a highly non-convex metric).4.2.2 Regression-based 2-D/3-D RegistrationGiven the ineffectiveness of intensity-based formulation of 2-D/3-D reg-istration described above, we propose a more effective regression-basedformulation, where a CNN regressor takes the DRR and fluoroscopic im-age as input, and produces an update of the transformation parameters.Instead of converting the two images to a 1-D metric, the regression-basedformulation takes advantage of the CNN to learn representations from theimage intensities, and reveal the mapping from the representations to the78Figure 4.2: Effects of the 6 rigid-body transformation parameters.parameter updates. Using this formulation, the information in the DRRand fluoroscopic image can be fully exploited to guide the update of thetransformation. The regression-based formulation is mathematically de-scribed as follows.A rigid-body 3-D transformation T can be parameterized by a vectort with 6 components. In our approach, the transformation is parameter-ized by 3 in-plane and 3 out-of-plane transformation parameters [92], asshown in Fig. 4.2. In particular, in-plane transformation parameters in-clude 2 translation parameters, tx and ty, and 1 rotation parameter, tθ . Theeffects of in-plane transformation parameters are approximately 2-D rigid-body transformations. Out-of-plane transformation parameters include 1out-of-plane translation parameter, tz, and 2 out-of-plane rotation param-eters, tα and tβ. The effects of out-of-plane translation and rotations arescaling and shape changes, respectively.Based on Eqn. (3.3), we denote the X-ray image with transformationparameters t as It, where the variables L and J are omitted for simplicitybecause they are non-varying for a given 2-D/3-D registration task. Theinputs for 2-D/3-D registration are: 1) a 3-D object described by its X-rayattenuation map J, 2) an X-ray image Itgt , where tgt denotes the unknownground truth transformation parameters, and 3) initial transformation pa-rameters tini. The 2-D/3-D registration problem is formulated as a regres-sion problem, where a set of regressors f (·) are trained to reveal the map-ping from a feature X(tini, Itgt) extracted from the inputs to the parameter79residuals, tgt − tini, as long as it is within a capture range e:tgt − tini ≈ f(X(tini, Itgt)), ∀tgt − tini ∈ e. (4.1)An estimation of tgt is then obtained by applying the regressors and incor-porating the estimated parameter residuals into tini:tˆgt = tini + f(X(tini, Itgt)). (4.2)It is worth noting that the range e in Eqn. (4.1) is equivalent to the capturerange of optimization-based registration methods. Based on Eqn. (4.1), ourproblem formulation can be expressed as designing a feature extractor X(·)and training regressors f (·), such thatδt ≈ f (X(t, It+δt)), ∀δt ∈ e. (4.3)In the next section, we will discuss in detail 1) how the feature X(t, It+δt) iscalculated and 2) how the regressors f (·) are designed, trained and applied.4.3 Pose Estimation via Hierarchical Learning4.3.1 Parameter Space PartitioningWe aim at training regressors to recover the mapping from the featureX(t, It+δt) to the parameter residuals δt. Since the feature naturally de-pends on t, the target mapping could vary significantly as t changes, whichmakes it highly complex and difficult to be accurately recovered. Ideally,we would like to extract a feature that is sensitive to the parameter residu-als δt, and is insensitive to the parameters t. Such features are referred toas pose-index features, and the property can be expressed as:X(t1, It1+δt) ≈ X(t2, It2+δt) ∀(t1, t2). (4.4)As we will show in Section 4.3.2, we use ROIs to make X(t, It+δt) in-variant to the in-plane and scaling parameters, (tx, ty, tz, tθ). However, weare unable to make X(t, It+δt) insensitive to tα and tβ, because they cause80complex appearance changes in the projection image. To solve this prob-lem, we partition the parameter space spanned by tα and tβ into a 18×18grid (empirically selected in our experiment). Each square in the grid cov-ers a 20×20 degrees area, and is referred to as a zone. We will show inSection 4.3.2 that for (tα, tβ) within each zone, our LIR feature is approxi-mately pose-indexed, i.e.,Xk(t1, It1+δt) ≈ Xk(t2, It2+δt) ∀(t1, t2) ∈ Ωk, (4.5)where Xk(·, ·) denotes the LIR feature extractor for the k-th zone, and Ωkdenotes the area covered by the k-th zone. The regressors therefore aretrained separately for each zone to recover the simplified mapping that isinsensitive to t.4.3.2 Local Image ResidualCalculation of Local Image ResidualThe LIR feature is calculated as the difference between the DRR renderedusing transformation parameters t, denoted by It, and the X-ray image It+δtin local patches. To determine the locations, sizes and orientations of thelocal patches, a number of 3-D points are extracted from the 3-D model ofthe target object following the steps described in Section 4.3.2. Given a 3-Dpoint p and parameters t, a square local ROI is uniquely determined in the2-D imaging plane, which can be described by a triplet, (q, w, φ), denotingthe ROI’s center, width and orientation, respectively. The center q is the2-D projection of p using transformation parameters t. The width w = w0 ·D/tz, where w0 is the size of the ROI in mm and D is the distance betweenthe X-ray source and detector. The orientation φ = tθ , so that it is alwaysaligned with the object. We define an operator Htp(·) that extracts the imagepatch in the ROI determined by p and t , and re-sample it to a fixed size(52×52 in our applications). Given N 3-D points, P = {p1, ..., pN}, the LIRfeature is then computed asX(t, It+δt,P) ={Htpi(It)− Htpi(It+δt)}i=1,...,N . (4.6)81Figure 4.3: Workflow of LIR feature extraction, demonstrated on X-rayEcho Fusion data. The local ROIs determined by the 3-D pointsP and the transformation parameters t are shown as red boxes.The blue box shows a large ROI that covers the entire object, usedin compared methods as will be discussed in Section 4.4.4.In a local area of It, the effect of varying tα and tβ within a zone is ap-proximately 2-D translation. Therefore, by extracting local patches fromROIs selected based on t, the effects of all 6 transformation parameters int are compensated, making Htp(It) approximately invariant to t. Since thedifference between Htp(It+δt) and Htp(It) is merely additional 2-D transfor-mation caused by δt, Htp(It+δt) is also approximately invariant to t. Theworkflow of LIR feature extraction is shown in Fig. 4.3.82Extraction of 3-D PointsThe 3-D points used for calculating the LIR feature are extracted separatelyfor each zone in two steps. First, 3-D points that correspond to 2-D edgesare extracted as candidates. Specifically, the candidates are extracted bythresholding pixels with high gradient magnitudes in a synthetic X-ray im-age (i.e., generated using DRR) with tα and tβ at the center of the zone,and then back-projecting them to the corresponding 3-D structures. Theformation model of gradients in X-ray images has been shown in [93] as:g(p) =∫η(L(p, r))dr, (4.7)where g(p) is the magnitude of the X-ray image gradient at the point p,and η(·) can be computed from µ(·) and the X-ray perspective geometry[93]. We back-project p to L(p, r0), wherer0 = arg maxrL(p, r), (4.8)if ∫ r0+σr0−ση(L(p, r))dr ≥ 0.9 · g(p). (4.9)The condition in Eqn. (4.9) ensures that the 3-D structure around L(p, r0)“essentially generates” the 2-D gradient g(p), because the contribution ofη(·) within a small neighborhood (i.e., σ = 2 mm) of L(p, r0) leads to themajority (i.e.,≥ 90%) of the magnitude of g(p). In other words, we find thedominant 3-D structure corresponding to the gradient in the X-ray image.Second, the candidates are filtered so that only the ones leading to LIRsatisfying Eqn. (4.4) and also not significantly overlapped are kept. Toachieve this, we randomly generate {t j}Mj=1 with tα and tβ within the zoneand {δtk}Mk=1 within the capture range e (M = 1000 in our applications).The intensity of the n-th pixel of Ht jpi(It j)− Ht jpi(It j+δtk) is denoted as hn,i,j,k.83Algorithm 2 PEHL: Application Stage1: procedure REGISTER(t, I, k)2: repeat3: Retrieve P for the zone covering (tα, tβ)4: Retrieve f (·) for the zone covering (tα, tβ)5: Calculate X(t, It+δt,P) . Eqn. (4.6)6: t{x,y,θ} ← t{x,y,θ} + f {x,y,θ}(X)7: Calculate X(t, It+δt,P) . Eqn. (4.6)8: t{α,β} ← t{α,β} + f {α,β}(X)9: Calculate X(t, It+δt,P) . Eqn. (4.6)10: tz ← tz + fz(X)11: until reaching k iterations12: return tThe following two measurements are computed for all candidates:Ei =〈(hn,i,j,k − 〈hn,i,j,k〉j)2〉n,j,k, (4.10)Fi =〈(hn,i,j,k − 〈hn,i,j,k〉k)2〉n,j,k, (4.11)where 〈·〉 is an average operator with respect to all indexes in the subscript.Since Ei and Fi measure the sensitivity of Htpi(It)− Htpi(It+δt) with respectto t and δt, respectively, an ideal LIR should have a small Ei to satisfyEqn. (4.4) and a large Fi for regressing δt. Therefore, the candidate list isfiltered by picking the candidate with the largest Fi/Ei in the list, and thenremoving other candidates with ROIs that have more than 25% overlap-ping area. This process repeats until the list is empty.4.3.3 Hierarchical Parameter RegressionInstead of regressing the 6 parameters together, which makes the map-ping to be regressed more complex as multiple confounding factors areinvolved, we divide them into the following 3 groups, and regress themhierarchically:• Group 1: In-plane parameters: δtx, δty, δtθ84• Group 2: Out-of-plane rotation parameters: δtα, δtβ• Group 3: Out-of-plane translation parameter: δtzAmong the 3 groups, the parameters in Group 1 are considered to be theeasiest to be estimated, because they cause simple while dominant rigid-body 2-D transformations of the object in the projection image that are lessaffected by the variations of the parameters in the other two groups. Theparameter in Group 3 is the most difficult one to be estimated, because itonly causes subtle scaling of the object in the projection image. The diffi-culty in estimating parameters in Group 2 falls in-between. Therefore weregress the 3 groups of parameters sequentially, from the easiest group tothe most difficult one. After a group of parameters are regressed, the fea-ture X(t, It+δt) is re-calculated using the already-estimated parameters forthe regression of the parameters in the next group. This way the mappingto te regressed for each group is simplified by limiting the dimension andremoving the compounding factors coming from those parameters in theprevious groups.The above HPR process can be repeated for a few iterations to achievethe optimal accuracy, and the result of the current iteration is used as thestarting position for the next iteration (Algorithm 2). The number of itera-tions can be empirically selected (e.g. 3 times in our application), as will bedescribed in Section CNN Regression ModelIn this section, we introduce a CNN regression model designed for 2-D/3-D registration task. During the last decade, impressive results in computervision task have been achieved using CNN, a feed-forward artificial neu-ral network that mimics the neuron connectivity of the animal visual cor-tex. CNN has a strong capability in modeling highly non-linear mapping,which has been widely recognized and testified on various challengingcomputer vision tasks. For this reason, CNN is particularly suitable for ourregression problem, where the mapping to be revealed is highly complexand non-linear.85There are two challenges in designing the CNN model: 1) it needs to beflexible enough to capture the complex mapping from X(t, It+δt) to δt, and2) it needs to be light-weighted enough to be forwarded in real-time andstored in RAM. Managing the memory footprint is particularly importantbecause regressors for all zones (in total 324) need to be loaded to RAM foroptimal speed. We employ the following CNN regression model to addressthese 2 challenges.Network StructureA CNN [94] regression model with the architecture shown in Fig. 4.4 istrained for each group in each zone. According to Eqn. (4.6), the input ofthe regression model consists of N channels, corresponding to N LIRs. TheCNN shown in Fig. 4.5 is applied on each channel for feature extraction.The CNN consists of five layers, including two 5×5 convolutional layers(C1 and C2), each followed by a 2× 2 max-pooling layers (P1 and P2) withstride 2, and a fully-connected layer (F1) with 100 Rectified Linear Unit(ReLU) activations neurons. The feature vectors extracted from all inputchannels are then concatenated and connected to another fully-connectedlayer (F2) with 250 ReLU activations neurons. The output layer (F3) is fully-connected to F2, with each output node corresponding to one parameter inthe group. Since the N input channels have the same nature, i.e., they areLIRs at different locations, the weights in the N CNNs are shared to reducethe memory footprint by N times.In our experiment, we empirically selected the size of the ROI, whichled to N ≈ 18. Using the CNN model shown in Fig. 4.4 with weight shar-ing, there are in total 660,500 weights for each group in each zone, exclud-ing the output layer, which only has 250 × Nt weights, where Nt is thenumber of parameters in the group. If the weights are stored as 32-bit float,around 2.5 MB is required for each group in each zone. Given 3 groups and324 zones, there are in total 972 CNN regression models and pre-loadingall of them into RAM requires 2.39 GB, which is manageable for moderncomputers.86Figure 4.4: The structure of the CNN regression model.Figure 4.5: Structure of the CNN applied for each input channel.TrainingThe CNN regression models are trained exclusively on synthetic X-ray im-ages, because they provide reliable ground truth labels with little needs onlaborious manual annotation, and the number of real X-ray images couldbe limited. For each group in each zone, we randomly generate 25,000 pairsof t and δt. The parameters t follow a uniform distribution with tα and tβconstrained in the zone. The parameter errors δt also follow a uniform dis-87tribution, while 3 different distribution ranges are used for the 3 groups, asshown in Table 4.1. The distribution ranges of δt for Group 1 are the targetcapture range that the regressors are designed for. The distribution rangesof δtx, δty and δtθ are reduced for Group 2, because they are close to zeroafter the regressors in the first group are applied. For the same reason, thedistribution ranges of δtα and tβ are reduced for Group 3. For each pairof t and δt, a synthetic X-ray image It+δt is generated and the LIR featureX(t, It+δt) is calculated following Eqn. (4.6).The objective function to be minimized during the training is Euclideanloss, defined as:Φ =1KK∑i=1‖yi − f (Xi;W)‖22, (4.12)where K is the number of training samples, yi is the label for the i-th train-ing sample, W is a vector of weights to be learned, f (Xi;W) is the output ofthe regression model parameterized by W on the i-th training sample. Theweights W are learned using Stochastic Gradient Descent (SGD) [94], witha batch size of 64, momentum of m = 0.9 and weight decay of d = 0.0001 .The update rule for W is:Vi+1 := m ·Vi − d · κi ·Wi − κi ·〈∂Φ∂W∣∣∣∣Wi〉Di, (4.13)Wi+1 := Wi +Vi+1, (4.14)where i is the iteration index, V is the momentum variable, κi is the learningrate at the i-th iteration, and〈∂Φ∂W∣∣∣Wi〉Diis the derivative of the objectivefunction computed on the i-th batch Di with respect to W, evaluated at Wi.The learning rate κi is decayed in each iteration followingκi = 0.0025 · (1+ 0.0001 · i)−0.75. (4.15)The derivative ∂Φ∂W is calculated using back-propagation. For weightsshared in multiple paths, their derivatives in all paths are back-propagatedseparately and summed up for weight update. The weights are initialized88using the Xavier method [95], and mini-batch SGD is performed for 12,500iterations (32 epochs).Table 4.1: Distributions of randomly generated δt. U (a, b) denotes theuniform distribution between a and b. The units for translationand rotations are mm and degree, respectively.Group 1 Group 2 Group 3δtx ∼ U (−1.5, 1.5)δty ∼ U (−1.5, 1.5)δtz ∼ U (−15, 15)δtθ ∼ U (−3, 3)δtα ∼ U (−15, 15)δtβ ∼ U (−15, 15)δtx ∼ U (−0.2, 0.2)δty ∼ U (−0.2, 0.2),δtz ∼ U (−15, 15)δtθ ∼ U (−0.5, 0.5)δtα ∼ U (−15, 15)δtβ ∼ U (−15, 15)δtx ∼ U (−0.15, 0.15)δty ∼ U (−0.15, 0.15)δtz ∼ U (−15, 15)δtθ ∼ U (−0.5, 0.5)δtα ∼ U (−0.75, 0.75)δtβ ∼ U (−0.75, 0.75)4.4 Experiment Setup4.4.1 DatasetsWe evaluated PEHL on datasets from the following 3 clinical applicationsto demonstrate its wide applicability for real-time 2-D/3-D registration:1. Total Knee Arthroplasty (TKA) Kinematics: In the study of the kinemat-ics of TKA, 3-D kinematics of knee prosthesis can be estimated bymatching the 3-D model of the knee prosthesis with the fluoroscopicvideo of the prosthesis using 2-D/3-D registration [96]. We evaluatedPEHL on a fluoroscopic video consisting of 100 X-ray images of a pa-tient’s knee joint taken at the phases from full extension to maximumflexion after TKA. The size of the X-ray images is 1024×1024 with apixel spacing of 0.36 mm. A 3-D surface model of the prosthesis wasacquired by a laser scanner, and was converted to a binary volumefor registration.2. Virtual Implant Planning System (VIPS): VIPS is an intraoperativeapplication that was established to facilitate the planning of im-plant placement in terms of orientation, angulation and length of the89screws [11]. In VIPS, 2-D/3-D registration is performed to matchthe 3-D virtual implant with the fluoroscopic image of the real im-plant. We evaluated PEHL on 7 X-ray images of a volar plate implantmounted onto a phantom model of the distal radius. The size of theX-ray images is 1024×1024 with a pixel spacing of 0.223 mm. A 3-DCAD model of the volar plate was converted to a binary volume forregistration.3. X-ray Echo Fusion (XEF) 2-D/3-D registration can be applied to es-timate the 3-D pose of a transesophageal echocardiography (TEE)probe from X-ray images, which brings the X-ray and TEE imagesinto the same coordinate system and enables the fusion of the twomodalities [90]. We evaluated PEHL on 2 fluoroscopic videos within total 94 X-ray images acquired during an animal study using aSiemens Artis Zeego C-Arm system. The size of the X-ray imagesis 1024×1024 with a pixel spacing of 0.154 mm. A micro-CT scan ofthe Siemens TEE probe was used for registration.Example datasets of the above 3 clinical applications are shown in Fig. 4.6.Examples of local ROIs and LIRs extracted from the 3 datasets are alsoshown in Fig. 4.7.Ground truth transformation parameters used for quantifying registra-tion error were generated by first manually registering the target object andthen applying an intensity-based 2-D/3-D registration method using Pow-ell’s method combined with Gradient Correlation (GC) [36]. Perturbationsof the ground truth were then generated as initial transformation param-eters for 2-D/3-D registration. For TKA and XEF, 10 perturbations weregenerated for each X-ray image, leading to 1,000 and 940 test cases, respec-tively. Since the number of X-ray images for VIPS is limited (i.e., 7), 140perturbations were generated for each X-ray image to create 980 test cases.The perturbation for each parameter followed the normal distribution witha standard deviation equal to 2/3 of the training range of the same pa-rameter (i.e., Group 1 in Table 4.1). In particular, the standard deviationsfor (tx, ty, tz, tθ , tα, tβ) are 1 mm, 1 mm, 10 mm, 2 degrees, 10 degrees, 1090(a) TKA (b) VIPS (c) XEFFigure 4.6: Example data, including a 3-D model and a 2-D X-ray im-age of the object.(a) TKA (b) VIPS (c) XEFFigure 4.7: Examples of local ROIs and LIRs.degrees. With this distribution, 42.18% of the perturbations have all 6 pa-rameters within the training range, while the other 57.82% have at least oneparameter outside of the training range.91(a) TKA (b) VIPS (c) XEFFigure 4.8: Example synthetic X-ray images used for training.4.4.2 Synthetic Training Data GenerationThe synthetic X-ray images used for training were generated by blending aDRR of the object with a background from real X-ray images:I = IXray + γ · Gσ ∗ IDRR +N (a, b), (4.16)where IXray is the real X-ray image, IDRR is the DRR, Gσ denotes a Gaussiansmoothing kernel with a standard deviation σ simulating X-ray scatteringeffect, f ∗ g denotes the convolution of f and g, γ is the blending factor, andN (a, b) is a random noise uniformly distributed between [a, b]. The param-eters (γ, σ, a, b) were empirically tuned for each object (i.e., implants andTEE probe) to make the appearance of the synthetic X-ray image realistic.These parameters were also randomly perturbed within a neighborhoodfor each synthetic X-ray image to increase the variation of the appearanceof the synthetic X-ray images, so that the regressors trained on them canbe generalized well on real X-ray images. The background image use for agiven synthetic image was randomly picked from a group of real X-ray im-ages irrespective of the underlying clinical procedures so that the trainednetwork will not be over-fitted for any specific type of background, whichcould vary significantly from case to case clinically. Examples of syntheticX-ray images of TKA, VIPS and XEF are shown in Fig. 4.8.92Figure 4.9: HAAR features used in the experiment “Without CNN”.4.4.3 Evaluation MetricsThe registration accuracy was assessed with the mean Target RegistrationError in the projection direction (mTREproj) [91], calculated at the 8 cornersof the bounding box of the target object. We regard mTREproj less than1% of the size of the target object (i.e. diagonal of the bounding box) asa successful registration. For TKA, VIPS and XEF, the sizes of the targetobjects are 110 mm, 61 mm and 37 mm, respectively. Therefore, the successcriterion for the three applications were set to mTREproj less than 1.10 mm,0.61 mm and 0.37 mm, which are equivalent to 2.8 pixels, 3.7 pixels and3.5 pixels on the X-ray image, respectively. Success rate was defined asthe percentage of successful registrations. Capture range was defined asthe initial mTREproj for which 95% of the registration were successful [91].Capture range is only reported for experiments where there are more than20 samples within the capture range.4.4.4 Performance AnalysisWe conducted the following experiments for detailed analysis of the per-formance and property of PEHL. The dataset from XEF was used for thedemonstration of performance analysis because the structure of the TEEprobe is more complex than the implants in TKA and VIPS, leading to anincreased difficulty for an accurate registration. As described in Section4.3.3, PEHL can be applied for multiple iterations. We demonstrate the im-pact of the number of iterations on performance, by applying PEHL for10 iterations and showing the registration success rate after each iteration.We also demonstrate the importance of the individual core components ofPEHL, i.e., the CNN regression model and 3 algorithmic strategies, LIR,HPR and PSP, by disabling them and demonstrating the detrimental effectson performance. The following 4 scenarios were evaluated for 10 iterations93to compare with PEHL:• Without CNN: We implemented a companion algorithm usingHAAR feature [97] with Regression Forest as an alternative to theproposed CNN regression model. We extract 8 HAAR features asshown in Fig. 4.9 from the same training data used for training theCNNs. We mainly used edge and line features because δt largelycorresponds to lines and edges in LIR. On these HAAR features, wetrained a Regression Forest with 500 trees.• Without LIR: A global image residual covering the whole object wasused as the input for regression (shown in Fig. 4.3 as blue boxes).The CNN regression model was adapted accordingly. It has five hid-den layers: two 5×5 convolutional layers, each followed by a 3×3max-pooling layer with stride 3, and a fully-connected layer with 250ReLU activation neurons. For each group in each zone, the networkwas trained on the same dataset used for training PEHL.• WithoutHPR: The proposed CNN regression model shown in Fig. 4.5was employed, but the output layer has 6 nodes, corresponding tothe 6 parameters for rigid-body transformation. For each zone, thenetwork was trained on the dataset used for training PEHL for Group1.• Without PSP: For each group of parameters, one CNN regressionmodel was applied for the whole parameter space. Because LIP can-not be applied without PSP, the CNN regression model describedin “without LIR” was employed in this scenario. The network wastrained on 500,000 synthetic training samples with tα and tβ uni-formly distributed in the parameter space.We also conducted an experiment to analyze the precision of PEHL (i.e.,the ability to generate consistent results starting from different initial pa-rameters). To measure the precision, we randomly selected an X-ray imagefrom the XEF dataset, and generated 100 perturbations of the ground truth94following the same distribution described in Section 4.4.1. PEHL and thebest performed intensity-based method, MI GC Powell (will be detailed inthe Section 4.4.5) were applied starting from the 100 perturbations. Theprecision of the registration method was then quantified by Root MeanSquared Distance in the Projection Direction (RMSDproj) from the regis-tered locations of each target to their centroid. Smaller RMSDproj indicateshigher precision.4.4.5 Comparison with State-of-the-Art MethodsWe first compare PEHL with several state-of-the-art intensity-based 2-D/3-D registration methods. An intensity-based method consists of two corecomponents, an optimizer and a similarity measure. A recent study com-pared four popular optimizers (Powell’s method, Nelder-Mead, BFGS,CMA-ES) for intensity-based 2-D/3-D registration, and concluded thatPowell’s method achieved the best performance [98]. Therefore, in all eval-uated intensity-based methods, we used Powell’s method as the optimizer.We evaluated three popular similarity measures, MI, CC and GC, whichhave also been reported to be effective in recent literature [27][28][36].For example, MI has been adopted in [27] for monitoring tumor motionduring radiotherapy. CC computed on splatting DRR has been adoptedin [36] for 5 Degree of Freedom pose estimation of TEE probe. GC hasbeen adopted in [28] for the assessing the positioning and migration ofbone implants. The above three intensity-based methods are referred toas MI Powell, CC Powell and GC Powell, indicating the adopted similaritymeasure and optimization method.In addition to the above three methods, we implemented anotherintensity-based method combining MI and GC to achieve improved ro-bustness and accuracy to compete with PEHL. MI focuses on the matchof the histograms at the global scale, which leads to a relatively large cap-ture range, but lacks fine accuracy. GC focuses on matching image gra-dients, which leads to high registration accuracy, but limits the capturerange. The combined method, referred to as MI GC Powell, first appliesMI Powell to bring the registration into the capture range of GC, and then95applies GC Powell to refine the registration.We also compared PEHL with CLARET, a linear regression-based 2-D/3-D registration method introduced in [46], which is closely related toPEHL, as it iteratively applies regressors on the image residual to estimatethe transformation parameters. In [46], the linear regressors were reportedto be trained on X-ray images with fixed ground truth transformation pa-rameters, and therefore can only be applied on X-ray images with poseswithin a limited range. Since the input X-ray images used in our experi-ment do not have such limitation, we applied the PSP strategy to train lin-ear regressors separately for each zone. For each zone, the linear regressorwas trained on the dataset used for training PEHL for Group Experiment EnvironmentThe experiments were conducted on a workstation with Intel Core i7-4790k CPU, 16GB RAM and Nvidia GeForce GTX 980 GPU. For intensity-based methods, the most computationally intensive component, DRR ren-derer, was implemented using the Ray Casting algorithm with hardware-accelerated 3-D texture lookups on GPU. Similarity measures were imple-mented in C++ and executed in a single CPU core. Both DRRs and similar-ity measures were only calculated within an ROI surrounding the target ob-ject, for better computational efficiency. In particular, ROIs of size 256×256,512×512 and 400×400 were used for TKA, VIPS and XEF, respectively. ForPEHL, the neural network was implemented with cuDNN acceleration us-ing an open-source deep learning framework, Caffe [99].4.5 Results4.5.1 Performance AnalysisFig. 4.10 shows the success rate as the number of iterations increases from1 to 10 for five analyzed scenarios. The results show that the success rate ofPEHL increased rapidly in the first 3 iterations (i.e., from 44.6% to 94.8%),and kept raising slowly afterward until 9 iterations (i.e., to 99.6%). Thecomputation time of PEHL is linear to the number of iterations, i.e., eachiteration takes ∼34 ms. Therefore, applying PEHL for 3 iterations is the96Number of interations1 2 3 4 5 6 7 8 9 10Success Rate (%)020406080100Without CNN (HF+RF)Without LIRWithout HPRWithout PSPPEHLFigure 4.10: Success rates of PEHL with 1 to 10 iterations. Four indi-vidual core components of PEHL, i.e., CNN, LIR, HPR and PSP,were disabled one at a time to demonstrate their detrimentaleffects on performance. Harr feature (HR) + Regression Forest(RF) was implemented to show the effect on performance with-out CNN. These results were generated on the XEF dataset.optimal setting for the trade-off between accuracy and efficiency, whichachieves close to the optimal success rate and a real-time registration of∼10 frames per second (fps). Therefore, in the rest of the experiment, PEHLwas tested with 3 iterations unless stated otherwise.The results show that the 3 proposed strategies, LIR, HPR and PSP, andthe use of CNN all noticeably contributed to the final registration accuracyof PEHL. In particular, if the CNN regression model is replaced with HAARfeature + Regression Forest, the success rate at the 3rd iteration dropped to70.7%, indicating that the strong non-linear modeling capability of CNN iscritical to the success of PEHL. If the LIP is replaced with a global imageresidual, the success rate at the 3rd iteration dropped significantly to 52.2%,showing that LIR is a necessary component to simplify the target mappingso that it can be robustly regressed with the desired accuracy. When HPRand PSP are disabled, the system almost completely failed, dropping the97Number of Iterations0 1 2 3 4 5 6 7 8 9 10RMSDproj (mm)00.511.522.5NMI_GC_Powell: 0.52 mmPEHLFigure 4.11: RMSDproj from the registered locations of each target totheir centroid using MI GC Powell and PEHL with 1 to 10 iter-ations. At Number of Iterations = 0, the RMSEproj at the per-turbed positions without registration is shown. These resultswere generated on the XEF dataset.success rate at the 3rd iteration to 19.5% and 14.9%, respectively, suggestingthat HPR and PSP are key components that make the regression problemsolvable using the proposed CNN regression model.Fig. 4.11 shows the RMSDproj from registered target points to their cor-responding centroid using both MI GC Powell and PEHL with 1 to 10 it-erations. The results show that as the number of iteration increases, theRMSDproj of PEHL approaches zero, indicating that with a sufficient num-ber of iterations, PEHL can reliably reproduce the same result starting fromdifferent positions (e.g., 6 iterations leads to RMSEproj = 0.005 mm). At the3rd iteration, the RMSDproj of PEHL is 0.198 mm, which is 62% smallerthan that of MI GC Powell, i.e., 0.52 mm. These results suggest that PEHLhas a significant advantage over MI GC Powell in terms of precision.98Table 4.2: RMSE of the 6 transformation parameters yielded by PEHLand CLARET on the training data for XEF.tx (mm) ty (mm) tz (mm) tθ (◦) tα (◦) tβ (◦)Start 0.86 0.86 8.65 1.71 8.66 8.66PEHL 0.04 0.04 0.32 0.06 0.18 0.18CLARET 0.51 0.88 34.85 2.00 19.41 17.5299Table 4.3: Quantitative experimental results of PEHL and baseline methods. Success rate is the percentage ofsuccessful registrations in each experiment. Capture range is the initial mTREproj for which 95% of theregistrations were successful. The 10th, 25th, 50th, 75th and 90th percentiles of mTREproj are reported.Running time records the average and standard deviation of the computation time for each registrationcomputed in each experiment. Capture range is only reported for experiments where the there are morethan 20 samples within the capture range.Application MethodSuccessRateCapture Range(mm)mTREproj Percentile (mm) Running Time(s)10th 25th 50th 75th 90thTKAStart N/A N/A 3.285 4.627 6.979 10.050 12.667 N/AMI Powell 36.2% N/A 0.437 0.746 1.693 6.238 8.421 1.37±0.44CC Powell 43.8% 1.88 0.348 0.637 1.359 6.321 8.398 0.92±0.27GC Powell 45.2% 2.14 0.330 0.588 1.313 7.615 9.765 2.52±1.22MI GC Powell 51.8% 2.83 0.299 0.521 1.048 6.408 8.614 3.11±0.94PEHL 79.6% 7.23 0.333 0.444 0.593 0.903 6.733 0.11±0.00VIPSStart N/A N/A 1.180 1.521 2.003 2.594 3.101 N/AMI Powell 75.1% N/A 0.156 0.234 0.375 0.604 0.917 1.66±0.60CC Powell 57.7% 0.89 0.187 0.303 0.535 0.851 1.293 0.91±0.31GC Powell 78.7% 1.12 0.121 0.207 0.325 0.543 2.283 3.91±1.55MI GC Powell 92.7% 2.77 0.106 0.170 0.259 0.367 0.535 4.71±1.59PEHL 99.7% 5.51 0.151 0.181 0.244 0.389 0.451 0.10±0.00XEFStart N/A N/A 1.048 1.369 1.826 2.307 2.790 N/AMI Powell 69.7% N/A 0.165 0.207 0.280 0.403 0.598 0.79±0.29CC Powell 54.8% N/A 0.117 0.168 0.321 0.893 1.173 0.40±0.10GC Powell 56.9% N/A 0.071 0.135 0.279 1.055 3.150 2.06±1.05MI GC Powell 89.1% 0.84 0.047 0.098 0.174 0.273 0.380 2.03±0.69PEHL 94.5% 3.33 0.082 0.113 0.148 0.195 0.243 0.10±0.00100mTREproj before registration (mm)0 5 10 15 20mTREproj after registration (mm)05101520TKA: MI_GC_PowellmTREproj before registration (mm)0 1 2 3 4 5mTREproj after registration (mm)012345VIPS: MI_GC_PowellmTREproj before registration (mm)0 1 2 3 4 5mTREproj after registration (mm)012345XEF: MI_GC_PowellmTREproj before registration (mm)0 5 10 15 20mTREproj after registration (mm)05101520TKA: PEHLmTREproj before registration (mm)0 1 2 3 4 5mTREproj after registration (mm)012345VIPS: PEHLmTREproj before registration (mm)0 1 2 3 4 5mTREproj after registration (mm)012345XEF: PEHLFigure 4.12: mTREproj before and after registration usingMI GC Powell and PEHL on TKA, VIPS and XEF datasets.4.5.2 Comparison with State-of-the-Art MethodsWe first observed that the linear regressor in CLARET completely failed inour experiment setup. Table 4.2 shows the root mean squared error (RMSE)of the 6 parameters yielded by PEHL and CLARET on the synthetic train-ing data for XEF. The linear regression resulted in very large errors in thetraining data (i.e., larger than the perturbation), indicating that the map-ping from the global image residual to the underlying transformation pa-rameters is highly non-linear, and therefore cannot be reliably captured bya linear regressor. In comparison, PEHL employs the 3 algorithmic strate-gies to simplify the non-linear relationship and captures it using a CNNwith a strong non-linear modeling capability. As a result, PEHL resultedin a very small error on the synthetic training data. Its ability to generalizethe performance to unseen testing data was then assessed on real data fromthe three clinical applications.101Table 4.3 summarizes the success rate, capture range, percentiles ofmTREproj and average running time per registration for PEHL and fourintensity-based methods on the three applications. The results show thatthe four intensity-based methods, MI Powell, GC Powell, CC Powell andMI GC Powell, all resulted in relatively small capture ranges and slowspeeds that are incapable of real-time registration. The small capturerange is owning to the limitation of non-convex optimization. Because theintensity-based similarity measures are highly non-convex, the optimizeris likely to get trapped in local maxima if the starting position is not closeenough to the global maxima. The relatively slow speed is owning to thelarge number of DRR renderings and similarity measure calculations dur-ing the optimization. The fastest intensity-based method is CC Powell,which took 0.4∼0.9 s per registration, is still significantly slower thantypical fluoroscopic frame rates (i.e., 10∼15 fps). The success rates forMI Powell, GC Powell and CC Powell are also very low, mainly due to twodifferent reasons: 1) MI and CC are unable to resolve small mismatch; 2) GCis unable to recover large mismatch. By employing MI Powell to recoverlarge mismatch and GC Powell to resolve small mismatch, MI GC Powellachieved much higher success rates, which is in line with our discussion inSection 4.4.5.The results show that PEHL achieved the best success rate and capturerange among all evaluated methods on all three applications, and is ca-pable of real-time registration. The advantage of PEHL in capture rangecompared to the 2nd best-performed method, i.e., MI GC Powell, is sig-nificant. In particular, on the three applications, PEHL resulted in 155%(on TKA), 99% (on VIPS) and 306% (on XEF) larger capture range thanMI GC Powell, respectively. The success rates of PEHL are also higher thanthat of MI GC Powell by 27.8% (on TKA), 5% (on VIPS) and 5.4% (on XEF).The advantage of PEHL in capture range and robustness is primarily own-ing to the learning of the direct mapping from the LIR to the residual of thetransformation parameters, which eliminates the need of optimizing over ahighly non-convex similarity measure. PEHL resulted in a running time of∼0.1 s per registration for all three applications, which is 20∼45 times faster102than that of MI GC Powell and leads to real-time registration at ∼10 fps.In addition, because the computation involved in PEHL is fixed for eachregistration, the standard deviation of the running time of PEHL is almostzero, so that PEHL can provide real-time registration at a stable frame rate.In comparison, intensity-based methods require different numbers of itera-tions for each registration, depending on the starting position, which leadsto a relatively large standard deviation of the running time. The mTREprojpercentiles show that at lower percentiles (e.g., 10th and 25th), the mTRE-proj of PEHL is in general larger than that of MI GC Powell. This is par-tially owning to the fact that the ground truth parameters were generatedusing GC, which could bear a slight bias toward intensity-based methodsusing GC as the similarity measure. For higher percentiles (e.g., 75th and90th), the mTREproj of PHEL becomes smaller than that of MI GC Powell,showing that PHEL is more robust than MI GC Powell. The distributionsof mTREproj before and after registration using MI GC Powell and PEHLon the three applications are shown in Fig. DiscussionIn this chapter, we presented a CNN regression approach, PEHL, for real-time 2-D/3-D registration. To successfully solve 2-D/3-D registration prob-lems using regression, we introduced 3 novel algorithmic strategies, LIR,HPR and PSP, to simplify the underlying mapping to be regressed, anddesigned a CNN regression model with strong non-linear modeling ca-pability to capture the mapping. We furthermore validated that all 3 al-gorithmic strategies and the CNN model are important to the success ofPEHL, by disabling them from PEHL and showing the detrimental effecton performance. We empirically found that applying PEHL for 3 itera-tions is the optimal setting, which leads to close to the optimal successrate and a real-time registration speed of ∼10 fps. We also demonstratedthat PEHL has a strong ability to reproduce the same registration resultfrom different initial positions, by showing that the RMSDproj of regis-tered targets approaches to almost zero (i.e., 0.005 mm) as the number ofiterations of PEHL increases to 6. In comparison, the RMSEproj using the103best performed intensity-based method, MI GC Powell, is 0.52 mm. Onthree potential clinical applications, we compared PEHL with 4 intensity-based 2-D/3-D registration methods and a linear regression-based method,and showed that PEHL achieved much higher robustness and larger cap-ture range. In particular, PEHL increased the capture range by 99%∼306%and the success rate by 5%∼27.8%, compared to MI GC Powell. We alsoshowed that PEHL achieved significantly higher computational efficiencythan intensity-based methods, and is capable of real-time registration.The significant advantage of PEHL in robustness and computationalefficiency over intensity-based methods is mainly owning to the fact thatCNN regressors are trained to capture the mapping from LIRs to the un-derlying transformation parameters. In every iteration, PEHL fully ex-ploits the rich information embedded in LIR to make an informed estima-tion of the transformation parameters, and therefore it is able to achievehighly robust and accurate registration with only a minimum number ofiterations. In comparison, intensity-based methods always map the DRRand X-ray images to a scalar-valued merit function, where the informationabout the transformation parameters embedded in the image intensities islargely lost. The registration problem is then solved by heuristically opti-mizing this scalar-valued merit function, which leads to an inefficient iter-ative computation and a high chance of getting trapped into local maxima.The results also show that PEHL is more accurate and robust thantwo accelerated intensity-based 2-D/3-D registration methods, Sparse His-togramming MI (SHMI) [33] and Direct Splatting Correlation (DSC) [36],which employ sub-sampled DRR and splatting DRR to quickly computeapproximated MI and CC, respectively. Because of the approximation,SHMI and DSC theoretically achieve the same or degraded accuracy com-pared to using original MI and CC. As shown in Table 4.3, all reportedmTREproj percentiles of PEHL are lower than that of MI Powell andCC Powell, and the differences at mid-range percentiles (i.e., 25th, 50th and75th) are quite significant. In particular, at the 50th percentile, the mTRE-proj of PEHL are 25%∼65% lower than that of MI Powell and CC Powellon all three applications. These results suggest that PEHL significantly out-104performs SHMI and DSC in terms of robustness and accuracy. In terms ofcomputational efficiency, while all three methods are capable of real-timeregistration, with an efficient GPU implementation, DSC reported the high-est registration speed (i.e., 23.6∼92.3 fps) [36].PEHL employs HPR and PSP to break the complex regression problemto several much simpler problems. The PSP strategy divides the parameterspace of out-of-plane rotations into small zones, and trains regressors foreach zone separately. Smaller zones will make the regression task simpler,but at the same time increase the training effort and memory consump-tion at the runtime. In our experiments, we empirically selected the size ofeach zone in the PSP strategy to be 20×20 degrees, which leads to satisfac-tory registration accuracy and robustness and keeps the number of zonesmanageable (i.e., 324). The HPR strategy divides the 6 parameters into 3groups, and trains regressors for each group separately. Therefore, thereare in total 324×3=972 regressors to be trained and loaded at the runtime.In order to make the memory footprint required by the 972 regressors man-ageable at runtime, we use a weight sharing mechanism in the CNN model,which leads to 2.39 GB total memory consumption. In comparison, with-out weight sharing, pre-loading the same CNN regression model used inPEHL requires 13.7 GB RAM, which could be impractical in a clinical setup.Like many machine learning-based methods, an important factor forthe success of PEHL is the quantity and quality of the training data. ForPEHL, it has been a challenge to obtain sufficient amount of annotated realX-ray images for training, because accurate annotation of 3-D transforma-tion on X-ray projection image is very difficult, especially for those out-of-plane parameters. We have shown that by generating well-simulated syn-thetic data and training the CNN network on synthetic data only, we couldachieve a high performance when applying PEHL on real X-ray images.However, it is worth noting that if the object to be registered is a deviceor implant that is manufactured with a fixed design, it is also possible tohave a factory setup to massively acquire real X-ray images with a knownground truth for training PEHL.105Chapter 5Conclusions and Future WorksIn this thesis, we studied image registration for motion estimation in med-ical imaging environments, and investigated two problems: 1) deformablerespiratory motion estimation from dynamic MRI, and 2) rigid-body objectmotion estimation from fluoroscopic images. In this chapter, we will sum-marize the findings presented in this thesis, and discuss potential futureworks.5.1 ConclusionsFor respiratory motion estimation, we showed that previous motion mod-eling methods produce inaccurate estimation when unexpected motionpatterns are presented, because the motion model cannot fully cover allvariations of respiratory motion patterns. In the course of addressing thislimitation, we proposed a dynamic MRI data acquisition protocol to ac-quire imaging data for monitoring the patient’s respiratory motion in real-time, and a scatter-to-volume registration method to recover the 3-D mo-tion field purely from the dynamic MRI data. Since the proposed methoddoes not rely on any respiratory motion, it is robust to the variation ofrespiratory motion patterns. We have presented validation results of theproposed method on synthetic MRI data of human subjects with varyingbreathing patterns, as well as on real MRI data acquired from canine stud-ies. The results on synthetic MRI data demonstrated the advantage of the106proposed method in addressing variations of respiratory motions patternsover the state-of-the-art model-based methods. We have shown that theproposed method achieves comparable accuracies to model-based meth-ods even when there is minimum or no variations in the subject’s breath-ing pattern. The efficacy of the proposed MRI data acquisition protocoland scatter-to-volume registration method has also been validated on realMRI data acquired in two canine studies. We further demonstrated theproposed method on the most directly related application, i.e., motion cor-rected PET imaging using hybrid PET/MRI systems, where the MRI data isused for monitoring the patient’s motion during PET data acquisition. It isshown that motion corrected PET images using the proposed method havefewer motion artifacts and higher SNR compared to previous methods.For non-anatomical object motion estimation, we first investigated us-ing pre-generated DRRs to accelerate DRR generation during 2-D/3-D reg-istration. We found that by parameterizing the 6 DoF rigid-body transfor-mation using 2 geometry-relevant and 4 geometry-irrelevant parameters,all DRRs with the same geometry-relevant parameters can be generatedfrom the same canonical form DRR using simple and computationally ef-ficient 2-D transformations. Based on this finding, we have proposed areal-time 2-D/3-D registration method using a library of canonical DRRswith hybrid optimization methods. The proposed method has been val-idated on 2-D/3-D registration and 6-DoF pose tracking of a TEE probe.We have shown that using the proposed library-based DRR generation, thecomputation time for intensity-based 2-D/3-D registration can be signifi-cantly reduced (up to 10 folds), with a real-time registration speed above25 FPS, while the degradation of the registration accuracy is negligible. Wehave also shown that using the proposed method, 2-D/3-D registration canbe performed frame-by-frame on fluoroscopic sequences to achieve 6-DoFobject tracking with very high robustness and accuracy.Finally, we proposed a new paradigm for 2-D/3-D registration, to ad-dress the two major limitations of the state-of-the-art 2-D/3-D registrationmethods: 1) slow computation, and 2) small capture range. We showedthat by formulating 2-D/3-D registration as a regression problem, 2-D/3-107D registration can be achieved without iterative optimization and there-fore can be performed very computationally efficiently. Since the mappingfrom image residuals to parameter residuals could be highly non-linear,we found it critical to properly formulate the regression problem to sim-plify the mapping. One important finding is that by using the 3-D poseindexed feature as input for regression, the complexity of the regressionproblem can be significantly simplified. This is because the 3-D pose in-dexed feature is only affected by the parameter residuals to be regressed,and is invariant to absolute parameter values. Another important findingis that by marginalizing the 6 parameters of rigid-body transformation andregressing them hierarchically, the complex regression problem can be sub-divided into multiple simpler problems that are easier to solve. We haveshown that the properly formulated regression problem can be effectivelysolved using a CNN, and we introduced a customized CNN architecturethat is suitable for our 2-D/3-D registration problems. We have validatedvia experiments that the algorithmic strategies we employed to simplifythe regression problem and the CNN regression model are all necessaryand critical for achieving accurate and robust 2-D/3-D registration via re-gression. We have demonstrated on three potential applications that theproposed CNN regression-based 2-D/3-D registration method can achievesignificantly larger capture range, higher robustness and higher computa-tional efficiency compared to the state-of-the-art intensity-based 2-D/3-Dregistration methods.5.2 Future Works5.2.1 Clinical Applications of Dynamic-MRI based RespiratoryMotion EstimationOne important future work is to evaluate / apply the proposed methodson real clinical applications. The most closely related application is res-piratory motion corrected PET imaging using hybrid PET/MRI systems.While the proposed method can recover a 4-D dense vector field of the res-piratory motion from dynamic MRI data, incorporating a 4-D dense vector108field into PET image reconstruction for effective motion correction remainsan open research problem. In this thesis, we have demonstrated the ap-plication of the proposed respiratory motion estimation method on motioncorrected PET imaging by combining it with a simple motion corrected PETreconstruction method, the RTA method, which corrects motions in recon-structed PET images as a post-processing step. However, most existingmotion corrected PET reconstruction methods including the RTA methodare designed for gated 3-D motion fields generated from motion models,and cannot fully utilize the information in the 4-D motion field providedby the proposed method. A more effective way to correct motion in PETdata using a 4-D motion field is to incorporate motion correction into thelist-mode processing of raw PET data instead of in a post-processing step.For example, the line of response in list-mode processing can be deformedusing the 4-D motion field, so that motion is compensated for every singleannihilation event.5.2.2 Extensions of CNN Regression-based 2-D/3-D RegistrationOne possible future research direction for CNN regression-based 2-D/3-D registration is to investigate the possibility of sharing the CNN weightsacross groups and/or zones, so that the memory footprint can be furtherreduced, making more complex and deeper network structures affordable.The difficulty of sharing the CNN across groups and/or zones lies in thefact that the training data for different groups and zones are different,which makes the training of the shared CNN a non-trivial task.Another possible future research direction is to extend the proposedmethod for multi-plane 2-D/3-D registration, i.e., registering one 3-Dmodel to multiple X-ray images acquired from different angles. This is cur-rently a limitation of our method compared to intensity-based methods,which can be straightforwardly extended from mono-plane to multi-planesetups by simply combining the similarity measures from all the planes intoone value. One possible way to achieve multi-plane registration is to per-form regression on each plane separately to obtain motion estimates andthen aggregate them into one estimation.1095.2.3 Global 6-DoF Pose Estimation and Tracking via CNN Re-gressionThe CNN regression-based 2-D/3-D registration method described inChapter 4 needs a relatively close starting position before registration. Apossible future research direction is to extend the method for global 6-DoF pose estimation, so that the requirement for a starting position can beavoided. This extension will make the method applicable for more clinicalapplications, where it is difficult to obtain a starting position. The chal-lenge in regressing the pose globally is the high complexity of the mappingto be revealed by the regression. To overcome this problem, one possiblesolution is to increase the complexity (e.g., depth) of the CNN model toincrease its expressive power, and train it with a larger dataset. However,training of a complex CNN model is known to be a non-trivial task. Re-cently, some techniques for training very deep neural network have beenreported, e.g., Deep Residual Learning, and can be investigated and ex-plored for solving the global 6-DoF pose estimation problem. Besides in-creasing the complexity of the CNN, a possible alternative solution is toemploy a multi-resolution strategy to regress the pose from coarse to finelevels, which breaks down the complex regression problem into severalsimpler problems, and thus makes the problem easier to solve.Another possible future research direction is to combine pose estima-tion from single fluoroscopic images with tracking techniques for 6-DoFobject tracking in fluoroscopic sequences. Since the motion of the object be-tween two neighboring frames is relatively small, a possible tracking strat-egy is to use the registration result from the previous frame as the start-ing position for 2-D/3-D registration for the current frame, and apply theregression frame by frame to achieve tracking. Combining the above ex-tensions of global pose estimation and pose tracking, the CNN regression-based method will be able to fully automatically recover and track an ob-ject’s pose in fluoroscopic sequences.1105.2.4 Deep Learning-based 3-D/3-D Registration of Medical Im-agesMotivated by the success of CNN regression on 2-D/3-D registration ex-plored in this thesis, a possible direction of future research is to investi-gate CNN regression or other deep learning methods for 3-D/3-D registra-tion of medical images. Rigid-body 3-D/3-D registration aims at estimat-ing a rigid-body transformation that can align two 3-D images. The targetobjects to be registered in 3-D/3-D registration are often organs or otheranatomical structures, for which it is impractical to perform training on thepatient-specific 3-D model in advance. Therefore, a learning-based solutionfor 3-D/3-D registration needs to be able to handle unseen patients, whichmakes the CNN regression framework described in this thesis not directlyapplicable. To address this challenge, one possible strategy is to construct agenerative model (e.g., statistical model and bio-mechanical model) of thetarget object, and train the deep neural network on data generated from themodel. The large variation in the training data could potentially make thetrained network generalizable to unseen patients.111Bibliography[1] S. Miao, R. Liao, G. Moran, J. Butler, L. Pan, and Z. J. Wang. Dynamicmr-based respiratory motion compensation for hybrid pet/mrsystem. In Industrial Electronics and Applications (ICIEA), 2014 IEEE9th Conference on, pages 1915–1920. IEEE, 2014. → pages[2] S. Miao, Z. J. Wang, and R. Liao. Non-parametric orthogonal slice tovolume deformable registration: Application to pet/mr respiratorymotion compensation. In Signal and Information Processing (ChinaSIP),2014 IEEE China Summit & International Conference on, pages 530–534.IEEE, 2014. → pages[3] S. Miao, Z. J. Wang, and R. Liao. Mri-based motion estimation viascatter to volume registration. In Biomedical Imaging (ISBI), 2015 IEEE12th International Symposium on, pages 596–600. IEEE, 2015. → pages[4] S. Miao, Z. J. Wang, L. Pan, J. Butler, G. Moran, and R. Liao.Mri-based respiratory motion estimation via scatter to volumeregistration. In Press, Computerized Medical Imaging and Graphics, 2016.doi:10.1016/j.compmedimag.2016.03.002. → pages 20[5] S. Miao, A. Tuysuzoglu, Z. J. Wang, and R. Liao. Real-time 6dof poserecovery from x-ray images using library-based drr and hybridoptimization. In Press, International Journal of Computer AssistedRadiology and Surgery, 2016. ISSN 1861-6429.doi:10.1007/s11548-016-1387-2. → pages 50[6] S. Miao, Z. J. Wang, Y. Zheng, and R. Liao. Real-time 2d/3dregistration via cnn regression. In Biomedical Imaging (ISBI), 2016IEEE 12th International Symposium on. IEEE, 2016. → pages[7] S. Miao, Z. J. Wang, and R. Liao. A cnn regression approach forreal-time 2d/3d registration. IEEE Transactions on Medical Imaging, 35112(5):1352–1363, May 2016. ISSN 0278-0062.doi:10.1109/TMI.2016.2521800. → pages 76[8] P. Keall, V. Kini, S. Vedam, and R. Mohan. Potential radiotherapyimprovements with respiratory gating. Australasian Physics &Engineering Sciences in Medicine, 25(1):1–6, 2002. → pages 3[9] J. Miller. Preparing children for imaging studies. Radiology Rounds, 7(6), 2009. → pages 3[10] Enableing ultra-low does pet/ct imaging.http://depts.washington.edu/imreslab/currentResearch.html. Accessed:2016-08-15. → pages 3[11] S. Vetter, I. Mu¨hlha¨user, J. von Recum, P.-A. Gru¨tzner, and J. Franke.Validation of a virtual implant planning system (vips) in distal radiusfractures. Bone & Joint Journal Orthopaedic Proceedings Supplement, 96(SUPP 16):50–50, 2014. → pages 2, 90[12] S. Dubsky and A. Fouras. Imaging regional lung function: A criticaltool for developing inhaled antimicrobial therapies. Advanced drugdelivery reviews, 85:100–109, 2015. → pages xi, 5[13] J. R. McClelland, D. J. Hawkes, T. Schaeffter, and A. P. King.Respiratory motion models: A review. Medical image analysis, 17(1):19–42, 2013. → pages 7, 8[14] D. Manke, P. Rosch, K. Nehrke, P. Bornert, and O. Dossel. Modelevaluation and calibration for prospective respiratory motioncorrection in coronary mr angiography based on 3-d imageregistration. Medical Imaging, IEEE Transactions on, 21(9):1132–1141,2002. → pages 8[15] J. McClelland, S. Hughes, M. Modat, A. Qureshi, S. Ahmad,D. Landau, S. Ourselin, and D. Hawkes. Inter-fraction variations inrespiratory motion models. Physics in medicine and biology, 56(1):251,2010. → pages 8[16] G. Shechter, C. Ozturk, J. R. Resar, and E. R. McVeigh. Respiratorymotion of the heart from free breathing coronary angiograms. MedicalImaging, IEEE Transactions on, 23(8):1046–1056, 2004. → pages 8113[17] J.-J. Sonke, J. Lebesque, and M. Van Herk. Variability offour-dimensional computed tomography patient models.International Journal of Radiation Oncology* Biology* Physics, 70(2):590–598, 2008. → pages 8[18] D. A. Low, P. J. Parikh, W. Lu, J. F. Dempsey, S. H. Wahab, J. P.Hubenschmidt, M. M. Nystrom, M. Handoko, and J. D. Bradley.Novel breathing motion model for radiotherapy. International Journalof Radiation Oncology* Biology* Physics, 63(3):921–929, 2005. → pages 8[19] D. Yang, W. Lu, D. A. Low, J. O. Deasy, A. J. Hope, and I. El Naqa.4d-ct motion estimation using deformable image registration and 5drespiratory motion modeling. Medical physics, 35(10):4577–4590, 2008.→ pages 8[20] Q. Zhang, A. Pevsner, A. Hertanto, Y.-C. Hu, K. E. Rosenzweig, C. C.Ling, and G. S. Mageras. A patient-specific respiratory model ofanatomical motion for radiation treatment planning. Medical Physics,34(12):4772–4781, 2007. → pages 9[21] A. P. King, C. Buerger, C. Tsoumpas, P. Marsden, and T. Schaeffter.Thoracic respiratory motion estimation from mri using a statisticalmodel and a 2-d image navigator. Medical image analysis, 16(1):252–264, 2012. → pages 9, 34, 41[22] J. McClelland, S. Hughes, M. Modat, A. Qureshi, S. Ahmad,D. Landau, S. Ourselin, and D. Hawkes. Inter-fraction variations inrespiratory motion models. Physics in medicine and biology, 56(1):251,2011. → pages 9[23] R. Werner, J. Ehrhardt, R. Schmidt, and H. Handels. Patient-specificfinite element modeling of respiratory lung motion using 4d ct imagedata. Medical physics, 36(5):1500–1511, 2009. → pages 9[24] J. Saade´, A.-L. Didier, P.-F. Villard, R. Buttin, J.-M. Moreau, M. Beuve,B. Shariat, et al. A preliminary study for a biomechanical model ofthe respiratory system. Proceedings of VISAPP 2010, 2010. → pages[25] B. Fuerst, T. Mansi, J. Zhang, P. Khurd, J. Declerck, T. Boettger,N. Navab, J. Bayouth, D. Comaniciu, and A. Kamen. A personalizedbiomechanical model for respiratory motion prediction. In MedicalImage Computing and Computer-Assisted Intervention–MICCAI 2012,pages 566–573. Springer, 2012. → pages 9, 10114[26] P. Markelj, D. Tomazˇevicˇ, B. Likar, and F. Pernusˇ. A review of 3d/2dregistration methods for image-guided interventions. Medical imageanalysis, 16(3):642–661, 2012. → pages 10, 53, 56[27] C. Gendrin, H. Furtado, C. Weber, C. Bloch, M. Figl, S. A. Pawiro,H. Bergmann, M. Stock, G. Fichtinger, D. Georg, et al. Monitoringtumor motion by real time 2d/3d registration during radiotherapy.Radiotherapy and oncology, 102(2):274–280, 2012. → pages 95[28] J. Schmid and C. Cheˆnes. Segmentation of x-ray images by 3d-2dregistration based on multibody physics. In Computer Vision–ACCV2014, pages 674–687. Springer, 2014. → pages 68, 95[29] S. Miao, R. Liao, and Y. Zheng. A hybrid method for 2-d/3-dregistration between 3-d volumes and 2-d angiography fortrans-catheter aortic valve implantation (tavi). In Biomedical Imaging:From Nano to Macro, 2011 IEEE International Symposium on, pages1215–1218. IEEE, 2011. → pages 10[30] R. A. McLaughlin, J. Hipwell, D. J. Hawkes, J. A. Noble, J. V. Byrne,and T. Cox. A comparison of 2d-3d intensity-based registration andfeature-based registration for neurointerventions. In Medical ImageComputing and Computer-Assisted InterventionMICCAI 2002, pages517–524. Springer, 2002. → pages 10, 55[31] A. Varnavas, T. Carrell, and G. Penney. Fully automated 2d–3dregistration and verification. Medical image analysis, 26(1):108–119,2015. → pages 11[32] S. Miao, R. Liao, J. Lucas, and C. Chefdhotel. Toward accurate androbust 2-d/3-d registration of implant models to single-planefluoroscopy. In Augmented Reality Environments for Medical Imagingand Computer-Assisted Interventions, pages 97–106. Springer, 2013. →pages 11, 12[33] L. Zo¨llei, E. Grimson, A. Norbash, and W. Wells. 2d-3d rigidregistration of x-ray fluoroscopy and ct images using mutualinformation and sparsely sampled histogram estimators. In ComputerVision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001IEEE Computer Society Conference on, volume 2, pages II–696. IEEE,2001. → pages 11, 55, 70, 71, 104115[34] W. Birkfellner, M. Stock, M. Figl, C. Gendrin, J. Hummel, S. Dong,J. Kettenbach, D. Georg, and H. Bergmann. Stochastic rankcorrelation: A robust merit function for 2d/3d registration of imagedata obtained at different energies. Medical physics, 36(8):3420–3428,2009. → pages 11[35] L. Westover. Footprint evaluation for volume rendering. In ACMSiggraph Computer Graphics, volume 24, pages 367–376. ACM, 1990.→ pages 11[36] C. Hatt, M. Speidel, and A. Raval. Robust 5dof transesophageal echoprobe tracking at fluoroscopic frame rates. In Medical ImageComputing and Computer-Assisted InterventionMICCAI 2002. Springer,2015. → pages 11, 70, 71, 90, 95, 104, 105[37] J. Kruger and R. Westermann. Acceleration techniques for gpu-basedvolume rendering. In Proceedings of the 14th IEEE Visualization 2003(VIS’03), page 38. IEEE Computer Society, 2003. → pages 11, 52[38] M. Kaiser, M. John, A. Borsdorf, P. Mountney, R. Ionasec, A. No¨ttling,P. Kiefer, J. Seeburger, and T. Neumuth. Significant acceleration of2d-3d registration-based fusion of ultrasound and x-ray images bymesh-based drr rendering. In SPIE Medical Imaging, pages867111–867111. International Society for Optics and Photonics, 2013.→ pages 11[39] S. Miao, T. Huynh, C. Adnet, M. Pfister, and R. Liao. Intensity-based3d-2d mesh-to-image registration using mesh-based digitallyreconstructed radiography. In Augmented Reality Environments forMedical Imaging and Computer-Assisted Interventions, pages 86–96.Springer, 2013. → pages 12[40] U. Mitrovic´, F. Pernusˇ, B. Likar, and Zˇ. Sˇpiclin. Simultaneous 3d–2dimage registration and c-arm calibration: Application toendovascular image-guided interventions. Medical physics, 42(11):6433–6447, 2015. → pages 12[41] S. Banks, W. A. Hodge, et al. Accurate measurement ofthree-dimensional knee replacement kinematics using single-planefluoroscopy. Biomedical Engineering, IEEE Transactions on, 43(6):638–649, 1996. → pages 12116[42] T. Aksoy, G. Unal, S. Demirci, N. Navab, and M. Degertekin.Template-based cta to x-ray angio rigid registration of coronaryarteries in frequency domain with automatic x-ray segmentation.Medical physics, 40(10):101903, 2013. → pages 12[43] M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios. Datafusion through cross-modality metric learning usingsimilarity-sensitive hashing. In Computer Vision and PatternRecognition (CVPR), 2010 IEEE Conference on, pages 3594–3601. IEEE,2010. → pages 12[44] F. Michel, M. Bronstein, A. Bronstein, and N. Paragios. Boostedmetric learning for 3d multi-modal deformable registration. InBiomedical Imaging: From Nano to Macro, 2011 IEEE InternationalSymposium on, pages 1209–1214. IEEE, 2011. → pages 12[45] A. R. Gouveia, C. Metz, L. Freire, P. Almeida, and S. Klein.Registration-by-regression of coronary cta and x-ray angiography.Computer Methods in Biomechanics and Biomedical Engineering: Imaging& Visualization, (ahead-of-print):1–13, 2015. → pages 13[46] C.-R. Chou, B. Frederick, G. Mageras, S. Chang, and S. Pizer. 2d/3dimage registration using regression learning. Computer Vision andImage Understanding, 117(9):1095–1106, 2013. → pages 13, 96[47] P. Wohlhart and V. Lepetit. Learning descriptors for objectrecognition and 3d pose estimation. In Proceedings of the IEEEConference on Computer Vision and Pattern Recognition, pages3109–3118, 2015. → pages 13[48] P. Dolla´r, P. Welinder, and P. Perona. Cascaded pose regression. InComputer Vision and Pattern Recognition (CVPR), 2010 IEEE Conferenceon, pages 1078–1085. IEEE, 2010. → pages 13[49] C. Zach, A. Penate-Sanchez, and M.-T. Pham. A dynamicprogramming approach for fast and robust object pose recognitionfrom range images. In Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition, pages 196–203, 2015. → pages[50] R. Mottaghi, Y. Xiang, and S. Savarese. A coarse-to-fine model for 3dpose estimation and sub-category recognition. In Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition, pages418–426, 2015. → pages 13117[51] P. Cachier and N. Ayache. Isotropic energies, filters and splines forvector field regularization. Journal of Mathematical Imaging and Vision,20(3):251–265, 2004. → pages 23[52] G. Wahba. Spline models for observational data, volume 59. Siam, 1990.→ pages 25, 26[53] S. N. Wood. Thin plate regression splines. Journal of the RoyalStatistical Society: Series B (Statistical Methodology), 65(1):95–114, 2003.→ pages 27[54] K. Zhang, I. W. Tsang, and J. T. Kwok. Improved nystro¨m low-rankapproximation and error analysis. In Proceedings of the 25thinternational conference on Machine learning, pages 1232–1239. ACM,2008. → pages 28[55] C. Tsoumpas, C. Buerger, A. King, P. Mollet, V. Keereman,S. Vandenberghe, V. Schulz, P. Schleyer, T. Schaeffter, and P. Marsden.Fast generation of 4d pet-mr data from real dynamic mr acquisitions.Physics in Medicine and Biology, 56(20):6597, 2011. → pages 32, 39[56] A. P. King, R. Boubertakh, K. S. Rhode, Y. Ma, P. Chinchapatnam,G. Gao, T. Tangcharoen, M. Ginks, M. Cooklin, J. S. Gill, et al. Asubject-specific technique for respiratory motion correction inimage-guided cardiac catheterisation procedures. Medical ImageAnalysis, 13(3):419–431, 2009. → pages 33, 41[57] J. Ehrhardt, R. Werner, A. Schmidt-Richberg, H. Handels, andA. SchmidtRichberg. Prediction of respiratory motion using astatistical 4d mean motion model. In The Second InternationalWorkshop on Pulmonary Image Analysis, Medical Image Computing andComputer-Assisted Intervention-MICCAI 2009, pages 3–14. Citeseer,2009. → pages 33, 41[58] A. P. King, C. Jansen, K. S. Rhode, D. Caulfield, R. Razavi, and G. P.Penney. Respiratory motion correction for image-guided cardiacinterventions using 3-d echocardiography. Medical image analysis, 14(1):21–29, 2010. → pages 34, 41[59] K. H. Zou, S. K. Warfield, A. Bharatha, C. Tempany, M. R. Kaus, S. J.Haker, W. M. Wells III, F. A. Jolesz, and R. Kikinis. Statisticalvalidation of image segmentation quality based on a spatial overlap118index¡ sup¿ 1¡/sup¿: scientific reports. Academic radiology, 11(2):178–189, 2004. → pages 35[60] G. P. Penney, J. Weese, J. A. Little, P. Desmedt, D. L. G. Hill, et al. Acomparison of similarity measures for use in 2-d-3-d medical imageregistration. Medical Imaging, IEEE Transactions on, 17(4):586–595,1998. → pages 35, 55[61] H. Gudbjartsson and S. Patz. The rician distribution of noisy mridata. Magnetic resonance in medicine, 34(6):910–914, 1995. → pages 37[62] L. Boucher, S. Rodrigue, R. Lecomte, and F. Be´nard. Respiratorygating for 3-dimensional pet of the thorax: feasibility and initialresults. Journal of Nuclear Medicine, 45(2):214–219, 2004. → pages 38[63] I. Polycarpou, C. Tsoumpas, and P. Marsden. Analysis andcomparison of two methods for motion correction in pet imaging.Medical physics, 39:6474, 2012. → pages 39[64] W. Crum, T. Hartkens, and D. Hill. Non-rigid image registration:theory and practice. British journal of radiology, 77(suppl 2):S140–S153,2004. → pages 52[65] M. Fleute and S. Lavalle´e. Nonrigid 3-d/2-d registration of imagesusing statistical models. pages 138–147. Springer, 1999. → pages[66] J. Yao and R. Taylor. Assessing accuracy factors in deformable 2d/3dmedical image registration using a statistical pelvis model. InComputer Vision, 2003. Proceedings. Ninth IEEE International Conferenceon, pages 1329–1334. IEEE, 2003. → pages 52[67] R. A. McLaughlin, J. Hipwell, D. J. Hawkes, J. A. Noble, J. V. Byrne,and T. C. Cox. A comparison of a similarity-based and afeature-based 2-d-3-d registration method for neurointerventionaluse. Medical Imaging, IEEE Transactions on, 24(8):1058–1066, 2005. →pages 54, 56[68] Y. Kita, D. L. Wilson, and J. A. Noble. Real-time registration of 3dcerebral vessels to x-ray angiograms. In Medical Image Computing andComputer-Assisted Interventation—MICCAI’98, pages 1125–1133.Springer, 1998. → pages 54119[69] M. Groher, F. Bender, R.-T. Hoffmann, and N. Navab.Segmentation-driven 2d-3d registration for abdominal catheterinterventions. In Medical Image Computing and Computer-AssistedIntervention–MICCAI 2007, pages 527–535. Springer, 2007. → pages 54[70] E. Bullitt, A. Liu, S. R. Aylward, C. Coffey, J. Stone, S. K. Mukherji,K. E. Muller, and S. M. Pizer. Registration of 3d cerebral vessels with2d digital angiograms: Clinical evaluation. Academic Radiology, 6(9):539–546, 1999. → pages[71] L. Duong, R. Liao, H. Sundar, B. Tailhades, A. Meyer, and C. Xu.Curve-based 2d-3d registration of coronary vessels for image guidedprocedure. In SPIE Medical Imaging, pages 72610S–72610S.International Society for Optics and Photonics, 2009. → pages[72] H. Sundar, A. Khamene, C. Xu, F. Sauer, and C. Davatzikos. A novel2d-3d registration algorithm for aligning fluoro images with 3dpre-op ct/mr images. Proc. SPIE (Med. Img.: Img. Processing), page61412K, 2006. → pages 54[73] D. Rivest-He´nault, H. Sundar, and M. Cheriet. Nonrigid 2d/3dregistration of coronary artery models with live fluoroscopy forguidance of cardiac interventions. Medical Imaging, IEEE Transactionson, 31(8):1557–1572, 2012. → pages 54[74] J. Hermans, P. Claes, J. Bellemans, D. Vandermeulen, and P. Suetens.Robust initialization for 2d/3d registration of knee implant models tosingle-plane fluoroscopy. In Medical Imaging, pages 651208–651208.International Society for Optics and Photonics, 2007. → pages 54[75] G. Zheng, X. Dong, and M. A. G. Ballester. Unsupervisedreconstruction of a patient-specific surface model of a proximal femurfrom calibrated fluoroscopic images. In Medical Image Computing andComputer-Assisted Intervention–MICCAI 2007, pages 834–841.Springer, 2007. → pages 54[76] A. Gueziec, P. Kazanzides, B. Williamson, and R. H. Taylor.Anatomy-based registration of ct-scan and intraoperative x-rayimages for guiding a surgical robot. Medical Imaging, IEEETransactions on, 17(5):715–728, 1998. → pages 54120[77] J. Feldmar, N. Ayache, and F. Betting. 3d–2d projective registration offree-form curves and surfaces. Computer Vision and ImageUnderstanding, 65(3):403–424, 1997. → pages 54[78] F. Maes, A. Collignon, D. Vandermeulen, G. Marchal, and P. Suetens.Multimodality image registration by maximization of mutualinformation. Medical Imaging, IEEE Transactions on, 16(2):187–198,1997. → pages 55[79] L. Lemieux, R. Jagoe, D. Fish, N. Kitchen, and D. Thomas. Apatient-to-computed-tomography image registration method basedon digitally reconstructed radiographs. Medical physics, 21:1749, 1994.→ pages 55, 56[80] L. Gottesfeld Brown and T. E. Boult. Registration of planar filmradiographs with computed tomography. In Mathematical Methods inBiomedical Image Analysis, 1996., Proceedings of the Workshop on, pages42–51. IEEE, 1996. → pages 55[81] W. M. Wells, P. Viola, H. Atsumi, S. Nakajima, and R. Kikinis.Multi-modal volume registration by maximization of mutualinformation. Medical image analysis, 1(1):35–51, 1996. → pages 55[82] T. M. Buzug, J. Weese, C. Fassnacht, and C. Lorenz. Imageregistration: Convex weighting functions for histogram-basedsimilarity measures. In CVRMed-MRCAS’97, pages 203–212.Springer, 1997. → pages 55[83] J. Weese, T. M. Buzug, C. Lorenz, and C. Fassnacht. An approach to2d/3d registration of a vertebra in 2d x-ray fluoroscopies with 3d ctimages. In CVRMed-MRCAS’97, pages 119–128. Springer, 1997. →pages 56[84] D. Sarrut and S. Clippe. Geometrical transformation approximationfor 2d/3d intensity-based registration of portal images and ct scan. InMedical Image Computing and Computer-Assisted Intervention–MICCAI2001, pages 532–540. Springer, 2001. → pages 56[85] S. Russell and P. Norvig. Artificial intelligence: A modern approach.2009. → pages 57, 66[86] J. A. Nelder and R. Mead. A simplex method for functionminimization. The computer journal, 7(4):308–313, 1965. → pages 57,66121[87] M. J. Powell. The bobyqa algorithm for bound constrainedoptimization without derivatives. 2009. → pages 57, 66[88] S. Miao, R. Liao, J. Lucas, and C. Chefdhotel. Toward accurate androbust 2-d/3-d registration of implant models to single-planefluoroscopy. In Augmented Reality Environments for Medical Imagingand Computer-Assisted Interventions, pages 97–106. Springer, 2013. →pages 60[89] S. Schumann, B. Thelen, S. Ballestra, L.-P. Nolte, P. Bchler, andG. Zheng. X-ray image calibration and its application to clinicalorthopedics. 36(7):968–974. ISSN 1350-4533. → pages 60[90] G. Gao, G. Penney, Y. Ma, N. Gogin, P. Cathier, A. Arujuna,G. Morton, D. Caulfield, J. Gill, C. A. Rinaldi, et al. Registration of 3dtrans-esophageal echocardiography to x-ray fluoroscopy usingimage-based probe tracking. Medical image analysis, 16(1):38–49, 2012.→ pages 67, 90[91] E. B. De Kraats, G. P. Penney, D. Tomazˇevicˇ, T. Van Walsum, and W. J.Niessen. Standardized evaluation methodology for 2-d-3-dregistration. Medical Imaging, IEEE Transactions on, 24(9):1177–1189,2005. → pages 68, 93[92] M. Kaiser, M. John, T. Heimann, A. Brost, T. Neumuth, and G. Rose.2d/3d registration of tee probe from two non-orthogonal c-armdirections. In Medical Image Computing and Computer-AssistedIntervention–MICCAI 2014, pages 283–290. Springer, 2014. → pages68, 79[93] D. Tomazˇevicˇ, B. Likar, T. Slivnik, and F. Pernusˇ. 3-d/2-d registrationof ct and mr to x-ray images. Medical Imaging, IEEE Transactions on, 22(11):1407–1416, 2003. → pages 83[94] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classificationwith deep convolutional neural networks. In Advances in neuralinformation processing systems, pages 1097–1105, 2012. → pages 86, 88[95] X. Glorot and Y. Bengio. Understanding the difficulty of trainingdeep feedforward neural networks. In International conference onartificial intelligence and statistics, pages 249–256, 2010. → pages 89122[96] Z. Zhu, S. Ji, M. Yang, H. Ding, and G. Wang. An application of theautomatic 2d-3d image matching technique to study the in-vivo kneejoint kinematics before and after tka. In World Congress on MedicalPhysics and Biomedical Engineering May 26-31, 2012, Beijing, China,pages 230–233. Springer, 2013. → pages 89[97] R. Lienhart and J. Maydt. An extended set of haar-like features forrapid object detection. In Image Processing. 2002. Proceedings. 2002International Conference on, volume 1, pages I–900. IEEE, 2002. →pages 94[98] M. Kaiser, M. John, T. Heimann, T. Neumuth, and G. Rose.Comparison of optimizers for 2d/3d registration for fusion ofultrasound and x-ray. In Bildverarbeitung fu¨r die Medizin 2014:Algorithmen-Systeme-Anwendungen Proceedings des Workshops vom 16.bis 18. Ma¨rz 2014 in Aachen, page 312. Springer, 2014. → pages 95[99] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture forfast feature embedding. arXiv preprint arXiv:1408.5093, 2014. → pages96123


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items