Automatic Vertebrae Localization, Identification, andSegmentation Using Deep Learning and Statistical ModelsbyAmin SuzaniB.Sc. Computer Engineering, Sharif University of Technology, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of Applied ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University Of British Columbia(Vancouver)October 2014c© Amin Suzani, 2014AbstractAutomatic localization and identification of vertebrae in medical images of thespine are core requirements for building computer-aided systems for spine diagno-sis. Automated algorithms for segmentation of vertebral structures can also ben-efit these systems for diagnosis of a range of spine pathologies. The fundamentalchallenges associated with the above-stated tasks arise from the repetitive natureof vertebral structures, restrictions in field of view, presence of spine pathologiesor surgical implants, and poor contrast of the target structures in some imagingmodalities.This thesis presents an automatic method for localization, identification, andsegmentation of vertebrae in volumetric computed tomography (CT) scans andmagnetic resonance (MR) images of the spine. The method makes no assumptionsabout which section of the vertebral column is visible in the image. An efficientdeep learning approach is used to predict the location of each vertebra based on itscontextual information in the image. Then, a statistical multi-vertebrae model isinitialized by the localized vertebrae from the previous step. An iterative expecta-tion maximization technique is used to register the statistical multi-vertebrae modelto the edge points of the image in order to achieve a fast and reliable segmentationof vertebral bodies. State-of-the-art results are obtained for vertebrae localizationin a public dataset of 224 arbitrary-field-of-view CT scans of pathological cases.Promising results are also obtained from quantitative evaluation of the automatedsegmentation method on volumetric MR images of the spine.iiPrefaceThis thesis is primarily based on several manuscripts, resulting from the collabora-tion between multiple researchers. All publications have been modified to make thethesis coherent. Ethical approval for conducting this research has been provided bythe UBC Research Ethics Board, certificate numbers: H13-02361.A version of Chapter 2 has been published in:• Amin Suzani, Abtin Rasoulian, Sidney Fells, Robert N. Rohling, and PurangAbolmaesumi. Semi-automatic segmentation of vertebral bodies in volumet-ric MR images using a statistical shape+pose model. In SPIE Medical Imag-ing, pages 90360P–90360P. International Society for Optics and Photonics,2014.The contribution of the author was in developing, implementing, and evaluatingthe presented framework. The statistical model of the vertebral bodies along withthe implementation of the model registration was provided by Dr. Rasoulian.Profs. Abolmaesumi, Rohling, and Fels helped with their valuable suggestionsfor improving the methodology. All co-authors contributed to the editing of themanuscript.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Clinical Background . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Semi-automatic Segmentation in MRI . . . . . . . . . . . . . . . . . 52.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.1 Intensity Correction . . . . . . . . . . . . . . . . . . . . 82.3.2 Anisotropic Diffusion . . . . . . . . . . . . . . . . . . . 82.3.3 Canny Edge Detection . . . . . . . . . . . . . . . . . . . 8iv2.3.4 Segmentation Using the Multi-vertebrae Model . . . . . . 92.4 Results and Discussions . . . . . . . . . . . . . . . . . . . . . . . 92.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Deep Learning for Automatic Vertebrae Localization in CT . . . . . 163.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Simultaneous Localization and Labeling . . . . . . . . . . . . . . 193.3.1 Intensity-based Features . . . . . . . . . . . . . . . . . . 193.3.2 Parametrizing Localization Problem As a Regression . . . 213.3.3 Deep Neural Networks for Regression . . . . . . . . . . . 21Network structure . . . . . . . . . . . . . . . . . . . . . . 22Cost function . . . . . . . . . . . . . . . . . . . . . . . . 24Layerwise pre-training . . . . . . . . . . . . . . . . . . . 26Second neural network for the z coordinates of outputs . . 273.3.4 Kernel Density Estimation for Vote Aggregation . . . . . 28Procedure on a test image . . . . . . . . . . . . . . . . . 28Kernel density estimation . . . . . . . . . . . . . . . . . 293.4 Hyper-Parameters Optimization . . . . . . . . . . . . . . . . . . 313.4.1 Parameters in Feature Extraction . . . . . . . . . . . . . . 31Downsample rate . . . . . . . . . . . . . . . . . . . . . . 31Size and displacement of feature boxes . . . . . . . . . . 32PCA whitening . . . . . . . . . . . . . . . . . . . . . . . 333.4.2 Parameters in Training Deep Neural Network . . . . . . . 35Neural network structure . . . . . . . . . . . . . . . . . . 35Activation function . . . . . . . . . . . . . . . . . . . . . 36Optimization method . . . . . . . . . . . . . . . . . . . . 373.4.3 Parameters in Aggregating Votes . . . . . . . . . . . . . . 393.5 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 403.5.1 Separate Network for Z Coordinate . . . . . . . . . . . . 413.5.2 Point Selection Using Canny Edge Detector . . . . . . . . 413.5.3 Refinement by Local Vote Aggregation . . . . . . . . . . 423.5.4 Capability of Our Shape+pose Model for Further Refinement 44v3.5.5 Final Results . . . . . . . . . . . . . . . . . . . . . . . . 453.5.6 Comparison to State-of-the-art . . . . . . . . . . . . . . . 453.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484 Automated Localization, Identification, and Segmentation in MRI . 494.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 494.2 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.3.1 Automatic Localization and Identification . . . . . . . . . 51Pre-processing: bias field correction . . . . . . . . . . . . 51Localization and identification by deep learning . . . . . . 52Refinement by local thresholding . . . . . . . . . . . . . 524.3.2 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 54Pre-processing: anisotropic diffusion . . . . . . . . . . . 54Statistical model registration . . . . . . . . . . . . . . . . 544.4 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 544.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Speedup by Vectorization and GPU Acceleration . . . . . . . . . . . 615.1 Vectorized Feature Extraction . . . . . . . . . . . . . . . . . . . . 615.1.1 Description of Features . . . . . . . . . . . . . . . . . . . 625.1.2 Sequential Implementation . . . . . . . . . . . . . . . . . 625.1.3 Vectorized Implementation . . . . . . . . . . . . . . . . . 635.1.4 Comparison and Computation Analysis . . . . . . . . . . 645.2 GPU-accelerated Model Registration . . . . . . . . . . . . . . . . 655.2.1 Spine Segmentation Method . . . . . . . . . . . . . . . . 66Computationally intensive part . . . . . . . . . . . . . . . 675.2.2 Parallel Computing Approach . . . . . . . . . . . . . . . 68GPU acceleration using CUDA . . . . . . . . . . . . . . 68Multicore CPU acceleration using shared memory . . . . 685.2.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 68Parallelization on GPU . . . . . . . . . . . . . . . . . . . 69Parallelization on a multicore CPU . . . . . . . . . . . . . 70vi5.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Block size in GPU programming . . . . . . . . . . . . . . 71Speedup gain from GPU-acceleration . . . . . . . . . . . 71Speedup gain from multicore CPU acceleration . . . . . . 725.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 735.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 746 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 756.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79viiList of TablesTable 2.1 Mean error and maximum distance (Hausdorff) of 3D segmen-tation in each multi-slice image are reported (in mm) for eachvertebral body. . . . . . . . . . . . . . . . . . . . . . . . . . . 12Table 2.2 Mean error and maximum distance (Hausdorff) of 2D segmen-tation in the mid-sagittal slice of each image are reported (inmm) for each vertebral body. . . . . . . . . . . . . . . . . . . 13Table 3.1 Localization errors (in mm) for using a single neural networkfor all coordinates vs. using a separate neural network for the zcoordinate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Table 3.2 Localization errors (in mm) for using a Canny edge detector forpoint selection vs. performing the analysis on the votes of allvoxels without any type of point selection. . . . . . . . . . . . 42Table 3.3 Localization errors (in mm) before and after using local voteaggregation to refine the predictions. . . . . . . . . . . . . . . 43Table 3.4 Localization results for different regions of the vertebral col-umn. Mean error and Standard Deviation (STD) are in mm. . . 46Table 3.5 Comparison of the localization results of our method and themethod presented in [15] which uses regression forests followedby a refinement using Hidden Markov Models. The same train-ing and test sets are used in evaluations of both methods. Meanerror and STD are in mm. . . . . . . . . . . . . . . . . . . . . 47viiiTable 3.6 Comparison of the localization results of our method and an-other method based on classification forests which is consideredas state-of-the-art for this dataset [16]. The same training andtest sets are used in evaluations of both methods. Mean errorand STD are in mm. . . . . . . . . . . . . . . . . . . . . . . . 48Table 4.1 Localization and identification results for lumbar vertebral bod-ies in nine volumetric Magnetic Resonance (MR) images. . . . 56Table 4.2 Mean error and maximum distance (Hausdorff) of segmentationerror (in mm) for each vertebral body. . . . . . . . . . . . . . . 58Table 5.1 Computation time and speedup of the GPU-accelerated pro-gram for different problem sizes. . . . . . . . . . . . . . . . . 72Table 5.2 Computation time and speedup of the multicore-accelerated pro-gram for different problem sizes. . . . . . . . . . . . . . . . . 73ixList of FiguresFigure 1.1 Left: Main parts of a typical vertebra, Right: The regions of thehuman vertebral column. (From http://www.compelvisuals.com,http://www.studyblue.com) . . . . . . . . . . . . . . . . . . . 2Figure 2.1 Flowchart of the semi-automatic framework for segmentationof vertebral bodies. . . . . . . . . . . . . . . . . . . . . . . . 7Figure 2.2 (a) Mid-sagittal slice of original image, (b) after intensity-correction,(c) after intensity correction and anisotropic diffusion. . . . . 10Figure 2.3 Edge extraction results in the mid-sagittal slice of three differ-ent volumes. . . . . . . . . . . . . . . . . . . . . . . . . . . 10Figure 2.4 Examples of segmentation results in three different subjects.White contours show our segmentation results, while manualsegmentation is shown with red contours. . . . . . . . . . . . 11Figure 2.5 Summery of the parameters of the semi-automatic frameworkfor segmentation of vertebral bodies. The values of the mainparameters of each step are shown in the boxes at the rightmostcolumn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14Figure 3.1 The mid-sagittal slice of several images from the dataset. Thedataset consists of 224 three-dimensional Computed Tomogra-phy (CT) scans of patients with varying types of pathologies.In many cases, severe image artifacts are caused by metal im-plants. Different regions of the spine are visible in differentimages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18xFigure 3.2 The value of each feature is the mean intensity over a three-dimensional cuboid displaced with respect to the reference voxelposition. From [10]. . . . . . . . . . . . . . . . . . . . . . . 20Figure 3.3 A 2D example of the integral image approach. Each pixel inthe integral image contains the sum of intensities above andto the left of itself, inclusive. Therefore, sum of intensitiesin rectangle D of the original image can be computed quicklyby two subtractions and one addition using the integral image:II(4)− II(3)− II(2) + II(1) = ∑ I(A∪ B∪C ∪D)−∑ I(A∪C)−∑ I(A∪B)+∑ I(A) = ∑ I(D). . . . . . . . . . . . . . . 20Figure 3.4 The vertebrae localization problem is parametrized as a regres-sion problem. The targets of the regression are the vectorsfrom the reference voxel to the center of each vertebral body(The orange arrow). The reference voxel is described by 500intensity-based features from the area around itself (Shown inred). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 3.5 The landmarks that are used for training and also evaluation arethe center of vertebral bodies. For the training set, we make arough guess of the location of the vertebrae that are not visiblein the image. Rigid registration of a simple model is used forthis purpose. . . . . . . . . . . . . . . . . . . . . . . . . . . 23Figure 3.6 The deep neural networks consists of six layers that are trainedusing layerwise pre-training and stochastic gradient descent.Hidden units are shown in orange while input and output layersare shown in blue. . . . . . . . . . . . . . . . . . . . . . . . 25Figure 3.7 Rectifier functions are used as the activation functions for hid-den units. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25Figure 3.8 Linear functions are used for the output layer activation. . . . 26Figure 3.9 A sample visualization of the cost function with respect to twoarbitrary parameters of the network. Stochastic Gradient De-scent (SGD) aims to converge to the global minimum while thecost function is not guaranteed to be convex. . . . . . . . . . 28xiFigure 3.10 Thousands of voxels of the image vote for the location of eachvertebra. The centroid of these votes are estimated using Ker-nel Density Estimation (KDE). Based on the votes, a vertebramay be localized inside or outside of the field of view of theimage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29Figure 3.11 Kernel density estimation of a sample set of points. The globalmaximum of this estimation is used as the centroid of the votes.In multimodal distributions like this sample, the estimated cen-troid from KDE may significantly differ from mean, median, ormode of the distribution. . . . . . . . . . . . . . . . . . . . . 30Figure 3.12 The effect of downsample rate on accuracy, precision, and iden-tification rate is plotted. Downsampling with the rate of 12 mmhas provided sufficient points for robust estimation. Furtherdecrease of the downsampling rate shows no more accuracygain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Figure 3.13 Increasing the maximum box size up to 32 mm improved theperformance. A larger value for this parameter causes it to goout of bound in several small images of the dataset. Going outof bound means that all votes are disregarded and no predictionis made. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Figure 3.14 Using Principal Component Analysis (PCA) whitening as anadditional preprocessing step caused overfitting. The cost func-tion is better minimized at the training time, but poor perfor-mance is observed on test data. The results are slightly betterin lumbar region where there has been more training examples. 35Figure 3.15 The accuracy and identification rates obtained from using dif-ferent numbers of hidden layers. Increasing the number of hid-den layers from 4 to 5 did not have a large effect on the finalresults. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Figure 3.16 The accuracy and identification rates obtained from using dif-ferent numbers of neurons in the first hidden layer. The systemshowed high sensitivity to this hyper-parameter. . . . . . . . . 37xiiFigure 3.17 The accuracy and identification rates obtained from using dif-ferent activation functions for hidden layers. Using RectifiedLinear Unit (RELU) was faster and also led to better results. . . 38Figure 3.18 Convergence of the deep neural network. SGD is used alongwith layerwise pre-training. Adding a layer in each step led tofurther minimization of the cost function. . . . . . . . . . . . 39Figure 3.19 The accuracy and identification rates obtained from using dif-ferent approaches for aggregating votes. . . . . . . . . . . . . 40Figure 3.20 Visual results of the local vote aggregation step are shown onthe mid-sagittal plane of three example images. The localiza-tion points before refinement are shown in red while the pointsafter refinement by local vote aggregation are shown in cyan.Refined points have a better representation of the spine curva-tures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Figure 3.21 Visual results of the shape+pose model refinement step areshown on the mid-sagittal plane of three example images. Thelocalization points before model refinement are shown in cyanwhile the points after model refinement are shown in yellow.The improvement is minimal and may not be worth the com-putational expense. . . . . . . . . . . . . . . . . . . . . . . . 44Figure 3.22 The edge map that is obtained by Canny edge detector in themid-sagittal plane of three example images. Image artifactsand unclear boundaries adversely affect the quality of the ex-tracted edge maps. . . . . . . . . . . . . . . . . . . . . . . . 45Figure 4.1 Flowchart of the automatic framework for segmentation of ver-tebral bodies. . . . . . . . . . . . . . . . . . . . . . . . . . . 51Figure 4.2 The value of each feature is the difference between the meanintensity over two cuboids displaced with respect to the refer-ence voxel position. From [10]. . . . . . . . . . . . . . . . . 53xiiiFigure 4.3 Refining localization points by replacing them with the centerof the closest large component, obtained from local threshold-ing. Left: Localized points before refinement. Middle: Re-finement using components obtained from local thresholding.Right: Localized points after refinement. . . . . . . . . . . . 53Figure 4.4 Localization and labeling results on the mid-sagittal slice ofbias-field corrected images. . . . . . . . . . . . . . . . . . . . 55Figure 4.5 Examples of segmentation results in the mid-sagittal slice offive different patients. Our segmentation results are shownwith white contours, while red contours show the manual seg-mentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Figure 4.6 Summery of the parameters of the automatic framework forsegmentation of vertebral bodies. The values of the main pa-rameters of each step are shown in the boxes at the rightmostcolumn. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59Figure 5.1 The proposed vectorization approach is shown on a 2D exam-ple. A feature box with size Si and displacement vector Di canbe computed for all pixels of an image by element-wise matrixaddition and subtraction on the integral image according to Si(Left), and then cropping the product matrix according to Di(Right). In the left figure, a number of element-wise matrixoperations (purple-orange-green+blue) on the integral imageresults in a matrix in which each element contains the sum ofboxes with size Si started from the corresponding element inthe original image. Out-of-bound parts of the matrices can behandled by zero-padding or cropping. . . . . . . . . . . . . . 64Figure 5.2 Computation time of the vectorized version and the sequentialversion of the algorithm on a sample image in MATLAB. . . . 65Figure 5.3 Computation time of the CUDA program for different blocksizes. M=7,000 and N=20,000. . . . . . . . . . . . . . . . . . 71Figure 5.4 Computation time of the multicore-accelerated program fordifferent numbers of processors. M=7,000 and N=20,000. . . 73xivGlossaryCT Computed TomographyMR Magnetic ResonancePACS Picture Archiving and Communication SystemSVM Support Vector MachineSVD Singular Value DecompositionPCA Principal Component AnalysisHMM Hidden Markov ModelRF Regression ForestGPU Graphics Processing UnitsSGD Stochastic Gradient DescentRELU Rectified Linear UnitHF Hessian-freeKDE Kernel Density EstimationSTD Standard DeviationCF Classification ForestCUDA Compute Unified Device ArchitecturexvCPU Central Processing UnitsTP True PositiveTN True NegativeFP False PositiveFN False NegativeROI Region Of InterestxviAcknowledgmentsFirst and foremost, I would like to thank my supervisor, Purang Abolmaesumi, forguiding me throughout this journey. He has always been a wonderful example forme both in research and professional behaviour.Special thanks to Robert Rohling, Abtin Rasoulian, and Weiqi Wang for theirinsightful feedback and sharing their knowledge and experience in the fields ofmedical image analysis and GPU programming.I also would like to thank all my colleagues and friends in the Robotics andControls Lab for providing a very friendly and pleasant working environment dur-ing my two-year stay.Finally, I would not be where I am today without the unconditional love andsupport from my parents, Tahereh and Hasan.xviiChapter 1Introduction1.1 Clinical BackgroundThe human spine (also referred to as vertebral column) normally consists of 33vertebrae. The upper 24 vertebrae are articulating and separated from each otherby intervertebral discs. The lower nine are fused in the sacrum and the coccyx. Thearticulating vertebrae are grouped into three regions: 1) seven cervical vertebraeof the neck, 2) twelve thoracic vertebrae of the middle back, and 3) five lumbarvertebrae of the lower back. The order of the regions and also the numbering ofeach region are from top to bottom. The number of vertebrae in each region areslightly variable in the population. Figure 1.1 illustrates the regions of the vertebralcolumn and their conventional numbering notation [71].The cervical region is located in the neck. It has generally smaller vertebraecompared to other regions. The shape of the first two cervical vertebrae (called theatlas and the axis) differs from the rest. These two vertebrae are used for rotatingthe neck. The thoracic vertebrae, on the other hand, are connected to the ribs andhave limited movement. This section forms the steady mid back of the human body.The lumbar vertebrae are generally larger than the vertebrae of other regions. Thissection bears most of the weight and also movements (bending, twisting, etc.) ofthe body. Lower back pain, one of the most common types of pain, usually occursin the lumbar region [62].A typical vertebra includes a vertebral body in the front and a vertebral arch1Figure 1.1: Left: Main parts of a typical vertebra, Right: The regions ofthe human vertebral column. (From http://www.compelvisuals.com,http://www.studyblue.com)in the back which encloses the vertebral foramen. One spinous process, two trans-verse processes, and a pair of laminae are some remarkable parts of the vertebralarch. Figure 1.1 shows the main parts of a typical vertebra.The spine is commonly viewed by X-rays, Computed Tomography (CT), andMagnetic Resonance (MR) imaging. X-rays and CT are more affordable and gener-ally provide better contrast of bony structures. However, their capability in viewingsoft tissues is very limited. On the other hand, MR depicts much better contrast ofsoft tissues like nerve roots and intervertebral discs [54]. In addition, unlike CT andX-rays, MR imaging does not expose the patient to ionizing radiations [12, 55].21.2 Thesis ObjectivesThe global objective of this thesis is to facilitate building of computer-aided sys-tems for spinal disease diagnosis, surgical planning, and post-operative assess-ments. To this end, we investigate and develop techniques for automatic vertebraedetection, localization, identification, and segmentation within a clinically accept-able time-frame.Localization and identification of the vertebrae and discs are essential steps inthe analysis of spine images. Radiologists report the diagnosis after detecting andidentifying the vertebrae present in the image [1]. In particular, lower back pain,one of the most common types of pain, can be caused by a wide range of spinediseases and pathologies (herniated disc, degenerative disc, and spondylolisthesisto name a few). Reliable computer-aided diagnosis systems can massively help thephysicians to find the cause of lower back pain. One of the core requirements forbuilding such systems is developing a reliable method for automated localizationand identification of vertebrae. Once the target vertebrae are localized, the systemwill be able to perform any subsequent diagnosis on them.Segmentation of vertebral structures in medical images of the spine can alsobenefit computer aided systems for measuring the deformations caused by differ-ent spinal pathologies and diseases. These measurements can be used for diagnosis,pre-operative planning and also post-operative assessments. For instance, scolio-sis is a spine pathology in which the patient’s spine is curved from side to side.This pathology can be diagnosed from the shapes and orientations of the vertebrae.Therefore, a reliable segmentation of vertebral structures are necessary. In addi-tion, automatic spine segmentation as well as vertebrae localization can benefit aPicture Archiving and Communication System (PACS) by allowing the system tostore more meaningful data than the raw images.1.3 Thesis StructureThe rest of this thesis is subdivided into five chapters as outlined below:Chapter 2 describes a semi-automatic method for segmenting vertebral bodiesin typical volumetric MR images of the spine, where the slice spacing is large.Using a graphical user interface, the user clicks on each vertebra that needs to be3segmented. The method takes advantage of the correlation between shape and poseof different vertebrae in the same patient by registering a statistical multi-vertebraemodel to the image.Chapter 3 presents an automatic method based on deep learning for simulta-neous localization and identification of vertebrae in general CT scans, where thefield-of-view is unknown and spine pathologies may be present.Chapter 4 combines the last two chapters to propose a fully-automatic approachfor localization, labeling, and segmentation of vertebral bodies in multi-slice MRimages. The need for user interaction in Chapter 2 is replaced with a modifiedversion of the automatic localization method presented in Chapter 3.Chapter 5 describes two main contributions for speeding up the methods pro-posed in previous chapters by vectorization and GPU acceleration. A vectorizationapproach for the feature extraction algorithm and a GPU-accelerated method forthe statistical model registration are proposed and compared against their sequen-tial versions.Chapter 6 concludes the thesis with a short summary followed by remarkingthe major contributions along with the direction of future work on the subject.4Chapter 2Semi-automatic Segmentation inMRI2.1 IntroductionSegmentation of vertebral structures in volumetric medical images of the spine isan essential step in the design of computer aided diagnosis systems for measuringthe deformations caused by different spinal pathologies such as osteoporosis, spinalstenosis, spondylolisthesis and scoliosis. When making decisions for diagnosis andtherapy of these pathologies, physicians often use MR images for the assessmentof soft spinal structures such as inter-vertebral discs and nerve roots. However,the established methods for segmentation of vertebral structures have been mainlydeveloped for CT. On the other hand, CT requires radiation exposure and does notdepict soft tissue as well as MR images. The availability of a fast and reliable spinesegmentation of MR images may eliminate the need for CT and benefit patient care.Segmentation of vertebral structures in MR images is challenging, mainly dueto poor contrast between bone surfaces and surrounding soft tissue. Another chal-lenge is relatively large inter-slice gap (more than 3 mm) in typical clinical MRimages compared to CT, which is normally sub-millimeter in recent generationof scanners. In addition, spatial variations in surrounding soft tissue contrast andmagnetic field inhomogeneities increase the complexity of the segmentation taskin MR images.5For spine segmentation in MR images, several methods have been proposedin the literature. Most of them are 2D approaches that are applied on manuallyidentified cross-sections which contain the target vertebrae [8, 12, 23, 60]. Penget al. [48] search for the best slice automatically and then perform a 2D segmen-tation approach. A smaller number of methods have been proposed for MR seg-mentation of vertebral structures in 3D. The method from Hoad et al. [22] has alabour-intensive initialization step and also a post-processing step for correctingthe segmentation. Sˇtern et al. [64, 65] perform the segmentation by optimizing theparameters of a geometrical model of vertebral bodies. The method from Neu-bert et al. [41] uses statistical shape models and registration of grey level intensityprofiles for vertebral body segmentation. Zukic et al. [78] use multiple-featureboundary classification and mesh inflation. Methods from Hoad et al. and Sˇtern etal. have long execution times (at least more than five minutes for high-resolutionvolumetric images). All above-stated 3D approaches segment each vertebra inde-pendently. Consequently, the correlation between shapes of different vertebrae ina patient is disregarded. In addition, the methods from Hoad et al., Sˇtern et al.,and Neubert et al. are evaluated on volumetric images with an inter-slice gap of atmost 1.2 mm which is not commonly used in clinical practice. Recently, a methodwas proposed by Kadoury et al. [28] for spine segmentation using manifold em-beddings. Multiple vertebrae are treated as a whole shape in this work. However,it is still evaluated on MR images with less than 1.2 mm slice spacing, and no com-putation time was reported.In a previous work of our research group [50], we have successfully segmentedlumbar vertebrae in volumetric CT scans using a statistical multi-vertebrae anatom-ical shape+pose model. In this thesis, we aim to find simple and fast pre-processingsteps to overcome the MR segmentation challenges explained above, and conse-quently obtain edge points of vertebral bodies from widely spaced slices of rou-tine volumetric MR images. Thereafter, we register our multi-vertebrae anatomicalmodel to the extracted edge points with the goal of achieving a fast and reliablesegmentation of lumbar vertebral bodies in MR images.6Intensity CorrectionUser InteractionSpecifying Regionof Interest (ROI)Anisotropic DiffusionEdge DetectionModel RegistrationFigure 2.1: Flowchart of the semi-automatic framework for segmentation ofvertebral bodies.2.2 MaterialsThe performance of the proposed method was evaluated on T1-weighted MR im-ages of nine patients (via SpineWeb [72]). The in-plane resolution is 0.5× 0.5mm2 with a slice spacing between 3.3 to 4.4 mm. Each series of images consists ofslices with 512×512 pixels. The number of slices from each patient ranges from12 to 18. The multi-vertebrae anatomical model was constructed from manuallysegmented volumetric CT scans of an independent group of 32 subjects [50, 52].Ground truth segmentations were obtained manually from raw images using ITK-SNAP [74].72.3 Methods2.3.1 Intensity CorrectionThe presence of intensity inhomogeneity in spinal MR images can adversely in-fluence the ability to extract edges. Hence, we first apply an intensity correctionalgorithm on MR images to reduce this inhomogeneity. The bias field correctionfunction provided in Statistical Parametric Mapping software package (SPM12b,Wellcome Department of Cognitive Neurology, London, UK) is used for this pur-pose. Figures 2.2(a) and 2.2(b), respectively, show the mid-sagittal slice of asample MR image before and after intensity correction.2.3.2 Anisotropic DiffusionWe then apply a conventional 3D anisotropic diffusion [49] filter to the intensity-corrected image. Anisotropic diffusion is an image smoothing algorithm that at-tempts to preserve edges. Figures 2.2(b) and 2.2(c), respectively, show the imagebefore and after this step. The function below (proposed by Perona and Malik [49])is used for the diffusion coefficient in order to privilege high-contrast edges overlow-contrast ones.c(x,y,z, t) = e−(||∇I||/κ)2, (2.1)∇I denotes the image gradient, and the conductance parameter, κ , controls thesensitivity to edges. Weak edges can block the conductance when κ is too low, andstrong edges may not be preserved when κ is too high. In all experiments κ andthe integration constant, ∆t, were set to 50 and 1/7, respectively. The variables x,y, and z represent the spatial coordinates, while the variable t enumerates iterationsteps. The algorithm was run with 20 iterations for each volumetric image.2.3.3 Canny Edge DetectionAt this step, the intensity-corrected mid-sagittal slice of the image is shown to theuser. The user clicks on each of the lumbar vertebrae (L1 to L5) in the image(five points in total). For some irregular image acquisitions where those vertebraeare not visible in the mid-sagittal slice, the user could select a slice manually to8be able to click on the vertebrae. However, we do not observe any sensitivity tothe slice selection. Only the clicked points are used for the rest of the procedure.After user interaction, a 2D Canny edge detection algorithm [7] is applied to eachsagittal slice to find the edge map of vertebral bodies in potential regions of thevolumetric image. These regions are five cubic bounding boxes centered at theclicked points. The scales of the cubes are obtained from the average distance ofconsecutive user clicks. The sensitivity threshold of the Canny edge detector isobtained automatically based on the maximum value of the gradient magnitude ofthe region. Figure 2.3 shows the result of edge detection in mid-sagittal slice ofthree different volumetric MR images. The statistical model is registered to theedge map obtained from this step.2.3.4 Segmentation Using the Multi-vertebrae ModelVariations of shapes and poses of lumbar vertebrae were independently extractedfrom previously acquired 32 manually-segmented volumetric CT images [50]. Thisanalysis was performed on all five lumbar vertebrae combined, taking into accountthe correlation between shapes of different vertebrae. The extracted variations arethen represented in a statistical model. An iterative expectation maximization reg-istration technique presented by Rasoulian et al. [50] is used for aligning the modelto the edge points extracted from MR images in the previous step. The structure ofthe model and the registration technique are described in more detail in [50].2.4 Results and DiscussionsOur segmentation results are quantitatively evaluated by using the manual segmen-tation as the ground truth. The mean error and Hausdorff distance are reported foreach vertebra in a volumetric image (Table 2.1). For each point of the 3D surfacemesh of manual segmentation we find its distance to the closest point in the reg-istered model. The average of these distances is considered as the mean error andthe maximum of them is considered as the Hausdorff distance. In order to makeour results comparable with other approaches that only focus on 2D segmenta-tion [8, 12, 23, 60], we also report the segmentation errors for the 2D mid-sagittalslice of each image in Table 2.2. Some examples of our segmentation results are9Figure 2.2: (a) Mid-sagittal slice of original image, (b) after intensity-correction, (c) after intensity correction and anisotropic diffusion.Figure 2.3: Edge extraction results in the mid-sagittal slice of three differentvolumes. 10Figure 2.4: Examples of segmentation results in three different subjects.White contours show our segmentation results, while manual segmen-tation is shown with red contours.shown in Figure 2.4. Although the slice spacing in our MR images is relatively high(between 3.3 and 4.4 mm), the quantitative results show that our method can seg-ment the lumbar vertebral bodies in MR images with a mean error of ∼ 3 mm andstandard deviation of ∼ 0.8 mm. The overall segmentation for each image takesless than 2 minutes using our implementation in MATLAB (The MathWorks, Inc.,Natick, MA) on a 2.5 GHz Intel core i5 machine.11Table 2.1: Mean error and maximum distance (Hausdorff) of 3D segmentation in each multi-slice image are reported(in mm) for each vertebral body.SubjectL1 L2 L3 L4 L5 All lumbar vertebraeMean Max Mean Max Mean Max Mean Max Mean Max Mean ± std Max1 2.6 6.5 2.2 6.7 2.1 5.9 2.2 7.7 2.6 8.3 2.3 ± 0.2 8.32 2.9 9.8 2.8 8.1 2.4 7.3 3.1 11.3 2.9 9.1 2.8 ± 0.3 11.33 2.8 7.1 2.1 5.7 2.0 6.5 2.8 6.0 3.0 8.5 2.5 ± 0.4 8.54 2.7 7.5 3.7 9.6 3.8 9.6 4.0 10.3 3.7 12.4 3.6 ± 0.5 12.45 2.6 7.4 2.4 8.5 2.5 8.8 2.6 9.3 3.3 11.1 2.7 ± 0.4 11.16 1.9 6.4 2.3 7.0 2.9 10.2 3.2 12.3 3.0 12.1 2.7 ± 0.5 12.37 3.6 11.6 4.1 12.0 3.7 11.1 4.4 11.4 5.1 13.2 4.2 ± 0.6 13.28 2.5 6.7 3.3 11.7 5.9 16.3 4.0 12.8 2.6 7.2 3.7 ± 1.4 16.39 2.0 6.3 2.3 6.6 3.0 9.5 3.3 11.7 2.7 8.9 2.7 ± 0.5 11.7Average 2.6 7.7 2.8 8.4 3.1 9.4 3.3 10.3 3.2 10.1 3.0 ± 0.8 11.712Table 2.2: Mean error and maximum distance (Hausdorff) of 2D segmentation in the mid-sagittal slice of each imageare reported (in mm) for each vertebral body.SubjectL1 L2 L3 L4 L5 All lumbar vertebraeMean Max Mean Max Mean Max Mean Max Mean Max Mean ± std Max1 1.6 3.7 1.6 4.2 1.4 2.9 1.6 2.8 1.3 2.5 1.5 ± 0.1 4.22 1.6 3.6 2.1 4.3 1.7 3.5 2.3 4.2 2.0 4.1 2.0 ± 0.3 4.33 1.4 2.5 1.4 2.4 1.6 2.6 2.1 4.4 3.0 5.8 1.9 ± 0.7 5.84 1.7 4.4 1.8 4.3 1.7 4.6 2.1 4.8 2.3 5.3 1.9 ± 0.3 5.35 2.2 5.4 2.4 5.3 2.3 7.2 2.7 9.5 2.8 7.1 2.5 ± 0.3 9.56 1.6 4.0 1.5 2.6 1.5 3.6 1.8 5.0 1.7 3.6 1.6 ± 0.1 5.07 2.1 4.1 1.7 4.6 1.6 3.6 1.9 4.7 2.5 7.1 2.0 ± 0.3 7.18 1.9 4.9 2.3 5.7 1.9 5.2 2.8 7.4 2.2 5.1 2.2 ± 0.4 7.49 1.9 4.5 1.5 2.7 1.5 3.1 2.5 5.5 1.5 2.7 1.8 ± 0.4 5.5Average 1.8 4.1 1.8 4.0 1.7 4.0 2.2 5.4 2.2 4.8 1.9 ± 0.4 6.013Intensity CorrectionUser Interaction - One click per vertebraSpecifying Regionof Interest (ROI)- ROI size= mean(D)+3std(D),where D = distances between con-secutive clicksAnisotropic Diffusion- Conductance κ = 50- Integration constant ∆t = 1/7- Number of iterations n = 20Edge Detection - Algorithm: CannyModel Registration- Only vertebral body points- Number of vertebrae = 5- Number of registered points = 104- Oulier rate = 0.01Figure 2.5: Summery of the parameters of the semi-automatic framework forsegmentation of vertebral bodies. The values of the main parameters ofeach step are shown in the boxes at the rightmost column.142.5 SummaryA semi-automatic segmentation algorithm based on registering a statistical modelis applied on nine 3D MR images of the spine. Adding simple MR pre-processingsteps allows the multi-vertebrae shape+pose model developed previously for CTscans to work on volumetric MR images. The statistical model can accommodatelarge inter-slice gaps (about 4 mm). In addition, it is fast because it exploits thefact that neighbouring vertebrae have similar shapes and poses. The segmentationerror will determine the range of clinical applications of this method.15Chapter 3Deep Learning for AutomaticVertebrae Localization in CT3.1 IntroductionAutomatic vertebra localization and identification (sometimes referred to as label-ing) in spinal imaging is a crucial component for image-guided diagnosis, surgicalplanning, and follow-up assessment of spine disorders such as disc/vertebra degen-eration, vertebral fractures, scoliosis, and spinal stenosis.The main challenges associated with building an automated system for robustlocalization and identification of vertebrae in spine images arise from: 1) restric-tions in field of view; 2) repetitive nature of spinal column; 3) high inter-subjectvariability in spine curvature and shape due to spine disorders and pathologies; and4) image artifacts caused by metal implants.Several methods have been proposed in the literature for automatic vertebralocalization and labeling in CT [15, 16, 20, 32, 36, 39, 53], MR [1, 44, 48, 58],and X-ray [63, 75] images. Most of the previous works either concentrate on spe-cific region or make an assumption about the visible part of vertebral column inthe image. A few studies claimed handling arbitrary-field-of-view scans in a fully-automatic system [15, 16, 32, 53]. The methods proposed in [15] and [53] relyon a generative model of shape and/or appearance of vertebrae. This may causethese methods to struggle with pathological subjects, especially when metal im-16plants produce artifacts in the image. The computational cost for handling generalscans is high in [32] and [53]. The recently proposed method in [16] has led tostate-of-the-art results in the dataset we are using in this work. The key idea inthat work is that the points of the image are probabilistically classified as being aparticular vertebra centroid. Based on predicted probabilities, the centroid of thesepoints are obtained using mean-shift. Although promising results are obtained ona challenging dataset, their method requires an additional post-processing step forremoving false positives. This step adds up to the computation time and may not berobust when the spine images are not tightly cropped. In addition, their approachfor centroid estimation approach based on mean-shift requires certain parametersand thresholds which may require manual tuning. Most importantly, the fastest ofthe above-stated methods [16] have a computation time of about a minute. Thiscomputation time limits the application of these methods to be used as preprocess-ing steps for wide range of image-guided tasks in clinical basis.In this work, we aim to find a faster and more robust solution to the problem ofvertebra localization in general clinical CT scans by using deep neural networks [3,57]. No assumptions are made about which and how many vertebrae are visible inimages. Our method is extensively evaluated on a public dataset, which consistsof 224 three-dimensional CT scans, and is compared to recently proposed state-of-the-art approaches.3.2 MaterialsThe performance of our method is evaluated on a publicly-available large datasetconsists of 224 spine-focused three-dimensional CT scans of patients with varyingtypes of pathologies. The pathologies include fractures, high-grade scoliosis, andkyphosis. Many cases are post-operative scans where severe image artifacts arecaused by surgical implants. Different sections of spine are visible in differentimages. The field-of-view is mostly limited to 5-15 vertebrae while the wholespine is visible in a few cases. It is the same dataset used in [16]. Figure 3.1illustrates several cases of this challenging dataset.Statistical multi-vertebrae shape+pose model mentioned in Section 3.5 wasconstructed from manually segmented three-dimensional segmented CT images of17Figure 3.1: The mid-sagittal slice of several images from the dataset. Thedataset consists of 224 three-dimensional CT scans of patients withvarying types of pathologies. In many cases, severe image artifacts arecaused by metal implants. Different regions of the spine are visible indifferent images.an independent group of 32 subjects which were mostly non-pathological [50]. Themanual segmentation is done using ITK-SNAP [74].183.3 Simultaneous Localization and Labeling3.3.1 Intensity-based FeaturesHundreds of intensity-based features are extracted from each voxel of the volumet-ric image. The value of each feature is the mean intensity over a three-dimensionalcuboid displaced with respect to the reference voxel position. The cuboid dimen-sions and the displacement of each feature are chosen randomly. For the referencevoxel p, the feature vector v(p) = (v1, ...,v j, ...,vn) is computed as follows:v j =1|Fp; j|∑q∈Fp; jI(q), (3.1)where I(q) is the image intensity at position q in the image, and q ∈ Fp; j are theimage voxels within the cuboid. Fp; j denotes the feature cuboid i displaced inrespect to pixel p. Figure 3.2 shows a visualization of our intensity-based features.Similar types of featuers are used in [9, 10, 14, 61].Extracting mean intensity over cuboidal regions can be computed very quicklyby using the idea of the integral image (introduced in [69]). The integral image isan intermediate representation for the original image. By definitionii(x,y,z) = ∑x′≤x,y′≤y,z′≤zi(x′,y′,z′), (3.2)where i(x,y,z) is the original image and ii(x,y,z) is the integral image. A 2D rep-resentation of the integral image is shown in Figure 3.3. Each pixel in this in-tegral image contains the sum of intensities above and to the left of itself, inclu-sive. For computing the sum of intensities in the box D, we only need to computeII4− II2− II3 + II1 which is only three operations (in 3D it will be seven opera-tions). The integral image can be constructed recursively from the original imageusing a few operations per voxel. Once constructed, any of our cuboidal features(regardless of location and size) can be computed in constant time.Extracting hundreds of above-mentioned features from an image voxel pro-vides a meaningful description of the area around the reference voxel. This de-scription is then used to train a learning system for vertebra localization.19Figure 3.2: The value of each feature is the mean intensity over a three-dimensional cuboid displaced with respect to the reference voxel po-sition. From [10].Figure 3.3: A 2D example of the integral image approach. Each pixel inthe integral image contains the sum of intensities above and to the leftof itself, inclusive. Therefore, sum of intensities in rectangle D ofthe original image can be computed quickly by two subtractions andone addition using the integral image: II(4)− II(3)− II(2)+ II(1) =∑ I(A∪B∪C∪D)−∑ I(A∪C)−∑ I(A∪B)+∑ I(A) = ∑ I(D).20Before training, we randomly generate the parameters of hundreds of randomfeatures. Then the same parameters are used in all steps of training and testing.Generating feature parameters involves randomly choosing the values of cuboiddimensions vector S and displacement vector D, where 0 < Si ≤ max size and 0≤Di ≤ max displace for i ∈ {1,2,3}. For the experiments of this chapter, we usedmax size= 32 mm, max displace= 12 mm to extract 500 features from each voxel.3.3.2 Parametrizing Localization Problem As a RegressionVertebra localization problem is parametrized as a multi-variate non-linear regres-sion. As explained above, hundreds of the cuboidal intensity-based features areextracted from each voxel. The targets of the regression are the relative distancebetween the center of each vertebral body and the reference voxel. In other words,the vector from the reference voxel to the center of the vertebral body is consideredas one target in the regression.The number of observations (samples) in our regression problem is equal to thenumber of voxels that are present in the image. The number of features are equalto 500 for the experiments of this chapter. Since the images of our CT dataset arelabeled with 26 landmarks (26 vertebral bodies), the target vector of our regressionincludes 26 three-dimensional vectors per observation. Therefore, the vertebra lo-calization problem is parametrized as multi-variate regression with 500 featuresand 26×3 = 78 targets.For the vertebrae that are not present in an image in training set, we make arough guess of the location of them outside of the field of view of the image. Thisrough guess is made by rigid registration of simple model to the last three presentlandmarks in the image. The model is build by averaging the position of the samevertebrae from all images of the dataset when they are rigidly registered together.Figure 3.5 shows a rough guess of the location of the non-visible cervical vertebralbodies in an image in training set.3.3.3 Deep Neural Networks for RegressionFor decades, shallow neural networks (with zero or one hidden layers) have beenused for machine learning. In 1990s, introduction of newer machine learning mod-21Figure 3.4: The vertebrae localization problem is parametrized as a regres-sion problem. The targets of the regression are the vectors from thereference voxel to the center of each vertebral body (The orange arrow).The reference voxel is described by 500 intensity-based features fromthe area around itself (Shown in red).els such as Support Vector Machine (SVM) and Random Forests caused loss ofpopularity for neural networks among researchers. However, Since 2006 sometechniques have been developed that enables effective training of deep neural net-work (with multiple hidden layers). Recent research shows that these deep net-works can outperform other learning models in several applications. During lastfew years, state-of-the-art results have been produced by applying deep learning tovarious tasks in computer vision [33, 67].Network structureIn the experiments of this chapter, a deep feed-forward neural network with sixlayers is used for solving the multi-variate non-linear regression problem. Thestructure of our neural network is demonstrated in Figure 3.6. The intensity-based22Figure 3.5: The landmarks that are used for training and also evaluation arethe center of vertebral bodies. For the training set, we make a roughguess of the location of the vertebrae that are not visible in the image.Rigid registration of a simple model is used for this purpose.23features are given to the network via the first layer (input layer). Then, layersare activated consecutively from left to right. After activation of all units in thenetwork, the outputs will be present in the last layer (output layer). Any layerbetween input layer and output layer is called a hidden layer. Training a neuralnetwork involves using training data to assign values to the connection weights be-tween layers. Once the weights are determined, units of each layer can be activatedfor test data as follows:a(l+1)i = g(Nl∑j=0Θ(l)i j alj), (3.3)where a(l)i denotes the activation of unit i in layer l, and Θ(l)i j is the weight ofconnection from unit j of layer l to unit i of layer l + 1. g(x) is the activationfunction. We have used Rectified Linear Unit, ghidden(x) = max(0,x), for hiddenlayers and linear function, gout put(x) = x, for output layer. A visualization of thesefunctions is brought in Figures 3.7 and 3.8.Cost functionFor training the network, we define a cost function as a measure of how far awayour solution is from the optimal solution. Our cost function is defined as:J(Θ) =−1m[m∑i=1K∑k=1y(i)k loghΘ(x(i))k +(1− y(i)k ) log(1−hΘ(x(i))k)], (3.4)where hΘ(x(i))k is the hypothesis for the kth output using the input x from sample iof the training data. y(i)k denotes the ground truth for kth output for the ith sample. Kis the number of outputs (equal to 78 in experiments of this chapter). m is the num-ber of training samples in the mini-batch that we are computing the cost function.In particular, the cost function is a measure that how far each hypothesis hΘ(x(i))is from its corresponding ground truth y(i)k . The above-stated function is preferredover possible trivial cost functions like− 1m[∑mi=1∑Kk=1(hΘ(x(i))k− y(i)k )2], becauseit is known to be more convex and consequently less prone to falling into localminima during the optimization process. However, the cost function is still notguaranteed to be a pure convex function. Figure 3.9 illustrates a sample of costfunction with respect of two parameters (two connection weights of the network).24Figure 3.6: The deep neural networks consists of six layers that are trainedusing layerwise pre-training and stochastic gradient descent. Hiddenunits are shown in orange while input and output layers are shown inblue.−1 −0.5 0.5 1−1−0.50.51xyFigure 3.7: Rectifier functions are used as the activation functions for hiddenunits.25−1 −0.5 0.5 1−1−0.50.51xyFigure 3.8: Linear functions are used for the output layer activation.A typical approach for training neural networks is to iteratively use back-propagation algorithm [34, 70] to compute partial derivatives of cost function withrespect to each connection weight of all layers, δδΘ(l)i jJ(Θ), and then use StochasticGradient Descent (SGD) to optimize the cost function by changing the connec-tion weights. Nevertheless, this approach may lose effectiveness when we havea deep network with two or more hidden layers. The main reason is the signifi-cant propagation of error in computing partial derivatives using conventional back-propagation over the whole network.Layerwise pre-trainingIn recent years, it has been shown that deep neural networks can be trained muchmore effectively using layerwise pre-training [4, 17]. This method involves break-ing down the deep network into several shallow networks and training each of themgreedily.Assume that we have K hidden layers (K = 4, in the experiments of this chap-ter). First, we disregard all hidden layers except the first one. It means that webuild a network including the input layer, first hidden layer, and the output layer.We train this shallow network using conventional back-propagation and stochastic26gradient descent. Once the weights of the first hidden layer are trained, we add thenext layer. Therefore, second network is built using the input layer, first hiddenlayer, second hidden layer, and the output layer. We train the weights of the secondhidden layer re-using the trained weights of the previous layer, and so forth. Wecontinue this approach until all hidden layers 1 to K− 1 are trained. The outputlayer is present in all steps because we need to compare it against the ground truthand compute the cost function. Once the hidden layers 1 to K− 1 are trained, weuse back-propagation and SGD over the whole network to determine the weights ofthe last hidden layer as well as fine-tuning the pre-trained weights of the previouslayers.Second neural network for the z coordinates of outputsAs mentioned earlier, the outputs of our network are the relative distance vectorbetween the center of vertebral bodies and the the reference voxel. As we have26 landmarks, 26 out of all 26× 3 = 78 outputs are the z coordinates of thesedistance vectors. Due to asymmetric structure of the spine (tall and thin structure)the range of the Z outputs is much wider than the range of x and y outputs. TheZ outputs vary in range [−800,800] mm which is the order of the spine height,whereas the x and y outputs vary in rage [−200,200] mm which is the order ofmaximum width of the CT images of spine. However, all these outputs are scaledlinearly to the same range [−1,1]. This means that the network will emphasizemore on optimizing x and y outputs than the z outputs, while the z outputs are moreimportant to us for vertebra identification (adjacent vertebrae have similar x and ycoordinates, but different z coordinate). To alleviate this issue, we train a separatefeed-forward neural network for only z outputs. The structure of this additionalnetwork is similar to the main network except that it has only 26 units in its outputlayer.27Figure 3.9: A sample visualization of the cost function with respect to twoarbitrary parameters of the network. SGD aims to converge to the globalminimum while the cost function is not guaranteed to be convex.3.3.4 Kernel Density Estimation for Vote AggregationProcedure on a test imageOn a 3D test image, we first extract hundreds of features from each voxel. Then, weuse the deep neural network to predict the relative distance vector of each vertebralbody with respect to the reference voxel. Knowing the location of the referencevoxel and the relative distance vector to the center of a specific vertebral body,we can compute the predicted absolute location of the vertebral body on the testimage. This absolute location is considered the vote of that specific voxel for thelocation of that specific vertebral body. Note that these votes might be either insideor outside of the scope of image. Each voxel in the image votes for the location ofeach vertebral body, so for each specific vertebral body (say L1) we end up withthousands of votes acquired from different voxels. We need to aggregate thesevotes to obtain a single prediction for the location of the vertebral body. Mean,median, or histogram mode of the votes can be used for aggregation, but a morerobust and accurate method is using Kernel Density Estimation (KDE).28Figure 3.10: Thousands of voxels of the image vote for the location of eachvertebra. The centroid of these votes are estimated using KDE. Basedon the votes, a vertebra may be localized inside or outside of the fieldof view of the image.Kernel density estimationKernel Density Estimation (KDE) is a non-parametric approach to estimate theprobability density function of a random variable. The simplest, most familiarnon-parametric density estimator is a histogram. However, there are several is-sues with the histograms to be used in certain applications: 1) Histograms are notsmooth; 2) They depend on end points of bins; 3) They depend on the width of bins.On the other hand, kernel density estimators are smooth, and have no end points.Figure 3.11 shows an example of a Kernel density estimation versus a histogram.Kernel density estimators still depend on a parameter called bandwidth. A den-sity estimation is said to be undersmoothed when we choose a bandwidth that istoo small. Conversely, an overly large value for the bandwidth will cause over-smoothing. An automatic data-driven bandwidth selection approach as well as anadaptive kernel density estimator based on linear diffusion processes is presented2910 20 30 40 50 60 700.000.050.100.150.20xDensityFigure 3.11: Kernel density estimation of a sample set of points. The globalmaximum of this estimation is used as the centroid of the votes. Inmultimodal distributions like this sample, the estimated centroid fromKDE may significantly differ from mean, median, or mode of the dis-tribution.by Botev et al. in [6]. Their approach is less sensitive to outliers than the popularGaussian kernel density estimator because it is locally adaptive. It also has shownbetter performance in terms of speed and reliability than most of other adaptive ap-proaches [19, 26, 47, 68]. The advantage of their automatic data-driven bandwidthselection over the other most popular approaches [25, 59] is that it does not usereference rules [27] and consequently does not make assumptions about normalityof the input data.In our experiments, we used the kernel density estimation method presented byBotev et al. to obtain a fast and reliable density function for the voxel votes foreach vertebral body. The global maximum of this density function is considered asthe predicted location of the vertebral body on the image.303.4 Hyper-Parameters OptimizationOne of the main challenges of building a deep learning system is the presence ofplenty of hyper-parameters which need to be adjusted. Since the training time fordeep networks is usually high (several hours to several weeks), extensive evaluationon all parameters can be impractical. Consequently, the learning system may endup being a sub-optimal solution. In this section, we briefly discuss the effect ofsome of the most important parameters in each phase of our deep learning systemin the context of vertebra localization and identification.3.4.1 Parameters in Feature ExtractionDownsample rateWhen we described the method in Section 3.3, we explained how we extract fea-tures from all voxels of the image. While it is possible to do this for all voxels,it would need a training time in the order of weeks, and testing time in the orderof a few minutes. However, we desire to decrease the testing time of the deepnetwork to less than a second. In our experiments, we extract these features fromall over the 3D image with a downsample rate of 12 mm in each direction. Notethat it is different from downsampling the image with this rate and then extractingfeatures from all voxels. In the experiments of this chapter, we first downsamplethe images with the rate of 1 mm in each direction. Then, we extract high-qualityfeatures from the voxels that have the distance of 12 mm to each other in each di-rection. The number of these voxels is still over a thousand on average for eachimage. This approach brings down the training time from weeks to two days, andthe testing time from a few minutes to less than a second. The feature extractionimplementation is explained in more detail in Chapter 5. Figure 3.12 shows thatsmaller downsample rates do not improve the localization results. In other words,it shows that the number of voxels obtained from downsampling rate of 12 mm aresufficient for the localization framework.316 8 10 12 14 16 18 20 22020406080100Downsample rate (mm)%AccuracyPrecisionIdentificationFigure 3.12: The effect of downsample rate on accuracy, precision, and iden-tification rate is plotted. Downsampling with the rate of 12 mm hasprovided sufficient points for robust estimation. Further decrease ofthe downsampling rate shows no more accuracy gain.Size and displacement of feature boxesAs explained in Section 3.3, each feature is a 3D box with a displacement fromthe reference voxel. Each dimension of the box is randomly chosen from the inter-val [1,max size]. Each dimension of the displacement vector is randomly chosenfrom the interval [−max displace,max displace]. The parameters max size andmax displace need to be large enough to provide meaningful description of thearea around the reference voxel. On the other hand, the value of the feature willbe undefined when the distance of the voxel to the edge of the image is less thanmax displace+max size/2. We disregard the votes of these voxels that have somemeaningless feature values. Hence, too large values for these parameters may causemeaningless feature values for significant number of voxels. In our experiments,we used max displace = 12 mm, and max size = 32 mm. The values are obtainedexperimentally. Figure 3.13 shows the effect of maximum box size on the perfor-3210 15 20 25 3020406080100Maximum box size%AccuracyPrecisionIdentificationFigure 3.13: Increasing the maximum box size up to 32 mm improved theperformance. A larger value for this parameter causes it to go outof bound in several small images of the dataset. Going out of boundmeans that all votes are disregarded and no prediction is made.mance of the system.PCA whiteningPrincipal Component Analysis (PCA) is a dimensionality reduction algorithm thataims to convert a set of possibly correlated variables into a set of linearly uncor-related variables [24]. In machine learning, PCA is widely used for enhancingunsupervised feature learning by reducing the dimension and also removing thecorrelation between features [2, 30, 56]. There is a closely related step calledwhitening that aims to reduce redundancy of the input data. More formally, PCAuses Singular Value Decomposition (SVD) of the data and keeps the most signifi-cant singular vectors to project the data to a lower dimensional space. When we usePCA whitening the principal components are divided by the singular values to en-sure uncorrelated outputs with unit variance. It has been shown that whitening canimprove the accuracy of the downstream estimator in several applications [35, 42].33In our experiments, using PCA whitening for removing correlation and redun-dancy among the features caused better convergence of the training (better mini-mization of the cost function). However, on test data it showed poor performance.This observation may be a sign of overfitting. In machine learning, overfitting oc-curs when a function is too closely fit to a limited set of data points. It means thatthe model is describing the noise and random error of the training set instead ofthe underlying relationship that is expected to be seen in the test data. Figure 3.14demonstrates the performance of the deep neural network for different regions ofspine when using whitening as a preprocessing step. The results show slightly bet-ter identification rate for lumbar region (where we have more training examples),but poor performance in other sections.Deep neural networks are prone to overfitting because of their complex struc-ture and their high number of parameters. However, several methods such as reg-ularization and dropouts, are proposed to alleviate the overfitting issue in deeplearning [11, 21, 73]. A future work may involve trying to improve the localizationresults by using PCA whitening along with regularization or dropouts techniqueswhich avoid overfitting.34All Cervical Thoracic Lumbar020406080100394537412772844Identificationrate(%)Raw featuresWhitenedFigure 3.14: Using PCA whitening as an additional preprocessing step causedoverfitting. The cost function is better minimized at the training time,but poor performance is observed on test data. The results are slightlybetter in lumbar region where there has been more training examples.3.4.2 Parameters in Training Deep Neural NetworkNeural network structureOne of the main challenges of training a deep neural network is to optimize thenumber of hidden layers, and the number of hidden units in each layer. More layersand more hidden units increase the ability of the network to estimate more complexrelationships in the data, while make the network more prone to overfitting. Ournetwork structure (shown in Figure 3.6) is obtained experimentally. Because ofthe long training time (about two days in our experiments) sweeping through allparameters was not feasible. Other than trial and error, no reliable shortcuts werefound in the literature for this purpose. Informal tutorials suggest that one shouldstart with one hidden layer with a number of units in the order of the number353 4 5020406080100Number of layers%AccuracyIdentificationFigure 3.15: The accuracy and identification rates obtained from using differ-ent numbers of hidden layers. Increasing the number of hidden layersfrom 4 to 5 did not have a large effect on the final results.of inputs, and then use trial and error to optimize these parameters. Figure 3.15and 3.16 illustrate the effect of number of layers and neurons per layer in ourexperiments, respectively.Activation functionLogistic sigmoid, and hyperbolic tangent were widely used as the activation func-tion of the hidden units in traditional neural networks [34]. In recent years, it hasbeen shown that rectifier functions can perform better than conventional functionsin terms of estimation accuracy and the speed of convergence [11, 40, 76]. A rec-tifier function is simply defined as f (x) = max(0,x) where x is the input of thehidden unit. A neuron that uses this function for activation is called a rectifiedlinear unit. training a neural network of rectified linear units is computationallyefficient because of: 1) the simplicity of the function 2) the sparse activation (in arandomly initialized network about half of the hidden units have zero output). It isargued in [18] that this sparsity also makes these units more biologically plausible.36150 200 250 300020406080100Number of units in first hidden layer%AccuracyIdentificationFigure 3.16: The accuracy and identification rates obtained from using differ-ent numbers of neurons in the first hidden layer. The system showedhigh sensitivity to this hyper-parameter.Using rectified linear units in our deep neural networks led to higher accuracyand also faster convergence. Figures 3.17 shows a comparison of the activationsfunctions in the context of experiments of this chapter.Optimization methodStochastic Gradient Descent (SGD) [31] is widely used for optimizing the costfunction in training neural networks. In our experiments, cost function is definedas in Equation 3.4 for each sub-network that are trained greedily (explained in Sec-tion 3.3). The gradient of cost function with respect to each connection weight iscomputed using backpropagation algorithm [70], and SGD is used for optimization.In early experiments, when not using layerwise pre-training, Hessian-free (HF)optimization [37] performed significantly better than SGD for training the wholenetwork altogether on a small subset of data. However, this approach was too time-consuming and consequently impractical for training on a large training set. On the37ReLU Tanh Sigmoid0204060801008581 833631 31%AccuracyIdentificationFigure 3.17: The accuracy and identification rates obtained from using dif-ferent activation functions for hidden layers. Using RELU was fasterand also led to better results.other hand, SGD led to desirable convergence along with layerwise pre-training onlarger datasets.Stochastic gradient descent has several parameters that need to be tuned for bestconvergence. Learning rate is the step size when changing parameters. Momentumis used to increase speed when the surface of cost function is highly non-spherical.Batch size is the size of the subset of training data that used to optimize the costfunction at each iteration. Above-stated hyper-parameters are obtained empirically.A typical approach is to plot the cost function versus SGD iterations, and choosethe parameters that lead to rough consistency in decreasing the cost function overtime. Figure 3.18 plots the convergence of SGD over time for each steps of the deepnetwork training.380 5 10 15 20 252468101214time(h)J(Θ)Hidden layer 1Hidden layer 2Hidden layer 3All layersFigure 3.18: Convergence of the deep neural network. SGD is used alongwith layerwise pre-training. Adding a layer in each step led to furtherminimization of the cost function.3.4.3 Parameters in Aggregating VotesOnce we have the votes of thousands of voxels of the image for a location of aspecific vertebra, we aggregate them using the Kernel Density Estimation (KDE)approach presented in [6] to obtain a single prediction for the location of the ver-tebra. Figure 3.19 demonstrate the importance of vote aggregation by comparingthe accuracy of the KDE approach against the basic measures of central tendencylike mean, trim-mean, and median. Trim-mean means that we remove first and last20% of data and then get the mean over the rest.39KDE Mean Trim-mean Median020406080100 968892 93452334 36%Accuracy IdentificationFigure 3.19: The accuracy and identification rates obtained from using dif-ferent approaches for aggregating votes.3.5 Results and DiscussionTwo-fold cross-validation is performed on two non-overlapping sets of volumetricCT scans with 112 images each. The results on data from fold 1 were obtained aftertraining the deep neural network on fold 2, and vice versa. The folds are selectedexactly as in [16] to enable comparing the results.After aggregating the votes of the voxels, we may conclude that a specificvertebra is outside of the scope of the image. If expert annotation (ground truth)confirms that the vertebra is not visible in the image, we consider it as a TrueNegative (TN). Otherwise, if the image contains the vertebra it will be a FalseNegative (FN). Similarly, if our system concludes that the vertebra in inside thescope of the image and the expert annotations confirm, it is considered as a TruePositive (TP). Otherwise, it will be a False Positive (FP). Based on these observa-tions the predictions are evaluated in terms of accuracy, precision, sensitivity, and40specificity which are defined as follows:Accuracy =T P+T NT P+FP+T N +FN, (3.5)Precision =T PT P+FP, (3.6)Sensitivity =T PT P+FN, (3.7)Speci f icity =T NT N +FP. (3.8)Localization error is defined as Euclidean distance between the estimated centroidof a vertebral body and its expert annotation. A vertebra is defined as correctlyidentified if the localization error for that vertebra is less than 2 cm, and the closestexpert annotation to its predicted vertebra location is the correct one. The sameevaluation criteria are used in [16] and [15].3.5.1 Separate Network for Z CoordinateIn the coordinate system of our experiments, the z coordinate is the direction of thelength of vertebral column. In a single image, centroid of adjacent vertebrae havealmost similar x and y, but different z coordinate. Therefore, the range for the z co-ordinate of the distance vectors (which are the targets of the deep neural network)is much larger than x and y coordinates. On the other hand, for an efficient opti-mization, we need to scale all the targets of the neural network to the same range.This will cause the optimization approach to concentrate more on minimizing theerror in x and y coordinates whereas the z coordinate is more important for vertebraidentification. Hence, we added another deep neural network for predicting onlythe z coordinate of the targets. The effect of adding this additional neural networkis summarized in Table 3.1.3.5.2 Point Selection Using Canny Edge DetectorIn our method, each voxel of the image equally contribute to the predicted locationof a certain vertebra. However, some images have some dark area that no infor-mation about the vertebrae can be obtained from the area around them. Presenceof these points in an image may reduce the accuracy of the method. Hence, we41Table 3.1: Localization errors (in mm) for using a single neural network forall coordinates vs. using a separate neural network for the z coordinate.RegionSingle neural network Separate network for z coord.Mean Std Id. rates Mean Std Id. ratesAll 23.0 12.6 37.1% 20.7 11.9 44.8%Cervical 22.6 11.2 26.7% 17.9 9.3 41.4%Thoracic 22.9 13.2 37.3% 20.8 12.5 45.3%Lumbar 23.4 13.0 46.3% 23.2 12.3 44.9%Table 3.2: Localization errors (in mm) for using a Canny edge detector forpoint selection vs. performing the analysis on the votes of all voxelswithout any type of point selection.RegionWith point selection Without point selectionMean Std Id. rates Mean Std Id. ratesAll 20.7 11.9 44.8% 21.5 12.5 40.1%Cervical 17.9 9.3 41.4% 19.3 9.9 35.0 %Thoracic 20.8 12.5 45.3% 20.8 12.9 45.1 %Lumbar 23.2 12.3 44.9% 23.8 12.4 41.6 %attempt to alleviate this issue by only using the voxels that are close to detectededges. Only the votes of these voxels are aggregated for predicting the locationof vertebrae. Table 3.2 summarizes the capability of this point selection step inimproving the localization results. We expect this step to be more effective whenthe images are not tightly cropped.3.5.3 Refinement by Local Vote AggregationOnce we have the predictions for each vertebra in the image, we refine the predic-tions by aggregating only the votes which are close to the predicted location. Foreach visible (according to the prediction) vertebra in the image, the points arounditself and its adjacent vertebra are aggregated using Kernel Density Estimation.The previous predicted point is then replaced by the point that is obtained from42Table 3.3: Localization errors (in mm) before and after using local vote ag-gregation to refine the predictions.RegionBefore Refinement After RefinementMean Std Id. rates Mean Std Id. ratesAll 20.7 11.9 44.8% 18.2 11.4 54%Cervical 17.9 9.3 41.4% 17.1 8.7 43.1%Thoracic 20.8 12.5 45.3% 17.2 11.8 55.3%Lumbar 23.2 12.3 44.9% 20.3 12.2 56.9%Figure 3.20: Visual results of the local vote aggregation step are shown onthe mid-sagittal plane of three example images. The localization pointsbefore refinement are shown in red while the points after refinement bylocal vote aggregation are shown in cyan. Refined points have a betterrepresentation of the spine curvatures.this step. Table 3.3 and Figure 3.20 demonstrate the effect of this refinement stepon our results. The visual results show that after this step the predictions are usu-ally fit better to the curvature of spinal column. It is mostly because of the fact thatthe curvature in one part of spine cannot be predicted from the points on other partsof the column.43Figure 3.21: Visual results of the shape+pose model refinement step areshown on the mid-sagittal plane of three example images. The lo-calization points before model refinement are shown in cyan while thepoints after model refinement are shown in yellow. The improvementis minimal and may not be worth the computational expense.3.5.4 Capability of Our Shape+pose Model for Further RefinementIn this step, we attempt to initialize our shape+pose model of lumbar spine by thepredictions of our method, register the model to the edges of the image in orderto refine the localization and also provide a vertebrae segmentation. Although thisapproach improved the identification rate by 1%, the results were not satisfactory(See Figure 3.21). The main reason is that the presence of image artifacts (mostlycaused by surgical implants) do not allow us to obtain a high-quality edge map formodel registration. Due to presence of these artifacts, the method fails to automat-ically find a suitable sensitivity threshold for the Canny edge detector. Figure 3.22illustrates three examples of Canny edge detector output. The other reason is thatthe model (trained on mostly healthy subjects [50]) cannot accommodate patho-logical spines in this dataset.We do not include this step in our final results, since it causes little improve-ment with the expense of high computation cost (about two minutes per image).44Figure 3.22: The edge map that is obtained by Canny edge detector in themid-sagittal plane of three example images. Image artifacts and un-clear boundaries adversely affect the quality of the extracted edgemaps.3.5.5 Final ResultsOur final localization and identification results are presented in Table 3.4. The re-sults show the capability of our method for determining the visible part of spine andalso identifying each visible vertebra. Since the width of vertebrae are generallysmaller in the cervical region, similar mean error may cause lower identificationrate in this region. Because our method is computationally fast (Potentially lessthan 3 seconds on a desktop computer), it can be used as a preprocessing step in awide range of clinical procedures.3.5.6 Comparison to State-of-the-artOur results are compared to [15] and [16] which are the state-of-the-art methodsfor vertebra localization and identification in CT scans. All three approaches areevaluated on the same dataset with a two-fold cross validation with exactly thesame fold separation.Our approach is the most similar to [15]. In this work, they first use a voting45Table 3.4: Localization results for different regions of the vertebral column.Mean error and STD are in mm.Region Accuracy Precision Sensitivity Specificity Mean STD Id. ratesAll 96.0% 94.4% 97.2% 95.0% 18.2 11.4 53.6%Cervical 96.0% 91.2% 97.8% 95.0% 17.1 8.7 43.1%Thoracic 95.1% 93.9% 95.9% 94.5% 17.2 11.8 55.3%Lumbar 98.1% 97.5% 99.4% 96.1% 20.3 12.2 56.9%framework based on a Regression Forest (RF) to obtain a rough localization, andthen they use a graphical model based on a Hidden Markov Model (HMM) to re-fine the predictions. The results of their method on this public dataset is providedin [16]. They achieved state-of-the-art results on a dataset of non-pathologicalcases. However, the performance of their method degrades significantly in thispublic pathological dataset (which we also used). A possible reason is that thegraphical model cannot accommodate high variations in pathological cases andconsequently fails to refine the predictions in this dataset. The results of our methodis compared to theirs in Table 3.5. The comparison demonstrates that using deeplearning can eliminate the need for model-refinement which has relatively highcomputational cost and is not as robust in pathological cases.On this dataset, the state-of-the-art localization results are recently presentedin [16] using a method based on a Classification Forest (CF). A comparison be-tween our results and theirs is summarized in Table 3.6. While their approach hasa higher identification rate, our method outperforms theirs in terms of accuracy,precision, sensitivity, and specificity. In addition, our algorithm can be computa-tionally faster.The algorithms presented in [15] and [16] take about 2 minutes and 1 minuteper image, respectively. In [15] a joint shape and appearance model in severalscales is used to refine the predictions. In [16] centroid density estimates basedon vertebra appearance are combined with a shape support term to remove thefalse positive predictions from all over the image. Our method does not requireabove-stated computationally-intensive steps. Our reported results are generatedusing a vectorized MATLAB implementation for feature extraction which takes46Table 3.5: Comparison of the localization results of our method and themethod presented in [15] which uses regression forests followed by arefinement using Hidden Markov Models. The same training and testsets are used in evaluations of both methods. Mean error and STD are inmm.RegionMean STD IdentificationOurs RF+HMM Ours RF+HMM Ours RF+HMMAll 18.2 20.9 11.4 20.0 54% 51%Cervical 17.1 17.0 8.7 17.7 43% 54%Thoracic 17.2 19.0 11.8 20.5 55% 56%Lumbar 20.3 26.6 12.2 19.7 57% 42%about 1 minute per image. However, an efficient version of the feature extractionstep is recently implemented in C++ (Bill Liu, Vancouver, BC) which works ina fraction of a second per image. The descriptions and complexity analyses ofboth implementations are provided in Section 5.1. After integration with the C++implementation of the feature extraction step, the computation time for our methodwill be reduced to less than 3 seconds per image. This makes our method a bettercandidate than the methods proposed in [15] and [16] for a range of image-guidedprocedures that cannot tolerate long execution time.For deep neural network part, we used Theano-nets package (Leif Johnson,Austin, TX) built on top of Theano library [5] in Python. The other parts of the al-gorithm are implemented in MATLAB. Testing time for two deep neural networksis less than a second per image. Two levels of vote aggregation (first predictionand then local refinement) take about one second per image. Feature extractiontakes about one minute using the MATLAB implementation. However, the C++implementation of the feature extraction step works in less than one second.47Table 3.6: Comparison of the localization results of our method and anothermethod based on classification forests which is considered as state-of-the-art for this dataset [16]. The same training and test sets are used inevaluations of both methods. Mean error and STD are in mm.Accuracy Precision Sensitivity Specificity Mean STD Id. rateOurs 96.0% 94.4% 97.2% 95.0% 18.2 11.4 53.6%CF 93.9% 93.7% 92.9% 94.7% 12.5 12.6 70.6%3.6 SummaryA fully-automatic method based on deep learning is developed for simultaneouslocalization and identification of vertebrae in three-dimensional CT scans. Theresults show that the method can handle arbitrary-field-of-view scans where it isinitially unknown that which part of vertebral column is visible in the image andto what extent. State-of-the-art results are achieved on a large public dataset ofpathological cases. The fact that the proposed method is computationally fasterthan the previous work make it potentially more appealing to be used in computer-aided systems for spine diagnosis and therapy.48Chapter 4Automated Localization,Identification, and Segmentationin MRI4.1 IntroductionLocalization and labeling of vertebrae is a crucial step in diagnosis and therapyof spinal diseases such as slipped vertebra, disk/vertebra degeneration, and spinalstenosis. Vertebra segmentation is also a pre-processing step in diagnosis of spinepathologies like scoliosis and osteoporosis. Hence, building a computer-based sys-tem for spine diagnosis and therapy requires automatic localization, labeling andsegmentation of vertebral structures.Lower back pain, one of the most common types of pain, is usually causedby lumbar region of the spine. Lumbar vertebrae are mostly viewed by X-rays,MR , or CT. Among these three, MR imaging shows the soft tissue better and alsodoes not expose the patient to ionizing radiations. This has led to increased inter-est in MR technologies for imaging the spine in recent years. Automatic labelingand segmentation of vertebral bodies in MR images is challenging because of: 1)variation among images in terms of field-of-view, 2) repetitive nature of vertebralcolumn, 3) lower contrast between bony structures and soft tissue compared to CT,494) larger inter-slice gap in clinical MR images compared to CT, and 5) presence offield inhomogeneities in MR images.Several researchers have investigated the localization and labeling problem inCT [15, 20, 32, 36, 39, 51] and MR [1, 44, 48, 58] images. Most methods make as-sumptions about which vertebrae are visible in the scan. Alomari et al. [1] assumedapproximate alignment between scans. Klinder et al. [32], Glocker et al. [15], andRasoulian et al. [53] claimed handling general scans. However, all three of thesemethods are applied on CT scans and not MR images. In addition, Glocker et al.only provide labeling and do not provide segmentation. Klinder et. al. and Rasou-lian et al. provide segmenation, but their algorithms have high computation timefor arbitrary-field-of-view scans.In Chapter 2 [66], we proposed a semi-automatic segmentation approach forMR images based on a multi-vertebrae statistical shape+pose model. It required oneuser click for each vertebra to be segmented. In this work, we propose a methodbased on deep learning for simultaneous localization and identification of vertebralbodies in MR images. Then, we initialize our previous segmentation method withthe localized points in order to obtain a fully-automatic segmentation.4.2 MaterialsThe proposed method was evaluated on the same dataset that is used in Chapter 2.The dataset consists of nine T1-weighted multi-slice MR images. All images con-tain lumbar region, but the extent of visibility of thoracic and sacrum regions varies.No pathology was observed in this dataset. Slice thickness ranges between 3.3 to4.4 mm among images. The size of all slices are 512× 512 pixels with in-planeresolution of 0.5 mm. The number of slices of each volumetric image varies from12 to 18. The statistical model (used for segmentation) was constructed from man-ually segmented multi-slice CT scans of 32 patients [50]. Segmentation groundtruth meshes (also the same as Chapter 2) were prepared manually using ITK-SNAP [74] from raw images. The center of gravity of each manually segmentedvertebral body is used as localization landmarks.50Intensity CorrectionAutomatic vertebrae local-ization and identificationSpecifying Regionof Interest (ROI)Anisotropic DiffusionEdge DetectionModel RegistrationFigure 4.1: Flowchart of the automatic framework for segmentation of verte-bral bodies.4.3 Methods4.3.1 Automatic Localization and IdentificationPre-processing: bias field correctionThe presence of intensity inhomogeneities in MR images can adversely influencethe quality of intensity-based feature extraction. Hence, we first apply a bias fieldcorrection algorithm on MR images to reduce this inhomogeneity. Statistical Para-metric Mapping package is used for bias field correction (SPM12b, Wellcome De-partment of Cognitive Neurology, London, UK).51Localization and identification by deep learningThe localization task is parametrized as a multi-variate regression problem. Eachvoxel of the image is described by hundreds of intensity-based features. Eachfeature is the difference between the mean intensity over two cuboids displacedwith respect to the reference voxel position. Dimensions and displacement of eachfeature are generated randomly. These features are fast to compute using the in-tegral image idea proposed by Viola et al. [69]. The targets of the multi-variateregression are the relative distances between the reference voxel and the centers oflumbar vertebral bodies (Parametrization of image localization task as a regressionproblem, and also randomly generated cubic features are explained in more detailby Criminisi et al. [10]). A feed-forward neural network with three hidden layersis trained for solving the multi-variate regression problem. Stochastic gradient de-scent along with layerwise pre-training is used for optimization. On a test image,we first extract features from all voxels, then we use the neural network to predictthe relative distance of the labels with the respect to each voxel. Each of theserelative distances are converted to absolute positions of the labels and consideredas the vote of that specific voxel for the position of each vertebral body. The votesof all pixels are aggregated using Kernel Density Estimation [6] to obtain a robustprediction of the center of each lumbar vertebral body. No assumptions are madeabout presence of the target vertebrae in the volumetric image. The voxels mayvote outside of the scope of the image if the target vertebrae are not visible.Refinement by local thresholdingIn most of the cases, the predicted points from the deep learning approach are lo-cated inside of the target vertebral bodies, but not exactly centered. We attempt asimple approach to bring the predicted points to the center of vertebral bodies inorder to improve identification and initialization of our statistical model for seg-mentation. By Otsu thresholding [46] in the region of predicted points, we findthe large components that may be vertebral bodies. If a large component is foundclose to the prediction, we refine the prediction by replacing it with the center ofthat component. Figure 4.3 illustrates an example of the refinement step.52Figure 4.2: The value of each feature is the difference between the mean in-tensity over two cuboids displaced with respect to the reference voxelposition. From [10].Figure 4.3: Refining localization points by replacing them with the center ofthe closest large component, obtained from local thresholding. Left:Localized points before refinement. Middle: Refinement using compo-nents obtained from local thresholding. Right: Localized points afterrefinement.534.3.2 SegmentationPre-processing: anisotropic diffusionA three-dimensional anisotropic diffusion [49] filter is applied to the bias-field cor-rected image for image smoothing while preserving edges. This pre-processingstep highly improves the quality of edge detection. For speed optimization, weonly apply the filter to the region of spinal column detected by previous steps.More details of the pre-processing steps are described in Section 2.3.Statistical model registrationWe use Canny edge detection algorithm to obtain the edges in the detected spinalcolumn region. The maximum value of the gradient magnitude of the region isused to automatically obtain the sensitivity threshold of the edge detector algo-rithm. Then, we use an iterative Expectation Maximization registration method(introduced by Rasoulian et al. [50]) to register a statistical model to Canny edgesin order to obtain a robust segmentation. The statistical multi-vertebrae modelintegrates variations of shapes and poses of lumbar vertebrae that are separatelyextracted from 32 manually-segmented 3D CT scans [50]. The multi-vertebraeregistration of the model allows us to avoid mis-segmentation in the area betweenvertebrae by simultaneous registration of all visible vertebrae and taking into ac-count the correlation between shapes and poses of different vertebral bodies.4.4 Results and DiscussionLocalization and identification results on nine subjects are reported in Table 4.1.The centroids of five lumbar vertebral bodies (L1 to L5) are predicted using leave-one-out cross validation. The distance to the ground truth landmarks are computed.The average and standard deviation of these distances as well as identification rateare reported. We consider a vertebral body to be identified when its distance toground truth landmark is less than 2 mm and the closest landmark is the correctone. The results illustrate 100% identification after refinement step for 45 lumbarvertebral bodies.Three-dimensional segmentation results are shown in Table 4.2. The closest54Figure 4.4: Localization and labeling results on the mid-sagittal slice of bias-field corrected images.55Table 4.1: Localization and identification results for lumbar vertebral bodiesin nine volumetric MR images.Mean error Standard deviation IdentificationDeep Learning Localization 11.9 mm 6.3 mm 91%After Refinement 3.0 mm 2.4 mm 100%distance to the registered model is found for each point in the manual segmenta-tion. The average and maximum of these distances are reported as respectivelymean error and Hausdorff distance. These results demonstrate that we can au-tomatically segment the lumbar vertebral bodies in volumetric MR images withmean error of below 2.8 mm which shows improvement over our previous semi-automatic method. Some examples of our identification and segmentation resultsare shown in Figures 4.4 and 4.5.The localization and identification step (including refinement) takes less than20 seconds on a desktop computer. The whole process takes less than 3 minutesusing an inefficient MATLAB (The MathWorks, Inc., Natick, MA) implementationfor segmentation. Training a deep neural network takes about one day which can beadjustable by downsampling the training data. Deep learning part is implementedin Python using Theano-nets package (Leif Johnson, Austin, TX) built on top ofthe Theano library [5].56Figure 4.5: Examples of segmentation results in the mid-sagittal slice of five different patients. Our segmentationresults are shown with white contours, while red contours show the manual segmentation.57Table 4.2: Mean error and maximum distance (Hausdorff) of segmentation error (in mm) for each vertebral body.SubjectL1 L2 L3 L4 L5 All lumbar vertebraeMean Max Mean Max Mean Max Mean Max Mean Max Mean ± std Max1 2.0 6.1 1.7 5.2 2.8 9.5 4.4 15.7 4.2 16.4 3.0 ± 1.2 16.42 2.2 6.5 2.2 8.0 2.0 7.3 1.8 9.6 3.4 16.6 2.3 ± 0.6 16.63 3.7 8.6 3.4 8.7 3.6 9.6 4.0 13.6 5.8 19.9 4.1 ± 1.0 19.94 2.6 8.1 1.9 7.1 2.6 10.5 1.6 8.3 2.0 9.1 2.1 ± 0.4 10.55 2.2 8.0 1.6 5.5 1.6 6.4 2.2 11.0 4.6 17.7 2.5 ± 1.2 17.76 2.7 10.3 2.6 9.7 2.8 9.3 2.5 10.4 4.0 16.1 2.9 ± 0.6 16.17 1.8 5.5 1.7 5.6 2.2 7.9 2.7 11.9 3.5 15.2 2.3 ± 0.7 15.28 3.2 8.4 3.3 9.5 3.2 10.4 3.6 12.7 2.6 11.0 3.2 ± 0.4 12.79 2.3 7.3 1.7 6.5 2.0 8.0 1.9 8.8 3.1 12.4 2.2 ± 0.5 12.4Average 2.5 7.7 2.2 7.3 2.5 8.8 2.7 11.3 3.7 14.9 2.7 ± 0.9 19.958Intensity CorrectionAutomatic vertebrae local-ization and identification- Described in previous chapter.- Type of features: Two cuboids- Number of features = 500- Number of layers = 5Specifying Regionof Interest (ROI)- ROI size= mean(D)+3std(D),where D = distances between con-secutive clicks.Anisotropic Diffusion- Conductance κ = 50- Integration constant ∆t = 1/7- Number of iterations n = 20Edge Detection - Algorithm: CannyModel Registration- Only vertebral body points- Number of vertebrae = 5- Number of registered points = 104- Oulier rate = 0.01Figure 4.6: Summery of the parameters of the automatic framework for seg-mentation of vertebral bodies. The values of the main parameters ofeach step are shown in the boxes at the rightmost column.594.5 SummaryA fully-automatic approach based on deep learning and statistical models is pro-posed for localization, labeling and segmentation of vertebral bodies in multi-sliceMR images. Although the experiments are done only on the lumbar region, noassumptions are made about presence of specific vertebrae in the method. Themulti-vertebrae model can handle large inter-slice gap (about 4 mm) in clinical MRimages with low computation cost. Results demonstrate that our method can au-tomatically localize, label, and segment the vertebral bodies in MR images withsufficient accuracy and speed for a wide range of clinical applications.60Chapter 5Speedup by Vectorization andGPU Acceleration5.1 Vectorized Feature ExtractionWhile compiled programming languages like C/C++ are widely used for softwarerelease, interpreted languages like MATLAB and Python are more popular in re-search. It is mainly because their tools and built-in math functions enable the re-searcher to explore multiple approaches and reach a solution faster than traditionalcompiled languages. MATLAB, in particular, is optimized for operations involvingmatrices and vectors. On the other hand, it performs much slower than compiledlanguage in executing loops over multi-dimensional data. This is one of the rea-sons that code vectorization is highly recommended when using such languages.In this context, vectorization means revising loop-based, scalar-oriented code tovector and matrix operations. Other than better performance, a vectorized code iseasier to read and less error prone.As explained in Chapters 3 and 4, for the task of vertebrae localization, weextract a set of features from image voxels to feed our deep learning system. Inthis section, we explain the key points of our contribution in providing a vector-ized implementation of this feature extraction step. The research and developmentbenefits of this approach is analyzed by a comparison against a basic sequentialimplementation.615.1.1 Description of FeaturesFor vertebrae localization methods presented in Chapters 3 and 4, hundreds ofintensity-based features are extracted from each voxel of the image (with somedownsampling for computational speedup). The features describe a short-rangearea around the reference voxel.In CT scans, the value of each feature is the mean intensity over a three-dimensional cuboid displaced with respect to the reference voxel position. In MRimages, because of the presence of intensity inhomogeneities, the mean intensityover two cuboids are subtracted and used as the value of each feature (See Fig-ures 3.2 and 4.2). The features used in CT and MR are explained in more detail inSections 3.3 and 4.3. For simplicity, in this section we focus on the implementationalternatives for CT features. MR features are implemented similarly by using twocuboids instead of one.Computing the sum of intensities over cuboids can be done efficiently using anintermediate representation of the image, called the integral image [69]. Mathe-matical definition is brought in Equation 3.2 and a 2D representation is shown inFigure 3.3. Having the integral image, the sum of intensities over a cuboid in theoriginal image can be computed in constant time.The first step is to generate specifications for M features. Each feature box Fi isdefined by its displacement Di ∈R3 and its dimensions Si ∈R3. The displacementis defined as the center of the feature box in respect with the reference voxel posi-tion. Di and Si are selected randomly from the ranges [−max displace,max displace]and [1 mm,max size], respectively. The integral image, IIN1×N2×N3 has the samesize of the original image denoted as IN1×N2×N3 . In our experiments, M = 500 fea-tures are specified. Note that this number of features need to be extracted for eachimage voxel. Therefore, for the whole image we may extract billions of features.5.1.2 Sequential ImplementationIn the sequential approach, we simply iterate over image voxels for each feature.Using the integral image, extracting the mean intensity of a feature box can be donein constant time . Assuming M features and an N×N×N image, the computationalcost for feature extraction per image will be MN3.62Algorithm 1 Sequential feature extraction1: for i = 1→M do . Loop over all features2: for x = 1→ N1 do . Loop over x axis3: for y = 1→ N2 do . Loop over y axis4: for z = 1→ N3 do . Loop over z axis5: Fi← mean intensity of the feature box . Constant time5.1.3 Vectorized ImplementationIn the vectorized version, we quantize the range of allowed box sizes and displace-ments, respectively, in setsS = {{2,2,2}{2,2,7}{2,2,12} , ...,{max size,max size,max size}} , (5.1)andD = {−max displace, ...,−8,−4,0,4,8, ...,max displace} . (5.2)The values in the above sets are in mm. For each Si ∈ S , we extract the meanintensity of feature boxes with size Si from all voxels of the image. Using element-wise matrix operations on the integral image, we create a 3D matrix with the samesize of the original image, where the value of each element of this matrix equals themean intensity of the box with size Si centred on the corresponding voxel on theoriginal image. Quantizing displacements allow us to downsample these computedmatrices for memory efficiency. Thereafter, the value of each feature Fj for allvoxels, can be obtained rapidly by picking the already-computed 3D matrix for S jand reshaping it according to D j. Figure 5.1 visually illustrates the steps of thisalgorithm on a 2D example.Algorithm 2 Vectorized feature extraction1: for all Si ∈ S do2: Bi← mean intensity of box sizes Si centred on all pixels3: for j = 1→M do . Loop over all features4: Fj← cropped B j according to D j63Figure 5.1: The proposed vectorization approach is shown on a 2D example.A feature box with size Si and displacement vector Di can be computedfor all pixels of an image by element-wise matrix addition and subtrac-tion on the integral image according to Si (Left), and then cropping theproduct matrix according to Di (Right). In the left figure, a numberof element-wise matrix operations (purple-orange-green+blue) on theintegral image results in a matrix in which each element contains thesum of boxes with size Si started from the corresponding element in theoriginal image. Out-of-bound parts of the matrices can be handled byzero-padding or cropping.5.1.4 Comparison and Computation AnalysisIn the experiments of Chapter 4, the number of features, M, equals 500. Consid-ering downsampling rate of [4 mm,4 mm,4 mm] which is used in Chapter 4, thedimension of images in each direction, N, is mostly less than 80 pixels. The pro-posed sequential algorithm is O(MN3) which we expect it to be executed usinga compiled programming language like C++ in the order of 500× 803 = 4× 106cycles, which should take less than a second on a desktop machine (with a com-monplace 3-GHz processor).Nevertheless, MATLAB is mostly an interpreted language and is relativelyslow in running loops over multi-dimensional data. On the other hand, MATLABis optimized for matrix operations. In our vectorized implementation instead offor-loops over dimensions of the image, we used element-wise matrix operations(addition, subtraction, and reshape). Figure 5.2 illustrates the substantial speedup640 0.5 1 1.5 2 2.5 3 3.5SequentialVectorized3 minutes15 secondsMinutesComputation timeFigure 5.2: Computation time of the vectorized version and the sequentialversion of the algorithm on a sample image in MATLAB.that we get from using the vectorized feature extraction algorithm in MATLABwith the setting of the experiments of Chapter 4. Note that the computational timereported for vectorized version is not affected by downsampling. Features are com-puted for all voxels and the downsampling rate is considered at the end to disregardthe significant number of computed features. On the other hand, the computationtime of the sequential approach is exponentially affected by downsampling rate(O(N3) when M is constant). Other than the performance gain, the vectorizedimplementation also makes the code easier to be read, maintained, and modified.Not being affected by downsample rate also let the researcher explore the effect ofdifferent downsample rates without recomputing the features.5.2 GPU-accelerated Model RegistrationThe computation time of algorithms is a principal factor in the area of medicalimage analysis. Algorithms need to meet clinical time requirements to be used inclinics and operating rooms. Many algorithms with high accuracy and robustnesscannot be used in clinical setting, because they do not meet clinical requirements interms of speed. If a method does not work within clinically acceptable time-frame,it would not be valuable in clinical basis and cannot be used in practice. Parallel65computing can be used to make the algorithms to meet these time requirements.In recent years, there has been a significant surge of interest in parallelizingmedical image analysis algorithms on Graphics Processing Units (GPU) [29, 45,77]. A GPU, typically, has a much higher number of processing elements than tradi-tional single-core or multicore machines. Unlike conventional Central ProcessingUnits (CPU), which did not have more than a handful of cores, GPUs include mas-sively parallel arrays of processing units that can greatly increase the throughputof parallel programming. Using this technology may allow a wide range of imageprocessing algorithms, which previously did not meet hard time requirements ofclinical usage, to fit in these constraints.The work presented in this section, in particular, aims to parallelize a statisti-cal model registration method on processing units of a GPU. The method, proposedin [50] , is used in Chapters 2 and 4 for spine segmentation. This algorithm is paral-lelized by two different methods: 1) Compute Unified Device Architecture (CUDA)programming on a GPU, and 2) Multicore programming using shared memory. Infirst method, we take advantage of hundreds of processing elements available in aGPU for parallelizing, while in second method parallelizing is based on a 32 pro-cessing units available in a multicore machine.5.2.1 Spine Segmentation MethodOur segmentation algorithm aims to find boundaries of each vertebra in spinal col-umn in CT or MR images. For this purpose, first, a statistical shape model of spineis generated based on a number of manually segmented images. Then, given a newnot-segmented image, we register our previously generated model to the new im-age to obtain a spine segmentation. In other words, the boundaries of each vertebrain the input image is found by aligning a statistical model to the detected edgepoints [50]. The concept of statistical models is described in detail in [13].Improving the performance of this method up to real-time limits can massivelyexpand its application for spine segmentation in CT, MR, and ultrasound images.To this end, we select the most computationally intensive portion of the algorithmand aim to efficiently parallelize its execution.66Computationally intensive partIn the most computationally intensive part of the algorithm, we have a set of Npoints in D dimensions, XN×D. We also have M Gaussian distributions with stan-dard deviation of σ and different means. The means of these M Gaussian distribu-tions are gathered in each row of matrix YM×D. We need to build the probabilitymatrix PM×N . The element Pmn in this matrix should be set to the probability thatpoint n in XN×D is generated by the mth Gaussian distribution that its mean is rep-resented in mth row of YM×D. Each element of P can be computed as follows:Pmn =exp(−12 ||(Ym−Xn)/σ ||2)(∑Mk=1 exp(−12 ||(Yk−Xn)/σ ||2))+ ε, (5.3)where Ym is the mth row of Y , Xn is the nth row of X , ε is a constant and the operator||A|| is defined as follows:||A||=√AAT . (5.4)The next step is to find the best transformation that aligns the model (above-mentioned Gaussian distributions) to the data points (matrix XN×D). In order tosolve this optimization problem, a loss function is defined. And the goal is tominimize this loss function. This function will be different for different types oftransformations (e.g. rigid, rotation, affine, etc.). These different types of lossfunctions are described in [38]. In all cases, when we get the derivative of the lossfunction (for minimizing), terms PX , PIc and PT Ic are produced, where Ic is a col-umn matrix that all of its elements are ones. PX is P multiplied by X . PIc meanssum of each row of P, and PT Ic is sum of each column of P. The final outputs ofthis portion are these three matrices.In each run of the algorithm, this portion is called about 120 times. In allcalls M = 7000, but N may vary in the range between 5,000 and 20,000. Thecomputation time of each call in the sequential C++ implementation is about 7seconds for N = 20,000. Our goal is to reduce this computation time to below 0.5seconds in order to meet clinical requirements.675.2.2 Parallel Computing ApproachTwo different parallel computing approaches are used: (1) GPU acceleration usingCUDA programming, and (2) multicore CPU acceleration using shared memory. InCUDA programming, we take advantage of the parallel nature of GPU architectureswhich have hundreds of cores capable of running thousands of parallel threads,while in the shared memory implementation a multicore CPU architecture is usedwith 32 cores and at most 32 threads. Our objective is to efficiently parallelize themethod in each of these platforms and compare the results.GPU acceleration using CUDACUDA is a parallel computing platform created by NVIDIA. It gives the program-mer access to instruction set and memory of the parallel computational elementsin NVIDIA GPUs. When programming in CUDA, one can use many thread blocksthat are scheduled independently. A thread block contains a number of threads(usually between 32 and 512). A kernel function is run in all threads. The threadsin each block are executed in parallel, but multiple blocks may be executed in par-allel or one after another depending on the resources of the GPU. Threads of eachblock can have a fast access to a shared memory and can also get synchronizedwith a low cost. The number of blocks and number of threads in each block shouldbe adjusted based on the data dependency of the program and available hardwareresources to achieve the highest possible performance.Multicore CPU acceleration using shared memoryPthreads library is used to parallelize our algorithm on a multicore CPU. Thislibrary defines a set of C programming language types, functions and constantsfor parallelizing tasks over multiple cores of a CPU. This enables us to evaluate theeffectiveness of GPU acceleration by comparing its speedup gains against multicoreCPU parallelization.5.2.3 ExperimentIn this subsection, we explain how the program is parallelized and how tasks aredistributed among the processing elements in each platform.68Parallelization on GPUThe most computationally intensive part of the algorithm is parallelized over theprocessing elements of a GPU. In this part, XN×D and YM×D are given. We wouldlike to build the probability matrix PM×N using Eq. 5.3 and derive the outputs PX ,PIc and PT Ic. For building the probability matrix P, we need to allocate an M×Nmatrix in the GPU’s memory. A kernel function needs to be executed on eachprocessing element for computing the numerator of Eq. 5.3. We assign M×Nthreads to execute this kernel function for each m and n, where 0 ≤ m < M and0≤ n < N. After this step, the numerator matrix will be as follows:numeratormn = exp(−12||(Ym−Xn)/σ ||2), (5.5)where each Pmn is computed by an independent thread in parallel. Thereafter,we need to compute the denumerator of Eq. 5.3. Note that the denumerator hasthe same value for all elements of each column of P. Also note that the firstterm of the denumerator of each column is the sum of the corresponding col-umn in numeratorM×N that we have already computed. Hence, for computingthe denumerator value of column i, we need to sum up the values of column i innumeratorM×N and then add the constant ε to the sum. Since numeratorM×N hasN columns, this can be done by using n parallel threads. After this step, the vectordenumerator with size N is built, where:denumeratorn = (∑Mk=1 exp(−12||(Yk−Xn)/σ ||2))+ ε. (5.6)Now we need to again use M×N threads to divide each element of numeratorM×Nby the corresponding element in denumeratorN :Pmn =numeratormndenumeratorn. (5.7)Once PM×N is built, we can obtain PX , PIc and PT Ic, where PIc is the vector of thesum of each row of P and PT Ic is the vector of the sum of each column of P. Basedon this description, we point out that PIc and PT Ic can be implemented as PIN×1and PT IM×1 where all elements of IN×1 and IM×1 are set to one. Therefore, each of69output matrices PX , PIc and PT Ic can be computed with one matrix multiplication.An efficient parallel matrix multiplication procedure is implemented in a library inCUDA called CUBLAS [43]. Matrix multiplication routine of this library is usedfor computing above-stated matrix products.Parallelization on a multicore CPUThe computationally intensive part of the algorithm is also implemented usingshared memory on a multicore machine. Assume that we use K parallel processingcores, in which K is at most 32, based on available resources. For building theprobability matrix PM×N , we distribute N columns of P between K processors. Inother words, each processor is assigned to compute N/K columns of P. And theprocessors work in parallel until all elements of P are computed. Consider columnj as one of the columns that is assigned to processor i. This processor traversesover each element of column j and computes the numerator value (Eq. 5.5) of eachelement. It also sums up the elements while traversing over the column. This way,when it reaches the bottom of the column, all numerator values of the column aswell as their summation are computed. We obtain the denumerator of that column(Eq. 5.6) by adding the constant ε to the sum of numerators. At this step, we haveboth the numerators and the denumerator of the column. Hence, the processorneeds to traverse over the column again to divide the numerator of each element bythe denumerator of the column. During this traverse, it also sums up the productsto obtain the jth element of PT Ic. Now, jth column of PM×N as well as jth elementof vector PT Ic is computed. Then, processor i moves on to its next assigned col-umn (possibly j+1) to repeat these steps. The processors work in parallel until allelements of P and PT Ic are computed. At this point, a synchronization using a bar-rier is necessary to make sure all processes are done before moving on to the nextstep. For obtaining PX , we distribute the rows of P among processors to performa parallelized matrix multiplication. This procedure includes traversing over eachrow of the first matrix P. During this traverse, we also sum up the elements of therow to obtain each element of PIc. By the end of this step, when all processors aredone we will have all outputs PX , PIc and PT Ic.7016 32 64 128 256 51200.20.40.60.81Number of threads per blocksecExecution timeFigure 5.3: Computation time of the CUDA program for different blocksizes. M=7,000 and N=20,000.5.2.4 ResultsBlock size in GPU programmingDeciding on the optimal number of threads for each block is a challenging step inGPU acceleration. In this experiment, we have changed the block size for the prob-lem size of N=20,000 and reported the observed performance. Results are shownin Figure 5.3. The results illustrate that the best performance is obtained with theblock size of 64 threads. Following this observation, all further experiments areperformed with the block size of 64 threads.Speedup gain from GPU-accelerationThe parallelized program is called in each run of the segmentation algorithm about120 times. In all these calls M = 7,000, but N can be varied between 5,000 and20,000. For this reason, we tested our GPU accelerated program for different prob-lem sizes in this range. The results are compared with a sequential efficient pro-71Table 5.1: Computation time and speedup of the GPU-accelerated programfor different problem sizes.Problem size N=5,000 N=10,000 N=15,000 N=20,000Sequential program 1400 ms 3700 ms 4900 ms 7200 msCUDA program 81 ms 163 ms 241 ms 317 msSpeedup 17× 22× 20× 22×gram which has been run on the same machine. Computational times and speedupsare reported in Table 5.1. All the timings are obtained from a 2.67 GHz Intel Xeronmachine with an NVIDIA Tesla C2050 GPU.The GPU-accelerated part of the algorithm is integrated in the implementationof the whole statistical model registration which is used in Chapters 2 and 4. Theprocessing time on a sample CT scan is reduced from 15 minutes to 76 seconds.The timing reported for model registration in Chapters 2 and 4 used an approxi-mate implementation of the algorithm. On the same sample CT scan, Our GPU-accelerated method performed 1.45× faster than the approximate implementation.The accuracy of this approach is also not affected by approximation.Speedup gain from multicore CPU accelerationFor each problem size, we have tested the multicore-accelerated program with dif-ferent number of cores. Figure 5.4 shows the execution times for the problem sizeN = 20,000 when using different numbers of cores. The results demonstrate thatincreasing the number of cores up to 8 may improve the performance, but increas-ing further will reduce the performance due to the overhead of accessing sharedmemory and synchronization. Table 5.2 shows the execution times and obtainedspeedups of our shared memory implementation for different problem sizes. Re-ported results for the sequential program are obtained from using only one corewhen running the same shared memory implementation. Reported results for themulticore-accelerated program are the best results that are obtained by varying thenumber of cores for each problem size.720 5 10 15 20 25 301015202530Number of processorsSecondsExecution timeFigure 5.4: Computation time of the multicore-accelerated program for dif-ferent numbers of processors. M=7,000 and N=20,000.Table 5.2: Computation time and speedup of the multicore-accelerated pro-gram for different problem sizes.Problem size N=5,000 N=10,000 N=15,000 N=20,000Sequential 7 s 14 s 21 s 26 sMulticore accelerated 5 s 11 s 16 s 19 sSpeedup 1.4× 1.3× 1.3× 1.3×5.2.5 DiscussionThe speedup gain results from the GPU accelerated program (more than 17× inall cases) is much more than the speedup gain from the conventional multicore-accelerated one (less than 1.5× in all cases). This observation shows that thehighly parallel structure of GPUs makes them more effective than general-purposemulticore CPUs for our spine segmentation algorithm. This speedup makes theregistration algorithm more acceptable for clinical settings.735.3 SummaryIn the first section of this chapter, a vectorized algorithm is proposed for extractingthe intensity-based features used in previous chapters, in parallel, from all voxelsof a 3D image. Quantitative results showed about 12× speedup over a sequentialapproach in MATLAB.In the second section of this chapter, we parallelized our segmentation ap-proach, used in previous chapters, over processing elements of a GPU. The speedupgain from this approach is compared against a conventional multicore-acceleratedapproach. A speedup around 17× was obtained from the GPU-accelerated methodwhich was significantly higher than the speedup gain from multicore acceleration.This speedup can determine the potential of GPU-acceleration for fitting the pro-posed automatic segmentation method in the acceptable time-frame of various clin-ical tasks.74Chapter 6Conclusion and Future WorkDesigning an effective computer-aided system for spine diagnosis and therapy re-quires a method for automatic vertebra localization and identification. An auto-matic segmentation method can also directly benefit such system for diagnosingspine pathologies. To be accepted in clinical practice, these methods need to befast, accurate, and robust.In this thesis, we proposed and evaluated several methods for vertebra local-ization, identification, and segmentation. In Chapter 2, we successfully adapteda semi-automatic segmentation approach, which was previously proposed for CTscans, to volumetric MR images. We showed that this method can handle intrinsicMR segmentation challenges such as intensity inhomogeneities and large inter-slicegaps. In Chapter 3, we introduced a method for automatic localization, and label-ing of vertebrae in three-dimensional CT scans. An extensive evaluation showedthat this method can efficiently solve the localization problem in general scanswhere the field of view is restricted and spine pathologies may be present. Themethods from Chapters 2 and 3 are combined in Chapter 4 to build an integratedautomatic system for all tasks of vertebra localization, labeling, and segmentation.Evaluation on MR images of the lumbar region shows promising performance forthe integrated system. In Chapter 5, the potential of some parallel computing ap-proaches is evaluated for improving the performance of the methods presented inprevious chapters.756.1 ContributionsThe major contributions of this thesis are summarized as follows:• A semi-automatic method is developed for extracting the edges of verte-bral bodies in volumetric MR images of the spine. A bias-field correctionalgorithm is applied to reduce the intensity inhomogeneities. Then, a sim-ple user interaction is used for determining the Region Of Interest (ROI). A3D anisotropic diffusion filter is applied to the ROI for smoothing the imagewhile preserving the edges. Then, a Canny edge detector is used to extractthe desired edges. The edge map is then used for statistical model registra-tion in order to segment the vertebral bodies.• A novel technique is developed for automatic vertebra localization and iden-tification based on training deep neural networks. Separate estimators areused for different coordinates. The voxels of the image that are far fromCanny edges are disregarded in the estimation. A recently proposed KernelDensity Estimation [6] method is used for fast and robust vote aggregation.• A novel localization refinement approach is proposed and integrated in theautomatic localization system. The initial prediction for location of a ver-tebra is obtained from the votes of voxels from all over the image. At thispoint, the prediction is refined by aggregating the votes of the voxels aroundthe initial prediction. The votes that are far from the initial prediction areconsidered as outliers and disregarded. This additional step improves theresults of the method on a large dataset of pathological cases, where unpre-dicted curvatures may be present in different parts of the spine, while it doesnot add significant computation time to the localization system.• A fully-automatic system is developed by integrating an automatic verte-brae localization algorithm and a semi-automatic segmentation approach. Anovel technique, based on local thresholding, is developed for refining thevertebrae localization predictions in MR images. This technique is used asan intermediate step for integrating the above-stated systems.76• The registration of a multi-vertebrae statistical model of the spine is paral-lelized over the computing elements of a Graphical Processing Unit (GPU).The computational time is compared against the pre-existing sequential im-plementation.• A vectorization technique is developed for computing the mean intensity ofhundreds of displaced cuboids for all pixels of a 3D image in parallel. Thesevalues are used as features for building an automatic vertebrae localizationsystem for CT scans and MR images.6.2 Future WorkAs a continuation of the work presented in this thesis, a number of interestingresearch projects can be suggested as follows:• In Chapters 2 and 4, we focused on segmenting the vertebral body part ofeach vertebra in MR images. Because of the large slice spacing in typicalMR images of the spine, the other parts such as transverse processes are notdepicted as well as vertebral bodies. However, a sub-anatomical labelingof vertebral arch may be possible by registering the statistical model of thewhole vertebrae after convergence of the vertebral body parts.• The segmentation methods presented in Chapters 2 and 4 are evaluated ona small dataset of MR images. In addition, the ground truth meshes areprovided by a single non-expert. Future work may involve better evaluationof the method by applying it to a larger dataset that are manually segmentedby multiple experts. Since the manual segmentation task on MR images isa subjective task, considering the variability of manually segmented meshesprovided by multiple experts can significantly increase the reliability of thereported segmentation results.• The vertebrae localization approach presented in Chapter 3 is extensivelyevaluated on a large dataset of arbitrary-field-of-view CT scans that most ofthem have spine pathologies. However, the evaluation on MR images has77been done on a small set of images that are all captured from lumbar region.Future work may involve a more extensive evaluation on MR images.• In the vertebrae localization approach proposed in Chapter 3, we used hun-dreds of intensity-based features to feed deep neural networks. A futurework may involve including the feature extraction step in the deep network,using convolutional neural networks or Boltzmann machines.• The vertebrae localization approach proposed in Chapter 3, does not relyon the image modality or the nature of the spine anatomy. Future work mayinvolve using this method for localizing other anatomies like prostate or liverin different image modalities such as CT scans, MR images, and ultrasoundimages.78Bibliography[1] R. S. Alomari, J. J. Corso, and V. Chaudhary. Labeling of lumbar discs usingboth pixel-and object-level features with a two-level probabilistic model.Medical Imaging, IEEE Transactions on, 30(1):1–10, 2011. → pages 3, 16,50[2] P. Baldi and K. Hornik. Neural networks and principal component analysis:Learning from examples without local minima. Neural Networks, 2(1):53–58, 1989. → pages 33[3] Y. Bengio. Learning deep architectures for AI. Foundations and Trends R© inMachine Learning, 2(1):1–127, 2009. → pages 17[4] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. Greedy layer-wisetraining of deep networks. Advances in Neural Information ProcessingSystems, 19:153–168, 2007. → pages 26[5] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins,J. Turian, D. Warde-Farley, and Y. Bengio. Theano: a CPU and GPU mathexpression compiler. In Proceedings of the Python for Scientific ComputingConference (SciPy), 2010. → pages 47, 56[6] Z. Botev, J. Grotowski, D. Kroese, et al. Kernel density estimation viadiffusion. The Annals of Statistics, 38(5):2916–2957, 2010. → pages 30, 39,52, 76[7] J. Canny. A computational approach to edge detection. Pattern Analysis andMachine Intelligence, IEEE Transactions on, (6):679–698, 1986. → pages 9[8] J. Carballido-Gamio, S. J. Belongie, and S. Majumdar. Normalized cuts in3-D for spinal MRI segmentation. Medical Imaging, IEEE Transactions on,23(1):36–44, 2004. → pages 6, 979[9] A. Criminisi, J. Shotton, and S. Bucciarelli. Decision forests withlong-range spatial context for organ localization in CT volumes. ProcMICCAI Workshop on Probabilistic Models for Medical Image Analysis,pages 69–80, 2009. → pages 19[10] A. Criminisi, D. Robertson, O. Pauly, B. Glocker, E. Konukoglu, J. Shotton,D. Mateus, A. M. Mo¨ller, S. Nekolla, and N. Navab. Anatomy detection andlocalization in 3D medical images. In Decision Forests for Computer Visionand Medical Image Analysis, pages 193–209. Springer, 2013. → pages xi,xiii, 19, 20, 52, 53[11] G. E. Dahl, T. N. Sainath, and G. E. Hinton. Improving deep neuralnetworks for LVCSR using rectified linear units and dropout. In Acoustics,Speech and Signal Processing, IEEE International Conference on, pages8609–8613. IEEE, 2013. → pages 34, 36[12] J. Egger, T. Kapur, T. Dukatz, M. Kolodziej, D. Zukic´, B. Freisleben, andC. Nimsky. Square-cut: a segmentation algorithm on the basis of a rectangleshape. PloS One, 7(2):e31064, 2012. → pages 2, 6, 9[13] A. Field. Discovering statistics using SPSS. Sage Publications Limited,2009. → pages 66[14] J. Gall and V. Lempitsky. Class-specific hough forests for object detection.In Decision Forests for Computer Vision and Medical Image Analysis, pages143–157. Springer, 2013. → pages 19[15] B. Glocker, J. Feulner, A. Criminisi, D. R. Haynor, and E. Konukoglu.Automatic localization and identification of vertebrae in arbitraryfield-of-view CT scans. In Medical Image Computing andComputer-Assisted Intervention, pages 590–598. Springer, 2012. → pagesviii, 16, 41, 45, 46, 47, 50[16] B. Glocker, D. Zikic, E. Konukoglu, D. R. Haynor, and A. Criminisi.Vertebrae localization in pathological spine ct via dense classification fromsparse annotations. In Medical Image Computing and Computer-AssistedIntervention, pages 262–270. Springer, 2013. → pages ix, 16, 17, 40, 41, 45,46, 47, 48[17] X. Glorot and Y. Bengio. Understanding the difficulty of training deepfeedforward neural networks. In International Conference on ArtificialIntelligence and Statistics, pages 249–256, 2010. → pages 2680[18] X. Glorot, A. Bordes, and Y. Bengio. Deep sparse rectifier networks. InProceedings of the 14th International Conference on Artificial Intelligenceand Statistics, volume 15, pages 315–323, 2011. → pages 36[19] P. Hall, T. C. Hu, and J. S. Marron. Improved variable window kernelestimates of probability densities. The Annals of Statistics, pages 1–10,1995. → pages 30[20] S. Hanaoka, K. Fritscher, B. Schuler, Y. Masutani, N. Hayashi, K. Ohtomo,and R. Schubert. Whole vertebral bone segmentation method with astatistical intensity-shape model based approach. In SPIE Medical Imaging,pages 796242–796242. International Society for Optics and Photonics,2011. → pages 16, 50[21] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R.Salakhutdinov. Improving neural networks by preventing co-adaptation offeature detectors. arXiv preprint arXiv:1207.0580, 2012. → pages 34[22] C. Hoad and A. Martel. Segmentation of MR images for computer-assistedsurgery of the lumbar spine. Physics in Medicine and Biology, 47(19):3503,2002. → pages 6[23] S.-H. Huang, Y.-H. Chu, S.-H. Lai, and C. L. Novak. Learning-basedvertebra detection and iterative normalized-cut segmentation for spinal MRI.Medical Imaging, IEEE Transactions on, 28(10):1595–1605, 2009. → pages6, 9[24] I. Jolliffe. Principal component analysis. Wiley Online Library, 2005. →pages 33[25] C. Jones, J. Marron, and S. Sheather. Progress in data-based bandwidthselection for kernel density estimation. Computational Statistics, (11):337–381, 1996. → pages 30[26] M. Jones, I. McKay, and T.-C. Hu. Variable location and scale kernel densityestimation. Annals of the Institute of Statistical Mathematics, 46(3):521–535, 1994. → pages 30[27] M. C. Jones, J. S. Marron, and S. J. Sheather. A brief survey of bandwidthselection for density estimation. Journal of the American StatisticalAssociation, 91(433):401–407, 1996. → pages 3081[28] S. Kadoury, H. Labelle, and N. Paragios. Spine segmentation in medicalimages using manifold embeddings and higher-order MRFs. MedicalImaging, IEEE Transactions on, 32(7):1227–1238, 2013. → pages 6[29] S. Khallaghi, P. Abolmaesumi, R. H. Gong, E. Chen, S. Gill, J. Boisvert,D. Pichora, D. Borschneck, G. Fichtinger, and P. Mousavi. GPU acceleratedregistration of a statistical shape model of the lumbar spine to 3D ultrasoundimages. In SPIE Medical Imaging, pages 79642W–79642W. InternationalSociety for Optics and Photonics, 2011. → pages 66[30] K. I. Kim, K. Jung, and H. J. Kim. Face recognition using kernel principalcomponent analysis. Signal Processing Letters, IEEE, 9(2):40–42, 2002. →pages 33[31] K. C. Kiwiel. Convergence and efficiency of subgradient methods forquasiconvex minimization. Mathematical Programming, 90(1):1–25, 2001.→ pages 37[32] T. Klinder, J. Ostermann, M. Ehm, A. Franz, R. Kneser, and C. Lorenz.Automated model-based vertebra detection, identification, and segmentationin CT images. Medical Image Analysis, 13(3):471–482, 2009. → pages 16,17, 50[33] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification withdeep convolutional neural networks. In Advances in Neural InformationProcessing Systems, pages 1097–1105, 2012. → pages 22[34] Y. A. LeCun, L. Bottou, G. B. Orr, and K.-R. Mu¨ller. Efficient backprop. InNeural Networks: Tricks of The Trade, pages 9–48. Springer, 2012. → pages26, 36[35] H. Lee, P. Pham, Y. Largman, and A. Y. Ng. Unsupervised feature learningfor audio classification using convolutional deep belief networks. InAdvances in Neural Information Processing Systems, pages 1096–1104,2009. → pages 33[36] J. Ma, L. Lu, Y. Zhan, X. Zhou, M. Salganicoff, and A. Krishnan.Hierarchical segmentation and identification of thoracic vertebra usinglearning-based edge detection and coarse-to-fine deformable model. InMedical Image Computing and Computer-Assisted Intervention, pages19–27. Springer, 2010. → pages 16, 5082[37] J. Martens. Deep learning via hessian-free optimization. In Proceedings ofthe 27th International Conference on Machine Learning, pages 735–742,2010. → pages 37[38] A. Myronenko and X. Song. Point set registration: coherent point drift.Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(12):2262–2275, 2010. → pages 67[39] B. Naegel. Using mathematical morphology for the anatomical labeling ofvertebrae from 3D CT-scan images. Computerized Medical Imaging andGraphics, 31(3):141–156, 2007. → pages 16, 50[40] V. Nair and G. E. Hinton. Rectified linear units improve restrictedboltzmann machines. In Proceedings of the 27th International Conferenceon Machine Learning, pages 807–814, 2010. → pages 36[41] A. Neubert, J. Fripp, C. Engstrom, R. Schwarz, L. Lauer, O. Salvado, andS. Crozier. Automated detection, 3D segmentation and analysis of highresolution spine MR images using statistical shape models. Physics inMedicine and Biology, 57(24):8357–8376, 2012. → pages 6[42] J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Y. Ng. Multimodaldeep learning. In Proceedings of the 28th International Conference onMachine Learning, pages 689–696, 2011. → pages 33[43] C. Nvidia. Cublas library. NVIDIA Corporation, Santa Clara, California,15, 2008. → pages 70[44] A. B. Oktay and Y. S. Akgul. Simultaneous localization of lumbar vertebraeand intervertebral discs with SVM-based MRF. Biomedical Engineering,IEEE Transactions on, 60(9):2375–2383, 2013. → pages 16, 50[45] Y. Otake, M. Armand, R. S. Armiger, M. D. Kutzer, E. Basafa,P. Kazanzides, and R. H. Taylor. Intraoperative image-based multiview2D/3D registration for image-guided orthopaedic surgery: incorporation offiducial-based C-arm tracking and GPU-acceleration. Medical Imaging,IEEE Transactions on, 31(4):948–962, 2012. → pages 66[46] N. Otsu. A threshold selection method from gray-level histograms.Automatica, 11(285-296):23–27, 1975. → pages 52[47] B. Park, S.-O. Jeong, M. Jones, and K.-H. Kang. Adaptive variable locationkernel density estimators with good performance at boundaries. Journal ofNonparametric Statistics, 15(1):61–75, 2003. → pages 3083[48] Z. Peng, J. Zhong, W. Wee, and J.-h. Lee. Automated vertebra detection andsegmentation from the whole spine MR images. In Engineering in Medicineand Biology Society, 2005. IEEE-EMBS 2005. 27th Annual InternationalConference of the, pages 2527–2530. IEEE, 2006. → pages 6, 16, 50[49] P. Perona and J. Malik. Scale-space and edge detection using anisotropicdiffusion. Pattern Analysis and Machine Intelligence, IEEE Transactions on,12(7):629–639, 1990. → pages 8, 54[50] A. Rasoulian, R. Rohling, and P. Abolmaesumi. Lumbar spine segmentationusing a statistical multi-vertebrae anatomical shape+pose model. MedicalImaging, IEEE Transactions on, 32(10):1890–1900, 2013. → pages 6, 7, 9,18, 44, 50, 54, 66[51] A. Rasoulian, R. Rohling, and P. Abolmaesumi. A statistical multi-vertebraeshape+pose model for segmentation of CT images. In SPIE MedicalImaging, volume 8671, 2013. → pages 50[52] A. Rasoulian, R. N. Rohling, and P. Abolmaesumi. A statisticalmulti-vertebrae shape+ pose model for segmentation of CT images. In SPIEMedical Imaging, pages 86710P–86710P. International Society for Opticsand Photonics, 2013. → pages 7[53] A. Rasoulian, R. N. Rohling, and P. Abolmaesumi. Automatic labeling andsegmentation of vertebrae in ct images. In SPIE Medical Imaging, pages903623–903623. International Society for Optics and Photonics, 2014. →pages 16, 17, 50[54] L. Remonda, A. Lukes, and G. Schroth. [spinal stenosis: current aspects ofimaging diagnosis and therapy]. Schweizerische MedizinischeWochenschrift, 126(6):220–229, 1996. → pages 2[55] P. J. Richards, J. George, M. Metelko, and M. Brown. Spine computedtomography doses and cancer induction. Spine, 35(4):430–433, 2010. →pages 2[56] T. D. Sanger. Optimal unsupervised learning in a single-layer linearfeedforward neural network. Neural Networks, 2(6):459–473, 1989. →pages 33[57] J. Schmidhuber. Deep learning in neural networks: An overview. arXivpreprint arXiv:1404.7828, 2014. → pages 1784[58] S. Schmidt, J. Kappes, M. Bergtholdt, V. Pekar, S. Dries, D. Bystrov, andC. Schno¨rr. Spine detection and labeling using a parts-based graphicalmodel. In Information Processing in Medical Imaging, pages 122–133.Springer, 2007. → pages 16, 50[59] S. J. Sheather and M. C. Jones. A reliable data-based bandwidth selectionmethod for kernel density estimation. Journal of the Royal StatisticalSociety, pages 683–690, 1991. → pages 30[60] R. Shi, D. Sun, Z. Qiu, and K. L. Weiss. An efficient method forsegmentation of MRI spine images. In Complex Medical Engineering,IEEE/ICME International Conference on, pages 713–717. IEEE, 2007. →pages 6, 9[61] J. Shotton, J. Winn, C. Rother, and A. Criminisi. Textonboost for imageunderstanding: Multi-class object recognition and segmentation by jointlymodeling texture, layout, and context. International Journal of ComputerVision, 81(1):2–23, 2009. → pages 19[62] P. Shwaluk. Clinical anatomy and management of low back pain.Australasian Chiropractic & Osteopathy, 6(1):24, 1997. → pages 1[63] P. P. Smyth, C. J. Taylor, and J. E. Adams. Automatic measurement ofvertebral shape using active shape models. Image and Vision Computing, 15(8):575–581, 1997. → pages 16[64] D. Sˇtern, B. Likar, F. Pernusˇ, and T. Vrtovec. Parametric modelling andsegmentation of vertebral bodies in 3D CT and MR spine images. Physics inMedicine and Biology, 56(23):7505, 2011. → pages 6[65] D. Sˇtern, T. Vrtovec, F. Pernusˇ, and B. Likar. Segmentation of vertebralbodies in CT and MR images based on 3D deterministic models. In SPIEMedical Imaging, pages 79620D–79620D. International Society for Opticsand Photonics, 2011. → pages 6[66] A. Suzani, A. Rasoulian, S. Fels, R. N. Rohling, and P. Abolmaesumi.Semi-automatic segmentation of vertebral bodies in volumetric MR imagesusing a statistical shape+ pose model. In SPIE Medical Imaging, pages90360P–90360P. International Society for Optics and Photonics, 2014. →pages 50[67] C. Szegedy, A. Toshev, and D. Erhan. Deep neural networks for objectdetection. In Advances in Neural Information Processing Systems, pages2553–2561, 2013. → pages 2285[68] G. R. Terrell and D. W. Scott. Variable kernel density estimation. TheAnnals of Statistics, pages 1236–1265, 1992. → pages 30[69] P. Viola and M. J. Jones. Robust real-time face detection. InternationalJournal of Computer Vision, 57(2):137–154, 2004. → pages 19, 52, 62[70] P. J. Werbos. Backpropagation through time: what it does and how to do it.Proceedings of the IEEE, 78(10):1550–1560, 1990. → pages 26, 37[71] P. L. Williams, M. Dyson, L. Bannister, P. Collins, M. Berry, M. Ferguson,and J. Dussek. Gray’s anatomy: The anatomical basis of medicine andsurgery. 1995. → pages 1[72] J. Yao, J. E. Burns, H. Munoz, and R. M. Summers. Detection of vertebralbody fractures based on cortical shell unwrapping. In Medical ImageComputing and Computer-Assisted Intervention, pages 509–516. Springer,2012. → pages 7[73] K. Yu, W. Xu, and Y. Gong. Deep learning with kernel regularization forvisual recognition. In Advances in Neural Information Processing Systems,pages 1889–1896, 2009. → pages 34[74] P. A. Yushkevich, J. Piven, H. Cody Hazlett, R. Gimpel Smith, S. Ho, J. C.Gee, and G. Gerig. User-guided 3D active contour segmentation ofanatomical structures: Significantly improved efficiency and reliability.Neuroimage, 31(3):1116–1128, 2006. → pages 7, 18, 50[75] G. Zamora, H. Sari-Sarraf, and L. R. Long. Hierarchical segmentation ofvertebrae from x-ray images. In Medical Imaging 2003, pages 631–642.International Society for Optics and Photonics, 2003. → pages 16[76] M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. V. Le,P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, et al. On rectified linear unitsfor speech processing. In Acoustics, Speech and Signal Processing, IEEEInternational Conference on, pages 3517–3521. IEEE, 2013. → pages 36[77] J. Zhang, L. Lv, X. Shi, F. Guo, Y. Zhang, and H. Li. Houghtransform-based approach for estimating 3D rotation angles of vertebraefrom biplanar radiographs using GPU-acceleration. International Journal ofImaging Systems and Technology, 23(3):272–279, 2013. → pages 66[78] D. Zukic, A. Vlasa´k, T. Dukatz, J. Egger, D. Horı´nek, C. Nimsky, andA. Kolb. Segmentation of vertebral bodies in MR images. In Vision,86Modeling & Visualization, pages 135–142. The Eurographics Association,2012. → pages 687
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Automatic vertebrae localization, identification, and...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
Automatic vertebrae localization, identification, and segmentation using deep learning and statistical… Suzani, Amin 2014
pdf
Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.
Page Metadata
Item Metadata
Title | Automatic vertebrae localization, identification, and segmentation using deep learning and statistical models |
Creator |
Suzani, Amin |
Publisher | University of British Columbia |
Date Issued | 2014 |
Description | Automatic localization and identification of vertebrae in medical images of the spine are core requirements for building computer-aided systems for spine diagnosis. Automated algorithms for segmentation of vertebral structures can also benefit these systems for diagnosis of a range of spine pathologies. The fundamental challenges associated with the above-stated tasks arise from the repetitive nature of vertebral structures, restrictions in field of view, presence of spine pathologies or surgical implants, and poor contrast of the target structures in some imaging modalities. This thesis presents an automatic method for localization, identification, and segmentation of vertebrae in volumetric computed tomography (CT) scans and magnetic resonance (MR) images of the spine. The method makes no assumptions about which section of the vertebral column is visible in the image. An efficient deep learning approach is used to predict the location of each vertebra based on its contextual information in the image. Then, a statistical multi-vertebrae model is initialized by the localized vertebrae from the previous step. An iterative expectation maximization technique is used to register the statistical multi-vertebrae model to the edge points of the image in order to achieve a fast and reliable segmentation of vertebral bodies. State-of-the-art results are obtained for vertebrae localization in a public dataset of 224 arbitrary-field-of-view CT scans of pathological cases. Promising results are also obtained from quantitative evaluation of the automated segmentation method on volumetric MR images of the spine. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2014-10-14 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0166073 |
URI | http://hdl.handle.net/2429/50722 |
Degree |
Master of Applied Science - MASc |
Program |
Electrical and Computer Engineering |
Affiliation |
Applied Science, Faculty of Electrical and Computer Engineering, Department of |
Degree Grantor | University of British Columbia |
GraduationDate | 2014-11 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2014_november_suzani_amin.pdf [ 3.36MB ]
- Metadata
- JSON: 24-1.0166073.json
- JSON-LD: 24-1.0166073-ld.json
- RDF/XML (Pretty): 24-1.0166073-rdf.xml
- RDF/JSON: 24-1.0166073-rdf.json
- Turtle: 24-1.0166073-turtle.txt
- N-Triples: 24-1.0166073-rdf-ntriples.txt
- Original Record: 24-1.0166073-source.json
- Full Text
- 24-1.0166073-fulltext.txt
- Citation
- 24-1.0166073.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
data-media="{[{embed.selectedMedia}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0166073/manifest