Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Automatic characterization of developmental dysplasia of the hip in infants using ultrasound imaging Quader, Niamul 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2018_may_quader_niamul.pdf [ 28.2MB ]
Metadata
JSON: 24-1.0364129.json
JSON-LD: 24-1.0364129-ld.json
RDF/XML (Pretty): 24-1.0364129-rdf.xml
RDF/JSON: 24-1.0364129-rdf.json
Turtle: 24-1.0364129-turtle.txt
N-Triples: 24-1.0364129-rdf-ntriples.txt
Original Record: 24-1.0364129-source.json
Full Text
24-1.0364129-fulltext.txt
Citation
24-1.0364129.ris

Full Text

Automatic Characterization of Developmental Dysplasiaof the Hip in Infants using Ultrasound ImagingbyNiamul QuaderB.Sc. in Engineering, Islamic University of Technology, 2010M.Sc. in Engineering, Islamic University of Technology, 2012A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDoctor of PhilosophyinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Electrical and Computer Engineering)The University of British Columbia(Vancouver)February 2018c© Niamul Quader, 2018AbstractDevelopmental dysplasia of the hip (DDH) is the most common pediatric hip con-dition, representing a spectrum of hip abnormalities ranging from mild dysplasiato irreducible hip dislocation. Thirty-three years ago, the introduction of the Grafmethod revolutionized the use of ultrasound (US) and replaced radiography forDDH diagnoses. However, it has been shown that current US-based assessmentssuffer from large inter-rater and intra-rater variabilities which can lead to misdiag-nosis and inappropriate treatment for DDH.In this thesis, we propose an automatic dysplasia metric estimator based on USand hypothesize that it significantly reduces the subjective variability inherent inthe manual measurement of dysplasia metrics. To this end, we have developed anintensity invariant feature to accurately extract bone boundaries in US images, andhave further developed an image processing pipeline to automatically discard USimages which are inadequate for measuring dysplasia metrics, as defined by expertradiologists. If found adequate, our method automatically measures clinical dys-plasia metrics from the US image. We validated our method on US images of 165hips acquired through clinical examinations, and found that automatic extractionof dysplasia metrics improved the repeatability of diagnoses by 20%.We extended our automatic metric extraction method to three-dimensional (3D)US to increase robustness against operator dependent transducer placement and tobetter capture the 3D morphology of an infant hip. We present a new randomforests-based method for segmenting the femoral head from a 3D US volume, anda method for automatically estimating a 3D femoral head coverage measurementfrom the segmented head. We propose an additional 3D hip morphology-deriveddysplasia metric for identifying an unstable acetabulum. On 40 clinical hip ex-iiaminations, we found our methods significantly improved the reproducibility ofdiagnosing femoral head coverage by 65% and acetabular abnormalities by 75%when compared to current standard methods.iiiLay SummaryMany babies are born with unstable hips which can cause severe mobility issuesas they grow older. This condition is known as developmental dysplasia of the hip(DDH), and although it is the most common hip disorder in infants, its diagno-sis has been shown to be prone to large variability. The goal of this thesis is todevelop methods to automatically identify DDH in infants, using ultrasound (US)imaging. We demonstrate two-dimensional (2D) and three-dimensional (3D) ultra-sound systems that automatically extract clinical measurements which can be usedby clinicians to diagnose DDH. We show that our automatic 2D measurementsimprove reliability in repeated diagnoses by 20% on a dataset of 165 clinical hipexaminations. We demonstrate a much more considerable improvement of 70% inrepeated diagnoses with 3D US-based measurements, evaluated on 40 infant hipexaminations.ivPrefaceThe research presented herein was approved by the UBC Clinical Research EthicsBoard (CREB), certificate numbers: H14–01448 and H17–01904. This thesis isprimarily based on the following articles, resulting from collaboration of multipleresearchers.Studies described in Chapter 3 have been published in:[P1] N. Quader, A. Hodgson, and R. Abugharbieh. Confidence weighted localphase features for robust bone surface segmentation in ultrasound. In Workshopon Clinical Image-Based Procedures, pages 7683. Springer, 2014. Oral presenta-tion. [120][P2] N. Quader, A. Hodgson, and R. Abugharbieh. Assessing the feasibility ofdownsampling and wavelet resizing for real-time extraction of bone surfaces from3d ultrasound. In 15th Annual Meeting of the International Society for ComputerAssisted Orthopaedic Surgery, 2015. Poster presentation. [122][P3]N. Quader, A. Hodgson, K. Mulpuri, T. Savage, and R. Abugharbieh.Automaticassessment of developmental dysplasia of the hip. In Biomedical Imaging (ISBI),2015 IEEE 12th International Symposium on, pages 1316. IEEE, 2015. Posterpresentation. [124][P4] N. Quader, A. Hodgson, K. Mulpuri, A. Cooper, and R. Abugharbieh. To-wards reliable automatic characterization of neonatal hip dysplasia from 3d ul-trasound images. In International Conference on Medical Image Computing andvComputer-Assisted Intervention, pages 602609. Springer, 2016. Poster presenta-tion. [125]Studies described in Chapter 4 have been published in:[P5]N. Quader, A. J. Hodgson, K. Mulpuri, E. Schaeffer, and R. Abugharbieh.Automaticevaluation of scan adequacy and dysplasia metrics in 2-d ultrasound images of theneonatal hip. Ultrasound in Medicine & Biology, 43(6) : 12521262,2017. [129][P3] N. Quader, A. Hodgson, K. Mulpuri, T. Savage, and R. Abugharbieh. Au-tomatic assessment of developmental dysplasia of the hip. In Biomedical Imag-ing (ISBI), 2015 IEEE 12th International Symposium on, pages 1316.IEEE, 2015.Poster presentation. [124][P6] N. Quader, A. Hodgson, K. Mulpuri, and R. Abugharbieh. Improving di-agnostic accuracy of hip dysplasia measures in 2d ultrasound scans of infants toguide decisions regarding need for surgery. In 15th Annual Meeting of the Inter-national Society for Computer Assisted Orthopaedic Surgery, 2015. Oral presen-tation. [123][P7]N. Quader, A. Hodgson, K. Mulpuri, A. Cooper, E. Schaeffer, and R. Abughar-bieh. A reliable automatic 2d measurement for developmental dysplasia of the hip.In 16th Annual Meeting of the International Society for Computer Assisted Or-thopaedic Surgery, 2016. Oral presentation. [126]Studies described in Chapter 5 have been published in:[P4] N. Quader, A. Hodgson, K. Mulpuri, A. Cooper, and R. Abugharbieh.Towardsreliable automatic characterization of neonatal hip dysplasia from 3d ultrasoundimages. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 602609. Springer, 2016. Poster presentation. [125][P8] Quader, A. Hodgson, K. Mulpuri, A. Cooper, and R. Abugharbieh. A 3dfemoral head coverage metric for enhanced reliability in diagnosing hipdysplasia.In International Conference on Medical Image Computing and Computer-AssistedviIntervention, Quebec-Canada, pp. 100-107, September 2017. Poster presentation.[127]All published articles were revised and edited by all co-authors. I was the pri-mary author of all of these publications.[P1, P2]: I contributed to the article’s idea, method implementation and validationscheme under the supervision of Drs. Rafeef Abugharbieh and Antony J. Hodgson.[P3]: I contributed to the article’s idea, method implementation of method and val-idation scheme under the supervision of Drs. Rafeef Abugharbieh and Dr. AntonyJ. Hodgson. Dr. Thomas Savage contributed in retrieving ultrasound data and inproviding annotations for the retrieved ultrasound images. Dr. Kishore Mulpuriprovided clinical feedback for the study.[P5, P7]: I contributed to the article’s idea, design of the data collection proto-col, implementation of method and validation scheme under the supervision ofDrs. Rafeef Abugharbieh and Antony J. Hodgson. Drs. Kishore Mulpuri andEmily Schaeffer contributed in retrieving 2D ultrasound data from infants at BritishColumbia Children’s hospital and in providing clinical feedback.[P6]: I contributed to the article’s idea, design of data collection protocol, imple-mentation of method and validation scheme under the supervision of Drs. RafeefAbugharbieh and Antony J. Hodgson. Dr. Kishore Mulpuri contributed in retriev-ing 2D ultrasound data from infants at British Columbia Children’s hospital and inproviding clinical feedback.[P4, P8]: I contributed to the article’s idea, design of data collection protocol,implementation of method and validation scheme under the supervision of Drs.Rafeef Abugharbieh and Antony J. Hodgson. Drs. Anthony Cooper and KishoreMulpuri contributed in acquiring 2D and 3D ultrasound data from infants at BritishColumbia Children’s hospital and in providing clinical feedback including annota-viitions for the scanned ultrasound images.viiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxxiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxxiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Thesis Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 A Clinical Evaluation of Developmental Dysplasia of the Hip . . . 41.3.1 US-based Diagnosis . . . . . . . . . . . . . . . . . . . . 41.4 Towards Minimal User-Interaction in Extracting Dysplasia Metrics 101.4.1 US Bone Imaging . . . . . . . . . . . . . . . . . . . . . . 101.4.2 US Adequate Plane Detection . . . . . . . . . . . . . . . 13ix1.4.3 Dysplasia Metric Extraction . . . . . . . . . . . . . . . . 131.5 Research Questions Addressed . . . . . . . . . . . . . . . . . . . 151.6 Thesis Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 172 A Systematic Review and Meta-analysis of Variability in DysplasiaMetrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2 Study Identification . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.1 Search Strategy . . . . . . . . . . . . . . . . . . . . . . . 222.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.2.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 242.3 Methodologic Quality Appraisal . . . . . . . . . . . . . . . . . . 252.3.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 262.4 Meta-analysis of variability in dysplasia metrics . . . . . . . . . . 272.4.1 Extracting Variability and Agreement Statistics . . . . . . 272.4.2 Summarizing Variability Statistics . . . . . . . . . . . . . 272.4.3 Summarizing Agreement Statistics . . . . . . . . . . . . . 282.4.4 Comparing Dysplasia Metrics . . . . . . . . . . . . . . . 282.4.5 Estimating Trend of Variability over Time . . . . . . . . . 282.4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 382.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 Symmetry and Attenuation-based Ultrasound Bone Imaging . . . . 463.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 463.2 Confidence-Weighted Local Phase Features . . . . . . . . . . . . 493.2.1 Local Phase Symmetry Feature . . . . . . . . . . . . . . 503.2.2 Attenuation-related Features . . . . . . . . . . . . . . . . 513.2.3 Combined Feature for Bone Surface Localization . . . . . 533.2.4 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 543.2.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 563.3 Structured Phase Symmetry . . . . . . . . . . . . . . . . . . . . . 58x3.3.1 2D Filters-based Structured Phase Symmetry . . . . . . . 583.3.2 3D Filters-based Structured Phase Symmetry . . . . . . . 603.3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 663.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 664 Characterizing Hip Joint using 2D Ultrasound . . . . . . . . . . . . 684.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.2 Localizing Bone and Cartilage Boundaries . . . . . . . . . . . . . 704.2.1 Ilium, Labrum and Acetabulum . . . . . . . . . . . . . . 714.2.2 Femoral Head . . . . . . . . . . . . . . . . . . . . . . . . 724.3 Scan Adequacy Classification . . . . . . . . . . . . . . . . . . . . 754.3.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 774.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 804.4 Extraction of 2D Dysplasia Metrics . . . . . . . . . . . . . . . . 824.4.1 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 834.4.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 954.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 995 3D Ultrasound-based Hip Dysplasia Assessment . . . . . . . . . . . 1015.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.2 Defining 3D Hip Morphology-derived Dysplasia Metrics . . . . . 1025.3 Extracting α3D . . . . . . . . . . . . . . . . . . . . . . . . . . . 1025.3.1 Localizing I and A Across the Coronal Plane . . . . . . . 1045.3.2 Extracting 3D Bone Boundary . . . . . . . . . . . . . . . 1045.3.3 Approximating Planes of I and A . . . . . . . . . . . . . . 1055.3.4 Calculating α3D . . . . . . . . . . . . . . . . . . . . . . . 1075.4 Extracting FHC3D . . . . . . . . . . . . . . . . . . . . . . . . . . 1075.4.1 Femoral Head Segmention . . . . . . . . . . . . . . . . . 1085.4.2 Enhancing Femoral Head Segmention . . . . . . . . . . . 1085.4.3 Estimating FHC3D . . . . . . . . . . . . . . . . . . . . . 1095.5 Experiment, Results and Discussion . . . . . . . . . . . . . . . . 1095.5.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 1215.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125xi6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1276.1 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . 1276.1.1 A Systematic Review and Meta-analysis of Variability inDysplasia Metrics . . . . . . . . . . . . . . . . . . . . . . 1286.1.2 Symmetry and Attenuation-based US Bone Imaging . . . 1286.1.3 Characterizing Hip Joint using 2D US . . . . . . . . . . . 1296.1.4 A 3D US-based Hip Dysplasia Assessment . . . . . . . . 1306.1.5 Summary of Technical Contributions . . . . . . . . . . . 1316.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136A Examples of Automatic Dysplasia Metric Extraction from 2D USImages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153B Examples of Automatic Dysplasia Metric Extraction from 3D USImages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156xiiList of TablesTable 2.1 Initial logic grid aligned with the population-intervention-comparison-outcome (PICO) elements of the review question: For infants atrisk of DDH, are US imaging-based diagnoses reproducible?Note that although the comparison intervention is left empty inhere, we will be comparing all the US imaging-based diagnoseswith each other. . . . . . . . . . . . . . . . . . . . . . . . . . 22Table 2.2 Logic Grid with identified keywords added in each of the PICOcolumns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Table 2.3 Logic Grid with keywords and index terms or MeSH headingsin each of the PICO columns [mh = mesh headings, kw = key-word, exp=explode]. . . . . . . . . . . . . . . . . . . . . . . . 23Table 2.4 Final search strategy using keywords and MeSH [mh = meshheadings, kw = keyword, exp=explode]. . . . . . . . . . . . . 24Table 2.5 Criteria for evaluating normal and dysplastic hips along withranges between normal and hip dysplasia for each of the dys-plasia metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . 29Table 2.6 Meta-analysis result for the total variability (σT ) in α angle (in◦), β angle (in ◦) and FHC (in %). . . . . . . . . . . . . . . . . 33Table 2.7 Meta-analysis result for the total variability (σT ) in ACA angle(in ◦) and H angle (in ◦) and FHC (in %). . . . . . . . . . . . . 38Table 4.1 Features used in adequacy classifier. . . . . . . . . . . . . . . 78xiiiTable 4.2 Graf-DDH classification of the neonatal hips. Here, type-I (α >60◦, mature hip), type-II (60◦ < α < 43◦, physiological imma-ture hip) and type-III (α < 43◦, eccentric hip). The agreementbetween automatic and manual classification is fair (Cohen’skappa coefficient=0.61 (CI:0.48 and 0.75)). . . . . . . . . . . . 94Table 5.1 Success rates in acquiring adequate 3D US images and 2D USimageset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115xivList of FiguresFigure 1.1 A flow-chart outlining the thesis objectives. The red box iden-tifies the 3D US adequacy step, not being addressed in thisthesis, necessary for an automatic processing of a 3D US image. 5Figure 1.2 The propagation of an ultrasound pulse (white) along one par-ticular scanline (dotted red line). A strong echo is reflectedfrom the ilium bone boundary resulting in a high brightnessmeasure on the image at the bone boundary point. . . . . . . . 7Figure 1.3 (a) and (b) display the anatomy surrounding a hip joint, with(a) illustrating the femur and (b) the hip bone. Note that thefemoral head is not ossified. (c) A B-mode US image of aninfant hip acquired from angle of the coronal plane. This isalso an example of an adequate US image since this imageincludes the labrum, ischium, femoral head, acetabulum andflat horizontal ilium [79]. (d), (e) and (f) illustrate examplemeasurements of α , β and FHC. . . . . . . . . . . . . . . . . 11Figure 1.4 A flow-chart outlining the research questions and core blocksof this thesis. P1 to P10 are publications associated with thisthesis (note that P1 has been submitted and P10 is currentlybeing prepared for submission). The vertical arrow bars to theright provide a visual direction to the components of this thesisdescribed in the various chapters. . . . . . . . . . . . . . . . 18xvFigure 2.1 Flow diagram demonstrating study identification process. Thereasons for excluding a study were based on the inclusion andexclusion criteria. . . . . . . . . . . . . . . . . . . . . . . . . 25Figure 2.2 Variability in α angle. Variability is expressed as standard de-viation of discrepancy between repeated measurements. . . . 31Figure 2.3 Variability in β angle. Variability is expressed as standard de-viation of discrepancy between repeated measurements. . . . 31Figure 2.4 Variability in FHC. Variability is expressed as standard devia-tion of discrepancy between repeated measurements. . . . . . 32Figure 2.5 Variability in ACA angle. Variability is expressed as standarddeviation of discrepancy between repeated measurements. . . 32Figure 2.6 Variability in PFD. Variability is expressed as standard devia-tion of discrepancy between repeated measurements. . . . . . 34Figure 2.7 Variability in M measure. Variability is expressed as standarddeviation of discrepancy between repeated measurements. . . 34Figure 2.8 Variability in H angle. Variability is expressed as standard de-viation of discrepancy between repeated measurements. . . . 35Figure 2.9 Variability in dysplasia metrics based on our meta analysis.Here, the variability of each dysplasia metric was standardizedby dividing σT by the range of that dysplasia metric betweennormal and dysplastic hips. The inter-exam, inter-rater vari-ability is the most clinically relevant variability. . . . . . . . . 35Figure 2.10 Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for α . . . . . . . . . . . . . 36Figure 2.11 Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for β . . . . . . . . . . . . . . 36Figure 2.12 Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for FHC. . . . . . . . . . . . 37Figure 2.13 Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for AROC. . . . . . . . . . . 37Figure 2.14 Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for AL. . . . . . . . . . . . . 39xviFigure 2.15 Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for PFD. . . . . . . . . . . . 39Figure 2.16 Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for M. . . . . . . . . . . . . 40Figure 2.17 Agreement in dysplasia metrics, expressed as Kappa coeffi-cient between repeated measurements. . . . . . . . . . . . . . 41Figure 2.18 Forest plot showing correlation coefficients between variabil-ity of α and year of study. Top rows show mean and 95% con-fidence intervals of the correlation between intra-exam, intra-image, intra-user variability and year of study for α . Last rowshow the total random effects correlation coefficient estimat-ing the effective change in variability with year of study for α .Here, the summary measure is the center of diamond, and theassociated confidence intervals of the summary measure arethe lateral tips of the diamond. . . . . . . . . . . . . . . . . . 42Figure 2.19 Forest plot showing correlation coefficients between variabil-ity of β and year of study. Top rows show mean and 95% con-fidence intervals of the correlation between intra-exam, intra-image, intra-user variability and year of study for β . Last rowshow the total random effects correlation coefficient estimatingthe effective change in variability with year of study for β . . . 43Figure 2.20 Forest plot showing correlation coefficients between variabil-ity of FHC and year of study. Top rows show mean and 95%confidence intervals of the correlation between intra-exam, intra-image, intra-user variability and year of study for FHC. Lastrow show the total random effects correlation coefficient esti-mating the effective change in variability with year of study forFHC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44xviiFigure 3.1 Flowchart for local phase analysis for a 3D volume collectedaround the iliac crest area of an adult trauma patient (bluebox on segmented CT representing approximate location of UStransducer). Here, 3D log-Gabor filtering on the 3D US imageresults in real and imaginary components, which are used withEquation 3.4 for computing PS response. CPS feature is com-puted using Equation 3.7. White arrows point to false positivebone surface points. . . . . . . . . . . . . . . . . . . . . . . . 52Figure 3.2 Qualitative results: segmented bone surfaces around bovine fe-mur (first column) and around in-vivo pelvis in human traumapatients. First row shows segmented CT with box represent-ing approximate location of US transducer. Second row showscorresponding US volume. Third row shows PS based on em-pirical parameters [57]. Last row shows the CPS response. . . 57Figure 3.3 Quantitative results: (a) Bovine phantom. Note that our pro-posed CPS based segmentation resulted in a 0.302 mm reduc-tion in error compared to PS. (b) In-vivo pelvic data across allsubjects (C-1 to C-18). Note that our proposed CPS resultedin a reduction in error which is significant at (p < 0.01) basedon Wilcoxon signed rank test compared to PS. . . . . . . . . . 58Figure 3.4 Example qualitative results. (a) B-mode US image of the femoralhead, (b) monogenic signal based PS, (c) directional filter basedPS, (d) SPS. Red circles point to false negatives, and greencircles point to falsely straightened structures, on the labrum,ilium, acetabulum, and triradiate cartilage. Blue circles pointto falsely connected structures. Along the bone and cartilagescontours of labrum, ilium, acetabulum, and triradiate cartilage,SPS has less false positive and false negative. . . . . . . . . . 61xviiiFigure 3.5 Example qualitative results. (a), (h) B-mode US volumes. (b),(i) US volume segments along the posterior-anterior axis thatcontains the ilium, acetabulum and labrum. (c), (j) OPS re-sponses of (b) and (i), respectively. (d), (k) CSPS responsesof (b) and (i), respectively. (e), (l) US slices near the middleof the femoral head chosen by the orthopaedic surgeon for theUS volumes (b) and (i), respectively. Yellow, green and redcontours are manually labeled ilium, acetabulum and labrum,respectively. (f) and (m) show OPS-based contours of ilium,acetabulum and labrum. (g) and (n) show CSPS-based con-tours of ilium, acetabulum and labrum. White lines point tofalse positive bone boundaries. . . . . . . . . . . . . . . . . . 64Figure 3.6 Cumulative distribution function of absolute discrepancy be-tween manually-labeled ilium and OPS- and CSPS-based il-ium boundaries. . . . . . . . . . . . . . . . . . . . . . . . . . 65Figure 3.7 Cumulative distribution function of absolute discrepancy be-tween manually-labeled labrum and OPS- and CSPS-based labrumboundaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65Figure 3.8 Cumulative distribution function of absolute discrepancy be-tween manually-labeled acetabulum and OPS- and CSPS-basedacetabulum boundaries. . . . . . . . . . . . . . . . . . . . . . 66Figure 4.1 Example coronal US images of the neonatal hip. (a) AdequateUS image showing key hip joint structures. (b) Inadequate USimage (missing ilium and labrum). (c), (d) The α , β anglesand FHC measurements extracted manually from the adequateUS image in (a). . . . . . . . . . . . . . . . . . . . . . . . . 71Figure 4.2 A flow-chart outlining the steps involved in our computer-assisted2D US-based DDH diagnosis. . . . . . . . . . . . . . . . . . 71xixFigure 4.3 Overview of bone and cartilage extraction: (a) Example B-mode US image showing the acetabular roof, ilium, labrum,ischium and femoral head. (b) SPS capturing ridges of theB-mode US image. (c) Relative signal strength map or con-fidence map of the B-mode US image. (d) Segmented boneboundary from SPS, with white arrow pointing to inferior edgeof ilium, i. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Figure 4.4 Block diagram showing extraction of I, L and A. . . . . . . . 74Figure 4.5 Features used in localizing femoral head. Note that only thefirst HOG feature is shown in here. . . . . . . . . . . . . . . . 76Figure 4.6 (a) B-mode US image. (b) Output of femoral head randomforests classifier, p. (c) Sobel filtering on p generates edges ofthe femoral head. (d) Circular Hough transform applied on (c).Peak of circular Hough transform (yellow arrow) provides anestimate of the center coordinates and radius of the femoral head. 77Figure 4.7 Out-of-bag classification error vs. number of grown trees inrandom forest classifiers for adequacy classification. Out-of-bag classification error does not seem to decay considerablyafter number of trees = 70. . . . . . . . . . . . . . . . . . . . 78Figure 4.8 Example scan adequacy classification results. (a) Typical 2DB-mode images that were judged adequate by a radiologist(i.e., identified manually). Arrows in the images point to thekey structures that were identified manually. In an adequateimage, all of the key five structures need to be present. Thefirst four images were classified as adequate by our method,while the last image was classified as inadequate (probabilitymetric = 0.46). (b) Images that were judged to be inadequateby the radiologist. The first four images were classified as in-adequate by our method, while the last image was classified asadequate (probability metric = 0.51). Based on the probabilitymetrics produced by our classifier, both of these disagreementsin classification could be considered to be borderline cases. . . 81xxFigure 4.9 ROC curves of our 229 features-based random forests classi-fier. (a) The five different plots correspond to the five cross-validation experiments we conducted on the validation dataset.(b) The five different plots correspond to the five classifierseach from a different cross-validation dataset and evaluated onthe test dataset. . . . . . . . . . . . . . . . . . . . . . . . . . 82Figure 4.10 Example qualitative results of manual (labelled as ’M’ at righttop corners) and automatic measurements (labelled as ’A’ atright top corners). (a), (c) and (e) show examples where man-ual and automatic measurements produced similar values forα , β and FHC, whereas (b), (d) and (f) show examples wherethe manual and automatic measurements differed noticeably.The white dotted lines represent the manual measurements,while the blue dotted lines represent the automatic measure-ments and the green in (e) and (f) represent femoral head ran-dom forests’ probability output. . . . . . . . . . . . . . . . . 85Figure 4.11 Scatter plot of automatic versus manual measurements α an-gle. The red line is the equality line. Blue data points corre-spond to lower αA and yellow data points correspond to higherαA, so blue correspond to hips that would be judged as hip dys-plasia whereas yellow correspond to hips that would be judgedas normal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Figure 4.12 Scatter plot of automatic versus manual measurements β an-gle. The red line is the equality line. Blue data points corre-spond to lower βA and yellow data points correspond to higherβA, so blue correspond to hips that would be judged as normalwhereas yellow correspond to hips that would be judged as hipdysplasia. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87xxiFigure 4.13 Scatter plot of automatic versus manual measurements FHCangle. The red line is the equality line. Blue data points cor-respond to lower FHCA and yellow data points correspond tohigher FHCA, so blue correspond to hips that would be judgedas hip dysplasia whereas yellow correspond to hips that wouldbe judged as normal. . . . . . . . . . . . . . . . . . . . . . . 88Figure 4.14 Box-plot of the within-hip standard deviations among the man-ual and automatic measurements of α angles within all of the165 hip examinations (i.e., 165 values of std(αM) and 165 val-ues of std(αA)). On each box, the central mark indicates themedian, and the bottom and top edges of the box indicate the1st and 3rd quartiles, respectively. The ’+’ points indicateoutliers or data points that are outside the range of whiskers,where the whiskers correspond to 99.3% coverage if the dataare normally distributed. The within-hip standard deviationsare significantly less in the automatic α angle measurements(p < 0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Figure 4.15 Box-plot of the within-hip standard deviations among the man-ual and automatic measurements of β angles within all of the165 hip examinations (i.e., 165 values of std(βM) and 165 val-ues of std(βA)). On each box, the central mark indicates themedian, and the bottom and top edges of the box indicate the1st and 3rd quartiles, respectively. The ’+’ points indicateoutliers or data points that are outside the range of whiskers,where the whiskers correspond to 99.3% coverage if the dataare normally distributed. The within-hip standard deviationsare significantly less in the automatic β angle measurements(p < 0.01). . . . . . . . . . . . . . . . . . . . . . . . . . . . 90xxiiFigure 4.16 Box-plot of the within-hip standard deviations among the man-ual and automatic measurements of FHC angles within all ofthe 165 hip examinations (i.e., 165 values of std(FHCM) and165 values of std(FHCA)). On each box, the central mark in-dicates the median, and the bottom and top edges of the boxindicate the 1st and 3rd quartiles, respectively. The ’+’ pointsindicate outliers or data points that are outside the range ofwhiskers, where the whiskers correspond to 99.3% coverageif the data are normally distributed. The within-hip standarddeviations are significantly less in the automatic FHC anglemeasurements (p < 0.01). . . . . . . . . . . . . . . . . . . . 91Figure 4.17 Scatter plot of variability of αM vs. mean αM. Blue data pointscorrespond to lower mean values of αM and yellow data pointscorrespond to higher mean values of αM, so blue correspondto hips that would be judged as hip dysplasia whereas yellowcorrespond to hips that would be judged as normal. The redline is the best fit line. Correlation and p values suggest thatthere is no significant association in the variability of αM withthe severity of hip dysplasia. . . . . . . . . . . . . . . . . . . 92Figure 4.18 Scatter plot of variability of αA vs. mean αA. Blue data pointscorrespond to lower mean values of αA and yellow data pointscorrespond to higher mean values of αA, so blue correspondto hips that would be judged as hip dysplasia whereas yellowcorrespond to hips that would be judged as normal. The redline is the best fit line. Correlation and p values suggest thatthere is no significant association in the variability of αA withthe severity of hip dysplasia. . . . . . . . . . . . . . . . . . . 93xxiiiFigure 4.19 Scatter plot of variability of βM vs. mean βM. Blue data pointscorrespond to lower mean values of βM and yellow data pointscorrespond to higher mean values of βM, so blue correspondto hips that would be judged as normal whereas yellow corre-spond to hips that would be judged as hip dysplasia. The redline is the best fit line. Correlation and p values suggest thatthere is no significant association in the variability of βM withthe severity of hip dysplasia. . . . . . . . . . . . . . . . . . . 94Figure 4.20 Scatter plot of variability of βA vs. mean βA. Blue data pointscorrespond to lower mean values of βA and yellow data pointscorrespond to higher mean values of βA, so blue correspondto hips that would be judged as normal whereas yellow corre-spond to hips that would be judged as hip dysplasia. The redline is the best fit line. Correlation and p values suggest thatthere is no significant association in variability of βA with theseverity of hip dysplasia. . . . . . . . . . . . . . . . . . . . . 95Figure 4.21 Scatter plot of variability of FHCM vs. mean FHCM. Blue datapoints correspond to lower mean values of FHCM and yellowdata points correspond to higher mean values of FHCM, soblue correspond to hips that would be judged as hip dyspla-sia whereas yellow correspond to hips that would be judgedas normal. The red line is the best fit line. Correlation and pvalues suggest that there is significant association in the vari-ability of FHCM with the severity of hip dysplasia. . . . . . . 96Figure 4.22 Scatter plot of variability of FHCA vs. mean FHCA. Bluedata points correspond to lower mean values of FHCA and yel-low data points correspond to higher mean values of FHCA, soblue correspond to hips that would be judged as hip dyspla-sia whereas yellow correspond to hips that would be judgedas normal. The red line is the best fit line. Correlation and pvalues suggest that there is significant association in the vari-ability of FHCA with the severity of hip dysplasia. . . . . . . 97xxivFigure 5.1 (a) A rendering of the anatomy of a hip joint showing A (red),I(blue) and the femoral head. (b) A schematic illustration ofα3D - the angle between the normals to the fitted planar sur-faces approximating A and I. (c) A schematic illustration ofFHC3D, which is defined as the ratio of the volume of thefemoral head portion that is medial to I to the total volumeof the femoral head. . . . . . . . . . . . . . . . . . . . . . . 103Figure 5.2 Block diagram showing the extraction of I, A, femoral head,α3D and FHC3D. . . . . . . . . . . . . . . . . . . . . . . . . 103Figure 5.3 Block diagram showing the steps for identifying the coronalplane slices that contain I and A (orange colored volume) froman US volume (grey colored volume). . . . . . . . . . . . . . 105Figure 5.4 (a), (b) and (c) show example US slices within an US volumethat contain the I and A structures, whereas (d), (e) and (f)show example US slices in the same US volume that does notcontain at least one of I and A structures. (g) Plot showing theslice numbers in the US volume that contain I and A. . . . . . 106Figure 5.5 Block diagram showing the extraction of bone and cartilageboundaries from 3D US. Arrows in the last image are pointingto A, I and labrum. . . . . . . . . . . . . . . . . . . . . . . . 107Figure 5.6 Overview of our slice-based probability map extraction. (a)Overlay of example US volume and a manually segmentedfemoral head. (b) N number of C were evaluated using classi-fier R2 to determine their likelihood of intersecting the femoralhead. (c) Back-projected likelihood scores, L, for each of thecross-sections C. (d) Back-projected responses were summedand normalized to construct voxel-wise probability map, withan overlay of the manually segmented femoral head (yellow). 110Figure 5.7 Features used in the classifier R3 . . . . . . . . . . . . . . . . 111Figure 5.8 (a) Overlay of an example US volume and a manually seg-mented femoral head. (b) Overlay of the example US volumeand its automatically extracted femoral head voxel-wise prob-ability map P. . . . . . . . . . . . . . . . . . . . . . . . . . . 112xxvFigure 5.9 Out-of-bag classification error for various numbers of growntrees used by the R1 classifier. We selected 70 as being ournumber of trees for R1. . . . . . . . . . . . . . . . . . . . . . 112Figure 5.10 Out-of-bag classification error for various numbers of growntrees used by the R2 classifier. We selected 70 as being ournumber of trees for R2. . . . . . . . . . . . . . . . . . . . . . 113Figure 5.11 Out-of-bag classification error for various numbers of growntrees used by the R3 classifier. We selected 25 as being ournumber of trees for R3. . . . . . . . . . . . . . . . . . . . . . 113Figure 5.12 Cumulative distribution function showing the relative frequencyof different sd values in our data. Maximum value of sd was3.5mm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114Figure 5.13 Box-plot showing the distribution in acquisition time for both2D and 3D US for all the six participating surgeons. . . . . . 116Figure 5.14 Box-plot showing distribution of acquisition time of both 2Dand 3D US. Mean acquisition time of 3D US is slightly lessthan that of 2D US, however the change was not statisticallysignificant. . . . . . . . . . . . . . . . . . . . . . . . . . . . 116Figure 5.15 Scatter plot of FHC2D vs. FHC3D. The red line is the equalityline. Blue data points correspond to lower FHC3D and yellowdata points correspond to higher FHC3D, so blue correspondto hips that would be judged as dysplastic hips whereas yellowcorrespond to hips that would be judged as normal. . . . . . . 118Figure 5.16 Scatter plot of α2D vs. α3D. Blue data points correspond tolower α3D and yellow data points correspond to higher α3D,so blue correspond to hips that would be judged as dysplastichips whereas yellow correspond to hips that would be judgedas normal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119xxviFigure 5.17 Qualitative results. (a), (b), (d) and (e) show example variabil-ity of α2D, α3D, FHC2D and FHC3D from two 2D and two 3DUS images from a hip examination (α2D = 47◦ and 56◦, α3D =45.1◦ and 45.9◦, FHC2D = 51% and 71%, FHC3D = 46.6%and 47.8%). The higher variability in the input 2D US im-ages (and α2D and FHC2D values) can be seen in the manuallyaligned 2D US images in (c) compared to the lower variabilityin the manually aligned 3D US images (and α3D and FHC3Dvalues) in (f). . . . . . . . . . . . . . . . . . . . . . . . . . . 120Figure 5.18 Box-plot of the within-hip standard deviations among FHC2Dvs. FHC3D values within all of the hip examinations. On eachbox, the central mark indicates the median, and the bottom andtop edges of the box indicate the 1st and 3rd quartiles, respec-tively. The ’+’ points indicate outliers or data points that areoutside the range of whiskers, where the whiskers correspondto 99.3% coverage if the data are normally distributed. Thewithin-hip standard deviations are significantly less in FHC3Dcompared to FHC2D (p < 0.01). . . . . . . . . . . . . . . . . 121Figure 5.19 Box-plot of the within-hip standard deviations among α2D vs.α3D values within all of the hip examinations. On each box,the central mark indicates the median, and the bottom and topedges of the box indicate the 1st and 3rd quartiles, respectively.The ’+’ points indicate outliers or data points that are outsidethe range of whiskers, where the whiskers correspond to 99.3%coverage if the data are normally distributed. The within-hipstandard deviations are significantly less in α3D compared toα2D (p < 0.01). . . . . . . . . . . . . . . . . . . . . . . . . . 122xxviiFigure 5.20 Scatter plot of variability of FHC3D vs. mean FHC3D. Bluedata points correspond to lower mean values of FHC3D andyellow data points correspond to higher mean values of FHC3D,so blue correspond to hips that would be judged as dysplastichips whereas yellow correspond to hips that would be judgedas normal. The red line is the best fit line. Correlation andp values suggest that there is no significant association in thevariability of FHC3D with the severity of hip dysplasia. . . . . 123Figure 5.21 Scatter plot of variability of α3D vs. mean α3D. Blue datapoints correspond to lower mean values of α3D and yellow datapoints correspond to higher mean values of α3D, so blue corre-spond to hips that would be judged as dysplastic hips whereasyellow correspond to hips that would be judged as normal. Thered line is the best fit line. Correlation and p values suggest thatthere is no significant association in the variability of α3D withthe severity of hip dysplasia. . . . . . . . . . . . . . . . . . . 124Figure 6.1 A flow-chart outlining the research questions, core blocks andkey contributions of this thesis. P1 to P10 are publications as-sociated with this thesis (note that P1 has been submitted andP10 is currently being prepared for submission). The verticalarrow bars to the right provide a visual direction to the compo-nents of this thesis described in the various chapters. . . . . . 132Figure A.1 Visualization for the extracted dysplasia metrics from a 2D USimage in a healthy hip. In this example, the values of automat-ically extracted dysplasia metrics are: α = 70◦, β = 29◦ andFHC = 73.2%. . . . . . . . . . . . . . . . . . . . . . . . . . 154Figure A.2 Visualization for the extracted dysplasia metrics from a 2D USimage in a borderline hip. In this example, the values of au-tomatically extracted dysplasia metrics are: α = 47◦, β = 40◦and FHC = 59.8%. . . . . . . . . . . . . . . . . . . . . . . . 154xxviiiFigure A.3 Visualization for the extracted dysplasia metrics from a 2D USimage in a dysplastic hip. In this example, the values of au-tomatically extracted dysplasia metrics are: α = 38◦, β = 56◦and FHC = 41%. . . . . . . . . . . . . . . . . . . . . . . . . 155Figure B.1 Visualization for the extracted dysplasia metrics from a 3DUS image in a healthy hip. In this example, the values ofautomatically extracted dysplasia metrics are: α = 71.8◦ andFHC = 72.4%. The visualization in here was done automati-cally using 2017a (MATLAB 2017a, the Mathworks Inc., Nat-ick, MA, USA). The 3D plot generated using MATLAB waslater manually rotated in 3D to make the planes of the iliumand acetabulum more apparent. In Chapter 5, the visualizationwas done manually using AMIRA software (TGS, San Diego,USA). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157Figure B.2 Visualization for the extracted dysplasia metrics from a 3DUS image in a borderline hip. In this example, the values ofautomatically extracted dysplasia metrics are: α = 53.1◦ andFHC = 47.9%. The visualization in here was done automati-cally using 2017a (MATLAB 2017a, the Mathworks Inc., Nat-ick, MA, USA). The 3D plot generated using MATLAB waslater manually rotated in 3D to make the planes of the iliumand acetabulum more apparent. . . . . . . . . . . . . . . . . . 158Figure B.3 Visualization for the extracted dysplasia metrics from a 3DUS image in a dysplastic hip. In this example, the values ofautomatically extracted dysplasia metrics are: α = 35.6◦ andFHC = 22.3%. . . . . . . . . . . . . . . . . . . . . . . . . . 159xxixGlossary2D Two-dimensional3D Three-dimensionalACA Acetabular contact angleAL Arc lengthAROC Acetabular radius of curvatureAVN Avascular necrosisBRC Bony rim coverageCPS Confidence-weighted phase symmetryCSPS Confidence-weighted structured phase symmetryCT Computed tomographyCNN Convolutional neural networksDDH Developmental dysplasia of the hipFHC Femoral head coverageGRRAS Guidelines for reporting reliability and agreement StudiesHOG Histogram of oriented gradientsICC Intraclass correlation coefficientxxxLBP Local binary patternsMESH Medical subject headingsMRI Magnetic resonance imagingMSAC M-estimator sample consensusOA Hip osteoarthritisOPS Optimized phase symmetryPFD Pubo-femoral distancePS Phase symmetryPICO Population-intervention-comparison-outcomeROC Receiver operating characteristicsSFE Surface fitting errorSPS Structured phase symmetryUS UltrasoundxxxiAcknowledgmentsI would like to thank my advisors, Drs. Rafeef Abugharbieh and Antony J. Hodg-son, for supervising me throughout my doctoral studies, and for spending count-less hours towards helping me both within and beyond studies. I would also like tothank my clinical advisor, Dr. Kishore Mulpuri, for supervising my clinical studyand for supporting me in my research.Furthermore, I would like to thank my friends both inside and outside of BiSICL,Surgical Technology Lab, Ortopaedic Research at Children’s Hospital for all oftheir support. I would also like to thank Drs. Shayesteh Jahanfar and Boris Kuzel-jevic for helping me with statistical analysis, Ursula Ellis and Charlotte Beck forhelping me with searching articles for a systematic review, and Drs. AnthonyCooper, Thomas Savage and Emily Schaeffer for providing me with valuable clin-ical feedback. Also, I would like to thank Teo Tammie and Janice Andrade forpatiently supporting and assisting me with my clinical study.Last but not least, I would like to thank my parents, my wife, my sisters and allmy family for always being there for me.xxxiiDedicationTo my parents, my wife, my sisters and all my family.xxxiiiChapter 1IntroductionDevelopmental dysplasia of the hip (DDH) refers to a spectrum of hip abnormali-ties ranging from mild dysplasia with a stable hip, through subluxation, to total hipdislocation [36, 55, 79, 116, 145, 148]. DDH is the most common hip disorder ininfants, affecting 0.16% to 2.85% of all newborns [11, 34, 55, 145, 148]. DDH canalso cause considerable long–term debilitation if left untreated – e.g. early arthritisis often associated with DDH [145], while failure to diagnose and treat DDH ininfancy can lead to costly corrective surgical procedures[119] later. It is thereforeimportant to maximize the accuracy of diagnosing DDH. This thesis focuses onassessing and improving DDH’s diagnostic accuracy.1.1 Thesis MotivationEarly diagnosis and treatment of DDH is widely recommended [28, 63, 116, 159,163]. Early treatments include use of the Pavlic harness which is a common non-surgical treatment that encourages increased hip joint abduction and flexion [132]and has high success rates (60% to 95%) [28, 63, 116, 159, 163]. A delay in earlytreatment can result in needing more complex treatments at a later age [119]. Whenthe head of the femur is left in an abnormal position, the surrounding anatomy de-velops abnormally and this, if left untreated, can lead to the need for surgical cor-rection to provide the joint with adequate stability and symmetry [55]. Joint reori-entation procedures (i.e. pelvic osteotomies) may be performed initially, and many1patients then progress to requiring joint replacements early in life [55]. Further-more, artificial hip joints have limited lifespans, so patients may require multiplerevision procedures as they grow and develop [55, 163].While the cost for diagnosing and treating DDH in infants is moderate (e.g. ap-proximately 690£ per infant receiving diagnosis and a Pavlik harness–based treat-ment per year in the United Kingdom [163]), failure to detect and treat DDH duringearly infancy can lead to considerable socioeconomic costs later in life. In partic-ular, DDH in infancy is a major risk factor in the development of adult hip os-teoarthritis (OA) [67, 68, 72, 100]. In 2013, Hoaglund performed a meta-analysisstudy which resulted in the conservative estimate that 10% of all OA patients alsohad DDH [72]; however, Nakamura [100] estimated that 88% in a study of 2000consecutive OA patients in Japan had DDH, so the rates may vary considerably bycountry. Even when considering Hoaglund’s conservative estimate, Price [119] cal-culated that DDH might be responsible for approximately 25,000 hip replacementprocedures annually in the United States. At approximately $50,000 per procedure[142], the direct financial impact of these hip replacements is in the order of $1.25billion per year in the United States alone, not including the cost of multiple, moreexpensive revision cases later in life, or other associated economic costs for thepatients and their families.While early treatment of DDH can reduce complications in later life, misdi-agnosis and over-treatment of DDH can lead to adverse consequences [112, 147].The worst of these complications, avascular necrosis (AVN) (the death of bonetissues due to a lack of blood supply), is likewise the most frequent complicationarising in both surgical and non-surgical treatments for DDH [112, 147]. A meta-analysis revealed that 1.35% to 10.9% of all infants undergoing treatment mighthave AVN [92]. The authors did not provide a comparison between the rates ofAVN following surgical as opposed to non-surgical treatments. Based on the in-cluded studies in Shipman’s review paper [147], surgical treatments are associatedwith markedly higher rates of AVN compared to nonsurgical treatments (0% to 4%in nonsurgical treatments [21, 39, 117, 165] versus 6% to 46% in surgical treat-ments [18, 30, 118, 160]). To reduce the rates of DDH treatments leading to AVN,it is important to reduce the rate of treatments performed redundantly, and moreimportantly, to reduce the number of excessive surgical interventions. Thus, it is2essential that DDH be diagnosed early and reliably.At an early age (from birth to 6 months of age), the femoral head in the hip jointis primarily cartilaginous, limiting the utility of plain radiographs in visualizing keystructures until ossification begins at around the 4 month mark [4]. Also, misdiag-nosis with radiographs can occur due to variability in pelvic rotation during imageacquisition [155]. Furthermore, radiographs are associated with ionizing radiation[4, 22, 50, 99]. Consequently, US has become the primary imaging modality fordetecting acetabular dysplasia and/or hip dislocation in infants at this early stage[4, 104, 137, 150]. However, despite considerable research on diagnosing DDHusing US imaging [51, 52, 61, 62, 98], misdiagnosis still occurs frequently - for ex-ample, Jaremko demonstrated that through using a standard US-imaging technique,a DDH patient might be interpreted as having healthy hip joints, while a healthyinfant could conversely be interpreted as having hip dysplasia [79]. This variabil-ity in DDH diagnosis can lead to non-trivial rates (e.g. 29% under-treatment rates[78] and 38% over-treatment rates [147]). Another concern is that, US is less fre-quently used in many of developing countries because of a lack of hip sonographyspecialists, particularly in rural and remote areas [7]. Even when diagnosis is doneby specialists, manually extracting dysplasia metrics is prone to high variabilityand error [53, 79, 106, 137]. In this thesis, we hope to improve the repeatabilityof US-based DDH assessments, which may in turn lead to reduced misdiagnosisrates. Furthermore, we hope to increase the automation of the diagnosis process,which may enable less specialized clinicians to perform effective diagnostic tests.1.2 Thesis ObjectivesThe overarching goal of this thesis is to assess and improve the repeatability ofUS imaging-based diagnosis. More formally, our first objective is to assess the re-peatability of currently available dysplasia metrics (objective 1 in Figure 1.1). Therest of the thesis objectives are based on improving repeatability of DDH diagnosisby minimizing user-interaction in extracting dysplasia metrics (objectives 2 and 3in Figure 1.1).Towards minimizing user-interaction in extracting dysplasia metrics, our sec-ond objective is to automatically process 2D US images of the neonatal hip for3diagnosing DDH - this includes extracting bone and cartilage boundaries (objec-tive 2A in Figure 1.1), assessing adequacy of 2D US images (i.e., whether an USimage is adequate for making dysplasia metric measurements), extracting dyspla-sia metrics, and comparing reproducibility of dysplasia metric measurements us-ing our method against reproducibility of dysplasia metric measurements usingother available methods (objective 2B in Figure 1.1). Anticipating that 3D hipmorphology-based DDH diagnosis will be more reliable, our final and third objec-tive is to automatically process 3D US images of the neonatal hip for diagnosingDDH (Figure 1.1). In this 3D US-related objective, we focus on extracting boneand cartilage boundaries (objective 3A in Figure 1.1), extracting 3D morphology-derived dysplasia metrics, and comparing reproducibility of the 3D morphology-derived dysplasia metrics extracted using our method against reproducibility ofdysplasia metric measurements using 2D US imaging-based methods (objective3B in Figure 1.1). This thesis does not address the problem of identifying the scanadequacy of 3D US images – a graduate student is currently working on this issue.1.3 A Clinical Evaluation of Developmental Dysplasia ofthe HipThe current clinical practice for the management of DDH is to screen infants usinga clinical and/or US-based diagnosis method [147]. Clinical examination tests thestability of the hip within its socket using the Barlow and Ortolani tests [55, 147].Originally popularized by Graf [51], the US-based methods improve the sensi-tivity of clinical examination-based diagnoses [40, 54, 136, 163]. Other imagingmodalities employed in diagnosing DDH are the X-ray, magnetic resonance imag-ing (MRI) and computed tomography (CT). X-rays are useful after the femoralepiphysis ossifies and are used as a primary tool after six months of age [4]. MRIsand CTs are only used in the pre- and postoperative evaluation of surgical hip treat-ments [4].1.3.1 US-based DiagnosisCurrently, US is the primary imaging modality for detecting acetabular dysplasiaand/or hip dislocations in infants at this early stage [104, 137, 150]. US is preferred4Figure 1.1: A flow-chart outlining the thesis objectives. The red box identi-fies the 3D US adequacy step, not being addressed in this thesis, neces-sary for an automatic processing of a 3D US image.over X-rays for diagnosing DDH at early age since the latter involves ionization ofmatter directly along their path of travel – this can damage living tissue and theDNA within cells [70]. While US is considered safer than X-rays, there are somerisks associated with US.In this subsection, we first provide fundamentals of US imaging, with a focuson US bone imaging since diagnosing DDH requires imaging the bone and carti-lage boundaries in an infant hip. We then present a short report on the potentialthermal and non-thermal risks associated with performing an infant hip examina-tion using US, and provide evidence suggesting that the risks associated with USimaging in DDH diagnosis is insignificant. Next, we report the currently recom-5mended US–based DDH diagnosis procedures and provide details on each of thoseprocedures.Fundamentals of US Imaging of Bone Boundaries: US images are obtained byusing a pulse-echo approach. Here, an US transducer, which is formed by an arrayof piezoelectric crystals, transmits a spatially localized pulse of US and directs itinto a patient along a US scanline (Figure 1.2). For convenience, in the rest ofthis thesis, we assume that US transducer is at the top of the US image regardlessof whichever orientation the US transducer was placed during image acquisition(similar to Figure 1.2). Thus, a mention of an upward direction refers to directiontowards the US transducer and a mention of downward direction refers to directionaway from the US transducer. As the US pulse travels through a point in the patient,part of the US signal is reflected (US echo), part of the US signal gets absorbed andthe remainder of the pulse continues deeper into the patient. These reflections at aparticular point depends on the energy of the US pulse and also on the local ratioof acoustic impedance along the line of US pulse travel. When an US pulse entersa bone, the local ratio of acoustic impedance at the bone boundary is large sincebone has a considerably larger acoustic impedance in comparison to neighboringtissues. The echo signals are collected at the transducer, and are then convertedto brightness or luminance measures (Figure 1.2). The brightness measures corre-sponding to the bone boundary locations are larger than the brightness measures atpoints immediately before the bone boundary. Also, since bone strongly attenuatesUS signal, echoes from beyond the bone boundary are weak and thus brightnessmeasures from points beyond the bone boundary are small. This strong response atthe bone boundary along with weak response around the bone boundary generatesa local symmetry (or a ridge-like appearance) of brightness measures around thebone boundary in an US image (Figure 1.2).Thermal Risks Associated with US Diagnosis: When US is directed into a pa-tient, there can be risks from heating of tissues. Draper [37] found an increase intemperature by 5◦C in the gastrocnemius muscle at a depth of 3 cm when applyinga continuous-echo US for 10 minutes with an US signal power of 1.5 W/cm2. In6Figure 1.2: The propagation of an ultrasound pulse (white) along one partic-ular scanline (dotted red line). A strong echo is reflected from the iliumbone boundary resulting in a high brightness measure on the image atthe bone boundary point.practice, US signals are emitted at a power of around 0.7 W/cm2 [146]. More im-portantly, the acquisition time of US is considerably shorter than 10 minutes (e.g.around 2 seconds for acquiring a 3D US volume). Also, we use a pulse-echo USapproach which has a considerably smaller heating effect than that of a continuous–echo approach [6]. For all these reasons, we estimate that the increase in heatingfrom a 3D US–based diagnosis would be around 1/1000th of 5◦C, and the increasein heating from a 2D US–based diagnosis would be even smaller. Thus, it is un-likely that thermal risks are present in an US–based DDH assessment.Non–thermal Risks Associated with US–based DDH Diagnosis: Baker’s [6] re-view article identified two non-thermal risks in association with US – cavitation (orthe formation of tiny gas bubbles in the tissues as the result of US vibration) andacoustic streaming (or a localized liquid flow in the fluid around a vibrating bub-ble as the result of US vibration). Regarding cavitation risks, Baker recommended7caution in continuous long exposure (around 10 minutes) of US near air-filled cav-ities such as the lungs and intestines. Since our target anatomy is the hip jointand also since the acquisition time of US in DDH diagnosis is short (e.g. around2 seconds for acquiring a 3D US volume), it is unlikely that cavitation risks arepresent in a 2D US–based DDH assessment. Regarding acoustic streaming, it isassociated with and is secondary to cavitation in terms of risks [6], so it is unlikelythat acoustic streaming risks are present in an US–based DDH assessment.Recommended US–based DDH Diagnosis Procedure: The American College ofRadiology [104] has standardized US-imaging-based methods for assessing DDH,which includes estimating the acetabular morphology using Graf’s method [51],estimating the femoral head coverage (FHC) using Morin’s method (optional) [98]and assessing the reducibility of the dislocated hip using Harcke’s method [62].Outlines of these methods are as follows:Acetabular morphology assessments using Graf’s method: Graf’s method [51]involves measuring two angle measurements, α and β . The α angle is the anglebetween the acetabular roof and the vertical cortex of the ilium, whereas the βangle is that between the vertical cortex of the ilium and the labrum (Figure 1.3(d) and (e)). A higher α corresponds to healthier hips. The threshold for a normalhip varies between studies. Graf reported an α of > 60◦ to represent a healthy hip[52], wheras Jones reported a healthy hip as being represented by an α of > 55◦[82]. In contrast to α , a lower β of < 55◦ corresponds to healthier hips [52].To improve reproducibility, these measurements are only performed in the B-modeUS images collected in the coronal plane, fulfilling the following adequacy criteria:the presence of the labrum, ischium, femoral head, acetabulum and flat horizontalilium (Figure 1.3 (c)) [52, 79].FHC Assessments using Morin’s method: FHC is defined as the ratio of the ac-etabular width to the maximal femoral head diameter (Figure 1.3 (f)). A higherFHC corresponds to healthier hips, with a hip with FHC > 55% being consideredhealthy. Also, to improve reproducibility, FHC measurements are made only in8the B-mode US images collected in the coronal plane which fulfill the followingadequacy criteria: the presence of the labrum, ischium, femoral head, acetabulumand flat horizontal ilium (Figure 1.3 (c)) [79].The Reducibility Assessment using Harcke’s method: In this method, US imagingis used to observe the relative location of the femoral head with respect to the iliumfrom both the coronal and transverse planes, while a motion and stress maneuversimilar to the Barlow and Ortolani examination being performed [61]. The diag-nosis outcome is categorical, namely: normal, lax, dislocatable, reducible and notreducible.Several other dysplasia metrics are available but not commonly used; theseinclude the acetabular contact angle (ACA a measure of the relative shape of the3-D acetabulum to the vertical cortex of the ilium) [64, 94], combined H angle(H the angle between the vertical cortex of the ilium and the line that connectsthe lower medial iliac edge to the most distal part of the labrum) [74], roundingindex (M: a measure of acetabular convexity) [23], acetabular radius of curvature(AROC a measure of curvature at the perichondrial point) [23], arc length (AL an-other measure of curvature at the perichondrial point) [23], pubo-femoral distance(PFD a measure of distance between the pubic bone and the femoral head) [157],bony-rim coverage (BRC a measure of ratio of the acetabular width to the maximalfemoral head diameter) [157] and L value (L the slope from the vertical cortex ofthe ilium to the acetabulum) [131]. A detailed analysis of the reproducibility ofeach in comparison to Graf’s [51], Morin’s [98] and Harcke’s [62] methods, arecovered in Chapter 2.While several methods exist to diagnose DDH using US imaging, all of thesemethods share two major difficulties. First, it is challenging to identify a standardor adequate 2D US plane that intersects all the necessary bone/cartilage structures(the ilium, acetabulum, ischium, labrum and femoral head) while remaining repro-ducible under repeated acquisitions. In one study, a group of 250 medical doctors(orthopaedic surgeons, paediatricians and radiologists) performing hip sonographyexaminations were required to classify four sonogram images [53]. 72% of themedical doctors made mistakes. 64% of these medical doctors were unable to lo-cate the bone/cartilage boundaries and adequate planes [53]. The remaining 36%9were able to identify adequate US images, but miscalculated the dysplasia metricmeasurements. Although the authors did not report the degree of error in makingdysplasia metric measurements, other studies have reported high variability in re-peated dysplasia metric measurements from the same hip [79, 107] (e.g. standarddeviation approximately ≈ 7◦ in repeated α measurements [79]).1.4 Towards Minimal User-Interaction in ExtractingDysplasia MetricsAnalogous to the US-based methods outlined in the previous section, a computeraided automatic DDH assessment should include methods for segmenting the boneand cartilage boundaries from a US image, one for classifying the adequacy ofthe acquired image for dysplasia metric extraction, and one for extracting the dys-plasia metrics from the adequate images. In this section, we outline the relevantbackground literature for each of these three components necessary for an auto-matic DDH assessment pipeline.1.4.1 US Bone ImagingAutomatic assessment of bone boundary in US images can be difficult since USimages are typically characterized by high levels of speckle noise, reverberation,anisotropy, and signal dropout, thereby making it demanding to interpret the imageand reliably detect its relevant features [103]. Over the last decade, several studieshave proposed novel methods for extracting the bone boundaries from US images.Some of these studies have suggested using phase symmetry responses in USimages to localize bone boundaries that are mostly planar [2, 17, 57, 59]. Thephase symmetry response is an intensity invariant local symmetry feature. Theintensity invariance property of phase symmetry makes it a robust measure againstUS signal dropout. To address US signal dropout, an intensity invariant featurelike phase symmetry is more advantageous than using a time–gain compensation(a method that increases the received signal intensity with depth) since the latterintroduces more noise in the US image.To improve segmentation on non-flat, multi-oriented or curved boundaries Haci-haliloglu [58] proposed the use of isotropic phase symmetric measurements. While10Figure 1.3: (a) and (b) display the anatomy surrounding a hip joint, with (a)illustrating the femur and (b) the hip bone. Note that the femoral headis not ossified. (c) A B-mode US image of an infant hip acquired fromangle of the coronal plane. This is also an example of an adequate USimage since this image includes the labrum, ischium, femoral head, ac-etabulum and flat horizontal ilium [79]. (d), (e) and (f) illustrate exam-ple measurements of α , β and FHC.11we did not find any direct comparison between Hacihaliloglu’s automatically-selectedphase symmetry method [59] and Hacihaliloglu’s isotropic phase symmetry-basedmethod [58], qualitative results in these works suggest that the isotropic phasesymmetry-based method has noticeably more false positive bone surfaces as com-pared to the non isotropic phase symmetry-based method.Other bone-boundary segmentation methods exist that exploit the bone shad-owing effect and local image intensity. For example, Fanti [42] used eigen-analysisinformation from a multi-scale 3D Hessian matrix to enhance sheet-like surfacesfor the purpose of generating 3D segmentations of large bones [42]. However, theresults from these techniques remain heavily dependent on the quality of the USimages used, as well as on the depth and complexity of the imaged bones due tothe effects of the shadowing and attenuation of the local intensities [59]. Recently,Hussain [77] proposed combining elastography strain imaging and the envelopepower of radio-frequency values to identify bone boundaries. The authors demon-strated a markedly reduced false positive rates in these techniques as compared tothose of phase-symmetry-based methods. However, the acquisition of good qual-ity strain imaging requires artefact-free cine loops of decompression-compressioncycles, in turn requiring specialization even in the acquisition of 2D US images[161].In 2014, we introduced a US bone imaging method that combines symmetryand attenuation features to markedly reduce false positive outliers that are presentin a phase-symmetry–based bone segmentation [120]. Following our contribution,similar methods of combining symmetry and attenuation features for segmentingbone boundaries were reported [2, 80, 109]. The dynamic programming basedbone segmentation methods proposed by Ozdemir [109] and Jia [80] both sufferfrom long computation times (around 2 minutes per US slice [109] and around 5seconds per US slice [80]). Anas’s method is relatively faster (around 0.3 secondper US slice [2], which is similar to our implementation’s runtime – around 0.25second per slice [120]), but the bone segmentation filters used are limited to a fi-nite number of orientations, suggesting that the technique may not be well-suitedto segmenting round (i.e., not sparse-oriented) bone/cartilage boundaries of theneonatal hip. We need an approach that can extract multi-oriented boundaries andyet remain robust to speckle, signal dropout and soft-tissue outliers. Note that the12computation times reported in the different studies reported here are comparablesince the implementations are all on MATLAB and the computers used are com-parable (e.g. an intel i7 930 @ 2.80 GHz and 8 GB RAM for Ozdemir’s slow 2minutes/slice [109] and a Xeon(R) 3.40 GHz CPU computer with 8 GB RAM forour considerably faster 0.25 second/slice method [120]).1.4.2 US Adequate Plane DetectionBefore extracting dysplasia metrics from the bone and cartilage boundaries in anUS scan, it is essential to first check whether an US image is adequate for extractingdysplasia metrics. Studies reveal that the process of classifying adequate images inDDH diagnoses is difficult, even for experts [53, 158].Similar difficulties of identifying a standard plane or an adequate plane areshared by many other US-based diagnoses [44, 91, 95, 96, 102, 130, 166]. A num-ber of machine learning-based solutions have been proposed to identify adequateor standard US images in diverse settings. Applications include: the fetal face[44], abdomen [130], heart [96], and head [95, 102]; the gestational sac [166] andcustom made phantoms [91]. These methods involve supervised machine learningapproaches (e.g. random forest [102], AdaBoost [130, 166], support vector [95],probabilistic boosting tree [44], deep learning [164]) and are driven by structuralfeatures, which are specifically Haar-like features [44, 102, 130, 166] and dynamictexture models [91, 95] or automatically learned features for deep learning. Themain limitations to these methods are that they are generally learned and validatedon mostly normal patient data. For our application, we need to identify adequateimages in both healthy and hip dysplasia patients; to the best of our knowledge, nomethods have been yet proposed to automatically identify adequate US images ofthe neonatal hip.1.4.3 Dysplasia Metric ExtractionOnce an US image is identified, the next step in a DDH diagnosis procedure is tomeasure dysplasia metrics in the US image. Although US-based DDH diagnosishas been in use for more than three decades [51, 79], very little work has beenperformed on automatically extracting dysplasia metrics. Hareendranathan [65]13proposed a semi-automatic method for extracting a contour α angle, based on therelative geometry of the ilium and acetabulum boundaries in a 2D US image. Theydiscovered that this contour α has slightly lower intra-examination variability com-pared with the standard α angle (∆σ = 0.2◦ for one rater and 0.4◦ for another rater).Golan [48] proposed using convolutional neural networks (CNN) to automaticallysegment the ilium and acetabulum boundary in US images of the neonatal hip, andused the segmented bone boundary to estimate α angle values. The authors did notreport the variability of their automatically extracted α values.The lack of a large reduction in variability is perhaps explained by Jaremko’sstudy [79] in which Jaremko noted that 2D US-based dysplasia metrics were sensi-tive to variability in positioning the 2D probe over the 3D hip structure. To addressthis probe-orientation-dependent variability, recent studies proposed using the 3DUS to extract 3D hip morphology-derived dysplasia metrics. For example, Hareen-dranathan [64] proposed the following 3D dysplasia metric: the ACA (measuredfrom 3D US data using manual landmark selections on the ilium and acetabulum,with subsequent angle measurements). This method involves a slice-by-slice anal-ysis process that requires manually selecting at least three seed points in three ofthe 2D US slices in a 3D US volume and manually separating the acetabulumfrom the ilium. Using such an interactive method would require valuable clini-cian time; further, these manual operations introduce a within-image measurementvariability of approximately 1◦ [64] and an inter-scan variability of approximately4◦ [94]. In another study, Mabee [93] proposed manually selecting an optimal 2DUS plane from the 3D US so as to reduce the variability which is currently per-sistent in an only 2D US-based technique. However, the intra-rater variability inthe proposed method did not improve considerably compared to the variability inthe 2D US-based diagnosis. In terms of semi-automatic analyses, de Luis-Garcia[31] minimized a local structure tensor feature and an interactive user-defined cen-tral location based energy term to segment the 3D femoral head of the neonatalhip. However, this semi-automatic work was validated qualitatively only on oneexample, without performing any quantitative evaluation of the segmentation per-formance. Furthermore, no method was proposed for using the segmented femoralhead for DDH diagnoses.141.5 Research Questions AddressedAt least twelve dysplasia metrics have been reported for diagnosing DDH using USimaging (see Chapter 2), but no meta-analysis is available to summarize statisticsregarding the variability of any of these dysplasia metrics. This makes it difficultto assess the relative utility of each of these metrics. To provide clear estimates onthe variability of these measuring systems, we have formulated our first researchquestion as follows:• Research Question 1: Using a systematic review and meta-analysis, can wesummarize estimates of the reproducibility of different dysplasia metrics?Among these metrics, Graf’s method is the most commonly used [51, 52, 79,104] and is recommended for use by the Americal College of Radiology [104].However, manually extracting dysplasia metrics, including Graf’s, is prone to highvariability and error [53, 79, 106, 137]. To address this variability and standard-ize the US-based DDH diagnosis, Germany has established special commissionswhich have enforced a checklist for hip sonographers to follow [53]. In 2011, oneGerman commission revoked the licenses of up to 43.7% of the hip sonographersin eight German states due to deficiencies in their protocols [158].To aid clinicians and sonographers in this challenging task of proposely acquir-ing the US images and correctly extracting the dysplasia metrics, we have formu-lated our next three research questions as follows:• Research Question 2: Can we develop in 2D US images a method that au-tomatically localizes hip bone boundaries? Can we also extend the 2D USbone imaging method to 3D, ensuring that the mean discrepancy betweenmanually-labelled bone boundaries and automatically-extracted bone bound-aries is < 2 mm? This 2 mm is based on empirical observations of the thick-ness of bone boundaries (ranging between 1.5 mm and 4 mm) in US images.• Research Question 3: Can we develop a method that automatically classifies2D US images that are adequate for dysplasia metric measurements, withboth true positive and negative rates being higher than 90% when the auto-matic classifications of the images are compared against the expert-labeledclassifications?15• Research Question 4: Can we develop a method that automatically extractsdysplasia metrics from adequate US images, with at least a moderate agree-ment (criteria for moderate correlation coefficient: r > 0.36 [152]) betweenthe automatically and manually extracted dysplasia metrics performed byexperts?In 2013, Graf [53] suggested that the inconsistent hip sonography techniqueswithin and across the health centres are one of the major causes of variability.Thus, we have anticipated that an automatic dysplasia metric extraction methodwill remove that inconsistency, and provide a more reproducible diagnosis. Ourfifth research question was:• Research Question 5: Does the automatically extracted 2D US-based dyspla-sia metrics improve the reproducibility significantly of the DDH diagnosis?Later in 2014, Jaremko [79] discovered that a greater source of variability indysplasia metric extraction is due to changes in the relative locations of the UStransducers and the infant subjects’ hips [86]. We anticipate being able to solvethis crucial probe-orientation-dependent variability problem using an intrinsic 3Dmorphology metric derived directly from 3D US scans. However, interpreting 3DUS images adds another layer of complexity to manually extracting dysplasia met-rics in comparison to interpreting 2D US images. To provide clinicians with aneasy-to-use tool that can potentially provide a more reliable diagnoses comparedto the current standard 2D US–based diagnosis, we formulate our next two researchquestions as:• Research Question 6: Can we develop a method that automatically extractsdysplasia metrics able to characterize 3D hip morphology, with at least amoderate agreement (criteria for moderate correlation coefficient: r > 0.36[152]) between 3D hip morphology-derived dysplasia metrics and manuallyextracted 2D dysplasia metrics performed by experts?• Research Question 7: Do automatically extracted 3D US-based dysplasiametrics significantly improve repeatability in DDH diagnosis?16In this thesis, we have not addressed the problem of identifying adequate 3DUS scans. For 3D US, this thesis focuses on investigating the potential reductionin variability that is possible with 3D morphology-derived dysplasia metrics.1.6 Thesis OverviewIn addition to this introductory chapter, the thesis includes five chapters. The finalchapter discusses the conclusions and directions for future work.An overview of Chapters 2 to 5 is shown in the flowchart in Figure 1.4. Fig-ure 1.4 also shows relevant publications associated with each of the research ques-tions.Chapter 2 addresses the first research question : “Using a systematic reviewand meta-analysis, can we summarize the estimates of reproducibility of differ-ent dysplasia metrics?” Here, we conduct a systematic review and meta-analysisto summarize the variability of each available dysplasia metrics, compare the de-gree of usefulness of each dysplasia metrics in terms of its variability, and, finally,recommend a dysplasia metric for use in DDH assessments.In Chapter 3, we address the second research question: “Can we develop in2D US images a method that automatically localizes hip bone boundaries? Can wealso extend the 2D US bone imaging method to 3D, ensuring that the mean dis-crepancy between manually-labelled bone boundaries and automatically-extractedbone boundaries is < 2mm?” Here, we present our bone imaging methods basedon the symmetry and attenuation features in US images. Evaluating on US im-ages of 15 infant hips (age 0 to 4 months), we report a mean discrepancy of< 1.5mm between manually-labelled bone boundaries and automatically-extractedbone boundaries.Later, in Chapter 4, we address our third, fourth and fifth research questionsconcerning reliably identifying adequate 2D US images and calculating dyspla-sia metrics. Here, we develop a machine learning-based method for identifyingwhether a 2D US image is adequate for dysplasia metric measurements. We alsopresent methods for automatically extracting the three commonly used dysplasiametrics [104], α , β and FHC. We demonstrate that automatic extractions of thesedysplasia metrics reduce variability by around 18% to 21% as compared to that of17Figure 1.4: A flow-chart outlining the research questions and core blocks ofthis thesis. P1 to P10 are publications associated with this thesis (notethat P1 has been submitted and P10 is currently being prepared for sub-mission). The vertical arrow bars to the right provide a visual directionto the components of this thesis described in the various chapters.their manual extraction counterparts.Chapter 5 addresses our last two research questions concerning reliably diagos-ing DDH using US imaging. Here, we propose new dysplasia metrics that char-acterize 3D hip morphology in infants. We also present our automatic method forextracting these dysplasia metrics. We demonstrate that these 3D hip morphology-derived dysplasia metrics provide a significant reduction in variability in repeateddiagnoses as compared to that of their manual 2D counterparts: an approximately75% reduction for α3D and an approximately 65% reduction for FHC3D).18Chapter 2A Systematic Review andMeta-analysis of Variability inDysplasia Metrics2.1 IntroductionDespite the availability of at least twelve dysplasia metrics (e.g. Graf’s α angle[51], femoral head coverage [98], etc.), classification and diagnostic terminologydiscrepancies exists in DDH diagnosis largely due to the lack of reliability of dys-plasia metrics [79, 137]. For example, Jaremko [79] showed that poor repeatabilityin measuring α angle can result in falsely classifying a dysplastic hip as normal, aswell as falsely classifying a normal hip as dysplastic. Our goal in this chapter is toestablish the reproducibility of the currently available dysplasia metrics.Investigating reproducibility is the first step towards identifying the usefulnessof any dysplasia metric [137]. Subsequent steps entail quantifying of accuracy (re-liability and validity), impact on clinical decisions, risk-benefit analysis and impacton short- and long-term clinical outcomes [137]. Diagnosis and treatment decisionsfor acetabular dysplasia and hip dislocations depend upon both clinical factors andimaging assessment, but diagnostic accuracy via ultrasound has been hampered bythe lack of a definitive standard for what constitutes true DDH. Further variabil-19ity arises between-surgeon and across-centre for treatment thresholds, regardlessof radiologic measurements. Consequently, many studies attempting to assess thediagnostic accuracy of ultrasound have used outcomes of later examinations, suchas those by plain radiograph at two years of age as a measure of actual outcomes.However, since dysplasia may spontaneously resolve during early infancy develop-ment in approximately 90% of hips [147], the quality of sensitivity and specificityanalysis is poor. Note that while 90% hip dysplasia resolves spontaneously, fail-ing to diagnose and treat early can lead to significant adverse consequences - e.g.Price [119] calculated that DDH might be responsible for approximately 25,000hip replacement procedures annually in the United States.A 2006 systematic review conducted on the quality of diagnostic accuracy inUS-based DDH diagnosis concluded that the accuracy in such diagnostic tests ispoorly established [137]. Further, there have been studies that have examined theimpact of ultrasound diagnosis on clinical decisions and ultimately on clinical out-comes without a complete understanding of the reproducibility or accuracy of thetest [12, 98, 154, 155, 167]. Consequently, this systematic review focuses on deter-mining the reproducibility of US-based DDH metrics in order to establish the firststep in diagnostic process-flow. There can be substantial variability in the quantita-tive assessment of dysplasia metrics by ultrasound, arising from two main sources:discrepancies between scans and discrepancies between observers [79]. An addi-tional limitation on ultrasound diagnosis is the lack of an available gold standardon the accuracy of diagnosis (sensitivity and specificity)[148]. In the absence ofa radiological gold-standard for diagnosis, this review will limit its study of USreproducibility to those diagnoses that have been clinically validated. In the con-text of DDH, the validity refers to whether the diagnosis can clinically differentiatebetween a morphologically healthy hip and a severely dysplastic hip. This distinc-tion alone has been fraught with challenges to accurately ascertain, because clearclinical indications for morphological dysplasia in the absence of hip instability ora dislocation may not be evident until the infant has reached walking age or be-yond [148]. Thus, in this chapter, we focus on determining the reproducibility ofUS-based diagnosis in clinically-validated cases of DDH.The population of interest in this review is infants between the ages of 0 and 6months at risk of DDH. The intervention/treatment/test for the population is the di-20agnosis of DDH using ultrasound imaging. Our goal is to summarize the reliabilityin the available dysplasia metrics. For each of the dysplasia metrics, we aimed tosummarize variability and/or agreement statistics for six different levels represent-ing the possible range of clinically relevant diagnostic ratings within and acrossobservers. The first level is the intra-exam, intra-image, intra-observer variability.This situation represents how consistent a single observer is in extracting dysplasiametrics when measuring on a single US image taken from a single US examina-tion. The second level is the intra-exam, intra-image, inter-observer variability,representing the variability in agreement between two readers extracting dyspla-sia metrics from the same US image selected from a single US exam. Third, theintra-exam, inter-image, intra-observer variability represents the variability in thesame reader extracting dysplasia metrics from two different US images capturedin the same exam. Fourth, the intra-exam, inter-image, inter-observer variabilityrepresents the variability in two different readers extracting dysplasia metrics fromtwo different US images captured in the same US exam. Fifth, the inter-exam,intra-observer variability represents the consistency with which a reader extractsdysplasia metrics on the same hip across two separate US examination sessions. Fi-nally, the inter-exam, inter-observer variability represents the variability betweentwo different readers extracting dysplasia metrics on the same hip based on twodifferent US examination sessions. Each of these represent sources of potentialvariation, and thus, diagnostic discrepancy in DDH. When an US assessment isdone at point-of-care, meaning the US exam is performed directly in the clinic ei-ther by an US technician or orthopaedic surgeon during the course of a clinicalexamination, the inter-exam, inter-rater variability is perhaps best reflective of thetrue variability.2.2 Study IdentificationTwo investigators (Niamul Quader and Emily Schaeffer) independently searchedtwo databases - Medline (1946 to September 1st 2016) and Embase (1974 to Septem-ber 1st 2016) using Aromataris’ guidelines [3]. Any discrepancy between the twoinvestigators was resolved in consensus meetings. We further consulted two li-brarians at the University of British Columbia to reduce any potential failure in21Table 2.1: Initial logic grid aligned with the population-intervention-comparison-outcome (PICO) elements of the review question: For in-fants at risk of DDH, are US imaging-based diagnoses reproducible?Note that although the comparison intervention is left empty in here, wewill be comparing all the US imaging-based diagnoses with each other.Population Intervention Comparison OutcomeInfants at risk ofDDHDiagnosis using ul-trasound imagingReproducibilityincluding the eligible studies.2.2.1 Search StrategyOur search strategy was developed to answer the following research question: forinfants at risk of DDH, are US imaging-based diagnoses reproducible? We firstderived a logic grid from our research question and then updated the identifiedconcepts in table Table 2.1 using synonyms and alternative words (table Table 2.2).Subsequently, we updated table Table 2.2 using index terms or medical subjectheadings (MeSH) and keywords with wildcard characters (table Table 2.3). Thefinal search strategy using keywords and MeSH derived from our research questionis presented in table Table 2.4.Studies were eligible for inclusion if they fulfilled one of the following threeconditions: (1) proposed a new US-based diagnosis of DDH, (2) featured a USimaging-based modification to an older method of DDH diagnosis, or (3) investi-gated the reproducibility of any US-based DDH diagnosis. Studies were excludedif they were based on less than 10 human patients, not reported in the Englishlanguage, or deemed to be of poor methodologic quality.2.2.2 ResultsOur search strategy (Figure 2.1) retrieved a total of 427 articles from Embase and275 articles from Medline. The combined total of 702 articles was manually de-duplicated. After sorting articles by title and removing articles only when an exacttitle match was found, 497 unique articles remained. Upon abstract review of these497 remaining articles, 42 were selected for full-text review. In total, 29 articles22Table 2.2: Logic Grid with identified keywords added in each of the PICOcolumns.Population Intervention Comparison OutcomeDevelopmentaldysplasia of the hipUltrasound ReliabilityCongenital hip dis-locationUltrasonography RepeatabilityMeasure DuplicabilityScreening ReplicabilityQuantitative VariabilityInterobserverIntraraterInterraterReproducibilityTable 2.3: Logic Grid with keywords and index terms or MeSH headingsin each of the PICO columns [mh = mesh headings, kw = keyword,exp=explode].Population Intervention Comparison OutcomeDevelopmentaldysplasia of the hip(kw)ultraso* (kw) reliab*(kw)exp Hip Disloca-tion, Congenital/(mh)exp Ultrasonogra-phy/ (mh)repeat* (kw)measur* (kw) duplica* (kw)metric* (kw) replic* (kw)screen* (kw) varia* (kw)quantit* (kw) inter* (kw)exp Neonatalscreening/ (mh)reproduc* (kw)exp “Reproducibil-ity of Results”/(mh)exp observer varia-tion/ (mh)23Table 2.4: Final search strategy using keywords and MeSH [mh = mesh head-ings, kw = keyword, exp=explode].Search strategy1. Developmental dysplasia of the hip.mp. or exp Hip Disloca-tion, Congenital/ [mp=title, abstract, original title, name of sub-stance word, subject heading word, keyword heading word, pro-tocol supplementary concept word, rare disease supplementaryconcept word, unique identifier]2. ultraso*.mp. or exp Ultrasonography/ [mp=title, abstract, orig-inal title, name of substance word, subject heading word, key-word heading word, protocol supplementary concept word, raredisease supplementary concept word, unique identifier]3. 1 and 24. limit 4 to humans5. (measur* or metric* or screen* or quantit*).mp. or exp Neona-tal screening/ [mp=title, abstract, original title, name of substanceword, subject heading word, keyword heading word, protocolsupplementary concept word, rare disease supplementary conceptword, unique identifier]6. (reliab* or varia* or inter* or reproduc*).mp. or (exp “Repro-ducibility of Results”/ or exp observer variation/ or exp “Sensi-tivity and Specificity”/) [mp=title, abstract, original title, name ofsubstance word, subject heading word, keyword heading word,protocol supplementary concept word, rare disease supplemen-tary concept word, unique identifier]7. 4 and 5 and 6[9, 23–25, 27, 35, 41, 64, 65, 73, 74, 79, 81, 82, 93, 94, 107, 108, 113, 114, 131,133, 135, 138, 141, 149, 153, 157, 167] were selected for quality appraisal and dataextraction following the full-text review. The reasons for excluding a study werebased on the inclusion and exclusion criteria.2.2.3 DiscussionOur systematic review was limited to articles that were published only in the En-glish language, potentially biasing results by the omission of non-English articleswith relevant variability data [83]. Inherent to all systematic reviews, there is also24Figure 2.1: Flow diagram demonstrating study identification process. Thereasons for excluding a study were based on the inclusion and exclusioncriteria.risk of publication bias toward favorable results; consequently, we may be under-estimating the magnitude of variability that exists in the extraction of dysplasiametrics [38].2.3 Methodologic Quality AppraisalFor each eligible study, we assessed the quality of methods by using the Guide-lines for Reporting Reliability and Agreement Studies (GRRAS) [87], which isa 15-item checklist providing a standard or guideline for reporting reliability oragreement in healthcare studies. For each included study, the two reviewers inde-pendently assessed for the presence or absence of each checklist item. Disagree-25ments in quality appraisal were resolved through consensus discussion. We judgeda study to be poor if it had a GRRAS score of less than 8.2.3.1 ResultsThe average GRRAS score of the selected articles was 10.7 (SD 1.09), with a highscore of 12 achieved in five studies [41, 65, 79, 114, 139] and a low score of 8[27, 131].Quality of statistics: The quality of the statistics in many of the included stud-ies was poor primarily because the statistical methods used in extracting statisticswere not explicitly stated. Specifically, seven out of the ten studies that reportedintraclass correlation coefficient (ICC) statistics did not provide any specific detailson which model of ICC was used. In thirteen studies that reported Kappa statis-tics, only six studies reported both the confidence intervals of Kappa coefficientas well as the computation method of computing Kappa, two studies reported onlythe confidence interval but did not report the method of computing Kappa and threestudies reported only the method of computing Kappa. The remaining two studiesneither provided the confidence intervals nor the method of computing Kappa.Risk of bias in individual studies: We found 20 out of 28 assessed studies thatexplicitly stated that measurements/ratings were conducted independently. In theremaining 8 studies, it was not clear whether there was any bias in between therepeated measurements.2.3.2 DiscussionOverall, the quality of the included 28 studies was moderate when assessed usingthe GRRAS guideline, a 15 item checklist for the accurate reporting of studies ofreliability and agreement (average 10.7 out of 15, range 6-12). However, we founda considerable lack of clearly reported statistics in many of the included studies.This limits the outcomes of our systematic review and meta-analysis, since theydepend on the quality of study and quality of reported statistics within each of theincluded studies.26The first limitation relating to quality of studies was that we could not performa meta-analysis on Cohen’s Kappa since most of the included studies in our sys-tematic review did not mention the confidence intervals of their Kappa statisticsan essential component of Kappa statistic meta-analyses [151]. Second, we didnot perform a meta-analysis on ICC measures reported within the included studiessince most of those studies did not specify their ICC-calculating method. Giventhat there are multiple methods for ICC calculation, comparability across all stud-ies could not be suitably established. Therefore, our meta-analysis was limitedmostly to the variability measures reported within the included studies, althoughwe have provided scatter plots summarizing the Kappa and ICC statistics foundwithin each of the individual studies.2.4 Meta-analysis of variability in dysplasia metrics2.4.1 Extracting Variability and Agreement StatisticsThe primary investigator (N.Q.) extracted basic demographic data and variabilityor agreement statistics for each selected study. Demographic data included: (1)year of publication, (2) last name of principal author, (3) country of origin forthe study, (4) reported dysplasia metrics, (5) frequency setting used in ultrasoundtransducer, (6) statistics used in describing variability or agreement, (7) samplesize, and (8) profession/expertise of clinicians. Variability or agreement statisticsincluded: (1) intra-exam, intra-image, intra-observer, (2) intra-exam, intra-image,inter-observer, (3) intra-exam, inter-image, intra-observer, (4) intra-exam, inter-image, inter-observer, (5) inter-exam, intra-observer, and (6) inter-exam, inter-observer in estimating dysplasia metrics.2.4.2 Summarizing Variability StatisticsTo capture the variability in dysplasia metrics between repeated measurements, wecategorized variability measures (i.e., standard deviation values) of each dysplasiametric separately. Specifically, we combined these variability measures by extract-ing the pooled standard deviation, σp [14]. Next we estimated the heterogeneitystatistic (i.e., between-study standard deviation, σb) using a Markov chain Monte27Carlo-based random effects model [115]. We then incorporated the heterogeneitystatistic to estimate the total standard deviation, σT =√σ2p +σ2b . Finally, in orderto directly compare the variability of different dysplasia metrics, we standardizedeach by a range, r = tn− ta, where tn is the threshold for normal hips and ta is thethreshold for abnormal hips for the given metric.2.4.3 Summarizing Agreement StatisticsTo capture the agreement in dysplasia metrics between repeated measurementswithin individual studies, we categorized agreement measures (i.e., ICC and Kappastatistic) of each dysplasia metric separately. Note that we could not impute vari-ability measures (e.g. standard deviation values) from ICC [71] since the includedstudies did not report standard deviation values within each of the measurementgroups separately.2.4.4 Comparing Dysplasia MetricsTo identify the most reliable metric, we identified the dysplasia metric that had thesmallest average σT . Next, we tested whether that dysplasia metric had a signifi-cantly lower variability compared to the other dysplasia metrics, with a Bonferronicorrection made to adjust for multiple comparisons [13].2.4.5 Estimating Trend of Variability over TimeTo investigate whether the dysplasia metric reliability trend was improving or de-teriorating with time, we estimated the variability correlation coefficients of eachindividual dysplasia metric against year of publication for each intra-/inter-ratercondition. We then separately combined the correlation coefficients using Med-Calc software that first calculates the weighted summary Correlation coefficient[69] and then incorporates the heterogeneity statistic using a random effects model[32].28Table 2.5: Criteria for evaluating normal and dysplastic hips along withranges between normal and hip dysplasia for each of the dysplasia met-rics.Criteria, normal hip Criteria, hip dysplasia Rangeα α> 60◦ α< 43◦ 17◦β β< 55◦ β> 75◦ 20◦FHC FHC> 55% FHC< 40% 15%ACA ACA> 48◦ ACA< 38◦ 10◦H H< 75◦ H> 85◦ 10◦PFD PFD> 5.6mm PFD< 6.6mm 1mmM M> 0.18 M< 0.05 0.13AROC AROC< 2mm AROC> 2.4mm 0.4mmAL AL< 2.05mm AL> 2.1mm 0.05mmBRC BRC< 51% BRC< 69% 18%L L< 0.68 L> 0.92 0.242.4.6 ResultsAvailable dysplasia metrics: The reproducibility (either agreement or variability)of the following dysplasia metrics were investigated by at least one study: (a) αangle (24 studies), (b) β angle (16 studies), (c) FHC measure (7 studies), (d) ACAangle (2 studies), (e) H angle (2 studies), (f) M measure (1 study), (g) AROC mea-sure (1 study), (h) AL measure (1 study), (i) PFD measure (2 studies), (j) BRCmeasure (1 study), and (k) L measure (1 study). Variability of α angle, β angle,FHC, ACA and H angle were reported in more than one study and were includedin our meta-analysis of variability measures. The resulting standard deviation mea-sure (σT ) for a dysplasia metric obtained from our meta-analysis were standardizedby dividing σT by the range of that dysplasia metric between normal and dysplas-tic hips. These ranges are based on the criteria for normal hips and hip dysplasiaas summarized in Table 2.5 [23, 64, 82, 94, 157]. We did not find any study thatinvestigated reproducibility of Harcke’s dynamic assessment of DDH [62].Variability in dysplasia metrics: Variability, as denoted by standard deviation val-ues, reported in each of the studies for the reported dysplasia metrics are shown in29Figure 2.2, Figure 2.8, Figure 2.7, Figure 2.6, Figure 2.5, Figure 2.4 and Figure 2.3.α angle was investigated in the largest number of studies (12/28), followed by βangle (7/28) and femoral head coverage (5/28).The pooled variability (σp) and the total variability (σT ), for each dysplasiametric are shown in Table 2.6 and Table 2.7. Since σp was adjusted to σT based onbetween-study heterogeneity, the difference between σp and σT provides a measureof the heterogeneity between different studies. The standardized variability foreach metric (i.e., σS = σT/r ∗100%) is shown in Figure 2.9. For intra-exam, intra-image, intra-rater variability, ACA had the lowest mean variability (σS = 23%),followed by β (σS = 24%). There was no statistical difference between variabilityof ACA and β (p > 0.05), however both ACA and β seemed to have significantlylower variability compared to α , FHC and combined H angle (p < 0.05/4). Forintra-exam, intra-image, inter-rater variability, β had the lowest variability (σS =28%, p< 0.05/4), followed by ACA (σS = 32%). FHC is the only dysplasia metricwhose intra-exam, inter-image, inter-rater variability was reported in at least twostudies. For both inter-exam, intra-rater and inter-rater variability, α seemed tohave the lowest variability (σS = 26% and σS = 41%, p < 0.05/4).Agreement in dysplasia metrics: Reported ICC coefficients for each dysplasiametric are shown in Figure 2.10, Figure 2.11, Figure 2.12, Figure 2.16, Figure 2.15,Figure 2.14 and Figure 2.13. α , β and FHC were the only metrics investigatedfor inter-exam, inter-observer agreement. Here, inter-exam, inter-rater ICC for αvaried from 0.03 to 0.45 and β varied from 0.13 to 0.45. Inter-exam, inter-observeragreement for FHC was reported in only one study (ICC=0.02).Reported Kappa coefficients for each dysplasia metric are shown in Figure 2.17.α and β angle-based Graf classification has been investigated the most (10 studies),whereas the Kappa statistic of the other dysplasia metrics were reported in only onestudy each. For inter-exam, inter-observer agreement, α and β angle-based Grafclassification ranged from poor-to-moderate (Kappa 0.1 to 0.6), position of carti-laginous roof-based classification was fair (Kappa 0.2 to 0.4) and shape of bonyroof-based classification ranged from poor-to-moderate (Kappa 0.1 to 0.4).30Figure 2.2: Variability in α angle. Variability is expressed as standard devia-tion of discrepancy between repeated measurements.Figure 2.3: Variability in β angle. Variability is expressed as standard devia-tion of discrepancy between repeated measurements.31Figure 2.4: Variability in FHC. Variability is expressed as standard deviationof discrepancy between repeated measurements.Figure 2.5: Variability in ACA angle. Variability is expressed as standard de-viation of discrepancy between repeated measurements.32Table 2.6: Meta-analysis result for the total variability (σT ) in α angle (in ◦),β angle (in ◦) and FHC (in %).Intra-examination Inter-examinationIntra-image Inter-imageintra-raterinter-raterintra-raterinter-raterintra-raterinter-raterα σp 2.88 3.38 3.83 5.92in ◦ σT 4.33 5.73 4.45 6.96Upperlimitof σT5.74 8.33 7.54 11.73Lowerlimitof σT3.48 4.19 3.83 5.93β σp 4.99 5.21 4.66 8.88in ◦ σT 5.29 6.12 6.17 9.96Upperlimitof σT6.81 10.19 12.55 14.99Lowerlimitof σT5 5.21 4.67 8.88FHC σp 4.45 5.56 3.93 5.35 8.48in % σT 5.26 6.86 5.4 6.82 9.94Upperlimitof σT9.32 12.82 11.6 13.1 16.32Lowerlimitof σT4.46 5.56 3.94 5.36 8.4933Figure 2.6: Variability in PFD. Variability is expressed as standard deviationof discrepancy between repeated measurements.Figure 2.7: Variability in M measure. Variability is expressed as standarddeviation of discrepancy between repeated measurements.Trend of Variability over Time: The meta-analysis results for the trend of variabil-ity in each of the individual dysplasia metrics against year of publication (i.e., cor-relation of variability with year of study) are shown by forest plots in Figure 2.18,Figure 2.19, Figure 2.20. With limited available data, we were only able to findthe trend in variability against year of publication for α , β , FHC and combined Hangle. β is the only dysplasia metric that shows a slight trend towards improvingrepeatability. All the other three dysplasia metrics show moderate to strong trends34Figure 2.8: Variability in H angle. Variability is expressed as standard devia-tion of discrepancy between repeated measurements.Figure 2.9: Variability in dysplasia metrics based on our meta analysis. Here,the variability of each dysplasia metric was standardized by dividing σTby the range of that dysplasia metric between normal and dysplastichips. The inter-exam, inter-rater variability is the most clinically rele-vant variability.35Figure 2.10: Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for α .Figure 2.11: Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for β .36Figure 2.12: Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for FHC.Figure 2.13: Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for AROC.37Table 2.7: Meta-analysis result for the total variability (σT ) in ACA angle (in◦) and H angle (in ◦) and FHC (in %).Intra-examination Inter-examinationIntra-image Inter-imageintra-raterinter-raterintra-raterinter-raterintra-raterinter-raterACAin ◦ σp 2.66 2.54 4.24σT 3.89 5.42 6.96Upperlimitof σT9.39 14.44 15.79Lowerlimitof σT2.66 2.56 4.26H in ◦ σp 3.18 4.06σT 5.94 6.76Upperlimitof σT14.93 15.59Lowerlimitof σT3.2 4.08towards deteriorating repeatability in repeated measurements.2.4.7 DiscussionAn US-based examination is fraught with many opportunities for operator or mea-surement errors to reduce its reliability. An US examination is a dynamic processinvolving collecting a video file that can be broken down into a series of static 2Dimages from which a single, optimal representative image is chosen for extractionof dysplasia metrics. Consequently, variability in metric extraction has the poten-38Figure 2.14: Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for AL.Figure 2.15: Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for PFD.tial to arise within the same US exam, based on both or either the image chosen,and the observer(s) performing the measurements. Additional variability is intro-duced by both the same observer and different observers performing measurementson images taken from different US exam sessions of the same hip.In our meta-analysis, we were able to assess the variability of multiple dyspla-sia metrics commonly used for 2D US α and β angles and percent femoral headcoverage (FHC). Additionally, we were also able to assess the variability of a three-dimensional (3D) US dysplasia metric the ACA employed in a limited number39Figure 2.16: Agreement in dysplasia metrics, expressed as ICC coefficientbetween repeated measurements for M.of included studies. The meta-analysis of variability measures reported within thestudies suggested that ACA has the lowest intra-exam intra-image intra-rater vari-ability (average 22%); however, it should be noted that only two of the 28 includedstudies provided data for this metric. Additionally, as a measure specific to 3DUS, ACA currently has limited clinical applicability due to the lack of widespreadavailability of 3D US in point-of-care clinical practice. Of the more widely-used2D US measures, the β angle has the lowest intra-exam intra-image inter-rater vari-ability (average 29%) while the α angle has both the lowest inter-exam intra-ratervariability (average 25%) and the lowest inter-exam inter-rater variability (average41%). We could not compare intra-exam inter-image variability of dysplasia met-rics, since intra-exam inter-image variability of only FHC was reported in a fewstudies (average of 38%).Depending upon the process-flow of the clinic setting, it is debatable whichsituation is the most representative of the true variability of dysplasia metrics ina real clinical setting. In some hospitals, US assessment is done at point-of-care,meaning the US exam is performed directly in the clinic either by an US technicianor orthopaedic surgeon during the course of a clinical examination. In these point-of-care settings, the inter-exam, inter-rater variability is perhaps best reflective ofthe true variability. We found α (average variability of 41%), β (average variabil-ity of 45%) and FHC (average variability of 63%) to be the only dysplasia metrics40Figure 2.17: Agreement in dysplasia metrics, expressed as Kappa coefficientbetween repeated measurements.41Figure 2.18: Forest plot showing correlation coefficients between variabilityof α and year of study. Top rows show mean and 95% confidence in-tervals of the correlation between intra-exam, intra-image, intra-uservariability and year of study for α . Last row show the total randomeffects correlation coefficient estimating the effective change in vari-ability with year of study for α . Here, the summary measure is thecenter of diamond, and the associated confidence intervals of the sum-mary measure are the lateral tips of the diamond.whose inter-exam inter-rater variability has been studied. Graf’s α and β metricsare significantly less variable compared to FHC; however, despite the lower vari-ability in Graf’s metrics compared to other available dysplasia metrics, their near40% variability is problematic, and is also evident in the agreement measures re-ported in the included studies - both α and β showed poor-to-moderate agreementbetween repeated inter-exam inter-rater measurements (α - range of ICC 0.03 to0.45, β - range of ICC 0.13 to 0.45). With this high degree of variability in themost commonly used dysplasia metrics, it is perhaps unsurprising that a consensusclinical gold standard for DDH diagnosis has yet to be conclusively defined.42Figure 2.19: Forest plot showing correlation coefficients between variabilityof β and year of study. Top rows show mean and 95% confidence in-tervals of the correlation between intra-exam, intra-image, intra-uservariability and year of study for β . Last row show the total randomeffects correlation coefficient estimating the effective change in vari-ability with year of study for β .While α , β and FHC measures were proposed in 1983 [51] and 1985 [98],and despite representation from included studies across 30 years, we did not findany evidence for a trend of reduction in variability in those metrics over the years.On the contrary, the repeatability of both α and FHC seems to have deteriorated,while β showed only slight improvement over this time period. This lack of im-provement suggests that, even in spite of technological advances in US equipmentover time, dysplasia metric measurement remains primarily subject to individualoperator/observer techniques. Additionally, we found considerable heterogeneitybetween measurements in different studies - the between-study standard deviation43Figure 2.20: Forest plot showing correlation coefficients between variabilityof FHC and year of study. Top rows show mean and 95% confidenceintervals of the correlation between intra-exam, intra-image, intra-uservariability and year of study for FHC. Last row show the total ran-dom effects correlation coefficient estimating the effective change invariability with year of study for FHC.[14, 115] was on average 1.4 times higher than the pooled within-study standarddeviation. This perhaps suggests the need for more standardized tools and process-of-care across all centers. Consequently, development of automated metric extrac-tion processes may be the optimal way to minimize intra-observer, inter-observerand cross-centre variability.There were several limitations to our systematic review and meta-analysis,mainly arising due to a dependence on the statistics provided within the includedstudies. First, we did not perform a meta-analysis on Cohen’s Kappa since mostof the included studies in our systematic review did not mention the confidenceintervals of their Kappa statistics an essential component of Kappa statistic meta-44analyses [151]. Second, we also did not perform a meta-analysis on ICC measuresreported within the included studies since most of those studies did not specify theirICC-calculating method. Given that there are multiple methods for ICC calcula-tion, comparability across all studies could not be suitably established. Therefore,our meta-analysis was limited mostly to the variability measures reported withinthe included studies, although we have provided scatter plots summarizing Kappaand ICC statistics found within each of the individual studies. Another limitationwas that, we only used Bonferroni correction in our multiple-comparison correc-tion, which tends to be more conservative than other approaches [101]. Thus, inthe only instance where we did not find a statistically significant result (comparingintra-exam, intra-image, intra-rater variability of ACA with β ), there may be a pos-sibility of a statistically significant result if we used a less conservative multiple-comparison correction. Finally, our systematic review was limited to articles thatwere published only in the English language, potentially biasing results by theomission of non-English articles with relevant variability data [83]. Inherent to allsystematic reviews, there is also risk of publication bias toward favorable results;consequently, we may be underestimating the magnitude of variability that existsin the extraction of dysplasia metrics [38].2.5 SummaryIn this chapter, we performed a systematic review and meta-analysis of the variabil-ity of dysplasia metrics between repeated measurements for the assessment of theinfant hip. We found the most commonly used measure the α angle to be the leastvariable among all dysplasia metrics; however, we found generally high variabil-ity and low agreement in all dysplasia metrics, including the α angle. Despite thelower variability in Graf’s α angle compared to other available dysplasia metrics,their near 40% inter-exam, inter-rater variability is a severe limitation to reliableDDH diagnosis. Furthermore, in the last three decades, the repeatability of dyspla-sia metrics has not markedly improved, indicating a genuine need for improvingrepeatability and reliability of US-based DDH diagnosis.45Chapter 3Symmetry and Attenuation-basedUltrasound Bone Imaging3.1 IntroductionBefore we present our contributions towards improving reliability of diagnosingDDH, we detail our second major contribution - improvements in segmenting boneboundaries from 2D and 3D US images. This is particularly important since all ofthe current dysplasia metrics involve identifying or segmenting bone boundariesin US images of an infant’s hip, and then making geometric measurements onthe identified bone boundaries [51, 52, 94, 155]. Furthermore, US bone imaginghas important emerging uses in computer assisted orthopaedic surgery that canpotentially reduce surgery time and improve outcomes from surgery [2, 59].US bone imaging is challenging primarily because US images are typicallycharacterized by high levels of speckle noise, reverberation, anisotropy, and signaldropout, thereby making it difficult to interpret the image and to reliably detectrelevant features [103]. Over the last decade, several methods have been proposedto automatically extract bone boundaries from US image.Bone boundaries were shown to be well captured by phase symmetry (PS)responses in US images since the beam reflection is considerably weaker both be-fore and beyond these structures where the primary reflection occurs [2, 57, 59].However, local symmetry features remain prone to false detection of soft-tissue46interfaces that often exhibit features similar to those of bone. Furthermore, thoughquite effective on relatively flat (i.e. sparsely-oriented) structures, PS responses re-quire tedious non-intuitive parameter tuning procedures to correctly identify com-plex bone shapes. Attempts to automate the parameter selection process have beenmade [1, 2, 17, 59]. Further, to improve segmentation on non-flat, multi-oriented orcurved boundaries Hacihaliloglu [58] proposed the use of isotropic phase symmet-ric measurements. While we did not find any direct comparison between Haci-haliloglu’s automatically-selected PS method [59] and Hacihaliloglu’s isotropicPS-based method [58], qualitative results in these works suggest that the isotropicPS-based method has considerably more false positive bone surfaces as comparedto the non-isotropic PS-based method.Apart from approaches based on local symmetric features, other bone-boundarysegmentation methods exist that exploit the bone shadowing effect and local im-age intensity. Foroughi [46] employed dynamic programming on intensity andlocal gradient information to segment bone contours in 2D images. This approachhas shown adequate clinical accuracy in 2D US images, but requires region-of-interest selection to remove soft-tissue inter-faces near the skin surface, such thatthe method could only be applied to 2D US images in a semi automated manner.A separate bone contour detection scheme depends on depth-weighted adaptivethresholding and subsequent morphological opening/closing operators to enhancesegmented bone surfaces in 2D images [89]. Another study used eigen-analysisinformation from a multi-scale 3D Hessian matrix to enhance sheet-like surfacesfor the purpose of generating 3D segmentations of large bones [42]. However, theresults from these techniques remain heavily dependent on the quality of the USimages used, as well as on the depth and complexity of the imaged bones due tothe effects of the shadowing and attenuation of the local intensities [57, 59].Among the strain imaging based bone segmentation techniques [76, 77, 134],the state-of-the-art [77] method combines Ultrasound (US) strain imaging and theenvelope power detection of radio-frequency values in a US image to identify boneboundaries with considerably reduced false positive rates as compared to phase-symmetry-based methods. However, acquiring good quality strain images requiresmore skill and training than acquiring B-mode US images [161].A recent work trained a CNN for use in segmenting bone boundaries [143]. The47authors demonstrated a significantly improved dice coefficient by switching from arandom forests-based segmentation to a CNN-based segmentation (increase from0.79 to 0.87) on a dataset of 1382 US images collected from the femur, tibia andpelvis area from multiple volunteers. While results are promising, it is not clearhow many volunteers/patients were in this study, and whether the trained CNNmodel captures the variability in appearance of bone boundaries in US images indifferent types of human subjects. Another recent CNN-based implementation [66]involving transfer learning on a pre-trained CNN model [90], was evaluated on 50US scans (2D) of infant hips. Discrepancy between their automatically segmentedbone boundaries and manually segmented bone boundaries were moderate: rootmean square discrepancy of 1.8 mm (SD 0.36 mm), Hausdorff distance of 2.1 mm(SD 0.45 mm).To segment bone boundaries using local symmetry feature while remainingrobust to soft-tissue outliers, we proposed to combine attenuation-related featureswith local symmetry feature [120]; our rationale - bone attenuates ultrasound signalconsiderably more than soft-tissue. Later, several other methods that combine lo-cal symmetry with attenuation-features were proposed [2, 5, 80, 109]. Jia[80] pro-posed combining PS with intensity gradient-derived features to identify high atten-uation regions, and using that information to reduce soft-tissue outliers. Ozdemir[109] proposed quantifying the probability of a pixel in a US image as belong-ing to any of the following three classes (a) bone, (b) tissue, and (c) shadow.They used a binary classifier on a number of PS and attenuation-related features.Unfortunately, the authors failed to provide the details of the classifiers they uti-lized. Based on the probability map, Ozdemir used Markov Random Fields tolabel each pixel to either of the above three classes. The primary limitations forboth of these methods is poor runtime (approximately 120 seconds per US im-age using Ozdemir’s method [109], approximately 9 seconds per US image usingHacihaliloglu’s method [56] and approximately 4 seconds per US image using Jia’smethod [80]). Anas’s method is relatively faster (around 0.3 second per US slice[2], which is comparable with our 0.26 second per slice [120]). Note that thecomputation times reported in the different studies reported here are comparablesince the implementations are all on MATLAB and the computers used are com-parable (e.g. an intel i7 930 @ 2.80 GHz and 8 GB RAM for Ozdemir’s slow482 minutes/slice [109] and a Xeon(R) 3.40 GHz CPU computer with 8 GB RAMfor our noticeably faster 0.25 second/slice method [120]). Another potential lim-itation is that, the bone segmentation filters used are limited to a finite number oforientations, suggesting that it may not be well-suited in segmenting round (i.e.,not sparse-oriented) bone/cartilage boundaries of the neonatal hip. Furthermore,these methods use anisotropic filters for computing symmetry features, which maynot work well in segmenting the rounder bone/cartilage boundaries of the neona-tal hip. We need an approach that can extract multi-oriented boundaries and yetremain robust to speckle, signal dropout and soft-tissue outliers.In this chapter, we present two separate methods for bone boundary extraction.The first method, confidence-weighted phase symmetry (CPS), is an automatic ap-proach that combines multiple US image features including sparsely-oriented localphase information with attenuation-based features to robustly segment bone sur-faces in 3D US images in adult hips (3.2, [120]). On a dataset of B-mode US vol-umes acquired from the pelvic region in 18 adult trauma patients, we showed thata combined-feature bone extraction method is more accurate (as measured by dis-crepancy between surfaces of US-based bone boundaries and CT-derived groundtruth bone boundaries) than using PS alone (around 50% improvement, p < 0.01).The second method, structured phase symmetry (SPS), is another automaticapproach that extracts local symmetry features by employing isotropic filters, andis independent of the orientation of the bone and cartilage boundaries. We alsopresent an extension to the SPS feature for use in 3D US. On a dataset of 15 USvolumes acquired from 15 infant hips, we demonstrated that this technique extractsbone boundaries that are consistent to what an expert would label as bone bound-aries, with a mean discrepancy between bone boundaries of < 1.5mm. This meandiscrepancy is also significantly less than the mean discrepancy between expert-labelled bone boundary and a previous state-of-the-art method-based bone bound-ary [59] (p < 0.01).3.2 Confidence-Weighted Local Phase FeaturesBone and cartilage structures are usually considerably more hyperechoic than theneighboring structures, and are well captured using local symmetry features such49as the intensity-invariant PS feature [57, 88]. Therefore, one key feature we usedin our bone segmentation was the PS feature proposed by Hacihaliloglu et al. [57].To calculate those, we used a 3D log-Gabor filter as our quadrature filter to identifythe points of symmetry since it can be constructed with arbitrary bandwidth [57].Another feature of bone material we used was its considerably higher US attenua-tion effect compared to other tissues, which results in the characteristic shadowingbeneath the bone surface (or further away than the bone surface with reference tothe US transducer) in the US image. To quantify this shadowing and attenuationfeature, we used Karamalis et al.’s [84] shadow detection algorithm, which extractsa transmission model for an US image. Finally, we combine the aforementionedfeatures into a hybrid feature that we call confidence-weighted-phase-symmetry(CPS).Our hybrid feature, CPS, aims to improve PS-based bone surface segmenta-tion technique presented in [57], which relies on empirically selected parameters.Given a 3D US volume, I(x,y,z), we first extract hyperechoic structures using PSinformation PS(x,y,z)(subsection 3.2.1). Next, to reduce outliers, we incorporatethe attenuation feature, A(x,y,z), and shadowing feature, S(x,y,z) (defined in sub-section 3.2.2). We combine the three measures, PS, A and S (subsection 3.2.3) togenerate our hybrid feature, CPS(x,y,z).3.2.1 Local Phase Symmetry FeatureTo extract PS, we first filter I using a bank of d orientation-weighted log-Gaborfilters [57] Each of the orientation-weighted log-Gabor filters in the filter bank isdefined by the transfer function:G(ω,φ ,θ) = L(ω)D(φ ,θ) (3.1)where L(ω ) controls the frequencies to which the filter responds [hacihaliloglu2015],L(ω ) = exp(−log2(ω × p)/2log2(σp)) (3.2)and, D(φ ,θ) controls the orientation selectivity of the filter [Dosil et al. 2006],50D(φ ,θ) = exp(−(φ −φi)2/2σ2φ )− (θ −θi)2/2σ2θ )) (3.3)Here, ω represents the 3D spatial frequencies along ωx, ωy, ωz, ||ω || is the l2-norm of the spatial frequencies, p is the center-wavelength of a filter and σω is thestandard deviation of all center-wavelengths across the filter bank, φi is the azimuthangle, θi is the elevation angle, σφ determines the angular bandwidth about φi, andσθ determines the angular bandwidth about θi.For every filter, Gd , the filtered response of I has a real component (denotedas even symmetric response, ed) and an imaginary component (denoted as oddsymmetric response od) (Figure 3.1). Similar to Kovesi [88], the PS feature iscomputed from ed and od as:PS(x,y,z) =∑d((|ed(x,y,z)|− |od(x,y,z)|)−Tr)∑d√ed(x,y,z)2+od(x,y,z)2+ ε(3.4)where ε is a small number to prevent division by zero and Tr = µn +σn is anoise threshold calculated from mean µn and standard deviation σn of the smallestscale filter which is more likely to have more noise than other higher scale filters[Kovesi 1999].3.2.2 Attenuation-related FeaturesThe calculated PS captures bone/cartilage boundaries and soft-tissue interfaces, allof which exhibit ridge-like characteristics. To distinguish between bone boundariesand other tissue types, we use an attenuation-based post-processing step on everyUS image slice along the z direction (perpendicular to the scan planes). Specifi-cally, in a US slice I(x,y), we make use of the property that the US signal atten-uation is very high at bone boundaries. To model this phenomenon in our post-processing framework, we first estimate a relative signal strength map or a confi-dence map in a B-mode US image similar to that described in [84], m(x,y), whichranges from 0 to 1, where 0 corresponds to zero signal strength and 1 correspondsto the full source signal strength generated by the transducer. We then estimate thesignal strength at each image pixel by calculating the probability of a random walk[49] starting from the pixel and ending at the top of the US image (i.e., location of51Figure 3.1: Flowchart for local phase analysis for a 3D volume collectedaround the iliac crest area of an adult trauma patient (blue box on seg-mented CT representing approximate location of US transducer). Here,3D log-Gabor filtering on the 3D US image results in real and imagi-nary components, which are used with Equation 3.4 for computing PSresponse. CPS feature is computed using Equation 3.7. White arrowspoint to false positive bone surface points.52virtual transducers). This is computed from the graph Laplacian matrix which isdefined as:Li j =di if i = j−wi j if i adjacent to j0 otherwise(3.5)where di = ∑ j wi j. The edge weights, wi j, are assigned in the horizontal, verticaland diagonal directions: wHi j = exp(−b(|ci− c j|+ γ)), wVi j = exp(−b(|ci− c j|)),and wDi j = exp(−b(|ci− c j|+√2γ). The term, ci = g(exp(−l)), is a depth-basedintensity gradient, where g is the image intensity and l is the normalized closestdistance from the pixel’s node to the top of the US image (i.e., the locations of theUS transducer). The other parameters are γ and b; γ represents the penalty of ahorizontal and diagonal walk compared to a vertical walk, and b is a regularizationterm [84]. We discuss our choice of γ and b in more detail in results section.Having estimated the relative signal strength m(x,y), we define two attenuation-based features to localize bone boundaries in PS; our rationale - bone materialtends to have a considerably higher US attenuation effect compared to other tis-sues [84]. The attenuation-based features are: an attenuation feature A(x,y) =m(x,y)− (mmin), and a shadowing feature S = m(x,y)/(mmin + ε) (both A and Sare normalized to range between 0 and 1), where mmin is the minimum node valuein a p by p window centered around (x,y) and ε is a small number to preventdivision by zero.3.2.3 Combined Feature for Bone Surface LocalizationWe combine the hyperechoic PS feature (3.2.1) and the attenuation-related features(3.2.2) as an arithmetic average of individual features:P(x,y) = [A(x,y)+S(x,y)+PS(x,y)]/3 (3.6)We use the bone membership value, P, to enhance the bone structures in PS andalso to remove outliers in PS (e.g., soft-tissue interfaces), resulting in CPS(x,y)(Figure 3.1. Specifically, we locate the coordinates (xm,ym) that correspond to the53maximum values of P along each of the scan lines (i.e., the vertical lines in an USimage). It is more likely that a pixel belonging to a bone boundary will be near(xm,ym) having confidence map values m(xm,ym). Leveraging P and m(xm,ym), weenhance the bone structures in PS as:CPS(x,y) =P(x,y)×PS(x,y) if |m(x,y)−µ|< σ and P(x,y)> 0.50 otherwise (3.7)where µ is the mean and σ is the standard deviation of the confidence mapvalues, m(xm,ym). This formulation is based on our hypothesis that bone bound-aries generally have confidence map values within µ ±σ , while away from boneboundaries (e.g., in shadows beneath bone surfaces) confidence map values tend tobe noticeably different from µ (i.e., |m(x,y)−µ|> σ ).3.2.4 ExperimentParameter Specification In our bone pipeline, the log-Gabor filter bank used avalue of p ranging from pmin = 1.5mm to pmin = 4mm to reflect the typical widthsof bone interfaces in US images. We used three sets of elevation and azimuth ori-entations: 45◦, 90◦ and 135◦ to capture bone boundaries in all orientations. Thestandard deviation parameter was set as σω = 1/3 ∗ pmin to reduce speckle noise(in our experience, speckle in this type of image is typically under 0.5 mm). Theregularization parameter in the confidence map estimation, b, controls the sensi-tivity of the random walk’s probability or the confidence value to the changes inintensity gradient along the path of the random walk. For the purpose of estimatingthe relative signal strength in medical ultrasound images, we set this regularizationparameter as 100 throughout our experiment. We adopted this value from the orig-inal work on confidence maps [84], which suggests that a 20% change around thisvalue has relatively little effect on the confidence map results. The other parameterused in the US confidence map is γ , which represents the penalty of a horizontaland diagonal walk compared to a vertical walk. In all our experiments, we usedγ = 0.05, based on the qualitative examples provided in [84].54Data We retrieved US images with corresponding computed tomography (CT)data that were previously collected from an ex-vivo bovine femur phantom and in-vivo pelvic data (around the iliac crest area) from 18 trauma patients (obtained aspart of routine clinical care under appropriate institutional review board approval,UBC CREB number: H17–01904) [57]. The ex-vivo bovine femur was placedin a polyvinyl chloride-filled cylindrical tube, with added fiducials to enable pre-cise US-CT comparisons [57]. US images were collected using a GE Voluson730 Expert ultrasound machine with a 3D RSP5-12 transducer (GE Healthcare,Waukesha, WI, USA) and CT images were collected using a Xtreme CT machine(HRpQCT, XtremeCT, Scanco Medical, Switzerland) [57]. The transducers centerultrasound frequency was kept at 7.5 MHz and image depth setting ranged between1.9cm-7.2cm.Validation Scheme We compared our CPS method against empirical PS-basedbone segmentation [57]. First, we extracted bone surfaces separately from bothCPS feature and PS feature, determined as the maximum of feature responsesalong the scan-lines of the US transducer. Next, we extracted gold-standard bonesurface from corresponding CT. To quantify the accuracy of a bone segmentation,we aimed to register the US-derived bone surface (either PS or CPS-based) to thegold-standard CT-derived bone surface. Finally, we would calculate a surface fit-ting error (SFE) defined as the Euclidean root mean square distance between theregistered US-derived bone surface and the CT-derived bone surface.For the in-vivo pelvic data, in the absence of fiducial markers, we used auto-matic Gaussian mixture model-based registration [16] to align the CT- and US-derived bone surfaces. The final bone surfaces that were used during registrationalgorithm were determined as the maximum of feature responses along the scan-lines of the US transducer.Results Qualitative results in Figure 3.2 show that combining PS response withattenuation-related features reduces false positive soft-tissue interfaces in both ex-vivo bovine femur phantom and in-vivo pelvic data. This marked decrease in falsepositive bone boundaries is also evident in our quantitative analysis (Figure 3.3) -55CPS had a significantly reduced SFE compared to PS (mean SFE of CPS = 0.7mm,mean SFE of PS = 1.2mm, p < 0.01 using Wilcoxon signed rank test).In terms of computational cost, for a 152× 158× 112 US volume, runtime ofCPS was short (run on a Xeon(R) 3.40 GHz CPU computer with 8 GB RAM withMATLAB code) compared to the reported runtimes of other available methods(approximately 0.15 second per US slice for CPS vs. approximately 0.3 secondper US slice [2], approximately 3 seconds per US slice for optimized PS [59],approximately 4 seconds per US slice using Jia’s method [80], approximately 120seconds per US slice using Ozdemir’s method [109]). All processes were executedusing MATLAB, the Mathworks Inc., Natick, MA, USA.3.2.5 DiscussionWe showed that by combining local symmetry features with attenuation-based fea-tures, bone segmentation accuracy improves significantly compared to segment-ing bone boundary using symmetry feature alone (p < 0.01). We also noted thatruntime of our CPS-based was comparable to Anas’s bone segmentation method[2], and was around an order of magnitude faster than Jia’s and Hacihaliloglu’s[57, 80], and around two orders of magnitude faster than Ozdemir’s bone segmen-tation method [109].One potential limitation is that some of the free parameters in our method werechosen based on empirical observations (e.g. γ = 0.05 was used in US confi-dence maps). So, there is opportunity for potentially improving reliability of bonesegmentation. While we have not done a sensitivity analysis for each of the pa-rameters, since we have successfully used the same parameter settings in all theexperiments in this thesis, we do not think the sensitivity of these parameters isproblematic enough to block clinical utility.Another limitation of our CPS method is that - we used a limited number log-Gabor filters that are oriented in a finite number of orientations (three elevationangles and three azimuth angles in this case). This limited number of filter orienta-tions means that the CPS method presented in this section is likely to work well inUS images that have planar bone boundaries, but may suffer from reduced segmen-tation accuracy in US images that have non-planar bone boundary structures (e.g.56Figure 3.2: Qualitative results: segmented bone surfaces around bovine fe-mur (first column) and around in-vivo pelvis in human trauma patients.First row shows segmented CT with box representing approximate lo-cation of US transducer. Second row shows corresponding US vol-ume. Third row shows PS based on empirical parameters [57]. Lastrow shows the CPS response.57Figure 3.3: Quantitative results: (a) Bovine phantom. Note that our proposedCPS based segmentation resulted in a 0.302 mm reduction in error com-pared to PS. (b) In-vivo pelvic data across all subjects (C-1 to C-18).Note that our proposed CPS resulted in a reduction in error which issignificant at (p < 0.01) based on Wilcoxon signed rank test comparedto PS.small round bone boundary, etc.), which is particularly important in segmentingbone and cartilage boundaries in neonatal hips.3.3 Structured Phase SymmetryBone/cartilage structures in infant hips have complex shapes (both planar and non-planar structures) and are usually considerably more hyperechoic than the neigh-boring structures; therefore, we introduce a orientation-independent local symme-try feature, which we call the SPS feature, to localize bone/cartilage boundaries inUS images of the neonatal hip.3.3.1 2D Filters-based Structured Phase SymmetryTo extract SPS(x,y) from a 2D US volume, I(x,y), we first enhance local PS fea-tures using responses to band-pass quadrature filters [88]. Specifically, we fil-ter an US image, I(x,y), using a bank of d radial log-Gabor filters. Each of the2D log-Gabor filters in the filter bank is defined by the transfer function, L(ω ) =exp(−log2(ω × p)/2log2(σp)), where ω is the 2D spatial frequencies along ωx,ωy, ||ω || is the l2-norm of the spatial frequencies, p is the center-wavelength ofthe filter and σp is the standard deviation of all center-wavelengths across the filterbank. To calculate the quadrature components, Rd(x,y), of the band-passed im-ages, Bd(x,y) = I(x,y)∗Ld(x,y), we use the Riesz transform [43], which is defined58by:Rd(x,y) =Rd,1(x,y)Rd,1(x,y)=h1(x,y)∗Bd(x,y)h2(x,y)∗Bd(x,y)(3.8)where h1(x,y) is the inverse Fourier transform of H1(ω ) = jωx/||ω || and h2(x,y)is the inverse Fourier transform of H2(ω ) = jωy/||ω ||. Rd is then formulated fromits components Rd,1 and Rd,2 as:Rd =√R2d,1+R2d,2 (3.9)The resulting phase symmetry response is determined as:PS(x,y,) =∑d(|Bd(x,y)|− |Rd(x,y)|)∑d√Bd(x,y)2+Rd(x,y)2+ ε(3.10)where ε is a small number to prevent division by zero. This PS measure is a dimen-sionless, intensity invariant measure that enhances ridge-like structures [Kovesi1996]. We further enhance the ridge-like bone/cartilage boundaries in PS usinga multi-scale eigen-analysis of the Hessian of PS [47]. The Hessian of PS, H, iscomputed by convolving PS with the second order derivatives of the Gaussian filterbank,G(x,y,s) =12pis2exp(−(x2+ y2)/2s2) (3.11)where s is the scale or variance of a Gaussian filter. For the eigenvalues |λ1| ≤ |λ2|,a pixel on a ridge-like structure will exhibit |λ1| ≈ 0, |λ2|  0. We enhance theridge-like features in our PS as:SPS(x,y) =0 if λ2 > 0(1− exp(−T ))exp(−|λ1|2|λ2|2 ) otherwise(3.12)59where exp(−|λ1|2|λ2|2 ) is a bright-ridge enhancing term, (1−exp(−T )) is a noise can-celling term with T = |λ1|2+ |λ2|2. SPS is normalized such that it ranges between0 and 1.Results We compared the qualitative performance of the SPS feature (on a sample2D B-mode US image of the neonatal hip) against the performance of two other PSmethods [43, 57], as shown in Figure 3.4. A radiologist marked the false positives,false negatives and falsely connected structures on the labrum, ilium, acetabulum,and triradiate cartilage. Our proposed method seem to extract these key structuresbetter than the other methods (qualitative results show less false positives and falsenegatives along these key structures).3.3.2 3D Filters-based Structured Phase SymmetryTo extract SPS(x,y) from a 3D US volume, I(x,y,z), we first compute the the localPS feature, PS, from I(x,y,z) using the monogenic signal-based method described3.3.1. To segment the sheet-like bone and cartilage surfaces from the PS featurevolume, we deploy a multi-scale eigen-analysis of the Hessian matrix. For eigen-values |λ1| ≤ |λ2| ≤ |λ3|, voxels on sheet-like structures will exhibit |λ1| ≈ |λ2| ≈ 0,|λ3|  0. We enhance the sheet-like features in our PS volume as:SPS =0, if λ3 > 0(1− exp(−R2a))(exp(−R2b))(1− exp(−S2)), otherwise (3.13)where Ra = abs(2∗|λ3|−|λ2|−|λ1|)/ |λ3| is a blob eliminating term, Rb = |λ2|/ |λ3|is a sheet enhancing term and S =√|λ1|2+ |λ2|2+ |λ3|2 is a noise cancelling term[33].Bone/cartilage boundaries also tend to attenuate an US beam more than otherneighboring structures (e.g. soft-tissue, etc.). We further enhance the bone/carti-lage structures in SPS to form CSPS using the attenuation-based method describedin subsections 3.2.1 and 3.2.2, i.e., we replace PS(x,y) with SPS(x,y) in Equa-tion 3.7 and Equation 3.6.60Figure 3.4: Example qualitative results. (a) B-mode US image of the femoralhead, (b) monogenic signal based PS, (c) directional filter based PS,(d) SPS. Red circles point to false negatives, and green circles point tofalsely straightened structures, on the labrum, ilium, acetabulum, andtriradiate cartilage. Blue circles point to falsely connected structures.Along the bone and cartilages contours of labrum, ilium, acetabulum,and triradiate cartilage, SPS has less false positive and false negative.61Parameter Specification: The free parameters in our automatic algorithm are phys-ically meaningful and can be set to reflect values characteristic of the anatomy.In the bone and cartilage extraction pipeline, the log-Gabor filter bank uses avalue of p ranging from pmin = 1.5mm to pmin = 4mm to reflect the typical widthsof bone interfaces in US images. The standard deviation parameter was set asσω = 1/3 ∗ pmin to reduce speckle noise (in our experience, speckle in this typeof image is typically under 0.5 mm). These bone/cartilage width values are alsoreflected in setting the scale or variance parameter, s, of the Gaussian filter bank:s = 1.5mm/(4×√2ln2) and s = 4mm/(4×√2ln2). This relation between ridge-width and variance parameter is derived using the relation between the full widthat half maximum and the variance in a Gaussian distribution.Data We acquired ten 3D B-mode US volumes from 15 infant hips - all obtainedas part of routine clinical care under appropriate institutional review board ap-proval (UBC CREB number: H14–01448). All 3D scans were collected in thecoronal plane. The scans were performed using a SonixTouch Q+ machine witha 4DL14–5/38 Linear 4D transducer (BK ultrasound, Analogic corporation, MA,USA). The transducers center ultrasound frequency was kept at 7.5 MHz and im-age depth setting ranged between 3.8 cm – 5 cm. While performing a 3D US scan,the operator tried to capture the entire femoral head within the scanned US volume.In each scanned volume, an orthopaedic surgeon further selected a slice near themiddle of the femoral head and labeled the ilium boundary, labrum and acetabulumboundaries.Validation Scheme We compared our 3D confidence-weighted SPS method (CSPS)-based bone segmentation against optimized PS(OPS)-based bone segmentation[59]. First, we extracted ilium bone surfaces separately from CSPS (determinedas the maximum of feature responses along the scan-lines of the US transducer)and OPS (determined as the bottom-most, or furthest away from the US trans-ducer, non-zero feature responses along the scan-lines of the US transducer) alongeach scan-line where the surgeon labeled an ilium boundary. Next, we separatedthe segment of the US image that is inferior-lateral (right-top in the image) from62the edge of the ilium (right-most ilium point in the image) - similar to extract-ing ilium bone boundaries, here we extracted labrum boundaries using both CSPSand OPS. We also separated the segment of the US image that is superior-lateral(bottom-top in the image) from the edge of the ilium (right-most ilium point in theimage) - similar to extracting ilium bone boundaries, here we extracted acetabulumboundaries using both CSPS and OPS.Finally, we calculated the absolute discrepancy of these ilium, labrum and ac-etabulum boundaries with manually labeled ilium, labrum and acetabulum bound-aries, only at the scan-lines that were manually labeled to contain those structures.Results Exemplary results in Figure 3.5 show that CSPS-based bone/cartilageboundaries are closer to manually-labeled bone/cartilage boundaries compared toOPS-based bone/cartilage boundaries.When evaluated on all 15 US volumes, CSPS-based bone/cartilage bound-aries had a markedly reduced absolute discrepancy with manually-labeled bone/-cartilage boundaries compared to OPS (ilium: average absolute discrepancy withCSPS = 0.96mm, average absolute discrepancy with OPS = 2.91mm, p < 0.01 us-ing the Kolmogorov–Smirnov test, Figure 3.6; labrum: average absolute discrep-ancy with CSPS = 1.06mm, average absolute discrepancy with OPS = 2.08mm,p < 0.01 using the Kolmogorov–Smirnov test, Figure 3.7; acetabulum: averageabsolute discrepancy with CSPS = 0.82mm, average absolute discrepancy withOPS = 2.16mm, p < 0.01 using the Kolmogorov–Smirnov test, Figure 3.8). Theroot mean square discrepancy for CSPS was moderate: 1.23mm (SD = 1.1mm) forilium, 1.4mm (SD = 1.2mm) for labrum, and 1.02mm (SD = 0.7mm) for acetabu-lum. This root mean square discrepancy values for our CSPS method was less thanHareendranathan’s CNN-based segmentation (1.8mm [66]).In terms of computational cost, processing time for our CSPS required around30 seconds for a 240× 240× 120 US volume (3D) and around 0.2 second for a240×240 US image (2D) (run on a Xeon(R) 3.40 GHz CPU computer with 8 GBRAM with MATLAB code). The computation time for Hareendranathan’s methodwas not reported [66].63Figure 3.5: Example qualitative results. (a), (h) B-mode US volumes. (b),(i) US volume segments along the posterior-anterior axis that containsthe ilium, acetabulum and labrum. (c), (j) OPS responses of (b) and (i),respectively. (d), (k) CSPS responses of (b) and (i), respectively. (e), (l)US slices near the middle of the femoral head chosen by the orthopaedicsurgeon for the US volumes (b) and (i), respectively. Yellow, green andred contours are manually labeled ilium, acetabulum and labrum, re-spectively. (f) and (m) show OPS-based contours of ilium, acetabulumand labrum. (g) and (n) show CSPS-based contours of ilium, acetabu-lum and labrum. White lines point to false positive bone boundaries.64Figure 3.6: Cumulative distribution function of absolute discrepancy be-tween manually-labeled ilium and OPS- and CSPS-based ilium bound-aries.Figure 3.7: Cumulative distribution function of absolute discrepancy be-tween manually-labeled labrum and OPS- and CSPS-based labrumboundaries.65Figure 3.8: Cumulative distribution function of absolute discrepancy be-tween manually-labeled acetabulum and OPS- and CSPS-based acetab-ulum boundaries.3.3.3 DiscussionWe proposed a rotation-invariant local symmetry feature, SPS, for segmenting non-planar bone boundary structures (e.g. small round bone boundary, etc.) in both 2Dand 3D US images of infant hips.Regarding extracting bone and cartilage boundaries in infant hips, we evaluatedperformance of CSPS on fifteen 3D US volumes. We found a markedly reduceddiscrepancy between CSPS-based and manually-labeled bone and cartilage bound-aries, when compared with the OPS-based method.We use our proposed SPS and CSPS in extracting dysplasia metrics from 2D(Chapter 4) and 3D US images (Chapter 5).3.4 SummaryIn this chapter, we investigated segmenting bone boundary from US images, whichis particularly important in DDH diagnosis since all qualitative and quantitativecharacterizations of DDH require identifying bone structures in infant hips.We proposed combining local symmetry features with attenuation-related fea-tures - our experiments showed that combining these features produce bone seg-66mentations that are robust to soft-tissue outliers. We also presented a rotation-invariant local symmetry measure, SPS, that makes use of the structural informa-tion in US images (i.e.,vessel-like structures of bone boundary in 2D US imagesand sheet-like structures of bone boundary in 3D US volumes). On a dataset of15 US volumes collected from 15 infant hips, we showed that bone and cartilageboundaries labeled by an expert agree well (mean absolute discrepancy around1mm) with bone and cartilage boundaries extracted using our CSPS method.We show in later chapters (Chapter 4 and Chapter 5) that the SPS and CSPSfeatures can be used for reliable diagnosis of DDH.67Chapter 4Characterizing Hip Joint using2D Ultrasound4.1 IntroductionAlthough current clinical practice for DDH at early infancy is to use 2D US imag-ing [104, 147], there is continuing controversy regarding the effectiveness of USin providing accurate early diagnosis and in guiding treatment decisions [79, 106,147]. This controversy regarding the effectiveness of US in DDH diagnosis islargely due to the lack of reliability of dysplasia metrics - our systematic reviewand metaanalysis in chapter 2 showed that the most reliable dysplasia metric, α an-gle (angle between the acetabular roof and the vertical cortex of the ilium), suffersfrom poor repeatability between repeated measurements (i.e. ≈ 40% variability,where variability is measured as the ratio of standard deviation between repeatedmeasurements to the range of β angle between a normal and dysplastic hip). Ourgoal in this chapter is to investigate methods of improving repeatability or reduc-ing variability in repeated 2D US-based DDH diagnosis. Here, we hypothesizethat an US-based automatic dysplasia metric estimator could significantly reducemeasurement variability as compared to a manual dysplasia metric estimator.In a typical 2D US-based DDH diagnosis, images used to compute dysplasiametrics are first judged by an US technologist, radiologist or orthopaedic surgeonto be adequate - i.e., to contain key hip joint structures (see example in Figure 4.1).68These images are then manually analyzed to extract dysplasia metrics. Thus, givena 2D US image, an US-based automatic dysplasia metric estimator would need toidentify the adequacy of that US image, and if adequate, would need to localizebone and cartilage boundary, and measure dysplasia metrics in that image.While we are not aware of any study that reports any method of identifyingadequacy of US scans for diagnosing DDH, there has been similar work in othermedical ultrasound applications, including the fetal face [44], fetal abdomen [130],fetal head [95], gestational sac [166] and phantom [91]. Previous methods uti-lized supervised classification approaches (e.g. random forests [102], AdaBoost[130, 166], support vector machines [95], probabilistic boosting trees [44], deeplearning [164]) and are driven by structural features, which are specifically Haar-like features [44, 102, 130, 166] and dynamic texture models [91, 95] or automati-cally learned features for deep learning. The main limitations to these methods arethat they are generally learned and validated on mostly normal patient data. Forour application, we need to identify adequate images in both healthy and hip dys-plasia patients; to the best of our knowledge, no methods have been yet proposedto automatically identify adequate US images of the neonatal hip.After identifying the adequacy of an US scan, the next step is to localize boneand cartilage boundaries that are necessary for extracting dysplasia metrics (e.g.vertical cortex of the ilium, etc.). A few researchers have proposed semi-automaticmethods to segment bone/cartilage boundaries in US images of the neonatal hip,which they have used to extract dysplasia metrics. For example, Hareendranathanet al. [64] have recently proposed measuring a contour α angle that is derived fromsegmented ilium and acetabulum boundaries in a 2D US image. Their segmenta-tion uses a graph-search-based technique that requires input of two seed points byan expert clinician. They found that this contour α has slightly lower intra-examvariabilities compared to the standard α angle (∆σ = 0.2◦ for the first rater and0.4◦ for the second rater). In 2016, Golan proposed [48] proposed using CNN toautomatically segment ilium and acetabulum boundary in US images of the neona-tal hip, and used the segmented bone boundary to estimate α angle values. Theauthors demonstrated that the Graf angles extracted with their method correlatedwell with an expert’s annotation (correlation coefficient, r = 0.76, CI not reported),but the authors did not report variability of these dysplasia metrics. A potential lim-69itation of Golan’s method is that they used a human expert’s annotation for trainingtheir CNN - this can be problematic since the variability of labeling/measuring ismarkedly high even when the labeling/measuring is done by experts [79, 94, 106].So, Graf angles extracted by Golan’s method may correlate well with an expert, butthat does not necessarily mean that the variability in the automatically extractedangles will be significantly smaller as compared to the variability in the manuallyestimated angles. In an earlier work, de Luis-Garcia [31] attempted to segmentthe femoral head using an energy function that uses both gray level and textureinformation from 3D US image and requires an user to initialize the center of thefemoral head. The authors presented only one qualitative example and did notreport any quantitative analysis.Figure 4.2 shows a flow-chart outlining the steps involved in our computer-assisted 2D US-based DDH diagnosis. In our work, to localize bone and cartilageboundaries in a 2D US scan, we used the 2D local symmetry enhancing method andbone boundary segmentation method described in Chapter 3, and prior geometricinformation regarding the relative geometry of bone and cartilage boundaries (e.g.ilium superior to the femoral head) (section 4.2). Next, we deployed a randomforests-based approach to identify adequacy of a US scan (section 4.3). Finally,after localizing the bone and cartilage boundary and identying adequacy of the 2DUS scan, we presented methods to automatically extract the standard clinically-used dysplasia metrics: the α angle, β angle and FHC (Figure 4.1) (section 4.4).The β angle is the angle between the vertical cortex of the ilium and triangularlabral fibrocartilage, and FHC is the ratio of the acetabular width to the maximalfemoral head diameter.4.2 Localizing Bone and Cartilage BoundariesTo extract bone/cartilage boundaries in a 2D US image (U) of an infant hip, we useda rotation-invariant local symmetry feature, structured phase symmetry (SPS) (sub-section 3.3.1) (Figure 4.3(b)), which extracts vessel-like hyperechoic responsesin U . After segmenting vessel-like hyperechoic responses in U , which includesbone boundaries, cartilage boundaries and soft-tissue interfaces, we use the rela-tive signal strength map-based (Figure 4.3(c)) attenuation features to enhance bone70LabrumIliumAcetabular RoofIschiumFemoralHead βαFHC = q/DqD(a) (b) (c) (d)Figure 4.1: Example coronal US images of the neonatal hip. (a) Adequate USimage showing key hip joint structures. (b) Inadequate US image (miss-ing ilium and labrum). (c), (d) The α , β angles and FHC measurementsextracted manually from the adequate US image in (a).Figure 4.2: A flow-chart outlining the steps involved in our computer-assisted2D US-based DDH diagnosis.boundaries and remove soft-tissue outliers subsection 3.2.2 (Figure 4.3(d)). Whileour extracted bone and cartilage boundaries approach do have false positive andfalse negative, our dysplasia metric extraction is based on straight line approxima-tions (section 4.4) that are robust to such outliers and missing boundary points (i.e.,false positive and false negative).4.2.1 Ilium, Labrum and AcetabulumFigure 4.4 shows a block diagram outlining the extraction of ilium, labrum and ac-etabulum. In adequate scans, the bony ilium (I), is oriented approximately parallelto the US transducer. To localize I, we first use the Radon transform (R(θ ,xp)) of71CPS, where θ is the counter-clockwise angle from downwards vertical line andxp is the distance from the center of the image) bounded between θ = 85◦ toθ = 95◦ and find the straight line corresponding to the maximum response of R- this straight line provides a straight line approximation of I. We remove all boneresponses that are 1 mm away from I. Next, we use a binary image formed by CPSmeasures with eigenvectors of λ1 (minor eigenvalues of the Hessian matrix (sub-section 3.3.1)) within±15◦ of any lines parallel to the US transducer - the resultingbinary image includes bone/cartilage boundaries that are reasonably parallel to theUS transducer. Next, we thin the responses in the binary image to widths of 1pixel, and calculate the areas of all the connected structures. Since widths of thestructures are 1 pixel, the area responses are approximately equal to the lengths ofthe structures. We identify the ilium as the contour that has the highest length, andlabel the inferior edge of the ilium as i (Figure 4.3(d), Figure 4.4).Since the labrum (L), acetabular roof (A) and I branch off from around i, weisolate the responses of SPS belonging to A, L and I by masking SPS with a circlehaving center i and radius r f , with r f being the radius of a typical femoral head(initialized as 9 mm based on empirical observations [51]). We estimate best-fitstraight line characteristics of I, L and A using the peak responses of the Radontransforms on the regions of the masked SPS as shown in Figure 4.4. We refine theestimated ilium edge point, i based on the intersection of approximated lines of Iand A. Afterwards we extract center and radius of femoral head using the methodoutlined in subsection 4.2.2, and use the updated estimate of femoral head radius rto recompute I, L and A.4.2.2 Femoral HeadTo locate the center of the femoral head in U , F (radius r f , center c), we extracta total of 28 features that capture orientation, texture and geometrical features ofthe femoral head, and input them to a classifier that identifies whether a pixel lo-cation in the US image belongs to the femoral head or not. We use random forests[15] as our femoral head classifier model to learn the probability p(yi|φ(xi)) of apixel location in the US image being part of the femoral head. The random forestsclassifier uses an ensemble of decision trees (weak learners), each grown using a72Figure 4.3: Overview of bone and cartilage extraction: (a) Example B-modeUS image showing the acetabular roof, ilium, labrum, ischium andfemoral head. (b) SPS capturing ridges of the B-mode US image. (c)Relative signal strength map or confidence map of the B-mode US im-age. (d) Segmented bone boundary from SPS, with white arrow pointingto inferior edge of ilium, i.bootstrap sample of the training data, and randomly selected subsets of predictorvariables as candidates for splitting tree nodes [15]. We chose random forests dueto their superior performance and robustness to parameter tuning compared to othermachine learning approaches such as artificial neural network, k-nearest neighbors,Naive Bayes and support vector machine [15, 20, 26, 45]. The input feature vec-tor φ(xi) of 28 features (Figure 4.5) for our femoral head random forests classifierconsist of seven geometric features, twelve texture features and nine orientationfeatures. The lowest out-of-bag feature importance among all these features was0.6, suggesting that all of these features are important in segmenting the femoralhead. These 28 features (shown in Figure 4.5) are listed as follows:• Geometric features: we extract the geometric features primarily from theidentified locations of I, L, A and edge of ilium, i: (1) distance from I, (2)distance from A, (3) distance from L, (4) distance from the line formed byan angular bisection of A and L, (5) vertical distance from i, (6) horizontaldistance from the edge of ilium, i, and (7) depth (Figure 4.5).• Texture features: (1) median-filtered (filter size 1mm by 1mm) intensity ofU , (2) range-filtered (filter size 1mm by 1mm) intensity of U , (3) range-filtered (filter size 2mm by 2mm) intensity of U , (4) range-filtered (filter size3mm by 3mm) intensity of Ue, (5) entropy-filtered (filter size 2mm by 2mm)73Figure 4.4: Block diagram showing extraction of I, L and A.74intensity of U , (6) median-filtered (filter size 1mm by 1mm) confidence mapvalue of U , (7) range-filtered (filter size 1mm by 1mm) confidence map valueof U , (8) range-filtered (filter size 2mm by 2mm) confidence map value ofU , (9) range-filtered (filter size 3mm by 3mm) confidence map value of U ,(10) entropy-filtered (filter size 2mm by 2mm) confidence map value of U ,(11) circular mask (radius 2mm)-filtered intensity of U , and (12) a blobnessmeasure which we define as blob = (1− exp(−|λ1|2|λ2|2 )) (Figure 4.5). Here,|λ1| is the minor eigenvalue and |λ2| is the major eigenvalue in the Hessian-based analysis in subsection 3.3.1. The rationale for the blobness measure isthat a blob is characterized by high minor and major eigenvalues with |λ1|close to |λ2|, leading to a low value for exp(−|λ1|2|λ2|2 ) and a high value forblob.• Orientation features: (1) we extract nine orientation features from the his-togram of oriented gradients (HOG) (Figure 4.5) [29].After extracting the probability map p of the femoral head (Figure 4.6 b), weapply a threshold of p> 0.5 and then a Sobel filter to find edges of the femoral head(Figure 4.6 c) [19]. Next, we apply the circular Hough transform [8], and identifythe peak to estimate the center coordinates and radius of the circular femoral head(Figure 4.6 d). We use these center coordinates and radius in evaluating adequacyand in localizing ilium and acetabulum.4.3 Scan Adequacy ClassificationAn adequate US image in DDH diagnosis includes the following five key bone/-cartilage structures: the acetabular roof, ilium, labrum, ischium and femoral head.Given the extracted bone/cartilage boundaries obtained using the methods outlinedin the previous section, we extract 26 geometrical features pertaining to these keystructures and input them to our adequacy classifier. We also use HOG [29] andlocal binary patterns (LBP) [105] features since using these two features togethercan significantly improve object detection performances as compared to using onlyone of those features in object detection [162]. A complete list of the features usedin our adequacy classifier are shown in Table 4.1.75Figure 4.5: Features used in localizing femoral head. Note that only the firstHOG feature is shown in here.76Figure 4.6: (a) B-mode US image. (b) Output of femoral head random forestsclassifier, p. (c) Sobel filtering on p generates edges of the femoral head.(d) Circular Hough transform applied on (c). Peak of circular Houghtransform (yellow arrow) provides an estimate of the center coordinatesand radius of the femoral head.4.3.1 ExperimentParameter Specification For our random forest classifier, we selected the numberof trees of the random forest as 70 (selected based on out-of-bag classification erroranalysis - the out-of-bag classification error does not seem to decrease considerablyafter number of trees = 70, Figure 4.7). Similar to the original implementation ofrandom forests, the trees were not pruned and the number of features to select atrandom for each decision split was selected as the square root of the number offeatures [10, 15].77Table 4.1: Features used in adequacy classifier.Type of fea-tures Feature detailsGeometricfeatures(1) length of ilium(2) horizontal length of ilium(3) vertical length of ilium(4) aspect ratio of ilium (ratio of vertical length and hori-zontal length)(5) x and y coordinates of edge of ilium (both initial andfinal estimation, 4 features)(6) gradients of the approximated lines of I, L and A (bothinitial and final estimation, 6 features)(7) peak Radon transform responses corresponding to linesof I, L and A (both initial and final estimation, 6 features)(8) center coordinates c and radius r of the femoral head (3features)(9) maximum US image intensity values inside each of thethree circles having center c and radius r, r/2 and r/3Texture fea-tures(10) LBP (59 features)Orientationfeatures(11) HOG features (144 features)Figure 4.7: Out-of-bag classification error vs. number of grown trees in ran-dom forest classifiers for adequacy classification. Out-of-bag classifi-cation error does not seem to decay considerably after number of trees= 70.78Data We obtained access to 885 US images from both left and right hips (ob-tained as part of routine clinical care with appropriate institutional review boardapproval and with informed consent from each participant’s parents) of 69 infantsin multiple US sessions (total of 82 US sessions with both hips and 1 US sessionwith 1 hip, 82*2+1*1=165 hip examinations) at British Columbia Children’s Hos-pital (UBC CREB number: H14–01448). The scans were performed by multipleclinicians from the health center as part of routine clinical care using a LOGIQE9 with XDclear premium ultrasound system and a GE ML6-15-D Matrix Lin-ear Probe (LOGIQ E9 with XDclear, General Electric Healthcare, Waukesha, WI,USA). The transducers center ultrasound frequency was kept within 7 MHz – 11MHz. Each patient underwent a minimum of 1 and a maximum of 4 sessions ofUS scanning based upon the recommendation of an orthopaedic surgeon or radi-ologist. The available US images in the health record consists of images of thefemoral head that were taken from a range of orientations. The available US im-ages in the health record consisted of images of the femoral head that were takenfrom a range of orientations. A subset of these US images (444/885) were judgedby the radiologist to be adequate for use in extracting dysplasia metrics (i.e., αand β angles). An orthopedic surgeon reviewed the remaining set of US imagesto determine if any additional images might be considered adequate for extractingdysplasia metrics. A subset of 20 possibly adequate images were identified andsubmitted to a radiologist for definitive classification. Of these, an additional 3images were accepted as adequate and added to the original 444, producing a to-tal of 447 images that were deemed adequate by a radiologist; the radiologist alsoextracted dysplasia metrics from these three newly added images.Validation Scheme The adequate/inadequate labels for the 885 US images (col-lected from 69 infants) were used as the gold standard for analyzing performanceof the automatic scan adequacy method. We randomly selected 60 patients’ datafor training and cross-valdiation, and kept the remaining 9 patients’ data for testing.The test data contained 47 adequate images and 53 inadequate images. The cross-validation dataset consisting of 60 patients’ data was randomly split into five setsof 12 patients’ data. Four of these sets were used in training and the remaining onewas kept for validation. This was repeated five times to evaluate classifier perfor-79mance on the entire cross-validation dataset. We also evaluated the performanceof our 229 features against only HOG features by using both of them separatelyas input feature vectors to adequacy random forests classifiers. We chose HOGfeatures on the basis of their recent success in binary classification of ultrasoundimages [85], and their extensive use in image classification applications [29]. Wecalculated the difference between receiver operating characteristics (ROC) curvesof different classifiers [60] to evaluate the null hypothesis that there was no changein classification performance between the different classifiers.Results Figure 4.8 shows sample classification results of our proposed randomforests classifier (probabilityRF > 0.5: adequate) from amongst the set of 885 USimages, along with the corresponding labels provided by the clinicians. The ma-jority of the automatic assessments agreed with the manual assessments (averagearea under the ROC curve of around 99.5% on both the cross–validation datasetand the test dataset - see Figure 4.9). This is significantly better than our previ-ously reported value of 95% for a classifier based on a single feature (horizontallength of the ilium), p < 0.01 [124]. Furthermore, the random forests classifier issignificantly better than the HOG and LBP features-based random forests classifier(area under the ROC curve: 93%, p < 0.01).On the 20 borderline adequate images that were later submitted to the radi-ologist for adequacy assessment, the agreement between the adequacy classifier’soutput and the expert’s assessments was low (35% agreement).4.3.2 DiscussionThe excellent agreement between the 229 features-based adequacy classifier outputwith that of radilogists’ assessments suggests that the adequacy classifier’s assess-ments can be similar to that of an expert clinician. However, the dataset we usedmostly involved clearly inadequate (424 US images) and clearly adequate US im-ages (441 US images), and only a small portion of borderline US images (20 USimages), so the excellent agreement between the adequacy classifier and an expertassessment is primarily indicative of differences in the discriminating abilities ofthe different classifiers we have evaluated on clearly adequate and clearly inade-80Figure 4.8: Example scan adequacy classification results. (a) Typical 2D B-mode images that were judged adequate by a radiologist (i.e., identifiedmanually). Arrows in the images point to the key structures that wereidentified manually. In an adequate image, all of the key five structuresneed to be present. The first four images were classified as adequate byour method, while the last image was classified as inadequate (proba-bility metric = 0.46). (b) Images that were judged to be inadequate bythe radiologist. The first four images were classified as inadequate byour method, while the last image was classified as adequate (probabilitymetric = 0.51). Based on the probability metrics produced by our clas-sifier, both of these disagreements in classification could be consideredto be borderline cases.quate images.In the borderline US images, the agreement between the adequacy classifier’soutput and the expert’s assessments was low (35% agreement). Thus, to betterunderstand the adequacy classifier’s performance in a live clinical setting, it wouldbe necessary to collect a broader range of US images along with reference labelsprovided by experts. Correct labels are particularly important since it can directlyaffect the learning of the classifier and subsequently the classifier’s performancein identifying adequate scans. This is currently a challenging task since manual81Figure 4.9: ROC curves of our 229 features-based random forests classifier.(a) The five different plots correspond to the five cross-validation ex-periments we conducted on the validation dataset. (b) The five differ-ent plots correspond to the five classifiers each from a different cross-validation dataset and evaluated on the test dataset.adequacy classification is prone to mistakes, even when it is done by experts. Forexample, one study reported that around 44% of medical doctors in one Germanstate made mistakes in localizing the bone/cartilage boundaries and adequate USscans [53]. One possible approach for generating more reliable adequacy labels isto combine labels obtained from multiple experts and then use the label that mostof the experts agreed with.4.4 Extraction of 2D Dysplasia MetricsIn US-based dysplasia assessment, three measures are most commonly used byclinicians: the α angle, β angle and FHC (Figure 4.1). The α angle is the anglebetween the acetabular roof and vertical cortex of the ilium, whereas the β angleis the angle between the labrum and vertical cortex of the ilium. We calculateα = RI −RA and β = RL−RI , where RI , RA and RL are the characteristic anglesof the ilium, acetabular roof and labrum. The FHC is defined as the ratio of theacetabular width to the maximal femoral head diameter (Figure 4.1). We calculateFHC as FHC = ∑M P/∑X P, where P represents the output of the femoral head82random forests classifier, X = [x,y] and M represents all X locations that are medialto I.4.4.1 ExperimentData: We used the same dataset that we described in validating scan adequacy.Results for extracting dysplasia metrics in all hips: Results for extracting 2Ddysplasia metrics in all hips are shown in Appendix A.Validation Scheme: We compared the manual measurements, αM, βM and FHCM,with the automatically extracted measurements, αA, βA and FHCA, in the sameimages that were judged adequate by the radiologists. Specifically, we analyzedthe following:• The discrepancies between the two sets of measurements (i.e., αM vs. αA,βM vs. βA, and FHCM vs. FHCA).• The within-hip variability in the measurements (i.e., std(αM), std(αA), std(βM),std(βA), std(FHCM) and std(FHCA)) for each hip of the 165 hip examina-tions.• We evaluated our results in terms of Graf-DDH classifications, i.e., type-I (α > 60◦, mature hip), type-II (43◦ < α < 60◦, physiological immaturehip) and type-III (α < 43◦, eccentric hip) [79]. We used maximum valuesof α for each of the classifications based on the assumption that a mature(normal) hip will have at least some orientations that show a high value ofα , but eccentric hips usually do not have orientations that will show a highvalue of α [4].Discrepancy between Manual and Automatic Metrics: Figure 4.10 shows exam-ple images with low (Figure 4.10 a, c and e) and high discrepancies (Figure 4.10b, d and f) between manual and automatic measurements (i.e., αA vs. αM, βA vs.83βM, and FHCA vs. FHCM). Overall, the discrepancies (i.e., automatic-manualmeasurements) for α angles and FHC were high and statistically significant (∆α:mean 4.64◦, SD 6.1◦, p < 0.01 (Figure 4.11); ∆FHC: mean −5.94%, SD = 7.0%,p < 0.01 (Figure 4.13)) compared to a larger and also statistically significant dis-crepancy for β (∆β : mean 8.14◦, SD 7.2◦, p< 0.01 (Figure 4.12)). The correlation(R) between manual and automatic measurements was high and statistically signif-icant, suggesting the presence of a relationship between the manual and automaticmeasures (α: R = 0.8 [CI 0.76 and 0.83], p < 0.01 (Figure 4.11); β : R = 0.72[CI 0.67 and 0.76], p < 0.01 (Figure 4.12); FHC: R = 0.77 [CI 0.72 and 0.80],p < 0.01 (Figure 4.13)).Appendix A shows other examples for automatic extractions of 2D US-baseddysplasia metrics in a healthy hip, a borderline hip and a dysplastic hip – we foundthat our approximations of the ilium, labrum, acetabulum and femoral head andour estimation of the 2D US–based dysplasia metrics are realistic.Variability of Metrics Since an ideal dysplasia metric will have minimal variabil-ity across multiple images in a hip examination, we calculate the standard devi-ations within each of the 165 hip examinations to investigate the overall repro-ducibility of each method. Box plots of all the standard deviation values are shownin Figure 4.14, Figure 4.15 and Figure 4.16. We found that the automatic methodresults in modest, but statistically reduced variability for all the dysplasia metrics(α , β and FHC) compared to variability of manually-extracted dysplasia metrics(Wilcoxon signed rank test: p < 0.01 for α , β and FHC; mean(σαM) = 3.09◦ andmean(σαA) = 2.45◦, mean(σβM) = 4.4◦ and mean(σβA) = 3.6◦, mean(σFHCM) =5.27% and mean(σFHCA) = 4.31%.We further evaluated whether there is any association in the variability of dys-plasia metrics with the severity of hip dysplasia. We did not find any statisticallysignificant association in the variability of dysplasia metrics with the severity ofhip dysplasia for αM (R =−0.08 [CI -0.23 and 0.07], p = 0.3 Figure 4.17) and αA(R = −0.03 [CI -0.18 and 0.12], p = 0.69 Figure 4.18). We also did not find anystatistically significant association in the variability of dysplasia metrics with theseverity of hip dysplasia for measuring βM (R= 0.12 [CI -0.04 and 0.14], p= 0.14Figure 4.19) and βA (R= 0.14 [CI -0.01 and 0.29], p= 0.07 Figure 4.20). However,84Figure 4.10: Example qualitative results of manual (labelled as ’M’ at righttop corners) and automatic measurements (labelled as ’A’ at right topcorners). (a), (c) and (e) show examples where manual and automaticmeasurements produced similar values for α , β and FHC, whereas(b), (d) and (f) show examples where the manual and automatic mea-surements differed noticeably. The white dotted lines represent themanual measurements, while the blue dotted lines represent the au-tomatic measurements and the green in (e) and (f) represent femoralhead random forests’ probability output.85Figure 4.11: Scatter plot of automatic versus manual measurements α angle.The red line is the equality line. Blue data points correspond to lowerαA and yellow data points correspond to higher αA, so blue correspondto hips that would be judged as hip dysplasia whereas yellow corre-spond to hips that would be judged as normal.we did find a statistically significant association in the variability of dysplasia met-rics with the severity of hip dysplasia for FHCM (R =−0.28 [CI -0.42 and -0.13],p = 0.0003 Figure 4.21) and FHCA (R = −0.25, C.I. -0.38 and -0.1, p = 0.001Figure 4.22), where the variability of FHC tends to increase in patients with hipdysplasia.Graf Classification Agreement: The Graf classifications for the manual and auto-matic methods are shown in Table 4.2. Overall, 148 of the 165 hip examinationswere classified identically by the two methods (84% agreement, substantial Kappacoefficient agreement of = 0.61 (CI:0.48 and 0.75) [97]). We looked into the first86Figure 4.12: Scatter plot of automatic versus manual measurements β angle.The red line is the equality line. Blue data points correspond to lowerβA and yellow data points correspond to higher βA, so blue correspondto hips that would be judged as normal whereas yellow correspond tohips that would be judged as hip dysplasia.87Figure 4.13: Scatter plot of automatic versus manual measurements FHC an-gle. The red line is the equality line. Blue data points correspond tolower FHCA and yellow data points correspond to higher FHCA, soblue correspond to hips that would be judged as hip dysplasia whereasyellow correspond to hips that would be judged as normal.88Figure 4.14: Box-plot of the within-hip standard deviations among the man-ual and automatic measurements of α angles within all of the 165 hipexaminations (i.e., 165 values of std(αM) and 165 values of std(αA)).On each box, the central mark indicates the median, and the bottomand top edges of the box indicate the 1st and 3rd quartiles, respectively.The ’+’ points indicate outliers or data points that are outside the rangeof whiskers, where the whiskers correspond to 99.3% coverage if thedata are normally distributed. The within-hip standard deviations aresignificantly less in the automatic α angle measurements (p < 0.01).89Figure 4.15: Box-plot of the within-hip standard deviations among the man-ual and automatic measurements of β angles within all of the 165 hipexaminations (i.e., 165 values of std(βM) and 165 values of std(βA)).On each box, the central mark indicates the median, and the bottomand top edges of the box indicate the 1st and 3rd quartiles, respectively.The ’+’ points indicate outliers or data points that are outside the rangeof whiskers, where the whiskers correspond to 99.3% coverage if thedata are normally distributed. The within-hip standard deviations aresignificantly less in the automatic β angle measurements (p < 0.01).90Figure 4.16: Box-plot of the within-hip standard deviations among the man-ual and automatic measurements of FHC angles within all of the 165hip examinations (i.e., 165 values of std(FHCM) and 165 values ofstd(FHCA)). On each box, the central mark indicates the median, andthe bottom and top edges of the box indicate the 1st and 3rd quar-tiles, respectively. The ’+’ points indicate outliers or data points thatare outside the range of whiskers, where the whiskers correspond to99.3% coverage if the data are normally distributed. The within-hipstandard deviations are significantly less in the automatic FHC anglemeasurements (p < 0.01).91Figure 4.17: Scatter plot of variability of αM vs. mean αM. Blue data pointscorrespond to lower mean values of αM and yellow data points corre-spond to higher mean values of αM, so blue correspond to hips thatwould be judged as hip dysplasia whereas yellow correspond to hipsthat would be judged as normal. The red line is the best fit line. Cor-relation and p values suggest that there is no significant association inthe variability of αM with the severity of hip dysplasia.nine cases of disagreement. In these cases, the automatic method graded the hipone class more severely than the manual method. In three hip examinations fromtwo patients, the automatic method found a type-II hip while the manual methodfound a type-I hip at their first US sessions (both hips for patient 1 and only theright hip for patient 2). Although these patients were deemed Graf type I based onthe manual assessment, their α angles were borderline (61◦, 62◦ and 60◦) and theysubsequently had a follow-up US session. In another two patients, the automaticmethod classified the patients’ hips as type III while the manual approach classi-92Figure 4.18: Scatter plot of variability of αA vs. mean αA. Blue data pointscorrespond to lower mean values of αA and yellow data points corre-spond to higher mean values of αA, so blue correspond to hips thatwould be judged as hip dysplasia whereas yellow correspond to hipsthat would be judged as normal. The red line is the best fit line. Cor-relation and p values suggest that there is no significant association inthe variability of αA with the severity of hip dysplasia.fied them as type II in early US sessions (both hips for patient 1 in the first two USsessions, and the left hip for patient 2 in the first and third US sessions). Both ofthese patients were treated using a Pavlik harness based on the manual US findings,but were later both booked for surgery, indicating that the clinicians regarded thesepatients as having relatively severe dysplasia.Computational Considerations: The process of image classification and dyspla-sia metric extraction is near real-time (approximately 2 seconds) for a B-mode93Figure 4.19: Scatter plot of variability of βM vs. mean βM. Blue data pointscorrespond to lower mean values of βM and yellow data points corre-spond to higher mean values of βM, so blue correspond to hips thatwould be judged as normal whereas yellow correspond to hips thatwould be judged as hip dysplasia. The red line is the best fit line. Cor-relation and p values suggest that there is no significant association inthe variability of βM with the severity of hip dysplasia.Table 4.2: Graf-DDH classification of the neonatal hips. Here, type-I (α >60◦, mature hip), type-II (60◦ < α < 43◦, physiological immature hip)and type-III (α < 43◦, eccentric hip). The agreement between auto-matic and manual classification is fair (Cohen’s kappa coefficient=0.61(CI:0.48 and 0.75)).AutomaticType I Type II Type IIIType I 114 16 2Manual Type II 0 18 8Type III 0 0 794Figure 4.20: Scatter plot of variability of βA vs. mean βA. Blue data pointscorrespond to lower mean values of βA and yellow data points corre-spond to higher mean values of βA, so blue correspond to hips thatwould be judged as normal whereas yellow correspond to hips thatwould be judged as hip dysplasia. The red line is the best fit line. Cor-relation and p values suggest that there is no significant association invariability of βA with the severity of hip dysplasia.US image, when run on a Xeon(R) 3.40 GHz CPU computer with 8 GB RAM. Allprocesses were executed using MATLAB 2015b (MATLAB 2017a, the MathworksInc., Natick, MA, USA).4.4.2 DiscussionDiscrepancy between Manual and Automatic Metrics: Overall, the discrepanciesfor α and β angles seem to be highest in images in which the relevant structures95Figure 4.21: Scatter plot of variability of FHCM vs. mean FHCM. Blue datapoints correspond to lower mean values of FHCM and yellow datapoints correspond to higher mean values of FHCM, so blue correspondto hips that would be judged as hip dysplasia whereas yellow corre-spond to hips that would be judged as normal. The red line is the bestfit line. Correlation and p values suggest that there is significant asso-ciation in the variability of FHCM with the severity of hip dysplasia.(acetabular roof and labrum, respectively) do not appear to be simple line segments,but instead have multiple sections and/or more complex shapes (Figure 4.10(b, d)).In such instances, the operator appears to focus more on the portions of the linesegments comprising the labrum and acetabulum that are more to the right in theimages, whereas the automatic process seems to be more influenced by the firststrong deviation from the ilium, so both α and β angles tend to be higher in manualmeasurements than in automatic measurements (Figure 4.10(b, d)). The discrep-ancy between automatic and manual FHC measurements seem to be higher where96Figure 4.22: Scatter plot of variability of FHCA vs. mean FHCA. Blue datapoints correspond to lower mean values of FHCA and yellow datapoints correspond to higher mean values of FHCA, so blue correspondto hips that would be judged as hip dysplasia whereas yellow corre-spond to hips that would be judged as normal. The red line is the bestfit line. Correlation and p values suggest that there is significant asso-ciation in the variability of FHCA with the severity of hip dysplasia.our automatic method produces a non-circular probability map (Figure 4.10(f)).Since femoral heads can be aspherical in shape, with a tendency for hightened as-pherical shape in hip dysplasia [140], presuming that the femoral head is circularis perhaps a limitation of the manual measurement process.The 4.6◦ bias of the automatic αA angle towards smaller values than the man-ual angles means that the automatic method will tend to classify more patientsas having DDH than would be identified as such by the current manual method.This tendency of heightened suspicion may perhaps be acceptable from a clinical97perspective since we know that a large percentage of patients who initially had adiagnosis of DDH that subsequently resolved nonetheless had radiological signs ofDDH a few months later (e.g., around 33% of such patients in Sarkissian et al.’sstudy [144]), though it is difficult to tell in such cases whether there was residualDDH not detected by 2D US or the DDH redeveloped post-initial exam.In contrast, the more noticeable bias of 8.1◦ for the β angle (βA < βM) meansthat the automatic method’s measure would be interpreted as indicating greater sta-bility in the labrum than if the same measure was obtained manually [52]. Likewisethe automatically-extracted FHC values with a bias of −5.9% (FHCA > FHCM)would similarly be interpreted as more indicative of relative ’acetabulum-femoralhead’ stability than would the corresponding manually-extracted values. Overall,the automatic method tends to classify more patients as having problematic shapeof acetabulum, less patients having problematic shape of labrum and less patientshaving problematic (e.g. subluxed, dislocated) femoral head.Variability of Metrics: The variabilities in the automatic measurements using ourproposed method are statistically lower than those of the manual measurements(≈ 21% reduction for α , ≈ 18% reduction for β and ≈ 18% reduction for FHC).However, the clinical significance of this reduction remains uncertain. This vari-ability may still lead to considerable misdiagnosis rates since the difference in dys-plasia metrics between mature and severely dysplastic hips can be around 17◦.To further reduce the variability in dysplasia metrics, one would need to also ad-dress the more dominant transducer-orientation-dependent variability [79], whichwe have not taken into account in our current 2D US-based approach. Our pro-posed approach is therefore still susceptible to variability arising from images takenfrom different orientations. Thus, a potential direction to further reduce the vari-ability in the measured dysplasia measures would be to implement a 3D US-basedsystem that would be more robust to variations in the orientations/placements ofthe US transducer during image acquisition.Regarding the variability in healthy and hip dysplasia, we found that the vari-ability of α measurements are independent of the degree of dysplasia since therewas no association in the variability of α measurements with the severity of hipdysplasia. For β measurements, variability seemed to increase when measured on98dysplastic hips, though the increase was not statistically significant. However, vari-ability in both automatic and manual FHC measurements to increased (p < 0.01)in dysplastic hips. This worsening reproducibility in dysplasia metric was also ob-served in Bar-On’s study [9], where the authors showed poor agreement betweenraters classifying hips based on Graf classification.Graf Classification Agreement: We found a high agreement between manual andautomatic Graf classifications (84% agreement). In nine cases of disagreementbetween the manual approach and our automatic method in the Graf-DDH typeassignment, process-of-care (e.g. patient booked for follow-up US session, patientbooked for surgery, etc.) support the classification decisions made by our proposedautomatic method, suggesting it could potentially be of value in reducing missedearly diagnosis rate (i.e., false negative rate) without increasing over-treatment (i.e.,false positive rate). However, the number of such disagreements here is too smallto state definitively that our classifier is more often in agreement with the ultimateclinical decisions than the current manual approach. More importantly, recent re-search suggests that using maximum values of α for Graf classifications is notreliable. The primary assumption behind using maximum values of α is that amature (normal) hip will have at least some orientations that show a high valueof α , but eccentric hips do not have orientations that will show a high value of α[4]. Recent studies identified that eccentric hips can also result in high α values[79, 86].4.5 SummaryIn this chapter, we presented a novel, automatic, near real-time (around 2 second-s/US image) method to assess the adequacy of a 2D US image of the neonatal hip,and, when found adequate, to subsequently extract the patient’s dysplasia metrics.The proposed method produced excellent agreement with radiologists in scan ad-equacy classification. Furthermore, the automatic method reduced the variabilityof manually measuring dysplasia metrics by around 20%. Finally, one of our auto-matically extracted metrics, αA, more consistently provided reproducible measuresin both healthy and dysplastic hips when compared with the other 2D dysplasia99metrics.A 2D US-based method such as ours, however, is not robust to variations inthe orientation and placement of the US transducer during image acquisition. Thisvariability may perhaps be reduced using a 3D-US based system, which can bettercapture the 3D morphology of an infant’s hip while remaining robust to transducerorientation during image acquisition [64, 79]. We address this issue in the nextchapter.100Chapter 53D Ultrasound-based HipDysplasia Assessment5.1 IntroductionWe have demonstrated in chapter 4 that automatically extracting 2D dysplasia met-rics can potentially reduce between-user variability, however, it continues to besensitive to the orientation and placement of the US transducer during image acqui-sition. This variability due to the 2D transducer location/orientation is particularlyimportant since it can cause a normal hip to appear dysplastic and a dysplastic hipto appear normal[79]. This probe-orientation-dependent variability problem canpotentially be resolved using a 3D US transducer. In contrast to a 2D US trans-ducer, a 3D US can capture the entire femoral head and its neighboring structures(i.e., the vertical cortex of the ilium, acetabulum, labrum). For 3D US volumesthat capture the entire femoral head and its neighboring structures in a normal in-fant hip, we hypothesize that this transducer can consistently extract the acetabularand femoral head morphology regardless of its location/orientation during acqui-sition. Dysplasia metrics derived from 3D hip morphology will thus be markedlyless variable than the conventional dysplasia metrics derived from 2D US images.To the best of our knowledge, only one previous work [64] has proposed theuse of an intrinsically 3D dysplasia metric - the ACA. Similar to the α2D, the ACArepresents the angular separation between the acetabular roof (A) and the lateral101iliac (I), except that the ACA is based on the segmented 3D surfaces of A andI. Hareendranathan et al.’s method [64] involves a slice-by-slice analysis processthat requires manually selecting at least three seed points in three 2D US slicesin a 3D US volume and manually separating A from I. Using such an interactivemethod would require valuable clinician time, and the manual operations introducean approximately 1◦ within-image measurement variability [64] and approximately4◦ inter-scan variability [94].In this chapter, we define a form of 3D US-derived metrics analogous to the 2Dα angle and FHC ratio. α3D is the angle between the normals to the fitted planarsurfaces of A and I (Figure 5.1) and FHC3D is the ratio of the femoral head portionmedial to the plane of I (Figure 5.1). Manually segmenting the femoral head, Iand A are difficult, so we provide a fully automatic method that approximates thefemoral head, I and A, and then uses the approximated structures to estimate α3Dand FHC3D. We validate our method on US data collected from 40 hips from 25infants.5.2 Defining 3D Hip Morphology-derived DysplasiaMetricsIn a 3D B-mode US image of an infant’s hip, U(x,y,z), we define α3D as the anglebetween the planar approximations of A and I (section 5.3, Figure 5.1).We define FHC3D as the ratio of the volume of the femoral head portion medialto I to the total volume of the femoral head (section 5.4, Figure 5.1).5.3 Extracting α3DFigure 5.2 presents a block diagram outlining the extraction of A, I and α3D. Tocompute α3D, we first identified U(x,y,zA,I) (subsection 5.3.1), where zA,I repre-sents all the coronal plane slices containing A and I. Next, we identified the boneboundaries in U(x,y,zA,I) (subsection 5.3.2), and then we finally used geometricalpriors to find the planar approximations of I and A (subsection 5.3.3). Once we hadapproximated the planes for I and A, we extracted α3D (subsection 5.3.4).102Figure 5.1: (a) A rendering of the anatomy of a hip joint showing A (red),I(blue) and the femoral head. (b) A schematic illustration of α3D - theangle between the normals to the fitted planar surfaces approximating Aand I. (c) A schematic illustration of FHC3D, which is defined as theratio of the volume of the femoral head portion that is medial to I to thetotal volume of the femoral head.Figure 5.2: Block diagram showing the extraction of I, A, femoral head, α3Dand FHC3D.1035.3.1 Localizing I and A Across the Coronal PlaneFigure 5.3 shows a block diagram outlining the steps involved in identifying coro-nal plane slices that contain I and A. To identify these slices, we began by traininga random forest classifier, R1, to distinguish the US slices containing I and A fromthe other US slices that do not (all the slices being labeled manually by a graduatestudent based on the presence of I and A in those slices, Figure 5.4). This classi-fier R1 (70 decision trees, minimum number of observations per tree set equal to1, no pruning applied as similar to Breiman’s implementation [15]), was trainedusing the HOG [29] and LBP [105] features. HOG and LBP were employed sincetheir use in conjunction can significantly improve object detection performance ascompared to using only one of those features in object detection [162].In a test US volume, the likelihood of each of the coronal plane slices (i) tocontain I and A is evaluated using the trained classifier, R1. The likelihood scoresof i are median filtered (resulting in p1(i)) and normalized with the unity-basednormalization, p1,n = norm(p1). We defined the center slice as the median of allthe slices that satisfy the criteria, p1,n > 0.5. Around this center slice, we croppedout all slices that are greater than sd mm away from it. We call this sd parameterthe span and discuss it in more detail in results section. Finally, in this cropped USvolume, we defined our localized region across the coronal plane (U(x,y,zA,I)) asoriginating from the most anterior position having p1,n(i) > 0.5 and ending at themost posterior position having p1,n(i)> 0.5.5.3.2 Extracting 3D Bone BoundaryWe employed the bone segmentation technique described in subsection 3.3.2 toextract bone boundaries in U(x,y,z) and identify bone boundaries in the region(U(x,y,zA,I)) (Figure 5.5). Here, we use a rotation-invariant local symmetry fea-ture, structured phase symmetry (SPS) (subsection 3.3.2), which extracts sheet-likehyperechoic responses in a 3D US image, SPS. After segmenting these sheet-likehyperechoic responses in the 3D US image, which includes bone boundaries, car-tilage boundaries and soft-tissue interfaces, we extracted bone boundaries, B, us-ing the attenuation-based post-processing step described in subsection 3.2.2 (Fig-ure 5.5).104Figure 5.3: Block diagram showing the steps for identifying the coronal planeslices that contain I and A (orange colored volume) from an US volume(grey colored volume).5.3.3 Approximating Planes of I and AOnce we had extracted the bone boundary B in U(x,y,zA,I), we used geometricpriors to find planes that approximate I and A. Since I and A are bone boundaries,we did not expect to find any other bone boundary beyond these structures in USinfant hip images. So, we selected the bottom-most non-zero B point along eachscan line for further processing to locate I and A (note that the lowest non-zero Bvalue is 0.5).Since I is approximately perpendicular to the US beam in coronal scans of in-fant hips, we approximated I by fitting a plane that is within 15◦ of the x,y plane.To fit planes onto bone boundary points, we used a M-estimator SAmple Consen-sus (MSAC) algorithm [156]: this iteratively identifies the parameters of a plane105Figure 5.4: (a), (b) and (c) show example US slices within an US volume thatcontain the I and A structures, whereas (d), (e) and (f) show example USslices in the same US volume that does not contain at least one of I andA structures. (g) Plot showing the slice numbers in the US volume thatcontain I and A.106Figure 5.5: Block diagram showing the extraction of bone and cartilageboundaries from 3D US. Arrows in the last image are pointing to A,I and labrum.from the bone boundary points by identifying the inliers (points close to the fittedmodel during iterations), outliers (points far away from the fitted model during it-erations) and a likelihood measurement of the amount of consensus extant betweenthe inliers and plane model.To identify A, we used a prior based on the fact that A is medial to I (or down-wards in US images collected in the coronal plane). More specifically, we approxi-mated the plane of A as fitting onto bone boundary points that were medial to I andoriented medially away from the plane of I.5.3.4 Calculating α3DWe calculated α3D = cos−1(nA.nI)/(|nA| |nI|), where nA and nI are the unit normalvectors of A and I.5.4 Extracting FHC3DTo estimate FHC3D, we first extracted a voxel-wise probability map, P(x,y,z),characterizing the likelihood of a voxel belonging to the femoral head (subsec-tion 5.4.1). Next, we used P(x,y,z) and I to calculate FHC3D (subsection 5.4.3).1075.4.1 Femoral Head SegmentionThe femoral head in infant hips is unossified and appears hypoechoic in US im-ages (Figure 5.6(a)). It is surrounded by anatomical structures with distinctivesonographic properties, e.g. the ilium which has a high sonographic response at itsboundary and a shadow region beneath it, the labrum, the triradiate cartilage, thegreater trochanter, etc. A cross-section of U , C(d,θ ,φ), (d is the shortest distanceof C from the origin of (x,y,z), and θ , φ are rotations about x and y axis, respec-tively, of a reference plane defined by z = 0 to a plane parallel to C, Figure 5.6b)that intersects the femoral head is expected to include a hypoechoic region sur-rounded by cross-sections of the neighbouring anatomical structures. In order tosegment a femoral head, we therefore started by training a random forest classi-fier, R2, to distinguish slices that intersect the femoral head from those that do not.The classifier, R2 (70 decision trees, minimum number of observations per tree setequal to 1, no pruning applied similar to Breiman’s implementation [15]), is trainedusing HOG and LBP features extracted from samples of cross-sections from USvolumes. In a test US volume, the likelihood of C intersecting the femoral head isevaluated using the trained classifier, R2. We encoded this measurement through-out the coordinate space of C, and back-projected or interpolated it onto the (x,y,z)coordinates, L(x,y,z) (Figure 5.6c). For N cross-sections sampled within the testUS volume, we constructed a slice-based or tomographic voxel-wise probabilitymap as being: p2 = norm(∑N LN) (Figure 5.6d), where norm(.) is the unity-basednormalization.Although p2 alone does not provide an accurate segmentation of the femoralhead (Figure 5.6d), we hypothesized that we could reliably segment the femoralhead when we used p2 with other voxel-wise features.5.4.2 Enhancing Femoral Head SegmentionTo further enhance the slice-based voxel-wise probability map p2 of the femoralhead, we combined this map p2 with twenty-four voxel-wise features (Figure 5.7).The complete list of features used are: (a) depth of voxel, (b) location of voxelalong the inferior-superior axis, (c) p2, (d) U , (e) range filtered (3x3 kernel size)U , (f) range filtered (5x5 kernel size) U , (g) entropy filtered (3x3 kernel size) U , (h)108entropy filtered (5x5 kernel size) U , (i) confidence map, M [84], (j) range filtered(3x3 kernel size) M, (k) range filtered (5x5 kernel size) M, (l) entropy filtered (3x3kernel size) M, (m) entropy filtered (5x5 kernel size) M, (n) nine HOG features, (o)convolution of U with a sphere of radius 5mm, (p) distance from the approximatedilium plane, and (q) distance from the approximated acetabulum plane (Figure 5.7).We feed these features into another RF classifier, R3, (25 decision trees, minimumnumber of observations per tree equal to 1, no pruning applied), to estimate a prob-abilistic score, P(x,y,z), of each voxel belonging to the femoral head (Figure 5.8).In subsequent steps, we used both P and the center of the femoral head, which wecalculated as: c = [cx,cy,cz] = ∑X(P×X)/∑P, where X = [x,y,z].5.4.3 Estimating FHC3DFinally, we estimated FHC3D as FHC3D = ∑X(PM/∑X P), where X = [x,y,z] andPM represents all P values that are medial to I.5.5 Experiment, Results and DiscussionParameter Specification: For our RF classifiers, we selected the number of treesbased on out-of-bag classification error analysis - we identified the number of treesbeyond which the out-of-bag classification error does not seem to decrease con-siderably. The number of trees for R1, R2 and R3 was 70, 70 and 25, respectively(Figure 5.9, Figure 5.11, Figure 5.11). The trees were not pruned and the numberof features to select at random for each decision split was selected as the squareroot of the number of features [10, 15].The span parameter sd in subsection 5.3.1 is an empirical parameter describingthe distance between the furthest slice containing I and A and the middle slice thatcontains I and A. We empirically set sd = 3.5mm based on the highest value of sdwe could find in our data (Figure 5.12).Data We acquired 3D and 2D B-mode US images from 40 infhant hips (25 in-fants) - all obtained as part of routine clinical care under appropriate institutionalreview board approval (UBC CREB number: H14–01448). All 2D and 3D scans109Figure 5.6: Overview of our slice-based probability map extraction. (a) Over-lay of example US volume and a manually segmented femoral head. (b)N number of C were evaluated using classifier R2 to determine their like-lihood of intersecting the femoral head. (c) Back-projected likelihoodscores, L, for each of the cross-sections C. (d) Back-projected responseswere summed and normalized to construct voxel-wise probability map,with an overlay of the manually segmented femoral head (yellow).were collected in the coronal plane. The scans were performed using a SonixTouchQ+ machine with a 4DL14–5/38 Linear 4D transducer (BK ultrasound, Analogiccorporation, MA, USA). The transducers center ultrasound frequency was kept at7.5 MHz and image depth setting ranged between 3.8 cm – 5 cm. While perform-ing a 3D US scan, the operator tried to capture the entire femoral head within thescanned US volume. A maximum of two surgeons each performed two 3D USscans and two 2D US scans. The repetition was designed to allow us to evaluateinter-operator repeatability. The surgeon who acquired each of the 2D US scansfurther chose a 2D US image from all the 2D US images collected during that ex-110Figure 5.7: Features used in the classifier R3to estimate the probabilistic score, P(x,y,z), of each voxel belonging to thefemoral head.111Figure 5.8: (a) Overlay of an example US volume and a manually segmentedfemoral head. (b) Overlay of the example US volume and its automati-cally extracted femoral head voxel-wise probability map P.Figure 5.9: Out-of-bag classification error for various numbers of grown treesused by the R1 classifier. We selected 70 as being our number of treesfor R1.112Figure 5.10: Out-of-bag classification error for various numbers of growntrees used by the R2 classifier. We selected 70 as being our numberof trees for R2.Figure 5.11: Out-of-bag classification error for various numbers of growntrees used by the R3 classifier. We selected 25 as being our numberof trees for R3.113Figure 5.12: Cumulative distribution function showing the relative frequencyof different sd values in our data. Maximum value of sd was 3.5mm.amination and measured α2D and FHC2D. If the surgeon did not find any adequateimages in a 2D US imageset belonging to a hip examination, that hip examinationdata was labeled as an inadequate imageset. Furthermore, we noted the acquisitiontime for acquiring both 2D US and 3D US scans in all the infants.Validation Scheme To avoid overfitting while using classifiers R1, R2 and R3, weavoided training and testing on the same patient. We randomly split the data intotwo halves - 13 patients in the first half and 12 patients in the second half. Whenwe trained on the first half of the data, we tested on the second, and vice versa.Since R2 is an independent variable while R3 is dependent (i.e., output of R2 is fedinto R3), we further split the data to avoid any over-fitting while training R3.To investigate the reliability of using 3D US in diagnosing DDH, we first inves-tigated the reliability in acquiring adequate 3D US volumes. A graduate studentlabeled adequacy of 3D US volumes based on the criteria that an US volume isadequate only when it has the femoral head, ilium, acetabulum and labrum. Wecompared this with the reliability in manually acquiring adequate 2D US image-sets. We also compared acquisition times for both 2D and 3D US images by thesame surgeons.114Table 5.1: Success rates in acquiring adequate 3D US images and 2D USimageset.Surgeon#1 #2 #3 #4 #5 #6Total hips scanned 38 25 2 2 1 1Adequate 2D 37 21 2 2 1 1Adequate 3D 37 21 1 2 1 1We reported both the inter-rater and intra-rater variability of 3D US based dys-plasia metrics, α3D and FHC3D. We compared these variability measures with theinter-rater and intra-rater variability of their their 2D counterparts, α2D and FHC2D.We estimated intra-rater variability by calculating the standard deviation withinrepeated measurements made on the same hip by the same rater. We estimatedinter-rater variability between two raters by calculating the standard deviation be-tween the mean dysplasia metric values measured by the two raters on the samehip.Finally, we investigated the agreement between α3D and FHC3D with their 2Dcounterparts α2D and FHC2D. Here, we extracted correlation between the 2D and3D US-based measures collected by the same raters.Acquiring 3D US Images: In total, 6 orthopaedic surgeons participated in acquir-ing both 3D and 2D US images. The success rate of each of the surgeons in acquir-ing adequate 3D US volumes (a volume is adequate if it has the ilium, acetabulum,labrum and femoral head in it) and 2D US imagesets (a 2D US is adequate if it hasthe ilium, acetabulum, labrum and femoral head in it) is summarized in Table 5.1.There was no marked difference between the success rates for acquiring adequate3D US vs. the success rates for acquiring adequate 2D US imagesets.We did not find any significant difference between the acquisition time for3D US images vs. acquisition time for 2D US image (23 seconds for 3D US, 25.4seconds for 2D US, ¯t3D− ¯t2D =−2.4 seconds±21.5 seconds, p= 0.4, Figure 5.13,Figure 5.14).115Figure 5.13: Box-plot showing the distribution in acquisition time for both2D and 3D US for all the six participating surgeons.Figure 5.14: Box-plot showing distribution of acquisition time of both 2Dand 3D US. Mean acquisition time of 3D US is slightly less than thatof 2D US, however the change was not statistically significant.116Results for extracting dysplasia metrics in all hips: Results for extracting 3Ddysplasia metrics in all hips are shown in Appendix B.Discrepancy between 2D US and 3D US-based Dysplasia Metrics: The discrep-ancy between FHC2D and FHC3D (i.e., FHC2D-FHC3D) was not statistically sig-nificant (mean 0.15%, SD = 11.1%, p > 0.05) Figure 5.15, however the discrep-ancy between α2D and α3D (i.e., α2D-α3D) was statistically significant (mean 2.88◦,SD = 11.7◦, p < 0.05) Figure 5.16. Agreement between the FHC2D and FHC3Dof each hip examination was moderate (correlation coefficient, r = 0.4 (95% con-fidence interval 0.23 and 0.55) Figure 5.15). We found similar agreement betweenα2D and α3D (correlation coefficient, r = 0.41 (CI 0.24 and 0.56) Figure 5.16).Appendix B shows other examples for automatic extractions of 3D US-baseddysplasia metrics in a healthy hip, a borderline hip and a dysplastic hip – we foundthat our approximations of the ilium, labrum, acetabulum and femoral head andour estimation of the 3D US–based are realistic.Variability of Dysplasia Metrics, σ : Figure 5.17 illustrates how multiple 3D USimages acquired from the same hip appear to be more similar to each other com-pared to the similarity between multiple 2D US images collected from the samehip. Figure 5.17 is also an example where the variability of FHC3D and α3D wasmuch lower than its manual counterparts.Box plots in Figure 5.18 and Figure 5.19 summarize variability of the dysplasiametrics over all hip examinations. We found that the 3D-based dysplasia metricshave significantly lower variability compared to their 2D counterparts - intra-ratervariability of FHC3D was 63% lower than FHC2D (3.19% for FHC3D vs. 8.56%FHC2D, p < 0.01), inter-rater variability of FHC3D was 65% lower than FHC2D(2.72% for FHC3D vs. 7.79% FHC2D, p < 0.01), intra-rater variability of α3Dwas 73% lower than α2D (2.2◦ for α3D vs. 8.33◦ α2D, p < 0.01), and inter-ratervariability of α3D was 78% lower than α2D (2.35◦ for α3D vs. 10.63◦ for α2D,p < 0.01). Also, comparing with ACA [94] which is the only other 3D metricavailable, the intra-rater variability of α3D is around 45% lower than the variabilityof ACA (2.2◦ for α3D vs. 4.1◦ for ACA) [94].117Figure 5.15: Scatter plot of FHC2D vs. FHC3D. The red line is the equalityline. Blue data points correspond to lower FHC3D and yellow datapoints correspond to higher FHC3D, so blue correspond to hips thatwould be judged as dysplastic hips whereas yellow correspond to hipsthat would be judged as normal.118Figure 5.16: Scatter plot of α2D vs. α3D. Blue data points correspond tolower α3D and yellow data points correspond to higher α3D, so bluecorrespond to hips that would be judged as dysplastic hips whereasyellow correspond to hips that would be judged as normal.We further evaluated whether there is any association in the variability of mea-suring the 3D US-based dysplasia metrics with the severity of hip dysplasia. Wedid not find any statistically significant association in the variability of dysplasiametrics with the severity of hip dysplasia for either α3D or FHC3D (p > 0.05, Fig-ure 5.20, Figure 5.21).119Figure 5.17: Qualitative results. (a), (b), (d) and (e) show example variabilityof α2D, α3D, FHC2D and FHC3D from two 2D and two 3D US imagesfrom a hip examination (α2D = 47◦ and 56◦, α3D = 45.1◦ and 45.9◦,FHC2D = 51% and 71%, FHC3D = 46.6% and 47.8%). The highervariability in the input 2D US images (and α2D and FHC2D values)can be seen in the manually aligned 2D US images in (c) compared tothe lower variability in the manually aligned 3D US images (and α3Dand FHC3D values) in (f).Computational Considerations: It takes around 58 seconds to compute CSPS, theplanes of ilium, acetabulum and α3D, and another 62 seconds to compute the prob-ability map of the femoral head and to compute FHC3D, when run on a Xeon(R)3.40 GHz CPU computer with 12 GB RAM. All processes were executed usingMATLAB 2017a (MATLAB 2017a, the Mathworks Inc., Natick, MA, USA).120Figure 5.18: Box-plot of the within-hip standard deviations among FHC2Dvs. FHC3D values within all of the hip examinations. On each box,the central mark indicates the median, and the bottom and top edges ofthe box indicate the 1st and 3rd quartiles, respectively. The ’+’ pointsindicate outliers or data points that are outside the range of whiskers,where the whiskers correspond to 99.3% coverage if the data are nor-mally distributed. The within-hip standard deviations are significantlyless in FHC3D compared to FHC2D (p < 0.01).5.5.1 DiscussionAcquiring 3D US Images: Overall, the acquisition time for 3D US images wassimilar to those of 2D US images. Furthermore, the rate of 3D hip examinationswhere adequate 3D US images were missed was similar to the rate of 2D hip exam-inations where adequate 2D US images were missed. These suggests that there isno noticeable difference between the difficulty in acquiring 3D and 2D US imagesfor neonatal hip examination.Variability of Metrics: The variabilities in the automatic measurements using ourproposed 3D US-based dysplasia metrics were significantly lower than those oftheir 2D US-based counterparts (≈ 75% reduction for α and ≈ 65% reduction for121Figure 5.19: Box-plot of the within-hip standard deviations among α2D vs.α3D values within all of the hip examinations. On each box, the centralmark indicates the median, and the bottom and top edges of the boxindicate the 1st and 3rd quartiles, respectively. The ’+’ points indicateoutliers or data points that are outside the range of whiskers, wherethe whiskers correspond to 99.3% coverage if the data are normallydistributed. The within-hip standard deviations are significantly less inα3D compared to α2D (p < 0.01).FHC). These reductions in variability in 3D US-based dysplasia metrics suggestthat probe position variation has a much larger effect on variability in the dysplasiametrics than manual processing of the 2D US (approximately 20% improvementwith automatic image processing within a 2D US; see chapter 4).Discrepancy between 2D US and 3D US-based Dysplasia Metrics: While thediscrepancy between FHC2D and FHC3D was not statistically significant, there isa significant bias of α3D to be larger than α2D by 2.88◦. This bias and the thresholdsfor α2D to classify hips as normal or dysplastic (i.e., α2D < 43◦: dysplastic, α3D >60◦: healthy), can potentially be used for setting thresholds for α3D to classifyhealthy and dysplastic hips (i.e., α3D < 45.88◦: dysplastic, α3D > 62.88◦: healthy).However, the validity of such thresholds is limited since the thresholds used for122Figure 5.20: Scatter plot of variability of FHC3D vs. mean FHC3D. Bluedata points correspond to lower mean values of FHC3D and yellowdata points correspond to higher mean values of FHC3D, so blue cor-respond to hips that would be judged as dysplastic hips whereas yellowcorrespond to hips that would be judged as normal. The red line is thebest fit line. Correlation and p values suggest that there is no signif-icant association in the variability of FHC3D with the severity of hipdysplasia.123Figure 5.21: Scatter plot of variability of α3D vs. mean α3D. Blue data pointscorrespond to lower mean values of α3D and yellow data points cor-respond to higher mean values of α3D, so blue correspond to hips thatwould be judged as dysplastic hips whereas yellow correspond to hipsthat would be judged as normal. The red line is the best fit line. Cor-relation and p values suggest that there is no significant association inthe variability of α3D with the severity of hip dysplasia.124α2D are debatable (e.g. α2D > 60◦: healthy in one study [79], whereas α2D > 55◦:healthy in another study [82]).Computational Considerations: The complete process of extracting FHC3D andα3D from an US volume took approximately 170 seconds, when run on a Xeon(R)3.40 GHz CPU computer with 8 GB RAM. All processes were executed usingMATLAB 2017b. Current practice has a sonographer process the images post-acquisition, so this computation time is not a critical barrier to implementation.Although not critical for clinical use, real-time feedback during acquisition on im-age adequacy could help prevent incidents in which a sonographer fails to acquirean adequate 3D US volume. While our focus thus far was on automatically extract-ing 3D US-based dysplasia metrics, we have earlier developed a fast 2D image ad-equacy classifier (chapter 4) that we hope in future to extend to 3D. We believe thatadequacy classification could be performed in real-time and provided as feedbackduring acquisition.5.6 SummaryIn this chapter, we presented novel 3D hip morphology-based dysplasia metrics,and we provided automatic methods for extracting those dysplasia metrics from3D US B-mode images. To the best of our knowledge, this is currently the onlymethod that automatically characterizes 3D hip morphology from 3D US images.We showed that these dysplasia metrics are significantly less variable than their 2Dcounterparts (around 65% reduced variability for FHC3D and around 75% reducedvariability for α3D). This suggests that this 3D morphology-derived dysplasia met-ric could be valuable in improving the reliability in diagnosing DDH, which maylead to a more standardized DDH assessment with better diagnostic accuracy thanthe current 2D assessment.Notably, the improvement in reliability associated with the 3D scans was achievedby orthopaedic surgeons, who have limited training in performing US examinationsin comparison to a radiologist. This strongly suggests that we may, in future, beable to train personnel other than radiologists to obtain reliable and reproducibledysplasia metrics using 3D ultrasound machines, potentially reducing the costs as-125sociated with screening for DDH.126Chapter 6ConclusionsDDH is an important clinical problem, impacting infants and families worldwide[7, 55, 147]. As DDH is the most common pediatric hip condition and can causeconsiderable long-term debilitation if left untreated, screening and detection ef-forts have resulted in universal clinical examinations of all newborns [147]. Con-sequently, every child born has the potential to be affected by both limitations toand advances in DDH diagnosis.6.1 Thesis ContributionsThis thesis contributes towards providing a comprehensive analysis of all the cur-rently available dysplasia metrics. We demonstrate the relative usefulness of eachof these dysplasia metrics in terms of their reproducibility in DDH diagnoses.This thesis further contributes towards inventing automatic and reliable meth-ods for diagnosing DDH using both 2D and 3D US imaging. We demonstratethe advantages and limitations of our methods in comparison to the currently usedstandard US imaging-based approach for diagnosing DDH.The main contributions of this thesis (Figure 6.1) are summarized as follows.1276.1.1 A Systematic Review and Meta-analysis of Variability inDysplasia MetricsIn chapter 2, we conducted a systematic review and meta-analysis to identify themost reliable dysplasia metric for diagnosing DDH. Our findings suggested thatthe α angle extracted from 2D US images of neonatal hips is currently the mostreliable metric. However, we also determined that even α , with the best repeatabil-ity amongst the assessed metrics, can be problematic and can lead to high rates ofmisdiagnosis. We also found that all the dysplasia metrics rely on first approximat-ing the bone and cartilage boundaries from US images. A list of our contributionsin chapter 2 are:• We performed a systematic review and meta-analysis of variability in variousdysplasia metrics used in US-based DDH diagnoses [P1].• We provided systematic review-based estimates (considered better than indi-vidual studies [75]) for variability estimates in measuring dysplasia metrics.We demonstrated that the variability in these dysplasia metrics are large [P1].• We also showed that, among all the currently available dysplasia metrics,Graf’s α angle measurements are most reproducible [P1].• We demonstrated that the reproducibility of dysplasia metrics have not im-proved since their advent in 1983, despite decades of advancements on USimaging technologies [P1].6.1.2 Symmetry and Attenuation-based US Bone ImagingTo reliably extract dysplasia metrics, our first work in chapter 3 focused on anautomatic and robust technique to extract bone and cartilage boundaries. We pro-posed new attenuation-based features and presented a method for combining themwith local symmetry features to result in a bone segmentation that was significantlymore robust compared with using only local symmetry features. We also presenteda rotation-invariant local symmetry measure, SPS, which we later used along withour bone segmentation method for extracting dysplasia metrics. In an experiment128including US data from 15 infant hip exams, we found that our proposed bone seg-mentation agreed well with manually-labelled bone boundaries (mean discrepancybetween segmentations around 1mm). A list of our contributions in chapter 3 are:• We proposed novel attenuation-based features that are useful for extractingbone boundaries in US images. We showed that these features can be com-bined with local symmetry features towards extracting bone boundaries inUS images [P2 [121], P3 [122]].• We demonstrated that the combination of attenuation and local symmetryfeatures result in a more robust segmentation (i.e., more robust to soft-tissueoutliers and US artifacts) than using a local symmetry feature alone [57].We demonstrated this in terms of surface fitting error when comparing withCT-derived ground truth (≈ 0.24mm with combined features-based methodvs. ≈ 0.5mm with local symmetry-based method [57], p < 0.01) [P2 [121],P3 [122]].• We presented a second bone segmentation method that employs isotropicfilters and is independent of the orientation of the bone boundaries [P4 [124],P7 [129]].• We extended the orientation-independent bone segmentation method for usein 3D US [P5 [125]].• We demonstrated that this orientation-independent bone segmentation methodhas lower discrepancy with manually-labelled bone boundaries compared toa previous state-of-the-art [59] (mean discrepancy: around 1 mm for ourmethod vs. around 3 mm for method in [59]) [P6 [128]].6.1.3 Characterizing Hip Joint using 2D USIn chapter 4, we developed a machine learning-based method for identifying whethera 2D US image is adequate for dysplasia metric measurements. We also presentedmethods for automatically extracting the three commonly used dysplasia metrics[104], α , β and FHC. We demonstrated on data from 69 patients that automatic129extraction of these dysplasia metrics reduces variability by around 18% to 21% ascompared to that of their manual extraction counterparts. We also demonstrateda high agreement between manual and automatic Graf classifications (84% agree-ment). In nine cases of disagreement between the manual approach and our au-tomatic method in the Graf-DDH type assignment, process-of-care supported theclassification decisions made by our proposed automatic method. Our classifica-tion tends to classify more patients as having DDH than would be identified assuch by the current manual method. This tendency of heightened suspicion mayperhaps be acceptable from a clinical perspective since we know that a considerablepercentage of patients who initially had a diagnosis of DDH that subsequently re-solved nonetheless had radiological signs of DDH a few months later (e.g., around33% of such patients in Sarkissian’s study [144]). A list of our contributions inchapter 4 are:• We have developed a novel automatic method that identifies the US imagessuitable for making measurements [P4 [124], P7 [129]].• We demonstrated that this identifier is in excellent agreement with identi-fications conducted by experienced clinicians (the area under the receiveroperating characteristics curve > 98%) [P7 [129]].• We have developed a novel automatic method that computes dysplasia met-rics from US images of the neonatal hip [P4 [124], P7 [129], P8 [123], P9[126]].• We demonstrated that this automatic computation can improve the repro-ducibility of dysplasia metrics when compared with the dysplasia metricsthat are manually extracted by expert clinicians (p < 0.01, an approximately18% to 21% improvement) [P7 [129]].• We demonstrated a high agreement between manual and automatic Graf clas-sifications (84% agreement) [P7 [129]].6.1.4 A 3D US-based Hip Dysplasia AssessmentIn chapter 5, we addressed perhaps the most crucial problem in diagnosing DDHwith US imaging: the variability of dysplasia metrics and in turn diagnosis and130treatment decisions is very sensitive to the orientation of the US transducer dur-ing acquisition [79]. To reduce sensitivity to transducer orientation, we proposed(like others [64, 94]) to use 3D US. Here, we investigated new dysplasia metricsthat could be extracted from 3D US and that characterize 3D hip morphology ininfants. We presented an automatic method for extracting these dysplasia met-rics, which included a novel tomographic-reconstruction based segmentation ofthe femoral head. We demonstrated on data from 25 patients that these 3D hipmorphology-derived dysplasia metrics provide a significant reduction in variabil-ity as compared to 2D dysplasia metrics: an approximately 75% reduction for α3Dand an approximately 65% reduction for FHC3D). A list of our contributions inchapter 5 are:• We have proposed two novel 3D hip morphology-derived dysplasia metrics:α3D and FHC3D [P5 [125], P6 [128], P10 [127]].• We have developed automatic image processing pipelines to compute eachof these dysplasia metrics [P5 [125], P6 [128], P10 [127]].• We demonstrated that both α3D and FHC3D provide significantly more con-sistent diagnoses than their 2D counterparts: they are significantly less vari-able (p < 0.01 and approximately 75% reduction in variability for α3D,p < 0.01 and approximately 65% reduction in variability for FHC3D) [P5[125], P6 [128], P10 [127]].6.1.5 Summary of Technical ContributionsWe conclude with a list of our key technical contributions in this thesis:• We developed an intensity-invariant and rotation-invariant local symmetryfeature, the SPS. We demonstrated in our experiments that the SPS featureis effective in identifying bone boundaries in US images of an infant hip[125, 127, 129].• We combined local symmetry feature with attenuation-related features [120],and demonstrated that the combination of attenuation and local symmetry131Figure 6.1: A flow-chart outlining the research questions, core blocks and keycontributions of this thesis. P1 to P10 are publications associated withthis thesis (note that P1 has been submitted and P10 is currently beingprepared for submission). The vertical arrow bars to the right provide avisual direction to the components of this thesis described in the variouschapters.features result in a more robust segmentation (i.e., more robust to soft-tissueoutliers and US artifacts) than using a local symmetry feature alone [57].• We developed the concept of image adequacy in the context of diagnosingDDH using US imaging [129]. An inadequate US image can be due to theacquisition of the US image from an improper transducer position or dueto a sub-standard quality of the US image, both of which can result in anabsence of key anatomical landmarks in the image. By using an adequacyclassifier before computing our dysplasia metrics, we can avoid erroneous132measurements of dysplasia metrics in inadequate images that do not containall the anatomical landmarks necessary for making a dysplasia metric mea-surement. We implemented an automatic 2D US image adequacy classifierthat has high specificity and sensitivity (area under the ROC curve around99.5%).• We proposed a fully automatic method for extracting 2D dysplasia metrics[124, 129]. We demonstrated that the automatically extracted 2D dysplasiametrics using our method had a significantly lower intra-exam variability inrepeated diagnosis compared to a manual 2D diagnosis (approximately 20%reduction in variability, p < 0.01) [129].• We developed a multi-oriented slice-based learning approach (i.e., a randomforests based learning for identifying presence of an object in a slice, fol-lowed by a 3D reconstruction of the probability map of the object based onthe random forests outputs and the corresponding locations of all the slicesat different orientations, subsection 5.4.1) for segmenting the femoral headin 3D US. We found that this approach can be reliably used for segment-ing the femoral head and for extracting the FHC3D (inter-exam variability ofFHC3D reduced by around 65% compared to variability of manual FHC2D,p < 0.01) [127].• We developed a fully automatic method for extracting 3D dysplasia metrics.To the best of our knowledge, our method currently is the only automaticmethod for extracting 3D dysplasia metrics. We demonstrated that the 3Ddysplasia metrics extracted using our method had a significantly lower inter-exam variability in repeated diagnosis compared to the current standard 2Ddiagnosis (approximately 70% reduction in variability, p < 0.01) [125, 127,128].6.2 Future WorkIn this thesis, we developed automatic methods that markedly improves repeata-bility of DDH diagnosis. Our contributions opened up new research questions andsome potential future works in US-based DDH diagnosis:1333D US Image Adequacy: In our study, we found that in approximately 9% of hipexaminations, our surgeons were unable to obtain adequate 3D US scans. Thispoints to a clinical feasibility issue in our method: the run-time for extracting dys-plasia metrics takes around three minutes, so we currently cannot provide any nearreal-time feedback to assure surgeons that they have obtained an adequate scan.To address this issue, in the future we plan on using a machine learning approach(similar to our 2D image adequacy work in chapter 4) to provide near real-timefeedback to users on whether any adequate 3D images have been recorded. Also,Paserin et al. [111] recently proposed a slice-based CNN classifier for identifyingadequate 3D volumes. They demonstrated a perfect 100% classification accuracyon 3D US volumes collected from 15 infant hips, and reported a processing speedof around 2 seconds per volume. Considering that it takes around 2.5 seconds forour 3D transducer to acquire one volume, an approximately 2 seconds processingtime for identifying adequacy can be considered as near real-time. In future, wemay therefore add a CNN-based adequacy classifier to our Sonix Q+ US machinethat will provide active feedback to an user on whether any of the collected USvolumes were adequate for extracting 3D dysplasia metrics.2D US-based DDH Diagnosis: The impact of improving DDH diagnosis with a2D US-based system is perhaps considerably more than that with a 3D US-basedsystem, particularly because most of the centers around the world currently haveaccess only to 2D US systems. A limitation of a 2D US-based diagnosis is thatdysplasia metric values measured from 2D US images can vary considerably basedon orientations and placements of the US transducer during image acquisition. Toaddress this issue, we plan on investigating the reliability of DDH diagnosis usinga 2D US based pseudo-3D technique. More specifically, we plan on acquiring3D US by first measuring the trajectory of our 2D US transducer using a trackingunit as a clinician moves it across the hip joint, and then reconstructing the 3DUS based on the acquired 2D US images and the tracked positions of the 2D USimages, and finally extract dysplasia metrics from the reconstructed 3D US volume.We anticipate that we will be able to further improve the reliability of our 2D USsystem by using this pseudo-3D technique.134Dynamic Assessment of DDH: Recently, Paserin et al. [110] developed the firstquantifiable 3D dynamic assessment of DDH. Paserin et al. [110] used our FHC3Dto get a measure of whether an infant hip is dislocated or not. In future, we mayfurther investigate the application of our proposed 3D dysplasia metrics for use insuch a quantifiable 3D dynamic assessment.Tracing the Evolution of DDH: To understand how DDH evolves in an infanthip, we can use our reliable 3D US-based diagnosis tool to perform a prospectivefollow-up study on infant hips as those hips develop. We can also do a prospectivestudy to identify reliable thresholds for 3D US-based dysplasia metrics for normal,abnormal and borderline hips. With this prospective study, we hope that we will beable to identify criteria to distinguish between healthy and abnormal hips that wecan later provide as guidelines to clinicians using our proposed system.135Bibliography[1] E. M. A. Anas, A. Rasoulian, A. Seitel, K. Darras, D. Wilson, P. S. John,D. Pichora, P. Mousavi, R. Rohling, and P. Abolmaesumi. Automaticsegmentation of wrist bones in ct using a statistical wrist shape + posemodel. IEEE Transactions on Medical Imaging, 35(8):1789–1801, 2016.[2] E. M. A. Anas, A. Seitel, A. Rasoulian, P. S. John, T. Ungi, A. Lasso,K. Darras, D. Wilson, V. A. Lessoway, G. Fichtinger, and Z. M. Boneenhancement in ultrasound based on 3d local spectrum variation forpercutaneous scaphoid fracture fixation. In International Conference onMedical Image Computing and Computer-Assisted Intervention, pages465–473. Springer, 2016.[3] E. Aromataris and D. Riitano. Systematic reviews: constructing a searchstrategy and searching for evidence. AJN The American Journal ofNursing, 114(5):49–56, 2014.[4] L. A. Atweh and J. H. Kan. Multimodality imaging of developmentaldysplasia of the hip. Pediatric radiology, 43(1):166–171, 2013.[5] N. Baka, S. Leenstra, and T. van Walsum. Ultrasound aided vertebral levellocalization for lumbar surgery. IEEE Transactions on Medical Imaging,36(10):2138–2147, 2017.[6] K. G. Baker, V. J. Robertson, and F. A. Duck. A review of therapeuticultrasound: biophysical effects. Physical therapy, 81(7):1351–1358, 2001.[7] J. R. Bale, B. J. Stoll, and A. Lucas. Reducing mortality and morbidityfrom birth defects, 2003.[8] D. H. Ballard. Generalizing the hough transform to detect arbitrary shapes.Pattern Recognition, 13(2):111–122, 1981.136[9] E. Bar-On, S. Meyer, G. Harati, and S. Porat. Ultrasonography of the hip indevelopmental hip dysplasia. J Bone Joint Surg Br, 80(2):321–324, 1998.[10] S. Bernard, L. Heutte, and S. Adam. Influence of hyperparameters onrandom forest accuracy. In MCS, pages 171–180. Springer, 2009.[11] V. Bialik, G. M. Bialik, S. Blazer, P. Sujov, F. Wiener, and M. Berant.Developmental dysplasia of the hip: a new approach to incidence.Pediatrics, 103(1):93–99, 1999.[12] D. Boal and E. Schwenkter. The infant hip: assessment with real-time us.Radiology, 157(3):667–672, 1985.[13] C. E. Bonferroni. Teoria statistica delle classi e calcolo delle probabilita.Libreria internazionale Seeber, 1936.[14] M. Borenstein, L. V. Hedges, J. Higgins, and H. R. Rothstein. A basicintroduction to fixed-effect and random-effects models for meta-analysis.Research Synthesis Methods, 1(2):97–111, 2010.[15] L. Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.[16] A. Brounstein, I. Hacihaliloglu, P. Guy, A. Hodgson, and R. Abugharbieh.Towards real-time 3d us to ct bone image registration using phase andcurvature feature based gmm matching. Medical Image Computing andComputer-Assisted Intervention–MICCAI 2011, pages 235–242, 2011.[17] A. Brounstein, I. Hacihaliloglu, P. Guy, A. Hodgson, and R. Abugharbieh.Fast and accurate data extraction for near real-time registration of 3-dultrasound and computed tomography in orthopedic surgery. Ultrasound inMedicine & Biology, 41(12):3194–3204, 2015.[18] J. R. Buchanan, R. Greer, and J. Cotler. Management strategy forprevention of avascular necrosis during treatment of congenital dislocationof the hip. J Bone Joint Surg Am, 63(1):140–146, 1981.[19] J. Canny. A computational approach to edge detection. IEEE Transactionson Pattern Analysis and Machine Intelligence, 6:679–698, 1986.[20] R. Caruana and A. Niculescu-Mizil. An empirical comparison ofsupervised learning algorithms. In Proceedings of the 23rd InternationalConference on Machine Learning, pages 161–168. ACM, 2006.137[21] J. Cashman, J. Round, G. Taylor, and N. Clarke. The natural history ofdevelopmental dysplasia of the hip after early supervised treatment in thepavlik harness. Bone & Joint Journal, 84(3):418–425, 2002.[22] R. Cavalier, M. J. Herman, P. D. Pizzutillo, and E. Geller.Ultrasound-guided aspiration of the hip in children: a new technique.Clinical Orthopaedics and Related Research, 415:244–247, 2003.[23] E. Cheng, M. Mabee, V. G. Swami, Y. Pi, R. Thompson, S. Dulai, and J. L.Jaremko. Ultrasound quantification of acetabular rounding in hip dysplasia:reliability and correlation to treatment decisions in a retrospective study.Ultrasound in Medicine & Biology, 41(1):56–63, 2015.[24] J. C. Cheng, Y. Chan, P. Hui, W. Shen, and C. Metreweli. Ultrasonographichip morphometry in infants. Journal of Pediatric Orthopaedics, 14(1):24–28, 1994.[25] C. Copuroglu, M. Ozcan, B. Aykac, B. Tuncer, and K. Saridogan.Reliability of ultrasonographic measurements in suspected patients ofdevelopmental dysplasia of the hip and correlation with the acetabularindex. Indian Journal of Orthopaedics, 45(6):553, 2011.[26] A. Criminisi, J. Shotton, and E. Konukoglu. Decision forests forclassification, regression, density estimation, manifold learning andsemi-supervised learning. Microsoft Research Cambridge, Tech. Rep.MSRTR-2011-114, 5(6):12, 2011.[27] J. Czubak, T. Kotwicki, T. Piontek, and H. Skrzypek. Ultrasoundmeasurements of the newborn hip comparison of two methods in 657newborns. Acta orthopaedica Scandinavica, 69(1):21–24, 1998.[28] J. Czubak, T. Piontek, K. Niciejewski, P. Magnowski, M. Majek, andM. Płon´czak. Retrospective analysis of the non-surgical treatment ofdevelopmental dysplasia of the hip using pavlik harness and frejka pillow:comparison of both methods. Ortopedia, Traumatologia, Rehabilitacja, 6(1):9–13, 2004.[29] N. Dalal and B. Triggs. Histograms of oriented gradients for humandetection. In Computer Vision and Pattern Recognition, 2005. CVPR 2005.IEEE Computer Society Conference on, volume 1, pages 886–893. IEEE,2005.138[30] L. Danielsson. Late-diagnosed ddh: a prospective 11-year follow-up of 71consecutive patients (75 hips). Acta Orthopaedica Scandinavica, 71(3):232–242, 2000.[31] R. de Luis-Garcia and C. Alberola-Lopez. Parametric 3d hip jointsegmentation for the diagnosis of developmental dysplasia. In Engineeringin Medicine and Biology Society, 2006. EMBS’06. 28th AnnualInternational Conference of the IEEE, pages 4807–4810. IEEE, 2006.[32] R. DerSimonian and N. Laird. Meta-analysis in clinical trials. ControlledClinical Trials, 7(3):177–188, 1986.[33] M. Descoteaux, M. Audette, K. Chinzei, and K. Siddiqi. Boneenhancement filtering: application to sinus bone segmentation andsimulation of pituitary surgery. Computer Aided Surgery, 11(5):247–255,2006.[34] C. Dezateux and K. Rosendahl. Developmental dysplasia of the hip. TheLancet, 369(9572):1541–1552, 2007.[35] J. Dias, I. Thomas, A. Lamont, and B. Mody. The reliability ofultrasonographic assessment of neonatal hips. Bone & Joint Journal, 75(3):479–482, 1993.[36] D. Dornacher, B. Cakir, H. Reichel, and M. Nelitz. Early radiologicaloutcome of ultrasound monitoring in infants with developmental dysplasiaof the hips. Journal of Pediatric Orthopaedics B, 19(1):27–31, 2010.[37] D. O. Draper, S. Sunderland, D. T. Kirkendall, and M. Ricard. Acomparison of temperature rise in human calf muscles followingapplications of underwater and topical gel ultrasound. Journal ofOrthopaedic & Sports Physical Therapy, 17(5):247–251, 1993.[38] K. Dwan, D. G. Altman, J. A. Arnaiz, J. Bloom, A.-W. Chan, E. Cronin,E. Decullier, P. J. Easterbrook, E. Von Elm, C. Gamble, and G. D.Systematic review of the empirical evidence of study publication bias andoutcome reporting bias. PloS One, 3(8):e3081, 2008.[39] M. Eidelman, A. Katzman, S. Freiman, E. Peled, and V. Bialik. Treatmentof true developmental dysplasia of the hip using pavlik’s method. Journalof Pediatric Orthopaedics B, 12(4):253–258, 2003.139[40] D. Elbourne, C. Dezateux, R. Arthur, N. Clarke, A. Gray, A. King,A. Quinn, F. Gardner, G. Russell, and U. C. H. T. Group. Ultrasonographyin the diagnosis and management of developmental hip dysplasia (uk hiptrial): clinical and economic results of a multicentre randomised controlledtrial. The Lancet, 360(9350):2009–2017, 2002.[41] A. Falliner, D. Schwinzer, H.-J. Hahne, J. Hedderich, and J. Hassenpflug.Comparing ultrasound measurements of neonatal hips using the methods ofgraf and terjesen. Bone & Joint Journal, 88(1):104–106, 2006.[42] Z. Fanti, F. Torres, and F. A. Cosı´o. Preliminary results in large bonesegmentation from 3d freehand ultrasound. In IX International Seminar onMedical Information Processing and Analysis, pages 89220F–89220F.International Society for Optics and Photonics, 2013.[43] M. Felsberg and G. Sommer. The monogenic signal. IEEE Transactions onSignal Processing, 49(12):3136–3144, 2001.[44] S. Feng, S. K. Zhou, S. Good, and D. Comaniciu. Automatic fetal facedetection from ultrasound volumes via learning 3d and 2d information. InComputer Vision and Pattern Recognition, 2009. CVPR 2009. IEEEConference on, pages 2488–2495. IEEE, 2009.[45] M. Ferna´ndez-Delgado, E. Cernadas, S. Barro, and D. Amorim. Do weneed hundreds of classifiers to solve real world classification problems. J.Mach. Learn. Res, 15(1):3133–3181, 2014.[46] P. Foroughi, E. Boctor, M. J. Swartz, R. H. Taylor, and G. Fichtinger. P6d-2ultrasound bone segmentation using dynamic programming. In UltrasonicsSymposium, 2007. IEEE, pages 2523–2526. IEEE, 2007.[47] A. F. Frangi, W. J. Niessen, K. L. Vincken, and M. A. Viergever. Multiscalevessel enhancement filtering. In International Conference on MedicalImage Computing and Computer-Assisted Intervention, pages 130–137.Springer, 1998.[48] D. Golan, Y. Donner, C. Mansi, J. Jaremko, and M. Ramachandran. Fullyautomating grafs method for ddh diagnosis using deep convolutional neuralnetworks. In International Workshop on Large-Scale Annotation ofBiomedical Data and Expert Label Synthesis, pages 130–141. Springer,2016.[49] L. Grady. Random walks for image segmentation. IEEE Transactions onPattern Analysis and Machine Intelligence, 28(11):1768–1783, 2006.140[50] R. Graf. The diagnosis of congenital hip-joint dislocation by the ultrasoniccombound treatment. Archives of Orthopaedic and Trauma Surgery, 97(2):117–133, 1980.[51] R. Graf. New possibilities for the diagnosis of congenital hip jointdislocation by ultrasonography. Journal of Pediatric Orthopaedics, 3(3):354–359, 1983.[52] R. Graf. Hip sonography: 20 years experience and results. HipInternational: the Journal of Clinical and Experimental Research on HipPathology and Therapy, 17:S8–14, 2006.[53] R. Graf, M. Mohajer, and F. Plattner. Hip sonography update.quality-management, catastrophes-tips and tricks. MedicalUltrasonography, 15(4):299, 2013.[54] P. J. Groarke, L. McLoughlin, L. Whitla, P. Lennon, W. Curtin, and P. M.Kelly. Retrospective multicenter analysis of the accuracy of clinicalexamination by community physicians in diagnosing developmentaldysplasia of the hip. The Journal of Pediatrics, 181:163–166, 2017.[55] V. Gulati, K. Eseonu, J. Sayani, N. Ismail, C. Uzoigwe, M. Z. Choudhury,P. Gulati, A. Aqil, and S. Tibrewal. Developmental dysplasia of the hip inthe newborn: A systematic review. World J Orthop, 4(2):32–41, 2013.[56] I. Hacihaliloglu. Enhancement of bone shadow region using localphase-based ultrasound transmission maps. International Journal ofComputer Assisted Radiology and Surgery, 12(6):951–960, 2017.[57] I. Hacihaliloglu, R. Abugharbieh, A. J. Hodgson, and R. N. Rohling.Automatic adaptive parameterization in local phase feature-based bonesegmentation in ultrasound. Ultrasound in Medicine & Biology, 37(10):1689–1703, 2011.[58] I. Hacihaliloglu, A. Rasoulian, R. N. Rohling, and P. Abolmaesumi.Statistical shape model to 3d ultrasound registration for spine interventionsusing enhanced local phase features. In International Conference onMedical Image Computing and Computer-Assisted Intervention, pages361–368. Springer, 2013.[59] I. Hacihaliloglu, P. Guy, A. J. Hodgson, and R. Abugharbieh. Automaticextraction of bone surfaces from 3d ultrasound images in orthopaedictrauma cases. International Journal of Computer Assisted Radiology andSurgery, 10(8):1279–1287, 2015.141[60] J. A. Hanley and B. J. McNeil. The meaning and use of the area under areceiver operating characteristic (roc) curve. Radiology, 143(1):29–36,1982.[61] H. T. Harcke and L. E. Grissom. Performing dynamic sonography of theinfant hip. AJR. American Journal of Roentgenology, 155(4):837–844,1990.[62] H. T. Harcke, N. Clarke, M. Lee, P. F. Borns, and G. D. MacEwen.Examination of the infant hip with real-time ultrasonography. Journal ofUltrasound in Medicine, 3(3):131–137, 1984.[63] M. G. Harding, H. T. Harcke, J. R. Bowen, J. T. Guille, and J. Glutting.Management of dislocated hips with pavlik harness treatment andultrasound monitoring. Journal of Pediatric Orthopaedics, 17(2):189–198,1997.[64] A. R. Hareendranathan, M. Mabee, K. Punithakumar, M. Noga, and J. L.Jaremko. A technique for semiautomatic segmentation of echogenicstructures in 3d ultrasound, applied to infant hip dysplasia. InternationalJournal of Computer Assisted Radiology and Surgery, 11(1):31–42, 2016.[65] A. R. Hareendranathan, M. Mabee, K. Punithakumar, M. Noga, and J. L.Jaremko. Toward automated classification of acetabular shape inultrasound for diagnosis of ddh: Contour alpha angle and the roundingindex. Computer Methods and Programs in Biomedicine, 129:89–98, 2016.[66] A. R. Hareendranathan, D. Zonoobi, M. Mabee, D. Cobzas,K. Punithakumar, M. Noga, and J. L. Jaremko. Toward automatic diagnosisof hip dysplasia from 2d ultrasound. In Biomedical Imaging (ISBI 2017),2017 IEEE 14th International Symposium on, pages 982–985. IEEE, 2017.[67] W. H. Harris. Etiology of osteoarthritis of the hip. Clinical Orthopaedicsand Related Research, 213:20–33, 1986.[68] M. Harris-Hayes and N. K. Royer. Relationship of acetabular dysplasia andfemoroacetabular impingement to hip osteoarthritis: a focused review.PM&R, 3(11):1055–1067, 2011.[69] L. Hedges and I. Olkin. Statistical methods for meta-analysis. orlando, fl:Academic press. HedgesStatistical Methods for Meta-analysis1985, 1985.[70] W. R. Hendee, E. R. Ritenour, and K. R. Hoffmann. Medical imagingphysics. Medical Physics, 30(4):730–730, 2003.142[71] J. P. Higgins and S. Green. Cochrane handbook for systematic reviews ofinterventions, volume 4. John Wiley & Sons, 2011.[72] F. T. Hoaglund. Primary osteoarthritis of the hip: a genetic disease causedby european genetic variants. J Bone Joint Surg Am, 95(5):463–468, 2013.[73] K. J. Holen, T. Terjesen, A. Tegnander, T. Bredland, O. D. Saether, andS. H. Eik-Nes. Ultrasound screening for hip dysplasia in newborns.Journal of Pediatric Orthopaedics, 14(5):667–673, 1994.[74] G. A. Hosny, W. Koizumi, and M. K. Benson. Ultrasound screening of theinfant’s hip: introduction of a new combined angle. Journal of PediatricOrthopaedics B, 11(3):204–211, 2002.[75] J. Howick, I. Chalmers, P. Glasziou, T. Greenhalgh, and C. Heneghan.Oxford centre for evidence-based medicine 2011 levels of evidence. ocebmlevels evid work gr, 2011.[76] M. A. Hussain, A. Hodgson, and R. Abugharbieh. Robust bone detection inultrasound using combined strain imaging and envelope signal powerdetection. In International Conference on Medical Image Computing andComputer-Assisted Intervention, pages 356–363. Springer, 2014.[77] M. A. Hussain, A. J. Hodgson, and R. Abugharbieh. Strain-initializedrobust bone surface detection in 3-d ultrasound. Ultrasound in Medicine &Biology, 43(3):648–661, 2017.[78] M. Imrie, V. Scott, P. Stearns, T. Bastrom, and S. J. Mubarak. Is ultrasoundscreening for ddh in babies born breech sufficient? Journal of Children’sOrthopaedics, 4(1):3–8, 2009.[79] J. L. Jaremko, M. Mabee, V. G. Swami, L. Jamieson, K. Chow, and R. B.Thompson. Potential for change in us diagnosis of hip dysplasia solelycaused by changes in probe orientation: patterns of alpha-angle variationrevealed by using three-dimensional us. Radiology, 273(3):870–878, 2014.[80] R. Jia, S. Mellon, P. Monk, D. Murray, and J. A. Noble. A computer-aidedtracking and motion analysis with ultrasound (cat & maus) system for thedescription of hip joint kinematics. International Journal of ComputerAssisted Radiology and Surgery, 11(11):1965–1977, 2016.[81] N. Jomha, J. McIvor, and G. Sterling. Ultrasonography in developmentalhip dysplasia. Journal of Pediatric Orthopedics, 15(1):101–104, 1994.143[82] D. P. G. Jones, A. G. Vane, G. Coulter, P. Herbison, and J. D. Dunbar.Ultrasound measurements in the management of unstable hips treated withthe pavlik harness: reliability and correlation with outcome. Journal ofPediatric Orthopaedics, 26(6):818–822, 2006.[83] P. Ju¨ni, F. Holenstein, J. Sterne, C. Bartlett, and M. Egger. Direction andimpact of language bias in meta-analyses of controlled trials: empiricalstudy. International Journal of Epidemiology, 31(1):115–123, 2002.[84] A. Karamalis, W. Wein, T. Klein, and N. Navab. Ultrasound confidencemaps using random walks. Medical Image Analysis, 16(6):1101–1112,2012.[85] J. Kawahara, J.-M. Peyrat, J. Abinahed, O. Al-Alao, A. Al-Ansari,R. Abugharbieh, and G. Hamarneh. Automatic labelling of tumourousframes in free-hand laparoscopic ultrasound video. In InternationalConference on Medical Image Computing and Computer-AssistedIntervention, pages 676–683. Springer, 2014.[86] A. Kolb, E. Benca, M. Willegger, S. E. Puchner, R. Windhager, andC. Chiari. Measurement considerations on examiner-dependent factors inthe ultrasound assessment of developmental dysplasia of the hip.International Orthopaedics, pages 1–6, 2017.[87] J. Kottner, L. Audige´, S. Brorson, A. Donner, B. J. Gajewski,A. Hro´bjartsson, C. Roberts, M. Shoukri, and D. L. Streiner. Guidelines forreporting reliability and agreement studies (grras) were proposed.International Journal of Nursing Studies, 48(6):661–671, 2011.[88] P. Kovesi. Symmetry and asymmetry from local phase. In Tenth AustralianJoint Conference on Artificial Intelligence, volume 190, pages 2–4.Citeseer, 1997.[89] J. Kowal, C. Amstutz, F. Langlotz, H. Talib, and M. G. Ballester.Automated bone contour detection in ultrasound b-mode images forminimally invasive registration in computer-assisted surgeryan in vitroevaluation. The International Journal of Medical Robotics and ComputerAssisted Surgery, 3(4):341–348, 2007.[90] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification withdeep convolutional neural networks. In Advances in Neural InformationProcessing Systems, pages 1097–1105, 2012.144[91] R. Kwitt, N. Vasconcelos, S. Razzaque, and S. Aylward. Localizing targetstructures in ultrasound video–a phantom study. Medical Image Analysis,17(7):712–722, 2013.[92] H. P. Lehmann, R. Hinton, P. Morello, and J. Santoli. Developmentaldysplasia of the hip practice guideline: technical report. Pediatrics, 105(4):e57–e57, 2000.[93] M. Mabee, S. Dulai, R. B. Thompson, and J. L. Jaremko. Reproducibilityof acetabular landmarks and a standardized coordinate system obtainedfrom 3d hip ultrasound. Ultrasonic Imaging, 37(4):267–276, 2015.[94] M. G. Mabee, A. R. Hareendranathan, R. B. Thompson, S. Dulai, and J. L.Jaremko. An index for diagnosing infant hip dysplasia using 3-dultrasound: the acetabular contact angle. Pediatric Radiology, 46(7):1023–1031, 2016.[95] M. A. Maraci, R. Napolitano, A. Papageorghiou, and J. A. Noble.Searching for structures of interest in an ultrasound video sequence. InInternational Workshop on Machine Learning in Medical Imaging, pages133–140. Springer, 2014.[96] M. A. Maraci, C. P. Bridge, R. Napolitano, A. Papageorghiou, and J. A.Noble. A framework for analysis of linear ultrasound videos to detect fetalpresentation and heartbeat. Medical Image Analysis, 37:22–36, 2017.[97] M. L. McHugh. Interrater reliability: the kappa statistic. BiochemiaMedica, 22(3):276–282, 2012.[98] C. Morin, H. Harcke, and G. MacEwen. The infant hip: real-time usassessment of acetabular development. Radiology, 157(3):673–677, 1985.[99] K. Mulpuri, K. M. Song, M. J. Goldberg, and K. Sevarino. Detection andnonoperative management of pediatric developmental dysplasia of the hipin infants up to six months of age. Journal of the American Academy ofOrthopaedic Surgeons, 23(3):202–205, 2015.[100] S. Nakamura, S. Ninomiya, and T. Nakamura. Primary osteoarthritis of thehip joint in japan. Clinical Orthopaedics and Related Research, 241:190–196, 1989.[101] S. R. Narum. Beyond bonferroni: less conservative analyses forconservation genetics. Conservation Genetics, 7(5):783–787, 2006.145[102] D. Ni, X. Yang, X. Chen, C.-T. Chin, S. Chen, P. A. Heng, S. Li, J. Qin, andT. Wang. Standard plane localization in ultrasound by radial componentmodel and selective search. Ultrasound in Medicine & Biology, 40(11):2728–2742, 2014.[103] J. A. Noble and D. Boukerroui. Ultrasound image segmentation: a survey.IEEE Transactions on Medical Imaging, 25(8):987–1010, 2006.[104] A. C. of Radiology. Acr-aium practice guideline for the performance of theultrasound examination for detection and assessment of developmentaldysplasia of the hip (acr guidelines), 2012.[105] T. Ojala, M. Pietikainen, and T. Maenpaa. Multiresolution gray-scale androtation invariant texture classification with local binary patterns. IEEETransactions on Pattern Analysis and Machine Intelligence, 24(7):971–987, 2002.[106] H. O¨merog˘lu. Use of ultrasonography in developmental dysplasia of thehip. Journal of Children’s Orthopaedics, 8(2):105–113, 2014.[107] H. O¨meroglu, A. Bic¸moglu, S. Koparal, and S. Seber. Assessment ofvariations in the measurement of hip ultrasonography by the graf method indevelopmental dysplasia of the hip [s]. Journal of Pediatric OrthopaedicsB, 10(2):89–95, 2001.[108] M. M. Orak, T. Onay, T. C¸ag˘ırmaz, C. Elibol, F. D. Elibol, and T. Centel.The reliability of ultrasonography in developmental dysplasia of the hip:How reliable is it in different hands? Indian Journal of Orthopaedics, 49(6):610, 2015.[109] F. Ozdemir, E. Ozkan, and O. Goksel. Graphical modeling of ultrasoundpropagation in tissue for automatic bone segmentation. In InternationalConference on Medical Image Computing and Computer-AssistedIntervention, pages 256–264. Springer, 2016.[110] O. Paserin, N. Quader, K. Mulpuri, A. Cooper, A. Hodgson, andR. Abugharbieh. Quantifying dynamic assessment of developmentaldysplasia of the hip. In 17th Annual Meeting of the International Societyfor Computer Assisted Orthopaedic Surgery, 2015.[111] O. Paserin, K. Mulpuri, A. Cooper, A. J. Hodgson, and R. Abugharbieh.Automatic near real-time evaluation of 3d ultrasound scan adequacy fordevelopmental dysplasia of the hip. In Computer Assisted and Robotic146Endoscopy and Clinical Image-Based Procedures, pages 124–132.Springer, 2017.[112] H. Patel and C. T. F. on Preventive Health Care. Preventive health care,2001 update: screening and management of developmental dysplasia of thehip in newborns. Canadian Medical Association Journal, 164(12):1669–1677, 2001.[113] C. D. Peterlein, K. F. Schu¨ttler, S. Lakemeier, N. Timmesfeld, C. Go¨rg,S. Fuchs-Winkelmann, and M. D. Schofer. Reproducibility of differentscreening classifications in ultrasonography of the newborn hip. BMCPediatrics, 10(1):98, 2010.[114] C.-D. Peterlein, S. Fuchs-Winkelmann, K.-F. Schu¨ttler, S. Lakemeier,N. Timmesfeld, C. Go¨rg, and M. D. Schofer. Does probe frequencyinfluence diagnostic accuracy in newborn hip ultrasound? Ultrasound inMedicine & Biology, 38(7):1116–1120, 2012.[115] J. L. Peters and K. L. Mengersen. Meta-analysis of repeated measuresstudy designs. Journal of Evaluation in Clinical Practice, 14(5):941–950,2008.[116] V. Pollet, V. Percy, and H. J. Prior. Relative risk and incidence fordevelopmental dysplasia of the hip. The Journal of Pediatrics, 181:202–207, 2017.[117] R. Pool, B. Foster, and D. Paterson. Avascular necrosis in congenital hipdislocation. the significance of splintage. Bone & Joint Journal, 68(3):427–430, 1986.[118] E. N. Powell, F. J. Gerratana, and J. R. Gage. Open reduction for congenitalhip dislocation: the risk of avascular necrosis with three differentapproaches. Journal of Pediatric Orthopaedics, 6(2):127–132, 1986.[119] C. T. Price and B. A. Ramo. Prevention of hip dysplasia in children andadults. Orthopedic Clinics of North America, 43(3):269–279, 2012.[120] N. Quader, A. Hodgson, and R. Abugharbieh. Confidence weighted localphase features for robust bone surface segmentation in ultrasound. InWorkshop on Clinical Image-Based Procedures, pages 76–83. Springer,2014.147[121] N. Quader, A. Hodgson, and R. Abugharbieh. Confidence weighted localphase features for robust bone surface segmentation in ultrasound. InWorkshop on Clinical Image-Based Procedures, pages 76–83. Springer,2014.[122] N. Quader, A. Hodgson, and R. Abugharbieh. Assessing the feasibility ofdownsampling and wavelet resizing for real-time extraction of bonesurfaces from 3d ultrasound. In 15th Annual Meeting of the InternationalSociety for Computer Assisted Orthopaedic Surgery, 2015.[123] N. Quader, A. Hodgson, K. Mulpuri, and R. Abugharbieh. Improvingdiagnostic accuracy of hip dysplasia measures in 2d ultrasound scans ofinfants to guide decisions regarding need for surgery. In 15th AnnualMeeting of the International Society for Computer Assisted OrthopaedicSurgery, 2015.[124] N. Quader, A. Hodgson, K. Mulpuri, T. Savage, and R. Abugharbieh.Automatic assessment of developmental dysplasia of the hip. InBiomedical Imaging (ISBI), 2015 IEEE 12th International Symposium on,pages 13–16. IEEE, 2015.[125] N. Quader, A. Hodgson, K. Mulpuri, A. Cooper, and R. Abugharbieh.Towards reliable automatic characterization of neonatal hip dysplasia from3d ultrasound images. In International Conference on Medical ImageComputing and Computer-Assisted Intervention, pages 602–609. Springer,2016.[126] N. Quader, A. Hodgson, K. Mulpuri, A. Cooper, E. Schaeffer, andR. Abugharbieh. A reliable automatic 2d measurement for developmentaldysplasia of the hip. In 16th Annual Meeting of the International Societyfor Computer Assisted Orthopaedic Surgery, 2016.[127] N. Quader, A. J. Hodgson, K. Mulpuri, A. Cooper, and R. Abugharbieh. A3d femoral head coverage metric for enhanced reliability in diagnosing hipdysplasia. In International Conference on Medical Image Computing andComputer-Assisted Intervention, pages 100–107. Springer, 2017.[128] N. Quader, A. J. Hodgson, K. Mulpuri, A. Cooper, and R. Abugharbieh. 3dmorphology-based diagnosis of developmental dysplasia of the hip usingultrasound imaging. Ultrasound in Medicine & Biology, 0(0):0–0, 2017.[129] N. Quader, A. J. Hodgson, K. Mulpuri, E. Schaeffer, and R. Abugharbieh.Automatic evaluation of scan adequacy and dysplasia metrics in 2-d148ultrasound images of the neonatal hip. Ultrasound in Medicine & Biology,43(6):1252–1262, 2017.[130] B. Rahmatullah, A. Papageorghiou, and J. A. Noble. Automated selectionof standardized planes from ultrasound volume. In International Workshopon Machine Learning in Medical Imaging, pages 35–42. Springer, 2011.[131] I. Rakovac, A. Tudor, B. Sestan, T. Prpic, G. Gulan, T. Madarevic,V. Santic, and L. Ruzic. New l value parameter simplifies and enhances hipultrasound interpretation in the detection of developmental dysplasia of thehip. International Orthopaedics, 35(10):1523–1528, 2011.[132] P. Ramsey, S. Lasser, and G. MacEwen. Congenital dislocation of the hip.use of the pavlik harness in the child during the first six months of life. JBone Joint Surg Am, 58(7):1000–1004, 1976.[133] S. Ramwadhdoebe, R. Sakkers, C. S. Uiterwaal, M. M. Boere-Boonekamp,and F. J. Beek. Evaluation of a training program for general ultrasoundscreening for developmental dysplasia of the hip in preventive child healthcare. Pediatric Radiology, 40(10):1634–1639, 2010.[134] H. Rivaz, E. M. Boctor, M. A. Choti, and G. D. Hager. Real-timeregularized ultrasound elastography. IEEE Transactions on MedicalImaging, 30(4):928–945, 2011.[135] E. Roovers, M. M. Boere-Boonekamp, T. Geertsma, G. Zielhuis, andA. Kerkhoff. Ultrasonographic screening for developmental dysplasia ofthe hip in infants. Bone & Joint Journal, 85(5):726–730, 2003.[136] E. Roovers, M. Boere-Boonekamp, R. Castelein, G. Zielhuis, andT. Kerkhoff. Effectiveness of ultrasound screening for developmentaldysplasia of the hip. Archives of Disease in Childhood-Fetal and NeonatalEdition, 90(1):F25–F30, 2005.[137] A. Roposch and J. G. Wright. Increased diagnostic information andunderstanding disease: Uncertainty in the diagnosis of developmental hipdysplasia 1. Radiology, 242(2):355–359, 2007.[138] A. Roposch, R. Graf, and J. G. Wright. Determining the reliability of thegraf classification for hip dysplasia. Clinical Orthopaedics and RelatedResearch, 447:119–124, 2006.149[139] A. Roposch, N. M. Moreau, E. Uleryk, and A. S. Doria. Developmentaldysplasia of the hip: Quality of reporting of diagnostic accuracy for us 1.Radiology, 241(3):854–860, 2006.[140] M. R. Rosenberg, R. Walton, E. A. Rae, S. Bailey, and R. O. Nicol.Intra-articular dysplasia of the femoral head in developmental dysplasia ofthe hip. Journal of Pediatric Orthopaedics B, 26(4):298–302, 2017.[141] K. Rosendahl, A. Aslaksen, R. Lie, and T. Markestad. Reliability ofultrasound in the early diagnosis of developmental dysplasia of the hip.Pediatric Radiology, 25(3):219–224, 1995.[142] J. A. Rosenthal, X. Lu, and P. Cram. Availability of consumer prices fromus hospitals for a common surgical procedure. JAMA Internal Medicine,173(6):427–432, 2013.[143] M. Salehi, R. Prevost, J.-L. Moctezuma, N. Navab, and W. Wein. Preciseultrasound bone registration with learning-based segmentation and speed ofsound calibration. In International Conference on Medical ImageComputing and Computer-Assisted Intervention, pages 682–690. Springer,2017.[144] E. J. Sarkissian, W. N. Sankar, X. Zhu, C. H. Wu, and J. M. Flynn.Radiographic follow-up of ddh in infants: are x-rays necessary after anormalized ultrasound? Journal of Pediatric Orthopaedics, 35(6):551–555,2015.[145] M. Sewell, K. Rosendahl, and D. Eastwood. Developmental dysplasia ofthe hip. BMJ (CR)-print, 339(4):b4454, 2009.[146] E. Sheiner, R. Hackmon, I. Shoham-Vardi, X. Pombar, M. Hussey,H. Strassner, and J. Abramowicz. A comparison between acoustic outputindices in 2d and 3d/4d ultrasound in obstetrics. Ultrasound in obstetrics &gynecology, 29(3):326–328, 2007.[147] S. A. Shipman, M. Helfand, V. A. Moyer, and B. P. Yawn. Screening fordevelopmental dysplasia of the hip: a systematic literature review for the uspreventive services task force. Pediatrics, 117(3):e557–e576, 2006.[148] D. Shorter, T. Hong, and D. A. Osborn. Screening programmes fordevelopmental dysplasia of the hip in newborn infants. Sao Paulo MedicalJournal, 131(2):139–140, 2013.150[149] E. Simon, F. Saur, M. Buerge, R. Glaab, M. Roos, and G. Kohler.Inter-observer agreement of ultrasonographic measurement of alpha andbeta angles and the final type classification based on the graf method. SwissMedical Weekly, 134(45-46):671–677, 2004.[150] E. Smergel, S. B. Losik, and H. K. Rosenberg. Sonography of hipdysplasia. Ultrasound Quarterly, 20(4):201–216, 2004.[151] S. Sun. Meta-analysis of cohens kappa. Health Services and OutcomesResearch Methodology, 11(3-4):145–163, 2011.[152] R. Taylor. Interpretation of the correlation coefficient: a basic review.Journal of Diagnostic Medical Sonography, 6(1):35–39, 1990.[153] S. R. Teixeira, V. F. Dalto, D. A. Maranho, O. S. Zoghbi-Neto, J. B.Volpon, and M. H. Nogueira-Barbosa. Comparison between graf methodand pubo-femoral distance in neutral and flexion positions to diagnosedevelopmental dysplasia of the hip. European Journal of Radiology, 84(2):301–306, 2015.[154] T. Terjesen. Ultrasound as the primary imaging method in the diagnosis ofhip dysplasia in children aged< 2 years., 1996.[155] T. Terjesen, T. Ø. Runden, and A˚. Tangerud. Ultrasonography andradiography of the hip in infants. Acta Orthopaedica Scandinavica, 60(6):651–660, 1989.[156] P. H. Torr and A. Zisserman. Mlesac: A new robust estimator withapplication to estimating image geometry. Computer Vision and ImageUnderstanding, 78(1):138–156, 2000.[157] C. Tre´guier, M. Chapuis, B. Branger, B. Bruneau, A. Grellier, K. Chouklati,M. Proisy, P. Darnault, P. Violas, P. Pladys, and Y. Gandon. Pubo-femoraldistance: an easy sonographic screening test to avoid late diagnosis ofdevelopmental dysplasia of the hip. European Radiology, 23(3):836–844,2013.[158] C. Tschauner and H. Matthiessen. Hu¨ftsonografie nach GRAF beiSa¨uglingen: Checklisten helfen, Fehler zu vermeiden, 2012.[159] J. J. Tucci, S. J. Kumar, J. T. Guille, and E. R. Rubbo. Late acetabulardysplasia following early successful pavlik harness treatment of congenitaldislocation of the hip. JPO: Journal of Prosthetics and Orthotics, 3(4):502–505, 1991.151[160] Y. Tumer, W. T. Ward, and J. Grudziak. Medial open reduction in thetreatment of developmental dislocation of the hip. Journal of PediatricOrthopaedics, 17(2):176–180, 1997.[161] T. Varghese. Quasi-static ultrasound elastography. Ultrasound Clinics, 4(3):323–338, 2009.[162] X. Wang, T. X. Han, and S. Yan. An hog-lbp human detector with partialocclusion handling. In Computer Vision, 2009 IEEE 12th InternationalConference on, pages 32–39. IEEE, 2009.[163] T. Woodacre, A. Dhadwal, T. Ball, C. Edwards, and P. Cox. The costs oflate detection of developmental dysplasia of the hip. Journal of Children’sOrthopaedics, 8(4):325–332, 2014.[164] M. Yaqub, B. Kelly, A. T. Papageorghiou, and J. A. Noble. A deep learningsolution for automatic fetal neurosonographic diagnostic plane verificationusing clinical standard constraints. Ultrasound in Medicine & Biology,2017.[165] T. Yoshitaka, S. Mitani, K. Aoki, A. Miyake, and H. Inoue. Long-termfollow-up of congenital subluxation of the hip. Journal of PediatricOrthopaedics, 21(4):474–480, 2001.[166] L. Zhang, S. Chen, C. T. Chin, T. Wang, and S. Li. Intelligent scanning:Automated standard plane selection and biometric measurement of earlygestational sac in routine ultrasound examination. Medical Physics, 39(8):5015–5027, 2012.[167] M. Zieger. Ultrasound of the infant hip. part 2. validity of the method.Pediatric Radiology, 16(6):488–492, 1986.152Appendix AExamples of Automatic DysplasiaMetric Extraction from 2D USImagesThe following figures show examples for automatic extractions of 2D US-baseddysplasia metrics in a healthy hip, a borderline hip and a dysplastic hip – we foundthat our approximations of the ilium, labrum, acetabulum and femoral head andour estimation of the 2D US–based dysplasia metrics are realistic.153Figure A.1: Visualization for the extracted dysplasia metrics from a 2D USimage in a healthy hip. In this example, the values of automaticallyextracted dysplasia metrics are: α = 70◦, β = 29◦ and FHC = 73.2%.Figure A.2: Visualization for the extracted dysplasia metrics from a 2D USimage in a borderline hip. In this example, the values of automaticallyextracted dysplasia metrics are: α = 47◦, β = 40◦ and FHC = 59.8%.154Figure A.3: Visualization for the extracted dysplasia metrics from a 2D USimage in a dysplastic hip. In this example, the values of automaticallyextracted dysplasia metrics are: α = 38◦, β = 56◦ and FHC = 41%.155Appendix BExamples of Automatic DysplasiaMetric Extraction from 3D USImagesThe following figures show examples for automatic extractions of 3D US-baseddysplasia metrics in a healthy hip, a borderline hip and a dysplastic hip – we foundthat our approximations of the ilium, acetabulum and femoral head and our esti-mation of the 3D US–based dysplasia metrics are realistic.156Figure B.1: Visualization for the extracted dysplasia metrics from a 3D USimage in a healthy hip. In this example, the values of automaticallyextracted dysplasia metrics are: α = 71.8◦ and FHC = 72.4%. Thevisualization in here was done automatically using 2017a (MATLAB2017a, the Mathworks Inc., Natick, MA, USA). The 3D plot generatedusing MATLAB was later manually rotated in 3D to make the planes ofthe ilium and acetabulum more apparent. In Chapter 5, the visualizationwas done manually using AMIRA software (TGS, San Diego, USA).157Figure B.2: Visualization for the extracted dysplasia metrics from a 3D USimage in a borderline hip. In this example, the values of automaticallyextracted dysplasia metrics are: α = 53.1◦ and FHC = 47.9%. Thevisualization in here was done automatically using 2017a (MATLAB2017a, the Mathworks Inc., Natick, MA, USA). The 3D plot generatedusing MATLAB was later manually rotated in 3D to make the planes ofthe ilium and acetabulum more apparent.158Figure B.3: Visualization for the extracted dysplasia metrics from a 3D USimage in a dysplastic hip. In this example, the values of automaticallyextracted dysplasia metrics are: α = 35.6◦ and FHC = 22.3%.159

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0364129/manifest

Comment

Related Items