{"Affiliation":[{"label":"Affiliation","value":"Applied Science, Faculty of","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","classmap":"vivo:EducationalProcess","property":"vivo:departmentOrSchool"},"iri":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","explain":"VIVO-ISF Ontology V1.6 Property; The department or school name within institution; Not intended to be an institution name."},{"label":"Affiliation","value":"Biomedical Engineering, School of","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","classmap":"vivo:EducationalProcess","property":"vivo:departmentOrSchool"},"iri":"http:\/\/vivoweb.org\/ontology\/core#departmentOrSchool","explain":"VIVO-ISF Ontology V1.6 Property; The department or school name within institution; Not intended to be an institution name."}],"AggregatedSourceRepository":[{"label":"Aggregated Source Repository","value":"DSpace","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","classmap":"ore:Aggregation","property":"edm:dataProvider"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/dataProvider","explain":"A Europeana Data Model Property; The name or identifier of the organization who contributes data indirectly to an aggregation service (e.g. Europeana)"}],"Campus":[{"label":"Campus","value":"UBCV","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#degreeCampus","classmap":"oc:ThesisDescription","property":"oc:degreeCampus"},"iri":"https:\/\/open.library.ubc.ca\/terms#degreeCampus","explain":"UBC Open Collections Metadata Components; Local Field; Identifies the name of the campus from which the graduate completed their degree."}],"Creator":[{"label":"Creator","value":"El-Hariri, Houssam","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/creator","classmap":"dpla:SourceResource","property":"dcterms:creator"},"iri":"http:\/\/purl.org\/dc\/terms\/creator","explain":"A Dublin Core Terms Property; An entity primarily responsible for making the resource.; Examples of a Contributor include a person, an organization, or a service."}],"DateAvailable":[{"label":"Date Available","value":"2020-03-10T17:36:53Z","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/issued","classmap":"edm:WebResource","property":"dcterms:issued"},"iri":"http:\/\/purl.org\/dc\/terms\/issued","explain":"A Dublin Core Terms Property; Date of formal issuance (e.g., publication) of the resource."}],"DateIssued":[{"label":"Date Issued","value":"2020","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/issued","classmap":"oc:SourceResource","property":"dcterms:issued"},"iri":"http:\/\/purl.org\/dc\/terms\/issued","explain":"A Dublin Core Terms Property; Date of formal issuance (e.g., publication) of the resource."}],"Degree":[{"label":"Degree (Theses)","value":"Master of Applied Science - MASc","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#relatedDegree","classmap":"vivo:ThesisDegree","property":"vivo:relatedDegree"},"iri":"http:\/\/vivoweb.org\/ontology\/core#relatedDegree","explain":"VIVO-ISF Ontology V1.6 Property; The thesis degree; Extended Property specified by UBC, as per https:\/\/wiki.duraspace.org\/display\/VIVO\/Ontology+Editor%27s+Guide"}],"DegreeGrantor":[{"label":"Degree Grantor","value":"University of British Columbia","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#degreeGrantor","classmap":"oc:ThesisDescription","property":"oc:degreeGrantor"},"iri":"https:\/\/open.library.ubc.ca\/terms#degreeGrantor","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the institution where thesis was granted."}],"Description":[{"label":"Description","value":"Developmental Dysplasia of the Hip is one of the most common congenital disorders. Misdiagnosis leads to financial consequences and reduced quality of life. The current standard diagnostic technique involves imaging the hip with ultrasound and extracting metrics such as the \u03b1 angle. This has been shown to be unreliable due to human error in probe positioning, leading to misdiagnosis. 3D ultrasound, being more robust to errors in probe positioning, has been introduced as a more reliable alternative. In this thesis, we aim to further improve the image processing techniques of the 3D ultrasound-based system, addressing three components: segmentation, metrics extraction, and adequacy classification. \r\n\r\nSegmentation in 3D is prohibitively slow when performed manually and introduces human error. Previous work introduced automatic segmentation techniques, but our observations indicate lack of accuracy and robustness with these techniques. We propose to use deep Convolutional Neural Network (CNN)s for improving the segmentation accuracy and consequently the reproducibility and robustness of dysplasia measurement. We show that 3D-U-Net achieves higher agreement with human labels compared to the state-of-the-art. For pelvis bone surface segmentation, we report mean DSC of 85% with 3D-U-Net vs. 26% with CSPS. For femoral head segmentation, we report mean CED Error of 1.42mm with 3D-U-Net vs. 3.90mm with the Random Forest Classifier. \r\n\r\nWe implement methods for extracting \u03b1\u2083D, FHC\u2083D, and OCR dysplasia metrics using the improved segmentation. On a clinical set of 42 hips, we report inter-exam, intra-sonographer intraclass correlation coefficients of 87%, 84%, and 74% for these three metrics, respectively, beating the state-of-the-art. Qualitative observations show improved robustness and reduced failure rates. \r\n\r\nPrevious work had explored automatic adequacy classification of hip 3D ultrasound, to provide clinicians with rapid point-of-care feedback on the quality of the scan. We revisit the originally proposed adequacy criteria and show that these criteria can be improved. Further, we show that 3D CNNs can be used to automate this task. Our best model shows good agreement with human labels, achieving an AROC of 84%. \r\n\r\nUltimately, we aim to incorporate these models into a fully automatic, accurate, reliable, and robust system for hip dysplasia diagnosis.","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/description","classmap":"dpla:SourceResource","property":"dcterms:description"},"iri":"http:\/\/purl.org\/dc\/terms\/description","explain":"A Dublin Core Terms Property; An account of the resource.; Description may include but is not limited to: an abstract, a table of contents, a graphical representation, or a free-text account of the resource."}],"DigitalResourceOriginalRecord":[{"label":"Digital Resource Original Record","value":"https:\/\/circle.library.ubc.ca\/rest\/handle\/2429\/73709?expand=metadata","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","classmap":"ore:Aggregation","property":"edm:aggregatedCHO"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/aggregatedCHO","explain":"A Europeana Data Model Property; The identifier of the source object, e.g. the Mona Lisa itself. This could be a full linked open date URI or an internal identifier"}],"FullText":[{"label":"Full Text","value":"Reliable and Robust Hip Dysplasia Measurement withThree-Dimensional Ultrasound and Convolutional NeuralNetworksbyHoussam El-HaririB.A.Sc., Simon Fraser University, 2015A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of Applied ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Biomedical Engineering)The University of British Columbia(Vancouver)March 2020c\u00a9 Houssam El-Hariri, 2020The following individuals certify that they have read, and recommend to the Fac-ulty of Graduate and Postdoctoral Studies for acceptance, the thesis entitled:Reliable and Robust Hip Dysplasia Measurement with Three-DimensionalUltrasound and Convolutional Neural Networkssubmitted by Houssam El-Hariri in partial fulfillment of the requirements for thedegree of Master of Applied Science in Biomedical Engineering.Examining Committee:Rafeef Garbi, Electrical and Computer EngineeringCo-supervisorAntony J. Hodgson, Mechanical EngineeringCo-supervisorKishore Mulpuri, OrthopaedicsSupervisory Committee MemberPeter Cripton, Mechanical EngineeringSupervisory Committee MemberiiAbstractDevelopmental Dysplasia of the Hip is one of the most common congenital dis-orders. Misdiagnosis leads to financial consequences and reduced quality of life.The current standard diagnostic technique involves imaging the hip with ultrasoundand extracting metrics such as the \u03b1 angle. This has been shown to be unreliabledue to human error in probe positioning, leading to misdiagnosis. 3D ultrasound,being more robust to errors in probe positioning, has been introduced as a morereliable alternative. In this thesis, we aim to further improve the image process-ing techniques of the 3D ultrasound-based system, addressing three components:segmentation, metrics extraction, and adequacy classification.Segmentation in 3D is prohibitively slow when performed manually and in-troduces human error. Previous work introduced automatic segmentation tech-niques, but our observations indicate lack of accuracy and robustness with thesetechniques. We propose to use deep Convolutional Neural Network (CNN)s forimproving the segmentation accuracy and consequently the reproducibility androbustness of dysplasia measurement. We show that 3D-U-Net achieves higheragreement with human labels compared to the state-of-the-art. For pelvis bonesurface segmentation, we report mean DSC of 85% with 3D-U-Net vs. 26% withCSPS. For femoral head segmentation, we report mean CED Error of 1.42mm with3D-U-Net vs. 3.90mm with the Random Forest Classifier.We implement methods for extracting \u03b13D, FHC3D, and OCR dysplasia metricsusing the improved segmentation. On a clinical set of 42 hips, we report inter-exam, intra-sonographer intraclass correlation coefficients of 87%, 84%, and 74%for these three metrics, respectively, beating the state-of-the-art. Qualitative obser-vations show improved robustness and reduced failure rates.iiiPrevious work had explored automatic adequacy classification of hip 3D ul-trasound, to provide clinicians with rapid point-of-care feedback on the quality ofthe scan. We revisit the originally proposed adequacy criteria, and show that thesecriteria can be improved. Further, we show that 3D CNNs can be used to automatethis task. Our best model shows good agreement with human labels, achieving anAROC of 84%.Ultimately, we aim to incorporate these models into a fully automatic, accurate,reliable, and robust system for hip dysplasia diagnosis.ivLay SummaryThe human hip is roughly a ball-and-socket joint. Some babies are born with hipdysplasia, which means the socket is less rounded than normal and the ball is lessstable, which can lead to trouble walking and other problems. Ultrasound is usedto scan the newborn baby for hip dysplasia, but doctors can still make mistakesusing ultrasound. We would like to avoid mistakes, as mistakes lead to wastefuland potentially risky treatment options. In this thesis, we use 3-dimensional ul-trasound and modern artificial intelligence techniques to assist clinicians in betterdiagnosing for hip dysplasia and making fewer mistakes.vPrefaceThe work in this thesis is part of the Developmental Dysplasia of the Hip (DDH)project, which was started by my three supervisors, Drs. Garbi, Hodgson, andMulpuri, and builds on the work previously conducted by alumni of the lab in-cluding N. Quader and O. Paserin. The research done in this thesis was con-ducted in the Biomedical Signal and Image Processing Lab at UBC, the SurgicalTechnologies Lab at UBC, and the Orthopedic Research Lab at British ColumbiaChildren\u2019s Hospital (BCCH). This work was approved by the UBC Clinical Re-search Ethics Board (CREB), certificate numbers: H14-01448, H18-00131, andH18-02024. Chapter 3 is based on the article listed below. The rest of the chaptersare based on work that is not yet published.Chapter 3 is based on the following paper, of which I was the first author, andwhich was reviewed and approved by the other authors:El-Hariri H., Mulpuri K., Hodgson A., Garbi R. (2019) Comparative Evaluationof Hand-Engineered and Deep-Learned Features for Neonatal Hip Bone Segmen-tation in Ultrasound. In: Shen D. et al. (eds) Medical Image Computing andComputer Assisted Intervention \u2013 MICCAI 2019. MICCAI 2019. Lecture Notesin Computer Science, vol 11765. Springer, ChamClinical data was collected at BCCH by our lab with the help of clinical tech-nicians, nurses, and orthopedic surgeons. I lead data collection in 2019, and priorto 2019 data collection was lead by N. Quader and O. Paserin. I organized andlabelled all the data. The studies conducted in this thesis were mainly designedby myself, with technical guidance from Drs. Hodgson and Garbi, and clinicalguidance from Dr. Mulpuri. All technical work including coding, training and test-viing of neural networks, clinical evaluations, and statistical analyses was done bymyself. Drs. Hodgson and Garbi also helped reviewing this thesis.viiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Developmental Dysplasia of the Hip . . . . . . . . . . . . . . . . 11.1.1 Epidemiology . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Diagnosis: Standard Clinical Practice . . . . . . . . . . . 21.1.3 Treatment and Consequences of Misdiagnosis . . . . . . . 51.1.4 Problem: Low Reliability of 2D Ultrasound for DDH Di-agnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.5 Solution: 3D Ultrasound . . . . . . . . . . . . . . . . . . 91.2 Overall Objective . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11viii1.3.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 111.3.2 Adequacy Classification . . . . . . . . . . . . . . . . . . 141.4 Research Questions Addressed . . . . . . . . . . . . . . . . . . . 151.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.6 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Clinical Protocol and Description of Data . . . . . . . . . . . . . . . 192.1 Inclusion and Exclusion Criteria . . . . . . . . . . . . . . . . . . 202.2 3D-US Data Collection Protocol . . . . . . . . . . . . . . . . . . 212.3 Data Used in Each Chapter . . . . . . . . . . . . . . . . . . . . . 222.4 Coordinate System Description . . . . . . . . . . . . . . . . . . . 243 Comparative Evaluation of Hand-Engineered and Deep-Learned Fea-tures for Neonatal Hip Bone Segmentation in Ultrasound . . . . . . 263.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.1.1 Hand-Crafted Features . . . . . . . . . . . . . . . . . . . 263.1.2 Deep-Learned Features . . . . . . . . . . . . . . . . . . . 273.1.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.2 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . 323.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 Measuring Hip Dysplasia with 3-Dimensional Convolutional NeuralNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.1 Pelvis Bone Surface Segmentation: Going 3D . . . . . . . . . . . 364.1.1 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . 374.1.2 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.1.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.1.4 Results and Discussion . . . . . . . . . . . . . . . . . . . 414.1.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 454.2 Locating the Femoral Head . . . . . . . . . . . . . . . . . . . . . 464.2.1 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . 464.2.2 Direct Regression . . . . . . . . . . . . . . . . . . . . . . 494.2.3 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 504.2.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 51ix4.2.5 Results and Discussion . . . . . . . . . . . . . . . . . . . 524.2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 554.3 Extracting Dysplasia Metrics . . . . . . . . . . . . . . . . . . . . 574.3.1 Choosing DDH Metrics . . . . . . . . . . . . . . . . . . . 574.3.2 Algorithm for Extracting the Metrics . . . . . . . . . . . 584.3.3 Clinical Study . . . . . . . . . . . . . . . . . . . . . . . 604.3.4 Results and Discussion . . . . . . . . . . . . . . . . . . . 624.3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 695 Automatic Adequacy Assessment with 3-Dimensional ConvolutionalNeural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.1 Labeling with New Criteria . . . . . . . . . . . . . . . . . . . . . 715.1.1 Evaluation Scheme . . . . . . . . . . . . . . . . . . . . . 735.1.2 Results and Discussion . . . . . . . . . . . . . . . . . . . 745.2 Automatic Adequacy Classification with 3D-CNNs . . . . . . . . 765.2.1 Classification Model . . . . . . . . . . . . . . . . . . . . 765.2.2 Labeling Data for CNN Training . . . . . . . . . . . . . . 765.2.3 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 775.2.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 785.2.5 Results and Discussion . . . . . . . . . . . . . . . . . . . 795.2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 806 Discussion and Conclusions . . . . . . . . . . . . . . . . . . . . . . . 836.1 Revisiting Research Questions and comparing the State-of-the-Art 836.1.1 Research Question 1 . . . . . . . . . . . . . . . . . . . . 836.1.2 Research Question 2 . . . . . . . . . . . . . . . . . . . . 846.1.3 Research Question 3 . . . . . . . . . . . . . . . . . . . . 856.1.4 Research Question 4 . . . . . . . . . . . . . . . . . . . . 856.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.3.1 Domain Shift and Adaptation . . . . . . . . . . . . . . . 876.3.2 Improved Clinical Study . . . . . . . . . . . . . . . . . . 896.3.3 Detectability of Failure: Deep Learning with Uncertainty . 89x6.4 Clinical Impact and Significance . . . . . . . . . . . . . . . . . . 90Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91A \u00a72 Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . 102B \u00a74.1 Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . 104C \u00a74.2 Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . 108D \u00a74.3 Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . 111E \u00a75 Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . 112xiList of TablesTable 1.1 Ranges of key DDH metrics . . . . . . . . . . . . . . . . . . . 4Table 3.1 Mean (and SD) segmentation accuracy of five methods we testedon the primary and secondary datasets. From left to right: 1)Shadow Peak with RoI spatial prior, 2) Confidence-WeightedStructured Phase Symmetry with naive thresholding, 3) Confidence-Weighted Structured Phase Symmetry with RoI spatial prior, 4)U-Net with B-mode input, 5) U-Net with multi-channel input.Best performers along each row are bolded. . . . . . . . . . . . 32Table 4.1 Results showing inter-exam, intra-sonographer ICC of our pro-posed pipeline for computing \u03b13D, FHC3D and OCR vs. the state-of-the-art methods (n=42 hips). . . . . . . . . . . . . . . . . . 62Table 4.2 Qualitative plausibility analysis of adequate sweeps in which alarge discrepancy between our methods and Quader\u2019s (SOTA)was detected. Numbers reported are percentages of success-ful and plausible measurements out of 51 sweeps visually in-spected. Inspection was performed by an unbiased rater otherthan the author of this thesis. . . . . . . . . . . . . . . . . . . 63Table 5.1 Comparing inter-exam, intra-rater test-retest ICC with differ-ent adequacy criteria. The number of sweeps remaining afterdiscarding inadequate sweeps n is shown in parentheses besideeach column header. The 95% CI is reported in parentheses nexttwo each ICC number. . . . . . . . . . . . . . . . . . . . . . . 75xiiTable 5.2 Adequacy train and test sets class distribution. . . . . . . . . . 76Table 5.3 AROC scores of three contrasted models when applied on thetest set. In the first row we ignore sweeps in the test set labelledas \u201cmaybe\u201d. In the second row we assign all sweeps labelled as\u201cmaybe\u201d to the \u201cinadequate\u201d class. . . . . . . . . . . . . . . . 79Table 5.4 Inter-exam, intra-sonographer ICC with the proposed CNNs. nis the number of remaining sweeps after \u201cinadequate\u201d sweepsare discarded. 95% CIs are shown in parentheses. . . . . . . . 79Table B.1 Mean performance metrics for the four contrasted methods ona test set of 52 volumes from 13 participants. . . . . . . . . . . 105Table B.2 Precision post hoc t-test p-values. . . . . . . . . . . . . . . . . 105Table B.3 Recall post hoc t-test p-values. . . . . . . . . . . . . . . . . . 105Table B.4 Jaccard Coefficient post hoc t-test p-values. . . . . . . . . . . 105Table B.5 Dice-Sorensen Coefficient post hoc t-test p-values. . . . . . . . 106Table B.6 MEDR2P post hoc t-test p-values. . . . . . . . . . . . . . . . . 106Table B.7 MEDP2R post hoc t-test p-values. . . . . . . . . . . . . . . . . 106Table B.8 MEDmax post hoc t-test p-values. . . . . . . . . . . . . . . . . 106Table B.9 HDR2P post hoc t-test p-values. . . . . . . . . . . . . . . . . . 106Table B.10 HDP2R post hoc t-test p-values. . . . . . . . . . . . . . . . . . 107Table B.11 HDmax post hoc t-test p-values. . . . . . . . . . . . . . . . . . 107Table B.12 CAI post hoc t-test p-values. . . . . . . . . . . . . . . . . . . 107Table B.13 CDI post hoc t-test p-values. . . . . . . . . . . . . . . . . . . . 107Table C.1 Results comparing the two proposed methods with the state-of-the art RFC for predicting the location of the femoral head. Notethat the RFC and 3D-ResNet-50 were compared against the fullsphere label as ground truth (as described in \u00a74.2.1), whereas3D-U-Net was compared against the semi-sphere cropped bybounding box B as ground truth. . . . . . . . . . . . . . . . . 109Table C.2 Precision post hoc t-test p-values. . . . . . . . . . . . . . . . 109Table C.3 Recall post hoc t-test p-values. . . . . . . . . . . . . . . . . . 109Table C.4 Jaccard Coefficient post hoc t-test p-values. . . . . . . . . . . 109xiiiTable C.5 DSC post hoc t-test p-values. . . . . . . . . . . . . . . . . . . 110Table C.6 CAEx post hoc t-test p-values. . . . . . . . . . . . . . . . . . . 110Table C.7 CAEy post hoc t-test p-values. . . . . . . . . . . . . . . . . . . 110Table C.8 CAEz post hoc t-test p-values. . . . . . . . . . . . . . . . . . . 110Table C.9 CED post hoc t-test p-values. . . . . . . . . . . . . . . . . . . 110Table C.10 RAE post hoc t-test p-values. . . . . . . . . . . . . . . . . . . 110Table D.1 Comparing the SD for paired inter-exam measures for the dif-ferent DDH metrics (n=42 hips) . . . . . . . . . . . . . . . . . 111xivList of FiguresFigure 1.1 Ultrasound anatomical landmarks in the standard plane de-scribed by Graf [13]: 1) chondro-osseous junction; 2) femoralhead; 3) synovial fold; 4) joint capsule; 5) acetabular labrum;6) hyaline cartilage; 7) bony part of the acetabular roof; 8)bony rim: turning point from concavity to convexity. . . . . . 3Figure 1.2 Illustration of the \u03b1 angle proposed by Graf. . . . . . . . . . 4Figure 1.3 Illustration of Femoral Head Coverage proposed by Morin. . . 5Figure 1.4 Simplified map showing socioeconomic consequences of DDHmisdiagnosis in newborns. (TP: True Positive, TN: True Neg-ative, FP: False Positive, FN: False Negative, OA: Osteoarthri-tis, THR: Total Hip Replacement) . . . . . . . . . . . . . . . 6Figure 1.5 Visualizing performance of Quader\u2019s methods. Problems showninclude over-segmentation, under-segmentation, and incorrectplane-fitting. . . . . . . . . . . . . . . . . . . . . . . . . . . 13Figure 1.6 High-level conceptual design of our system. . . . . . . . . . . 15Figure 2.1 Ultrasonix 4DL14-5 3D ultrasound probe. . . . . . . . . . . . 20Figure 2.2 Data collected per patient in Phase III, showing number of ex-ams per patient and number of sweeps per exam. . . . . . . . 22Figure 2.3 Summary of 3D ultrasound data we collected from BCCH, show-ing details of training and testing data used for each chapter inthis thesis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23Figure 2.4 Coordinate system conventions used in this thesis. . . . . . . 25xvFigure 3.1 Example labeling procedure. Left: B-mode image with Struc-tured Phase Symmetry overlaid in red, and user-defined pointsshown as asterisks. Right: contour fitted to user-defined pointsshown as a solid red line. . . . . . . . . . . . . . . . . . . . . 28Figure 3.2 Example segmentation results. a,b) different segmentation tech-niques including SP, CSPS, and U-Net applied to Ultrasonixtest data. c) the same techniques applied to Clarius test data. . 33Figure 4.1 To train 3D-U-Net, we use U-Net predictions (left) from theprevious chapter as a starting point and manually fix areas thatare incorrectly segmented to get the \u201chuman\u201d label (right). . . 38Figure 4.2 Visualizing pelvis bone surface segmentation with the con-trasted methods. a) human label; b) CSPS; c) U-Net; d) 3D-U-Net. Red arrows point to areas that are over-segmented (falsepositives). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42Figure 4.3 Pixel-wise classification evaluation for pelvis bone surface seg-mentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43Figure 4.4 Contour distance evaluation metrics for pelvis bone surfacesegmentation. . . . . . . . . . . . . . . . . . . . . . . . . . . 44Figure 4.5 Combined evaluation metrics for pelvis bone surface segmen-tation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45Figure 4.6 Femoral head keypoint placement in 3D Slicer. . . . . . . . . 47Figure 4.7 Fitting a sphere to the key edge points. We show the full spherein green and the cropped sphere in yellow. . . . . . . . . . . . 48Figure 4.8 Conceptual illustration of 3D convolutional neural networksused for direct regression of sphere parameters. . . . . . . . . 49Figure 4.9 Conceptual illustration of 3D-U-Net used for segmenting thefemoral head. . . . . . . . . . . . . . . . . . . . . . . . . . . 51Figure 4.10 Visualizing Quader\u2019s [61, 64] RFC prediction of the femoralhead. Left: hip ultrasound with human-labelled keypoints ofthe femoral head in red. Right: binary segmentation mask out-put of the RFC in green. . . . . . . . . . . . . . . . . . . . . . 53xviFigure 4.11 Visualizing output of 3D-ResNet-50 direct regression modelfor femoral head localization. Left: hip ultrasound with human-labelled keypoints of the femoral head in red, and best fittingsphere in green. Right: 3D-ResNet-50 predicted sphere in blue. 53Figure 4.12 Visualizing output of 3D-U-Net segmentation model for femoralhead localization. Left: hip ultrasound with human-labelledkeypoints of the femoral head in red, and best fitting, croppedsphere in yellow. Right: 3D-U-Net segmentation binary maskprediction in pink. . . . . . . . . . . . . . . . . . . . . . . . 54Figure 4.13 Pixel-wise classification-based evaluation for femoral head lo-calization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55Figure 4.14 Distance-based evaluation metrics for femoral head localization. 56Figure 4.15 Conceptual illustration of proposed pipeline for extracting \u03b13D,FHC3D, and OCR from the segmented pelvis bone surface andfemoral head. Note that measurement is done in 3D, but con-cept is simplified to 2D for illustration purposes only. . . . . . 60Figure 4.16 Bland-altman plots showing large discrepancies between ourmetrics and Quader\u2019s metrics (n=42 hips). . . . . . . . . . . . 64Figure 4.17 Example showing failure with Quader\u2019s SOTA CSPS-based methodfor pelvis bone surface segmentation and \u03b13D measurement[63]. a) Incorrectly segmented pelvis bone surface with Quader\u2019smethod shown in green. b) The corresponding fitted planesresulting in an implausible \u03b13D measurement. c) The samesweep correctly segmented with 3D-U-Net shown in red. d)The corresponding fitted planes and plausible \u03b13D measurement. 65Figure 4.18 Example showing failure with Quader\u2019s SOTA RFC-based method[64]. a) Incorrectly segmented femoral head with Quader\u2019smethod shown in green. b) The corresponding fitted planesresulting in an implausible FHC3D measurement. c) The samesweep correctly segmented with 3D-U-Net shown in red. d)The corresponding fitted planes and plausible FHC3D measure-ment. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66xviiFigure 4.19 Example showing questionable case with Quader\u2019s SOTA CSPS-based method [63]. a) Quader\u2019s CSPS segmentation methodonly captures a very thin silver of the overall pelvis bone sur-face. b) The corresponding fitted planes resulting in a ques-tionable \u03b13D measurement. c) The same sweep correctly seg-mented with 3D-U-Net shown in red. d) The correspondingfitted planes and plausible \u03b13D measurement. . . . . . . . . . 67Figure 5.1 Example showing a sweep that was deemed \u201cinadequate\u201d be-cause the ilium appears to be beyond the FOV of the scan dueto the probe being positioned too inferiorly from the optimalposition (right), and an \u201cadequate\u201d volume with ilium fullywithin the FOV for comparison (left). . . . . . . . . . . . . . 73Figure 5.2 Example of a sweep deemed \u201cinadequate\u201d because of move-ment artifact that can be seen as a \u201csmudge\u201d in the sagittal view(lower row), and for comparison we show the sagittal view ofan \u201cadequate\u201d volume (top row). . . . . . . . . . . . . . . . . 74Figure 5.3 Example of a sweep deemed borderline adequate (\u201cmaybe\u201d)on the right, due to the labeler\u2019s perception that the probe wasnot positioned optimally. Note the shape and reduced area ofthe ilium (green) and acetabulum (yellow) surfaces used for\u03b13D in the sweep on the right, compared with the high-qualitysweep on the left. This is potentially due to the probe beingslightly tilted (roll around x-axis) or translated (along z-axis)away from the optimal position. . . . . . . . . . . . . . . . . 75Figure A.1 Data collection form used in the clinical study . . . . . . . . . 103xviiiFigure E.1 An example case for which the ground truth label is \u201cinad-equate\u201d, our models predicted as \u201cinadequate\u201d, but that theRNN predicted as \u201cadequate\u201d. Left: the coronal view near thestandard plane, with the 3D-U-Net pelvis bone surface predic-tion overlaid in pink. Right: the ilium and acetabulum pointclouds after processing with the metrics extraction algorithmdescribed in \u00a74.3. We get a clear picture from these views thatthe probe is positioned too inferiorly, and that much of the il-ium surface is not imaged. As a result, the bony rim appears tobe misidentified, and the ilium plane appears to be incorrectlyfitted, ultimately resulting in invalid \u03b13D and FHC3D measure-ments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113Figure E.2 Another example case for which the ground truth is \u201cinade-quate\u201d, our models predicted as \u201cinadequate\u201d, but that the RNNpredicted as \u201cadequate\u201d. In this case we show the segmentedpoints clouds in the top row, and the 3 anatomical planes inthe bottom row. We can see in the sagittal view that there isa \u201csmudge\u201d due to movement artifact, circled in red. The ef-fect of this on the acetabulum point cloud can be seen as a gapin the acetabulum that is usually not present in high-quality,adequate volumes. . . . . . . . . . . . . . . . . . . . . . . . 114Figure E.3 Another example case for which the ground truth is \u201cinade-quate\u201d, our models predicted as \u201cinadequate\u201d, but that the RNNpredicted as \u201cadequate\u201d. Left: we show our best attempt at lo-cating the standard plane by browsing all the coronal slices.Right: the per-frame prediction of the RNN, from which the fi-nall RNN prediction is made by thresholding and summing. Weclearly see that the RNN incorrectly predicts very high scoresfor the first 40 slices, although none of these meet the criteriadefined by Paserin [56, 58]. . . . . . . . . . . . . . . . . . . . 115xixGlossary2D 2-dimensional3D 3-dimensionalACA Acetabular Contact AngleAI Artificial IntelligenceANOVA Analysis of VarianceAROC Area under the Receiver Operating Characteristic curveBCCH British Columbia Children\u2019s HospitalBCE Binary Cross EntropyBNN Bayesian Neural NetworkCAE Centre Absolute ErrorCAI Coverage Agreement IndexCAOS Computer-Assisted Orthopaedic SurgeryCDI Coverage Distance IndexCED Centre Euclidean DistanceCI Confidence IntervalCIHR Canadian Institutes of Health ResearchxxCNN Convolutional Neural NetworkCREB Clinical Research Ethics BoardCSPS Confidence-Weighted Structured Phase SymmetryDDH Developmental Dysplasia of the HipDSC Dice-Sorensen CoefficientFCN Fully Convolutional NetworkFHC Femoral Head CoverageFN False NegativeFOV Field-of-ViewFP False PositiveGPU Graphics Processing UnitHD Hausdorff DistanceHOG Histogram Of GradientsICC Intraclass Correlation CoefficientICICS Institute for Computing, Information and Cognitive SystemsIOU Intersection-Over-UnionLBP Local Binary PatternsMC Multi-ChannelMED Mean Euclidean DistanceMRI Magnetic Resonance ImagingMSE Mean Squared ErrorNSERC Natural Sciences and Engineering Research Council of CanadaxxiOCR Osculating Circle RadiusRAE Radius Absolute ErrorREB Research Ethics BoardRFC Random Forest ClassifierRMS Root Mean SquareRNN Recurrent Neural NetworkROI Region-of-InterestRQ Research QuestionSD Standard DeviationSOTA State-Of-The-ArtSP Shadow PeakSPS Structured Phase SymmetryTP True PositiveUBC University of British ColumbiaUS UltrasoundVRMSE Vertical Root Mean Square ErrorxxiiAcknowledgmentsThis thesis was funded by Natural Sciences and Engineering Research Council ofCanada (NSERC), Canadian Institutes of Health Research (CIHR), and Institute forComputing, Information and Cognitive Systems (ICICS). The Titan V GPU wasprovided by NVIDIA. Compute Canada provided additional computing power andstorage services.I would like to thank my supervisors, Drs. R. Garbi, A.J. Hodgson, and K.Mulpuri for their support and guidance during the entire course of my studies atUBC. I would also like to thank my labmates for their support. I would like to thankthe Orthopedic research team for their clinical support and for allowing us to sharetheir space. I would like to thank the radiology technicians P. Thiessen, C. Beatonfor their patience and help with data collection. I would also like to thank Dr. D.Rosenbaum (Radiology) for his support in helping me understand and interpret theultrasound images. I would also like to thank Dr. A. Cooper for allowing us to scanhis patients. I would also like to thank the nurses and the rest of the clinical teamfor their support on data collection days. Last, but not least, I would like to thankmy family for financial and emotional support, and my friends for making this afun experience.xxiiiChapter 1IntroductionDevelopmental Dysplasia of the Hip (DDH) is one of the most common congenitaldisorders affecting newborns [7, 16, 31, 42, 70, 71]. Accurate and early clinicaldiagnosis is key for effective treatment of DDH [78]. Current clinical practicefor diagnosis usually involves Ultrasound (US) imaging of the newborn hip andmanual delineation of anatomical landmarks. Despite its widely-accepted clini-cal use, there has been significant towards the low reliability of this 2D-US-basedtechnique [32, 48, 66], which has recently motivated research towards using 3-dimensional (3D) US and automatic image processing as a more reproducible alter-native to the current clinical practices. This thesis focuses on improving the imageprocessing techniques to improve the reproducibility and robustness of 3D-US formeasuring DDH.1.1 Developmental Dysplasia of the Hip1.1.1 EpidemiologyDDH is one of the most common congenital defects seen in newborns, with preva-lence up to 2.85% [31], and incidence up to 7.6% in some populations [42]. Leftuntreated, DDH can lead to serious consequences including limping, leg lengthdiscrepancy, pain, and disability. Of particular note, DDH in infancy is a major riskfactor in the development of early-onset osteoarthritis [10, 24, 25, 50, 69].11.1.2 Diagnosis: Standard Clinical PracticeDue to the cartilaginous composition of the newborn hip (younger than 6 months),which is not easily resolved with X-ray, US is currently the clinical standard imag-ing modality for diagnosing DDH in newborns. In British Columbia, US exami-nation is usually performed only if the infant is suspected of having DDH due torisk factors including: having been born breech; having been born with C-section;being female; and having a family history of DDH. US is usually performed in ad-dition to a clinical examination that involves examining for leg length discrepancy;looking for hip folds; and applying the Barlow and Ortolani Tests to feel for dis-location clicks. X-ray is used for infants over 6 months of age, after the bone hasbegun to ossify.Graf Ultrasound TechniqueUS-based testing was first popularized and standardized by Graf [12], who de-scribed the technique in detail. According to Graf, the newborn is placed in thelateral position, the hip is flexed, and the probe is positioned coronally at the hipjoint. In this position, the sonographer would then look for the standard planeby navigating towards well-defined anatomical landmarks as shown in figure 1.1.When the sonographer identifies the standard plane, a 2D image is acquired. Thesonographer then manually delineates salient anatomical landmarks and extractstwo DDH measures: the \u03b1 and \u03b2 angles. The \u03b1 angle is the angle between theilium and acetabulum lines-of-best-fit, and measures the shallowness of the hipsocket as shown in figure 1.2. A lower \u03b1 angle indicates increased DDH severity.Similarly, the \u03b2 angle is the angle between the ilium and labrum lines-of-best-fit. An increased \u03b2 angle indicates increased DDH severity. Other measures havesince been proposed including Femoral Head Coverage (FHC) originally proposedby Morin [47], which measures the percentage of the femoral head covered by thebony acetabulum as shown in figure 1.3. A decreased FHC indicates increased DDHseverity. The normal to dysplastic ranges for the aforementioned DDH metrics aresummarized in table 1.1.2Figure 1.1: Ultrasound anatomical landmarks in the standard plane describedby Graf [13]: 1) chondro-osseous junction; 2) femoral head; 3) synovialfold; 4) joint capsule; 5) acetabular labrum; 6) hyaline cartilage; 7) bonypart of the acetabular roof; 8) bony rim: turning point from concavity toconvexity.3Figure 1.2: Illustration of the \u03b1 angle proposed by Graf.Table 1.1: Ranges of key DDH metricsCriterion, normal hip Criterion, dysplastic hip Range\u03b1 >60\u25e6 <43\u25e6 17\u25e6FHC >55 % <40 % 15 %4Figure 1.3: Illustration of Femoral Head Coverage proposed by Morin.1.1.3 Treatment and Consequences of MisdiagnosisIf diagnosed early with DDH, most patients can be treated with the Pavlik Har-ness. For more severe cases, surgical intervention may be required. Dysplasticcases that were missed may require costly surgical interventions later in life. Over-treatment and under-treatment may result due to misdiagnosis, both of which haveserious clinical and economic impacts on patients, their families, and society. Wesummarize potential consequences of misdiagnosis in figure 1.4, and present thefollowing over-simplified discussion of costs only as a rough guide for the reader.Importantly, we do not report proportions of False Negative (FN)s and False Posi-5All babies bornDDHNo DDHTPFPPavlik Harness TreatmentTN FNLate DetectionNot DetectedSurgery OATHR SurgeryOpportunity CostsReduced Quality of LifeDirectSecondary :-Avascular Necrosis (AVN)-Nerve palsy-Skin rash-Psychological effects60-80% resolve spontaneously60-80% resolve spontaneouslyFigure 1.4: Simplified map showing socioeconomic consequences of DDHmisdiagnosis in newborns. (TP: True Positive, TN: True Negative, FP:False Positive, FN: False Negative, OA: Osteoarthritis, THR: Total HipReplacement)tive (FP)s, as most sources in the literature attempt to quantify these against unre-liable ground truth measurements such as 2D-US, which we consider an unreliablediagnostic measure as later explained. The analysis is further complicated by thefact that 60%-80% of cases born with DDH will spontaneously resolve within 2-8weeks after birth [2]. Therefore, we leave a more in-depth analysis for future work.False Negatives and Consequences of Under-TreatmentNewborns with DDH who are missed by standard diagnosis soon after birth may bedetected in follow-up visits. As the Pavlik Harness loses efficacy 4 months afterbirth [51], more costly surgical treatments may be required. Consequently, the ef-fect of late detection has been estimated to increase the cost of treatment by 7 times6due to late detection [78]. If completely undetected and untreated, DDH can lead tomore serious consequences later in life, including early development of osteoarthri-tis, reduced quality of life, and opportunity costs. While it is difficult to quantifyopportunity costs, there is some evidence towards the cost of osteoarthritis as aresult of DDH. In a 2013 meta-analysis, Hoaglund reported an estimate that 10%of all osteoarthritis patients also had DDH [27]. However, it is worth noting thatthis is a conservative estimate and, in contrast, Nakamura [50] estimated that 88%of 2000 consecutive osteoarthritis patients in Japan had DDH. Taking Hoaglund\u2019sconservative estimate of 10%, Price [60] estimated that DDH might be responsiblefor about 25,000 hip replacements per year in the United States. At approximately$50,000 per procedure [68], the direct financial impact of these hip replacementsis on the order of $1.25 billion per year in the United States alone.False Positives and Consequences of Over-TreatmentCases that are actually DDH negative but falsely identified as having DDH at birthalso incur costs, and suffer serious consequences due to secondary complications.Direct costs include the cost of treatment with Pavlik Harness, reported to be onthe order of \u00a3600 in the United Kingdom [78]. Given the relatively low cost andrisk associated with use of the Pavlik Harness, erring towards over-treatment mayseem a tempting option, but there are secondary factors against over-treatment tobe considered. Avascular Necrosis, the death of bone tissues due to a lack of bloodsupply, is the worst and most frequent complication associated with the PavlikHarness, reported in 1.35% to 10.9% of all infants undergoing treatment [40]. Evenat such low incidence rates, this is a serious consequence that potentially requiresvery costly surgical treatment, and will cause much unnecessary suffering to thepatient. Other secondary complications of Pavlik Harness include nerve palsy [49],skin rashes, and unnecessary psychological hardship on the parents [45].1.1.4 Problem: Low Reliability of 2D Ultrasound for DDH DiagnosisAlthough 2D-US is currently considered the clinical gold standard, there has beenmuch evidence towards its lack of reliability. In a 2018 meta-analysis [61, 66],Quader reported extremely low inter-exam, inter-observer Intraclass Correlation7Coefficient (ICC)s for \u03b1 angle of 23%, for \u03b2 angle of 19%, and for FHC of near0%.Sources of VariabilityA major source of error in 2D-US-based diagnosis is ambiguous probe positioning.In 2014, Jaremko showed by simulating US probe movement in three degrees-of-freedom that the \u03b1 angle can vary by as much as 19\u25e6 over the range of acceptableimages (acceptable meaning it meets the definition of the standard plane) [32].This is a very large range considering the 17\u25e6 \u03b1 angle range from extremely dys-plastic at 43\u25e6 to normal at 60\u25e6 (see Figure 1.1). Consequently, a normal hip maybe incorrectly diagnosed as dysplastic and vice versa. So, it appears that the opera-tor\u2019s ability to identify the correct plane consistently is a major source of error andperhaps partially explains the extremely low ICCs.Further exacerbating the problem is lack of standardization in training. Currentclinical practice for diagnosing DDH with US involves identification of the standardplane as originally described by Graf [12, 13]. This is a difficult task and even ex-perts with years of training can still make mistakes. For example, a 2013 paper byGraf on quality management of US hip sonography in Germany [14], reported thatin 1.6%-43.7% of cases across 8 states the sonographers\u2019 licences were withdrawnby a quality control commission because of poor quality diagnosis. Further, Grafreported that in a refresher course, 250 orthopedic surgeons, pediatricians, and ra-diologists were required to classify 4 neonatal hip sonograms. Only 28% of theclinicians passed this test, and most mistakes were due tilting effects resulting inincorrect anatomical identification.Another source of variability may be attributed to human variability in drawingthe lines and circles used to make the measurements. However, this seems to bea relatively smaller source of error as evidenced by the high intra-image, inter-observer ICCs reported by Quader [61, 66] of 65%-90% for \u03b1 and 70%-93% forFHC.Therefore, it appears that the main problem leading to variability is high vari-ability in probe positioning.81.1.5 Solution: 3D UltrasoundTo mitigate human variability in probe positioning, and improve diagnostic accu-racy and reliability of DDH, 3D-US is the most promising solution that has beenproposed in the last few years.Quader proposed in 2016 the \u03b13D angle, a novel 3D-US-based metric [63]analogous to the \u03b1 angle originally proposed by Graf [12]. They hypothesizedthat 3D-US is more robust to ambiguous probe positioning, and would improvetest-retest reproducibility over 2D-US [61]. Quader\u2019s implementation was basedon Confidence-Weighted Structured Phase Symmetry (CSPS) for segmenting thepelvis bone surface and was fully automatic, not requiring operator input to seg-ment the bone surface. They reported a 75% reduction in test-retest Standard De-viation (SD) with \u03b13D compared to \u03b1 [61].Concurrently in 2016, Mabee and Hareendranathan proposed another 3D-US-based metric, the Acetabular Contact Angle (ACA) [21, 44]. Their implementationof ACA was semi-automatic and required some human input. The reported inter-exam, intra-rater variability for ACA was 41%, reported in Quader\u2019s meta-analysis[61, 66] as test-retest SD normalized over the range of angles from normal to dys-plastic.Quader later proposed in 2017 FHC3D [64], a novel 3D-US-based metric anal-ogous to FHC originially proposed by Morin [47]. FHC3D was defined as the theratio of the femoral head volume medial to the Ilium plane-of-best-fit vs. the totalvolume of the femoral head. Quader\u2019s implementation was fully-automatic, andresulted in a 65% reduction in test-retest SD compared to the analogous FHC 2D-US-based measure [61].Most recently, Zonoobi and Hareendranthan proposed \u03b13D-posterior, \u03b13D-anterior,and Osculating Circle Radius (OCR) 3D-US-metrics [82]. Zonoobi reported inter-exam ICCs of 68%, 62%, 50% for \u03b13D-posterior, \u03b13D-anterior, and OCR, respec-tively. Using the same techniques, Mostofi conducted a study in 2019 [48] to com-pare the benefit of 3D-US in the hands of novice (1.5 hrs. training) vs. expert (5years training) users. For novice users, Mostofi reported inter-exam \u03b1 angle ICC of10% vs. \u03b13D angle ICC of 73%-83%, showing that with 3D-US novices can measureDDH almost as consistently as experts whereas this is not the case with 2D-US.9These works have shown substantial evidence in support of the hypothesis that3D-US-based DDH diagnostic metrics are more reliable as compared to 2D-US-based metrics. To the best of our knowledge, we are not aware of any other worksin the literature on automatic 3D-US for DDH. Further, we acknowledge that therehave been many works on 2D-US for DDH, which we do not address in this thesisdue to the strong aforementioned evidence against this imaging modality for hipdysplasia measurement.1.2 Overall ObjectiveThe ultimate goal of this project is to develop a system for safe, reliable, accurate,and robust measurement of DDH, and that is amenable for clinical translation. US,being a safe, non-ionizing modality; capable of imaging the cartilaginous hip jointanatomy; and being a relatively portable and more cost-effective technology, is theimaging modality of choice for our system. Specifically, we choose 3D-US due tobeing more reliable than 2D-US. Therefore, we aim to develop a system that canreliably and robustly extract key DDH metrics such as the \u03b13D angle from 3D-USimages of the neonatal hip. A simplified workflow of our envisioned system isdepicted in figure 1.6. In the next section, we present preliminary technical workthat has been done by our lab and others towards such a system, and we proposeareas of improvement that will drive this thesis. Specifically, we address key stepsin the pipeline including:\u2022 Automatic, accurate, and fast segmentation of salient anatomy including thepelvis bone surface and femoral head\u2022 Automatic, reliable, and robust key DDH metrics extraction from the seg-mented volume\u2022 Accurate and fast adequacy classification of images to provide point-of-carefeedback to the sonographer about the quality of the acquired image101.3 Related Work1.3.1 SegmentationEvidently, much progress has been made with 3D-US for DDH diagnosis and mea-surement, but there are still some aspects that can be improved which would helpfacilitate clinical adoption of the proposed system. For example, Hareendranathan\u2019sgraph-based segmentation method is semi-automatic [21, 23], which unnecessarilyrequires additional clinical time for seed-point entry (30-75 seconds [82]); intro-duces another source of variability; and could be a barrier to clinical adoptionespecially in settings where trained US technicians are scarce. Quader\u2019s methodson the other hand, are fully automatic. Hand-crafted features are used includingbone shadowing and phase symmetry to segment the pelvis bone surface [61, 62].To segment the femoral head, they use multiple Random Forest Classifier (RFC)s,which take as inputs many features including Histogram Of Gradients (HOG) andLocal Binary Patterns (LBP) [61, 64]. However, our recent observations usingQuader\u2019s algorithm show dubious behaviour, including most commonly (see figure1.5):\u2022 Over-segmentation of soft-tissue\u2022 Under-segmentation of the pelvis bone surface, in some cases missing mostof the pelvis bone surface and capturing only a thin sliver\u2022 Under-segmentation of the femoral head\u2022 Incorrect plane-fitting due to noisy segmentationUltimately, leading to concerns about the validity of the \u03b13D and FHC3D measures,despite being consistent between scans. For these reasons, part of this thesis fo-cuses on fully automatic methods for segmentation and localization of the salientanatomy in neonatal hip 3D-US, in an effort to improve DDH measurement. Thenext sections present a discussion of recent related work in US bone segmentation.11Bone Segmentation in UltrasoundAutomatic segmentation of bone surfaces in US is a well-studied area, with themajority of work having been focused on Computer-Assisted Orthopaedic Surgery(CAOS) applications in adult patients [17, 53]. Traditionally, a variety of techniquesbased on hand-crafted features have been explored including intensity-based analy-sis, morphological operations, connected-component analysis, phase analysis, andothers [17, 53]. In recent years, we have seen major successes of deep-learning-based techniques for segmentation in medical imaging and other areas, and nat-urally we have observed a similar trend for bone segmentation in US in the lastyear.Several works have proposed solutions for bone surface segmentation in US,tested on adult bone US. For example, Villa proposed using the popular FCN-8sarchitecture [43], concatenating the B-mode image with phase symmetry (PS) andbone shadowing features in the input channels, and compared the performance ofthis multichannel approach to the CSPS approach proposed by Quader [62]. Villa[74] reported a DSC of 57%\u00b128% for the multichannel Fully Convolutional Net-work (FCN) compared to 41%\u00b125% for CSPS, showing a significant improvementin accuracy. Wang [76] proposed using another popular architecture, U-Net [67],and similarly fusing with the input B-mode image features including Bone-ShadowEnhanced Image, Local Phase Tensor Image, and Local Phase Bone Image. Wangreported a DSC of 97% with this approach, and 93% with vanilla B-mode U-Net.In a more recent work by the same group, Alsinan [1] proposed a new architecturebased on AdapNet [73], with late-stage Local Phase feature fusion, and reportedDSC of 98%, and 91% with vanilla B-mode U-Net.Bone Segmentation in Neonatal Hip UltrasoundSeveral works have explored automatic segmentation specifically in neonatal hipUS for DDH diagnosis, which presents unique challenges due to the partially carti-laginous composition of neonatal bone.Hand-engineered phase features were proposed, of which a prominent ex-ample is the aforementioned CSPS technique proposed by Quader [61, 62]. CSPScombined Structured Phase Symmetry, an orientation-independent variant of Phase12Figure 1.5: Visualizing performance of Quader\u2019s methods. Problems showninclude over-segmentation, under-segmentation, and incorrect plane-fitting.Symmetry [39] designed to segment non-planar bone structures, with bone shad-owing features reducing soft tissue false positives. More recently, Pandey [54]proposed Shadow Peak (SP), a simplified method that uses only bone shadowingfeatures to segment bone, and has shown certain improvements in accuracy andspeed over CSPS in a limited study. Though promising, these methods still rely onhighly engineered hand-crafted features hence challenges remain with regards torobustness and generalizability to new data, as we later show.Data-driven methods have also been proposed for this task. Hareendranathanproposed using superpixel classification with a CNN and reported Hausdorff Dis-tance (HD) error of 2.1\u00b10.9mm between contours [22]. Zhang [81] proposed aneural network based on Mask R-CNN, and compared it to other the popular archi-tectures including FCN-32s and U-Net but they reported very poor DSCs of 39% fortheir network, 5% with U-Net, and 22% with FCN-32s (we note that their resultscontradict our own tests and findings, as will be presented later). Golan [11] ap-plied U-Net with an extra adversarial component for automatically segmenting theilium and acetabulum bone surfaces for automatically extracting \u03b1 angle from the2D coronal standard plane. Golan did not directly report segmentation agreementwith human labels, but reported a correlation coefficient of 0.76 with the clinical \u03b1angle.131.3.2 Adequacy ClassificationAddressing the adequacy assessment step (figure 1.6), Quader [61, 65] presentedthe first work on adequacy assessment of 2D-US for DDH. Quader\u2019s method reliedon extracting certain features including HOG and LBP, and using a RFC to classifywhether coronal slices are adequate for measurement. They reported an excellentAROC for this technique of 98.5%. Paserin conducted the first work on adequacyclassification of neonatal hip 3D-US volumes [56\u201358]. The goal of this work wasto implement a classifier that could provide rapid point-of-care feedback to theoperator whether the acquired volume is adequate for measurement or must be re-acquired. The advantages here are improved workflow efficiency and speed, as theexisting methods for automatically extracting DDH metrics from 3D-US were rela-tively slow, requiring on the order of 1 minute computation time, thus processingan inadequate image would be a waste of valuable clinical time. Further, such aclassifier could reduce costs by helping inexperienced users in remote locations toscan patients locally. For example, currently patients in Canadian Territories areflown to British Columbia for DDH examination due to lack of expertise in theseremote locations.Paserin first proposed using a Convolutional Neural Network (CNN)-based ad-equacy classifier [57], that processes volumes frame-by-frame, and uses an ag-gregate score to classify the full volume. Specifically, the CNN was trained tolook for coronal slices containing all anatomical landmarks including the ilium,acetabulum, femoral head, ischium, and labrum. Paserin reported a per-slice cross-validation classification accuracy of 90% and runtime of 2s per volume [56], muchfaster than Quader\u2019s method [65] which requires around 3 minutes per volume.Paserin later also proposed the addition of a Recurrent Neural Network (RNN) [58],which similarly processes coronal slices frame-by-frame, but can make use of in-formation in adjacent slices. Paserin hypothesized that incorporating this informa-tion would improve the classification accuracy. With this model, Paserin reportedan improved per-slice accuracy of 93% [56]. To the best of our knowledge, therewere no other works on automatic adequacy classification of DDH 3D-US volumesoutside of our group.Despite showing excellent accuracy in terms of agreement with expert human14Figure 1.6: High-level conceptual design of our system.labels on a limited dataset, as well as improvements in computation time, Paserin\u2019swork [56, 58] did not evaluate the choice of adequacy criteria. In this thesis, werevisit these criteria and present modified criteria and new methods for automaticadequacy classification.1.4 Research Questions AddressedSome obstacles remain for clinical translation of the 3D-US-based system for DDHmeasurement. Hareendranathan\u2019s methods [23, 82] are semi-automatic, requir-ing user input of key points to segment the pelvis bone surface and femoral head.Quader\u2019s methods [61], on the other hand, are fully automatic and use CSPS forsegmenting the pelvis bone surface, and an RFC to segment the femoral head.However, our recent observations show that these methods which rely on highly-engineered, hand-crafted features often fail to correctly segment the salient anatomyfor DDH measurement. The last few years have witnessed the rise of deep Convo-lutional Neural Network (CNN)s, which consistently showed overwhelming evi-dence for their ability to outperform classical machine learning methods that relyon hand-crafted features for image processing. Our first research questions are asfollows:\u2022 Research Question 1: Can CNNs be trained to segment the pelvis bone sur-faces, including the ilium and acetabulum, in neonatal hip 3D-US? Would thepredictions produced by such CNNs more closely resemble human labels, ascompared to existing SOTA methods such as CSPS?15\u2022 Research Question 2: Can CNNs be trained to locate the femoral head inneonatal hip 3D-US? Would the predictions produced by such CNNs moreclosely resemble human labels, as compared to existing SOTA methods suchas Quader\u2019s RFC?Several automatic techniques have been proposed for extracting DDH met-rics from segmented neonatal hip 3D-US volumes. However, these methods werehighly tailored to the segmentation techniques previously proposed. For example,Quader developed highly engineered techniques to get \u03b13D [61, 63] from the rel-atively noisy CSPS segmentation of the bone surface. Perhaps these techniquescould be further improved, given improved segmentations, and can produce morereproducible, robust, and plausible DDH diagnostics.\u2022 Research Question 3: Can we develop automatic methods for extracting \u03b13Dand FHC3D metrics with our improved segmentations that are at least as re-producible as the previously proposed methods [61, 63, 64]? Can we showthat our proposed methods are at least as robust and plausible as these previ-ously proposed methods?Paserin first attempted to standardize and automate neonatal hip 3D-US ade-quacy classification based on anatomical landmarks [56\u201358]. Paserin showed thatCNNs are capable of rapidly and accurately predicting adequacy based on pre-defined criteria. However, Paserin did not explore the completeness of the ade-quacy criteria and how they relate to DDH measurement.\u2022 Research Question 4: Are the current adequacy criteria proposed by Paserin[56] sufficient? Can we improve the criteria? Can we train new models forautomating classification based on the newly defined criteria?1.5 ContributionsOur contributions are as follows:\u2022 As part of our clinical study and for training neural networks, we collected3D-US data from 59 newborn participants at BCCH, further expanding thisproject\u2019s database.16\u2022 Training, comparing, and evaluating U-Net [67] and 3D-U-Net [6] for seg-menting pelvis bone surface in 3D-US. Towards this end, I labelled \u223c6002D slices and \u223c100 3D volumes. We show much improved segmentationaccuracy with these methods compared to the SOTA.\u2022 Training, comparing, and evaluating CNNs including 3D-ResNet-50 [19, 20]and 3D-U-Net [6] for segmenting and locating the femoral head in neona-tal hip 3D-US. In the process, I labelled \u223c100 volumes for training andevaluating the neural networks. We show much improved segmentation andlocalization accuracy with these compared to the SOTA.\u2022 Proposing new algorithms for extracting \u03b13D [63], FHC3D [64], and OCR[82]. We show improved reliability, robustness, and plausibility with theproposed algorithms compared to the SOTA.\u2022 Revisiting the adequacy criteria previously proposed by Paserin [56\u201358], andevaluating the the choice of these criteria. We propose new criteria, and showthat 3D-CNNs can be trained to automate adequacy classification based on thenewly proposed criteria. We show that the new criteria are more selective andthis selectivity improves test-retest reproducibility of DDH measurement.1.6 Thesis OutlineIn addition to this introductory chapter, this thesis includes six chapters, outlinedas follows:\u2022 Chapter 2: Presents an overview of the clinical study, details of the datacollected, terminology, and conventions used in this thesis.\u2022 Chapter 3: As per RQ-1, we train U-Net [67] for pelvis bone segmentationin neonatal hip 3D-US, and evaluate its performance against SOTA methodsfor pelvis bone segmentation including CSPS [54, 61].\u2022 Chapter 4: We address RQs 1, 2, and 3.\u2013 Section 4.1: We leverage U-Net from Ch.3 to train 3D-U-Net [6] tofurther improve the segmentation of the pelvis bone surface.17\u2013 Section 4.2: We apply 3D-CNNs to segment the femoral head, and com-pare our models\u2019 performance to Quader\u2019s SOTA RFC [64].\u2013 Section 4.3: We use our improved segmentation of the pelvis bonesurface and femoral head to implement a new algorithm for extracting\u03b13D, FHC3D, and OCR. We evaluate our metrics compared to Quader\u2019s[61] and Hareendranathan\u2019s [82], and in the process present new ade-quacy classification criteria.\u2022 Chapter 5: This chapter focuses on revisiting adequacy classification criteriapreviously proposed by Paserin [56], and presenting new adequacy criteria.We assess the effect of using the old and new criteria on DDH measurement.Finally, we evaluate 3D-CNNs for automating the task of adequacy classifi-cation.\u2022 Chapter 6: Final discussion of conclusions, limitations, and future work.18Chapter 2Clinical Protocol and Descriptionof DataUS data from real participants is a key component for the work presented in thisthesis, so we dedicate this chapter to describing how the data was collected andused. As part of an ongoing effort since 2014, we collected data from a total of 118participants at British Columbia Children\u2019s Hospital (BCCH) (Vancouver, BritishColumbia, Canada), with the collaboration of engineering students, professors, or-thopedic surgeons, radiologists, US technicians, nurses, and research staff. Datawas collected with the approval of the UBC Children\u2019s and Women\u2019s ResearchEthics Board (REB), under the following application IDs:\u2022 Phase I (2014-2018): H14-01448\u2022 Phase II (2018-2019): H18-00131\u2022 Phase III (2019-Present): H18-02024Roughly the same protocol was used over the three phases for collecting 3D-USfrom newborn participants, with some small differences due to changing researchquestions and clinical workflow over the years, which will be briefly described inthis chapter.19Figure 2.1: Ultrasonix 4DL14-5 3D ultrasound probe.2.1 Inclusion and Exclusion CriteriaParticipants were selected based on the following criteria.Inclusion criteria:\u2022 Suspected or diagnosed with DDH, and this is usually due to other risk factorsincluding being born with C-section, born breech, being female, or having afamily history of DDH.\u2022 Referred for a regular ultrasound exam\u2022 Ages 0 to 6 months of age (0-4 months in Phases I and II).Exclusion criteria:\u2022 Not suspected of having DDH\u2022 Not referred for clinical US exam\u2022 Has other congenital hip abnormalities\u2022 Over the age limit202.2 3D-US Data Collection ProtocolWhen a newborn is suspected of having DDH, they are normally referred for a clin-ical US exam. With the parents\u2019 consent, 3D-US scans for our study are collectedafter the technician scans the newborn with a 2D probe as part of their regular clini-cal practice. The Ultrasonix 4DL14-5\/38 (BK Ultrasound, Richmond, BC, Canada,see figure 2.1), which uses a mechanically-swept 1D linear piezoelectric array toget 3D images, is used to acquired 3D images in this study. For each participant,the sonographer attempts to scan each hip twice, with the probe being removed andreplaced between exams. Before starting the recording, the sonographer is askedto align the probe in the coronal plane, finding the optimal plane according to theirtraining. With the probe being held as steady as possible in this position, multiplesweeps of the array are acquired within an exam. Using this protocol, multiplevolumes are acquired per participant (varies depending on the protocol followed ineach phase, and the participant\u2019s level of cooperation).We note the following differences between the phases:\u2022 Phase I:\u2013 Orthopedic surgeons did the scans\u2013 Multiple surgeons imaged each participant\u2013 Static assessment protocol was followed, so no force was applied to theparticipant\u2019s hip\u2013 Number of sweeps per exam varied up to 8\u2022 Phase II:\u2013 Specialized US radiology technicians did the scans\u2013 Only one technician imaged each participant\u2013 Dynamic assessment protocol was followed [59], so the participant\u2019ship was pushed posteriorly usually in the second and third sweeps inan exam\u2013 4 sweeps were usually collected per exam\u2022 Phase III (see Figure 2.2):21PatientLeft HipExam 1 Sweep 1Sweep 2Sweep 3Sweep 4Probe is removed and replaced between examsProbe is not movedExam 2 Sweep 1Sweep 2Sweep 3Sweep 4Right HipExam 1 Sweep 1Sweep 2Sweep 3Sweep 4Probe is removed and replaced between examsExam 2 Sweep 1Sweep 2Sweep 3Sweep 4Figure 2.2: Data collected per patient in Phase III, showing number of examsper patient and number of sweeps per exam.\u2013 Specialized US radiology technicians did the scans\u2013 Only one technician imaged each participant\u2013 Static assessment protocol was followed, so no force was applied to theparticipant\u2019s hip\u2013 4 sweeps were usually collected per exam2.3 Data Used in Each ChapterThe data used in the following chapters is summarized in figure 2.3. The totalnumber of participants is 118, and the total number of sweeps from all partici-pants is 2,202. In Chapters 4 and 5, to ensure as many participants as possible arerepresented in the dataset given the labeling time constraints, we down-sample byrandomly selecting only 4 sweeps per participant, so we end up with 472 sweepswhich is reflected in the x-axis of figure 2.3. The numbers on the right hand siderepresent the number of participants per category. We briefly describe the dataused in the subsequent chapters here:22Ch.3 Train Set(439 slices)# sweepsCh.3 Test Set(103 slices)Section 4.3 Clinical Study Set(483 sweeps) Section 4.1Section 4.2Chapter 5Figure 2.3: Summary of 3D ultrasound data we collected from BCCH, show-ing details of training and testing data used for each chapter in this the-sis.\u2022 Ch.3: The data in this chapter was used to train and test U-Net, which pro-cesses volumes slice-by-slice. Briefly, we randomly extract 439 coronalslices from the Phase II sweeps for training, and 103 coronal slices fromthe Phase I sweeps for testing. Not shown in figure 2.3, we additionally col-lected 72 2D coronal US images with the Clarius L7 wireless probe in PhaseIII, which we use to evaluate U-Net\u2019s performance on unseen data from adifferent probe.\u2022 \u00a74.1: 3D-U-Net is used, so we require 3D binary mask labels. We label 64sweeps from Phases I and II for training, and 52 sweeps from Phase III fortesting.23\u2022 \u00a74.2: Again, we use CNNs that use 3D convolutions, so require 3D labels.We label 52 sweeps from Phases I and II for training, and 48 sweeps fromPhase III for testing.\u2022 \u00a74.3: For the clinical study described in this section, we make use of all 483sweeps in Phase III before being down-sampled, as in this case the labelinginvolved is much faster than the previous chapters. As we illustrate in figure2.2, for each participant both hips are imaged. Each hip is scanned twice,with the probe being removed and replaced to assess test-retest reproducibil-ity. Within each exam we usually record 4 anterior\/posterior sweeps. Thissums to 16 sweeps per participant, however this number could vary depend-ing on the level of cooperation from the participant.\u2022 Ch.5: Similar to \u00a74.1 and \u00a74.2, we again require labeled 3D data to train3D-CNN adequacy classifiers. Since it requires much less time to get theyes\/maybe\/no labels for this task compared to the pixel-wise segmentationlabels required in the previous chapters, we were able to label the full set of472 sweeps. We assign all sweeps in Phases I and II to the training set andall in Phase III to the test set.2.4 Coordinate System DescriptionWe can describe a 3D-US volume in terms of anatomical terms, Cartesian coordi-nates, or matrix coordinates, so in this section we describe the relations betweenthese coordinate systems to clarify terms used in the rest of the thesis. The con-ventions used are illustrated in figure 2.4. Anatomical terms are shown in white,Cartesian coordinates in purple, and matrix terms in black. To conform with clin-ical practice, in this thesis we always show hip US in the horizontal-cranial leftposition [14]. Matrix rows are aligned with the y-axis; matrix columns are alignedwith the x-axis; coronal slices are aligned with the z-axis.24zyxFigure 2.4: Coordinate system conventions used in this thesis.25Chapter 3Comparative Evaluation ofHand-Engineered andDeep-Learned Features forNeonatal Hip Bone Segmentationin UltrasoundIn this chapter we address RQ-1 by implementing a popular data-driven model,namely U-Net [67], for pelvis bone surface segmentation and comparing it to theSOTA.3.1 Methods3.1.1 Hand-Crafted FeaturesWe include CSPS [61\u201364] and SP [54] in our comparisons given their proven per-formance on neonatal hip US. Both methods tend to localize hip bone surfaces wellbut suffer from significant false positive responses at soft tissue, e.g. labrum andother irrelevant bone structures like the femur. To improve performance further weattempt to incorporate the spatial prior that the ilium and acetabulum are a continu-26ous bone structure that always appears as the most medial (or deepest with respectto the probe) and superior bone in the image.To apply this spatial prior, we start with the observation that SP only detects onestructure along each vertical scan line, which due to its high acoustic impedance,is likely to be bone. The hip bone is the most superior connected component ofthe SP segmentation. To find this region, we first define the set of k regions CC ={cc1, ...,cci, ...,cck} obtained by applying connected-component analysis to the SPbinary segmentation mask, with 8-connectivity test in 2D and 26-connectivity in3D. We further define the corresponding set of their centroids C= {c1, ...,ci, ...,ck},with ci = (xi,yi), and the corresponding set of x-components of the centroids X ={x1, ...,xi, ...,xk}. We find the pelvis bone surface region ccxmin, where the indexxmin = argmin(X). The final segmentation with SP is obtained by converting thisconnected region ccxmin to a binary mask.For CSPS, we first threshold the CSPS output volume to obtain a binary mask.Similarly, we apply connected-component analysis to obtain a set of connectedregions. To find the pelvis bone surface from this set of regions, we leverage thesegmentation previously obtained from SP of the pelvis bone surface, ccxmin, asa Region-of-Interest (ROI), and use it to find the CSPS region with most overlap.We convert this to a binary mask, and to eliminate any soft tissue connected to thebone, we only keep the most medial (deepest) pixel along each scan line.3.1.2 Deep-Learned FeaturesArchitectureWe use the U-Net architecture [67] for our task of segmenting the ilium and ac-etabulum bone surfaces. We choose U-Net for its proven performance on medicalimage data and its ability to train on very few training samples. As in the origi-nal architecture [67], our implementation includes nine convolution blocks, five inthe contracting path and four corresponding blocks in the expanding path. Eachblock is made up of six layers in the following order: conv3x3-batchnorm-ReLU-conv3x3-batchnorm-ReLU, in contrast to the original architecture which did notinclude batch normalization layers [29]. We use stride 1 for the 3x3 convolutions.27Figure 3.1: Example labeling procedure. Left: B-mode image with Struc-tured Phase Symmetry overlaid in red, and user-defined points shownas asterisks. Right: contour fitted to user-defined points shown as asolid red line.We use max pooling with 2x2 kernels and stride 2 in the contracting path, andcorresponding transposed convolution layers in the expanding path. With this con-figuration, the receptive field of the convolution at the end of the contracting pathis 140x140 pixels of the input image whose size is 250\u00d7250. The number offeature maps in the nine blocks is 64-128-256-512-1024-512-256-128-64. We ex-plore training U-Net with two types of inputs: 1) B-mode only image data, and2) a Multi-Channel (MC) input based on promising results from several recent pa-pers [1, 74, 76] on bone segmentation that have shown improved accuracy of bonelocalization with this method. In our implementation, the multi-channel input in-cludes the B-mode image, the corresponding SPS, and shadow confidence map [35]features in the R, G, B channels, respectively (see figure 3.2).U-Net TrainingTo prepare the training data, we start with 231,384 coronal slices, obtained with theUltrasonix probe from 59 neonate scans as potential training data. Approximatelyonly 25% of these slices contained the anatomy of interest (ilium, acetabulum,femoral head), so we filter out all other slices with a recently proposed RNN scan28adequacy architecture [58]. We randomly select 500 such adequate slices, fromwhich a trained user manually labelled the ilium and acetabulum bone contour.To aid with this manual training data labelling step, we overlaid the SPS featureon the B-mode image to help guide the user while allowing flexibility to deviatefrom SPS overlay should the user deem suitable (see figure 3.1). Further we allowthe user to reject inadequate slices not detected by the RNN from the training set.In summary, we end up with 439 labelled samples in our training set from theUltrasonix set, all of which contained the anatomy of interest. We intentionally didnot include Clarius samples in the training set in order to test generalizability ofU-Net on different domains. We subsequently dilate the manually traced contoursas originally proposed by Villa to alleviate the class imbalance problem [74], theimbalance between the number of contour and background pixels, converting ourbone contour to a ribbon-like structure. We train U-Net on both B-mode input only,as well as the multi-channel input. We select Dice loss, Adam optimizer, two-slicebatch size, with learning rate of 0.0001 over 30 epochs, and resize the input imagesto 250\u00d7250 pixels.3.1.3 TestingWe contrasted segmentation accuracy of five methods:\u2022 Original CSPS (with naive thresholding)\u2022 CSPS after applying the ROI prior as described in \u00a73.1.1\u2022 SP with ROI prior as described in \u00a73.1.1\u2022 U-Net with only B-mode input\u2022 U-Net with multi-channel input data (R=B-Mode, G=SPS, B=Shadow Con-fidence Map)We test these five methods on two datasets, a primary dataset of images thatwere acquired with the Ultrasonix 4DL14-5 3D-US probe, and a secondary datasetthat was acquired with the Clarius L7 2D-US probe. The first test set was preparedfrom 880 3D-US volumes (from 25 neonate patients) using the Ultrasonix 4DL14-5\/38 probe. Similar to the training data, we discarded inadequate slices with the29RNN approach [58] to randomly select a subset of adequate slices, on which thesame user manually delineated the contours (see Figure 3.1). A final total of 103labelled samples constituted this primary test set. We also prepared a secondarytest set using data from the Clarius L7 probe, constituting 72 2D-US images froma different pool of 19 neonates.Following segmentation using the contrasted methods, we performed simplepost-processing to convert the output segmentation map output of U-Net to a crispcontour. Specifically, we threshold the probability map at 0.5, skeletonize, andprune the resulting binary mask to generate a contour.To assess segmentation accuracy, many metrics have been proposed in the lit-erature, but there is not a single standardized metric that encompasses all the in-formation, so we include all metrics that may be applicable to our task. These canbe categorized into two main groups: pixel-wise classification metrics, and opencontour distance metrics. Classification metrics we reported include the following:\u2022 Dice-Sorensen Coefficient (DSC), also known as the F1-scoreDSC =2|R\u2229P||R|+ |P| =2T P2T P+FP+FN(3.1)\u2022 Jaccard coefficient, also known as Intersection-Over-Union (IOU)J =|R\u2229P||R\u222aP| =T PT P+FP+FN(3.2)\u2022 PrecisionPrecision =T PT P+FP(3.3)\u2022 RecallRecall =T PT P+FN(3.4)Where, TP, FP, and FN are true positive, false positive, and false negatives pix-els, respectively; R are the set of pixels in the reference (ground truth) segmentationand P are the set of pixels in the predicted segmentation; | \u00b7 | is the cardinality oper-ator, which returns the number of elements in a set. Note that these pixel-wise met-rics are conventionally used for measuring blob-shaped segmentations, and cannot30be used directly on thin, contour-like segmentations such as ours. To get aroundthis, we dilated both reference and prediction contours equally as was previouslyproposed by others [74].Distance metrics include:\u2022 Vertical Root Mean Square Error (VRMSE), defined as the root mean squareddistance between the reference and predicted contours at every scan line thatcontains both contours. More precisely, let R = ri, j \u2208 RM\u00d72 be the set ofall points in the reference contour and P = pi, j \u2208 RN\u00d72 be the set of allpoints in the predicted contour. Define R \u2287 Ro = roi, j \u2208 RK\u00d72 and P \u2287 Po =poi, j \u2208 RK\u00d72, the subsets of the contours where both reference and predictedcontours exist. VRMSE is defined as,V RMSE =\u221a\u2211Ki (roi,2\u2212 poi,2)2K(3.5)Where roi,2 is the y-component of the i-th element of Ro, poi,2 is the y-componentof the i-th element of Po, and K is the number of rows in Ro and Po.\u2022 Hausdorff Distance (HD). To compute HD, we first define the function d(p,q)that computes the Euclidean Distance between two points p and q,d(p,q) =\u221a(px\u2212qx)2+(py\u2212qy)2 (3.6)We define A = ai j \u2208 RM\u00d7N as the matrix of Euclidean Distances from allpoints in R to all points in P, calculated with d(). We compute the vectorof minimum distances from each point in R to each point in P as the row-minima of A,dR2P = minjai j (3.7)And similarly we define the vector of minimum distances from each point inP to each point in R as the column-minima of A,dP2R = miniai j (3.8)31Table 3.1: Mean (and SD) segmentation accuracy of five methods we testedon the primary and secondary datasets. From left to right: 1) ShadowPeak with RoI spatial prior, 2) Confidence-Weighted Structured PhaseSymmetry with naive thresholding, 3) Confidence-Weighted StructuredPhase Symmetry with RoI spatial prior, 4) U-Net with B-mode input,5) U-Net with multi-channel input. Best performers along each row arebolded.Ultrasonix SP+RoI CSPS CSPS+RoI U-Net MC U-NetJaccard 0.61 (0.13) 0.28 (0.14) 0.70 (0.16) 0.76 (0.10) 0.77 (0.11)Dice-Sorensen 0.75 (0.12) 0.42 (0.16) 0.81 (0.14) 0.86 (0.07) 0.86 (0.08)Precision 0.79 (0.12) 0.30 (0.15) 0.86 (0.11) 0.89 (0.07) 0.89 (0.07)Recall 0.71 (0.14) 0.82 (0.11) 0.78 (0.17) 0.85 (0.10) 0.85 (0.11)Hausdorff (mm) 4.41 (3.04) 21.89 (9.7) 3.06 (3.1) 1.60 (1.67) 1.91 (2.17)VRMSE (mm) 0.35 (0.32) 5.45 (4.96) 0.37 (0.61) 0.21 (0.07) 0.20 (0.07)Clarius - - - - -Jaccard 0.58 (0.08) 0.34 (0.09) 0.69 (0.14) 0.85 (0.07) 0.86 (0.06)Dice-Sorensen 0.73 (0.06) 0.51 (0.10) 0.81 (0.10) 0.92 (0.04) 0.92 (0.04)Precision 0.88 (0.04) 0.35 (0.09) 0.72 (0.15) 0.92 (0.07) 0.94 (0.05)Recall 0.64 (0.09) 0.93 (0.06) 0.95 (0.05) 0.93 (0.03) 0.91 (0.05)Hausdorff (mm) 5.79 (1.92) 25.68 (3.54) 5.65 (4.55) 2.34 (4.79) 1.09 (0.90)VRMSE (mm) 0.33 (0.12) 2.42 (3.27) 0.33 (0.12) 0.22 (0.10) 0.20 (0.07)Finally, HD is calculated as follows,HD = max(max(dR2P),max(dP2R)) (3.9)3.2 Results and DiscussionQuantitative results for both the Ultrasonix and Clarius probe datasets are summa-rized in Tab. 3.1. Across all evaluation metrics, the B-mode U-Net and MC U-Netappeared to be virtually tied for best performance, and appeared to perform wellconsistently as evident by the reduced standard deviations. CSPS suffered from sig-nificant soft tissue false positives despite its use of shadow features, which explainsits high recall but low precision rates, but we observe that the precision was muchimproved after applying the ROI, with a small drop in recall.32a)b)c)Figure 3.2: Example segmentation results. a,b) different segmentation tech-niques including SP, CSPS, and U-Net applied to Ultrasonix test data. c)the same techniques applied to Clarius test data.33To evaluate generalizability, we tested our model trained only on the Ultrasonixdata on a secondary test set obtained with the Clarius probe. We saw a similarpattern on this secondary Clarius set, with U-Net and MC U-Net outperformingthe other methods. Although not directly comparable, as the Clarius dataset wascomprised of only 2D-US optimal coronal images of the infant hip, whereas theUltrasonix dataset is 3D and contained coronal slices away from the optimal cen-tral slice, these results still provide some evidence that U-Net is likely capable ofgeneralizing to image data from probes not included in the training set.We show exemplar qualitative results on both Ultrasonix and Clarius data inFigure 3.2. To extract the \u03b1 angle, it is crucial for the segmentation algorithm toaccurately delineate the bony rim between the ilium and acetabulum, and enoughof the ilium and acetabulum surfaces surrounding it, while simultaneously not cap-turing any false positive soft tissue or unrelated bone such as the femur. It is sub-sequently crucial to reduce outliers and carefully assess failure cases beyond mereaggregate quantitative measures comparisons such as those in Tab. 3.1. For CSPS,common failure cases included soft tissue false positives; completely missing theilium when rotated at certain angles; as well as fragmented contour, as shown inFigure 3.2. Similarly, SP often missed the acetabulum due to weak shadow in thatregion (Figure 3.2). In contrast, U-Net rarely detected false positive soft tissue,and consistently and accurately segmented the full ilium and acetabulum contour.Errors were mainly due to slightly under- or over-segmenting the ilium or acetab-ulum at the superior and inferior extremities of the contour. Along each scan line,we observe that U-Net very accurately segmented the bone contours as is reflectedin the negligible VRMSE errors observed on our clinical dataset.In contrast to recent papers reporting improved results by fusing phase sym-metry and shadow features within MC deep learning networks [1, 74, 76], we didnot observe significant improvements by including these hand-crafted features toour input. However, an apparent improvement in HD indicates slight improvementon the secondary test set, and this is consistent with qualitative observations as weobserved that MC U-Net is sometimes more robust to soft tissue false positives.Comparing to the closest literature, we observed on the primary test set improvedmean HD of 1.60\u00b11.67 mm compared to Hareendranathan\u2019s 2.1\u00b10.9mm with thesuperpixel classification method [22]. Further we report much improved mean34DSC of 88% on our primary test set, compared to Zhang\u2019s [81] DSC of 39% fortheir proposed architecture, and 5% with their implementation of U-Net.With regards to computational complexity, we had about 15 million parametersin our U-Net implementation. When tested on 250\u00d7250 2D coronal US slices, ona machine with Intel Core i-7 (4.0 GHz, 6 core) processor and NVIDIA Titan XpGPU, we logged run times of 0.007s for Shadow Peak, 0.155s for CSPS, and 0.003sfor U-Net.3.3 ConclusionsWe proposed a deep-learning-based approach for bone segmentation in neonatalhip ultrasound. We showed this method improved accuracy over state-of-the-art,feature-based techniques recently proposed in the literature for our task. Results ona secondary dataset show that U-Net is robust to domain shifts such as images froma probe that produces significantly different images, and that using a multi-channelinput may improve robustness further. The main limitation of this experiment isusing a CNN architecture that uses 2D convolution kernels, that do not incorporatepotentially useful information from adjacent slices. We address this limitation in\u00a74.1 with 3D-U-Net.35Chapter 4Measuring Hip Dysplasia with3-Dimensional ConvolutionalNeural NetworksIn this chapter we address RQs-1,2, and 3. We address RQ-1 in \u00a74.1, training 3D-U-Net for pelvis bone surface segmentation, testing, comparing it to other meth-ods. We address RQ-2 in \u00a74.2, implemneting and training regression-based andsegmentation-based CNNs for femoral head localization, testing, and comparingthese models to other methods. We address RQ-3 in \u00a74.3, proposing an algorithmfor extracting key 3D-US DDH metrics from the segmented anatomy, and evaluateour algorithm\u2019s performance in a clinical study against the SOTA.4.1 Pelvis Bone Surface Segmentation: Going 3DIn the previous chapter, we showed that CNNs can outperform the previously pro-posed CSPS for pelvis bone surface segmentation. However, U-Net [67] uses 2Dconvolutions, so can only process the input volume slice-by-slice. Recent architec-tures have been proposed that employ 3D convolutions to process the volume as asingle input, making use of potentially important 3D information in adjacent slices.Most famously, these architectures include 3D-U-Net [6] and V-Net [46]. Thesearchitectures were concurrently published and are very similar to each other, with36the exception that V-Net additionally uses short-distance residual connections. Inthis section, we extend our work from the last chapter on pelvis bone surface seg-mentation by training 3D-U-Net and comparing its performance to U-Net [67] andCSPS [62].4.1.1 LabelingOne of the main challenges in training CNNs is getting enough labelled data. Thisis especially a problem for getting segmentation mask annotations for volumetricdata, as the time required for labeling only a single training sample (i.e. a fullvolume) is on the order of 10 to 100 times the time to label a single slice. Inour case, labeling a single slice takes roughly 10s. There are usually around 50slices with visible pelvis bone surface, so this translates to roughly 10 minutes pervolume. This is the main reason we opted to use 2D U-Net in Ch.3 instead ofdirectly using a 3D architecture such as 3D-U-Net. To reduce the time required forgetting mask annotations for our task, we leverage the previously trained U-Netfrom Ch.3 for getting approximate mask annotations. We then manually fix anyover- or under-segmentation in 3D Slicer [36] (see figure 4.1). We annotated thepelvis bone surface in a total of 116 volumes from a total of 29 participants, with64 volumes for training and 52 for testing.4.1.2 TrainingWe train 3D-U-Net to optimize Binary Cross Entropy (BCE) Loss,LBCE(y, y\u02c6) =\u2212(y log(y\u02c6)+(1\u2212 y)(log1\u2212 y\u02c6)) (4.1)where y is target class for each pixel and y\u02c6 is the predicted class. We use batchsize of 1 volume, and use 4-fold cross-validation to find a good learning rate of0.001. We resize the input volume down to 100\u00d7100\u00d7100 voxels. We train for 90epochs, starting with an initial learning rate of 0.001 and reducing it by a factor of0.2 at 30 and 60 epoch milestones. We use Adam optimizer [37]. Since we onlyhave a relatively small training set of 64 training volumes, we apply the followingrandom augmentations:37Figure 4.1: To train 3D-U-Net, we use U-Net predictions (left) from the pre-vious chapter as a starting point and manually fix areas that are incor-rectly segmented to get the \u201chuman\u201d label (right).\u2022 Non-uniform zooming by a factor in the range of [0.9, 1.1]\u2022 Shifting along the x,y,z axes in the range of [-10, 10] pixels\u2022 Rotating around the x,y,z axes in the range of [-5, 5] degrees\u2022 Flips in the medial\/lateral direction (z-axis) with 0.5 probability\u2022 Elastic deformation [72] with probability 0.5, and \u03c3 in the range [2,4]\u2022 Gamma contrast correction with \u03b3 in the range of [0.2,2]Iout = I\u03b3in (4.2)4.1.3 TestingTo evaluate the performance of the different methods proposed, we assess agree-ment with a labelled test set of 52 volumes from 13 participants. In this section,we compare the following methods:\u2022 3D-U-Net [6]38\u2022 U-Net [67] that was proposed in Ch.3\u2022 CSPS as originally proposed by Quader [62], with naive thresholding\u2022 CSPS-DDH, which is the CSPS-based segmentation that Quader used to com-pute the DDH metrics such as \u03b13D [61]. This included some post-processingsuch as cropping with an ROI and ray-casting.Note that in this comparison we do not include methods previously evaluated inCh.3 including SP [54], MC U-Net, and CSPS with connected-component analysisand anatomical prior. This is because, based on the evidence we saw in Ch.3, U-Net performed the best and we do not see an added benefit of further investigatingthe other methods.As in \u00a73.3, we use both classification and distance metrics to assess accuracyof segmentation. Classification metrics used in this chapter include the following:\u2022 Dice-Sorensen Coefficient (DSC), also known as the F1-score (see Eq.3.1)\u2022 Jaccard coefficient, also known as IOU (see Eq.3.2)\u2022 Precision (see Eq.3.3)\u2022 Recall (see Eq.3.4)Note that all of these metrics are conventionally used for measuring blob-shaped segmentations, and cannot be used directly on thin, contour-like segmen-tations such as ours. To get around this, we dilated both reference and predictioncontours equally with a 5\u00d75\u00d75 voxel cubic structuring element, as was previouslyproposed by others [74].For measuring distance between the reference and predicted contours we use:\u2022 Mean Euclidean Distance (MED), and this can be computed from the ref-erence surface to the predicted surface, prediction to reference, or bidirec-tionally as the maximum of these. To calculate the MED from reference toprediction, let R = ri, j \u2208 RM\u00d73 be the set of all points in the reference sur-face and P = pi, j \u2208 RN\u00d73 be the set of all points in the predicted surface.39Further we define the function d(p,q) that computes the Euclidean Distancebetween two points p and q,d(p,q) =\u221a(px\u2212qx)2+(py\u2212qy)2+(pz\u2212qz)2 (4.3)We define A = ai j \u2208 RM\u00d7N as the matrix of Euclidean Distances from allpoints in R to all points in P, calculated with d(). We compute the vectorof minimum distances from each point in R to each point in P as the row-minima of A,dR2P = minjai j (4.4)And similarly we define the vector of minimum distances from each point inP to each point in R as the column-minima of A,dP2R = miniai j (4.5)Finally, we compute the MEDs from R to P, P to R, and bidirectionally as,MEDR2P =\u2211dR2PN(4.6)MEDP2R =\u2211dR2PM(4.7)MEDmax = max(MEDR2P,MEDP2R) (4.8)\u2022 Hausdorff Distance (HD), and similarly this is computed from reference toprediction, prediction to reference, or bidirectionally as the maximum ofthese.HDR2P = max(dR2P) (4.9)HDP2R = max(dP2R) (4.10)HDmax = max(HDR2P,HDP2R) (4.11)In addition, our lab has recently proposed two new metrics that aim to ad-40dress some of the shortcomings of the aforementioned metrics, specifically for thetask of bone segmentation in US, by combining classification and distance metricsinto a unified measure, that does not rely on dilation. This unified measure is theCoverage Agreement Index (CAI) and is computed from the Coverage DistanceIndex (CDI) and Root Mean Square (RMS) distance as follows.Given a 3D binary segmentation B = bi, j,k \u2208 ZM\u00d7N\u00d7K2 , we compute the scan-line 2D binary projection BBP = b j,k \u2208 ZN\u00d7K2 as follows:bBPj,k =\uf8f1\uf8f2\uf8f31 if \u2211i b j,k > 00 otherwise (4.12)Given the reference binary segmentation volume R and corresponding scan-line projection RBP, and the predicted binary segmentation volume P and corre-sponding scan-line projection PBP, the CAI is the DSC between the 2D binary pro-jections and is computed as follows:CAI =2|RBP\u2229PBP||RBP|+ |PBP| (4.13)Further, we compute the RMS Euclidean Distance error. For each image, it isthe RMS of the Euclidean Distances from each point on the predicted bone surfaceto the nearest point on the reference bone surface.RMSP2R =\u221a\u2211dP2RM2(4.14)Where dP2R was defined in equation 4.5.Finally CDI is computed as:CDI =CAI1+RMS2P2R(4.15)4.1.4 Results and DiscussionTesting results for the four contrasted methods are shown as boxplots in figures 4.3,4.4, and 4.5, and a summary is provided in table B.1. As well, we show qualitative41Figure 4.2: Visualizing pelvis bone surface segmentation with the contrastedmethods. a) human label; b) CSPS; c) U-Net; d) 3D-U-Net. Red arrowspoint to areas that are over-segmented (false positives).42(a) Precision (b) Recall(c) Jaccard Coefficient (d) DSCFigure 4.3: Pixel-wise classification evaluation for pelvis bone surface seg-mentation.visual results in figure 4.2.We perform statistical analyses as follows. We first test the null hypothesis thatall four methods produce equivalent segmentations with a one-way Analysis ofVariance (ANOVA) (see results in table B.1). Considering the conservative Bonfer-roni criterion for multiple tests [4], our p-value threshold for statistical significanceis reduced from 0.05 down to 0.004. Even with this conservative criterion, we re-ject the null hypothesis across all 12 reported metrics. Further we apply post hoct-tests between methods for all the reported metrics (see t-test results in AppendixB). From this, we make the following observations about the three key metrics:\u2022 DSC: U-Net and 3D-U-Net outperform CSPS. U-Net slightly outperforms43(a) MEDR2P (b) MEDP2R(c) MEDmax (d) HDR2P(e) HDP2R (f) HDmaxFigure 4.4: Contour distance evaluation metrics for pelvis bone surface seg-mentation.44(a) CAI (b) CDIFigure 4.5: Combined evaluation metrics for pelvis bone surface segmenta-tion.3D-U-Net.\u2022 HDmax and MEDmax: 3D-U-Net outperforms all the other methods, includingU-Net.\u2022 CAI: 3D-U-Net far outperforms the other methods.Based on these observations, as well as our qualitative visualizations of theresults, we conclude that 3D-U-Net is our best option to use in the final algorithmfor extracting the DDH metrics. The fact that U-Net slightly outperforms 3D-U-Net in the classification metrics and CDI is probably explained by the fact that ourreference (ground truth) labels are initially based on the U-Net predictions. DSCof 85% for 3D-U-Net is still an objectively good score, and taking all the othermetrics into consideration, we conclude that 3D-U-Net outperforms U-Net and faroutperforms CSPS.4.1.5 ConclusionsWe proposed 3D-U-Net to segment the pelvis bone surface.3D-U-Net more ac-curately segmented the pelvis bone surface compared to CSPS, the SOTA methodpreviously proposed by Quader [61\u201364] to segment the pelvis bone surface. Com-pared to the U-Net model proposed in chapter 3, 3D-U-Net captured similar bone45surface as U-Net, but with fewer false positive detached islands from the main bonesurface.4.2 Locating the Femoral HeadThe second important landmark to delineate is the femoral head. This is especiallytrue for extracting metrics such as FHC3D. However, the femoral head presentsits own unique challenges. Being almost completely cartilaginous in neonates,and therefore hypoechoic, the femoral head has weakly defined boundaries andappears as a dark, approximately semi-spherical shape with some speckle. It isbounded medially by hyperechoic pulvinar fat in the acetabulum, laterally by softtissue including muscles and ligaments, and superiorly by the labrum and hypoe-choic hyaline cartilage lining the bony rim (junction between the ilium and theacetabulum).In this section, we explore two potential approaches for locating the femoralhead:1. Direct regression of sphere parameters (\u00a74.2.2): We assume that the femoralhead is a perfect sphere, although only part of the sphere is visible in anUS volume. We use 3D-CNNs to directly regress the four sphere parameters,including the the center coordinates (cx,cy,cz) and the radius.2. Segmentation (\u00a74.2.3): We do not make assumptions about the shape, andonly attempt to segment the visible part of the femoral head. We again use3D-U-Net [6] for this task.4.2.1 LabelingSimilarly to the previous chapters, we need volumes with the femoral head anno-tated to train the proposed CNNs. As the femoral head does not have well-definedboundaries, we do not attempt a direct pixel-wise annotation as is customarily donefor segmentation datasets. We found that determining which pixels are femoralhead and which are background to be difficult. Additionally, frame-by-frame pixel-wise labeling is extremely time-consuming and inefficient. Instead, we make useof the knowledge that the femoral head is approximately spherical, and label it by46Figure 4.6: Femoral head keypoint placement in 3D Slicer.47Figure 4.7: Fitting a sphere to the key edge points. We show the full spherein green and the cropped sphere in yellow.selecting only a few keypoints at its edges in some coronal and transverse frames.We demonstrate this process in figure 4.6. After selecting these key edge points,we use them to prepare two kinds of labels (see figure 4.7):1. Full sphere label: We fit a sphere to these points with a least-squares method[33], generating a 4-element vector that includes the three center coordinatesand the radius [cx,cy,cz,r]2. Semi-sphere label: From this fitted sphere, we generate a semi-spherical48Figure 4.8: Conceptual illustration of 3D convolutional neural networks usedfor direct regression of sphere parameters.binary segmentation mask of the visible part of the femoral head (see figure4.7). First, any pixels within the sphere are labeled as 1 and any pixels out-side are labeled as 0. Further, we define a bounding box B whose boundariesare defined by the most extreme keypoints. We set any points outside of B to0, ensuring that only the clearly bounded, visible parts of the femoral headare segmented.Our train set for this task contains 52 volumes from 13 participants, and thetest set constains 48 volumes from 12 participants.4.2.2 Direct RegressionArchitecture ChoiceConsidering the weakly-defined boundaries of the femoral head in 3D-US of theneonatal hip, our initial guess was that formulating this problem as a segmentationproblem would not be effective. Instead, we propose to directly regress the centreand radius parameters of the sphere-of-best-fit as shown in figure 4.8. AlthoughCNNs have been mainly used for classification, they can be used just as effectivelyfor regression. Sphere regression is by definition a 3D task, so we choose to use3D models that use 3D convolutions, and have been shown to be effective for 3Dtasks such as video classification [18\u201320]. 3D versions of modern architectureshave been recently proposed including ResNet [26], whose residual connectionswere shown to be improve accuracy and efficiency over its predecessors [5]; aswell as newer derivatives of ResNet including DenseNet [28], WideResNet [80],49and ResNext [79]. Volumetric data and architectures require significantly increasedGPU memory and time for training. So, due to time and hardware limitations, wechoose to limit our experiments to 3D-ResNet-50 and 3D-DenseNet-121 proposedin [18\u201320].TrainingWe set up our models as shown in figure 4.8. We resize the input volume to100\u00d7100\u00d7100 voxels to meet memory constraints of our system. To identify agood model and hyper-parameters for our task, we use 4-fold cross-validation andexperiment with architectures including DenseNet-121 and ResNet-50, augmenta-tion options, and learning rates. We choose Mean Squared Error (MSE) loss as theobjective function to train our networks for regression.LMSE = ||y\u2212 y\u02c6|| (4.16)Where y is the target label and y\u02c6 is the prediction. Based on results from thecross-validation, we finally choose 3D-ResNet-50, trained with batch-size of 3 vol-umes, for 210 epochs, with an initial learning rate of 0.001, reduced by a factor of0.2 at the 70 epoch and 140 epoch milestones. We apply the following augmenta-tion in training:\u2022 Non-uniform zooming by a factor in the range of [0.9, 1.1]\u2022 Shifting along the x,y,z axes in the range of [-10, 10] pixels\u2022 Rotating around the x,y,z axes in the range of [-5, 5] degrees\u2022 Flips in the medial\/lateral direction (z-axis) with 0.5 probability\u2022 Gamma contrast correction with \u03b3 in the range of [0.2, 2]4.2.3 SegmentationConsidering our previous successes with U-Net and 3D-U-Net, as well as over-whelming evidence from the literature on the success of these architectures [30],50Figure 4.9: Conceptual illustration of 3D-U-Net used for segmenting thefemoral head.we evaluate the performance of 3D-U-Net for this task (figure 4.9), despite our ini-tial doubts about the challenging task of segmenting such ill-defined anatomy. Weuse BCE loss (Eq. 4.1) as the objective function for this task. We use batch size of1 volume, and train for 30 epochs with an initial learning rate of 0.0001, reducedby a factor of 0.2 at 10 and 20 epoch milestones. We use the same augmentationsas in \u00a74.2.2.4.2.4 TestingOn a test set of 48 volumes from 12 participants, we evaluate the performance ofour proposed ResNet-50 for direct sphere regression and 3D-U-Net for segmenta-tion, against the previously proposed SOTA RFC by Quader [64]. Metrics we usedfor assessing performance can be divided into two categories: classification metricsand distance metrics.Classification metrics include:\u2022 Precision (Eq. 3.3)\u2022 Recall (Eq. 3.4)\u2022 Jaccard coefficient, also known as IOU (Eq. 3.2)\u2022 DSC (Eq. 3.1)51Distance metrics include:\u2022 Centre Absolute Error (CAE)s between the true and predicted spheres\u2019 cen-ters along all three x, y, and z axes. Note that the 3D-U-Net segmentationground truth label as well as its predicted output are not spherical, but insteadare semi-spherical cropped by the bounding box B. So for the segmentationbinary mask produced by 3D-U-Net, the center is computed as the center-of-mass of the segmented region. The errors are computed as follow:CAEx = |cx\u2212 c\u02c6x| (4.17)CAEy = |cy\u2212 c\u02c6y| (4.18)CAEz = |cz\u2212 c\u02c6z| (4.19)Where c is the reference (ground truth) centre and c\u02c6 is the predicted centre.\u2022 RAE between the true and predicted spheres\u2019 radii. Again, the 3D-U-Net pre-diction is not spherical, so the radius is computed as the difference betweenthe most medial and lateral points of the segmented region.RAE = |r\u2212 r\u02c6| (4.20)Where r is the reference (ground truth) radius and r\u02c6 is the predicted radius.\u2022 Centre Euclidean Distance (CED) between the predicted center and humanlabel, computed as follows:CED =\u221a(cx\u2212 c\u02c6x)2+(cy\u2212 c\u02c6y)2+(cz\u2212 c\u02c6z)2 (4.21)4.2.5 Results and DiscussionWe report quantitative testing results as boxplots in figures 4.13 and 4.14, as wella summary in table C.1. Qualitative, visual examples are shown in figures 4.10,4.11, and 4.12.52Figure 4.10: Visualizing Quader\u2019s [61, 64] RFC prediction of the femoralhead. Left: hip ultrasound with human-labelled keypoints of thefemoral head in red. Right: binary segmentation mask output of theRFC in green.Figure 4.11: Visualizing output of 3D-ResNet-50 direct regression model forfemoral head localization. Left: hip ultrasound with human-labelledkeypoints of the femoral head in red, and best fitting sphere in green.Right: 3D-ResNet-50 predicted sphere in blue.53Figure 4.12: Visualizing output of 3D-U-Net segmentation model for femoralhead localization. Left: hip ultrasound with human-labelled keypointsof the femoral head in red, and best fitting, cropped sphere in yellow.Right: 3D-U-Net segmentation binary mask prediction in pink.Similar to the previous section we apply the following statistical analyses. Weapply a one-way ANOVA to test the null hypothesis that all 3 methods are equal(see results of ANOVA in table C.1). Considering the Bonferroni correction formultiple comparisons [4] (in our case 9 comparisons), our p-value threshold forstatistical significance is reduced from 0.05 down to 0.006. With this conservativethreshold, we can reject this null hypothesis for all the metrics except for the CAEx,for which a p-value of 0.009 suggests we cannot reject the null hypothesis for thismetric. Further, we apply post hoc t-tests to compare the different methods (seetables in Appendix C). We make the following observations about the key metrics:\u2022 DSC: both of our proposed models, 3D-U-Net (segmentation) and 3D-ResNet-50 (regression) outperform the RFC, and 3D-U-Net performs the best.\u2022 CED: 3D-U-Net predicts the center against its ground truth label most accu-rately. 3D-ResNet-50 and the RFC are virtually tied.\u2022 RAE: 3D-ResNet-50 and 3D-U-Net outperform the RFC, and 3D-U-Net pre-54(a) Precision (b) Recall(c) Jaccard Coefficient (d) DSCFigure 4.13: Pixel-wise classification-based evaluation for femoral head lo-calization.dicts the radius most accurately.Based on these observations and our qualitative observations of the visual seg-mentations, we conclude that 3D-U-Net is the best of the three methods presented,so we choose to use it in our final pipeline for DDH metrics extraction.4.2.6 ConclusionsWe proposed two new methods based on 3D-CNNs for locating the femoral headin neonatal hip 3D-US. We directly compared the performance of our methodsto each other, as well as to the SOTA RFC proposed by Quader [61, 64], the onlyother fully automatic method proposed for our task to the best of our knowledge.55(a) CAEx (b) CAEy(c) CAEz (d) CED(e) RAEFigure 4.14: Distance-based evaluation metrics for femoral head localization.56We found that the regression-based, 3D-ResNet-50 model locates the femoral headmore accurately than the RFC in all the proposed metrics, except for the CAEx. Thesegmentation-based, 3D-U-Net model locates the femoral locates more accuratelythan 3D-ResNet-50 and the RFC, so we choose to use 3D-U-Net in our final metricspipeline for extracting dysplasia metrics as described in the next section.4.3 Extracting Dysplasia MetricsIn this section, we describe our algorithms for extracting \u03b13D, FHC3D, and OCRDDH diagnostic metrics from neonatal hip 3D-US, using the improved segmentationtechniques described in the previous chapters. In addition, we present a clinicalstudy to evaluate the performance of our algorithms against the SOTA.4.3.1 Choosing DDH MetricsMany US-based diagnostic metrics have been previously presented in the literaturefor DDH [66]. For consistency with standard clinical practice, and for direct com-parison to the SOTA algorithms, we choose to focus only on the following threemetrics:\u2022 \u03b1 3D: a 3D metric that was first proposed by Quader [63], analogous to thewidely clinically-used \u03b1 angle first proposed by Graf [12]. This metric isdefined as the angle between the normals to the fitted planar surfaces of theilium and acetabulum. DDH severity increases with decreased \u03b13D angle.\u2022 FHC3D: another 3D metric that was first proposed by Quader [64], analogousto the widely used FHC metric originally proposed by Morin [47]. This met-ric captures additional information not captured by \u03b13D, and can potentiallybe used for quantitative dynamic assessment as proposed by Paserin [59].This metric is defined as the ratio of the femoral head volume medial to theplane of the ilium vs. the total femoral head volume. DDH severity increaseswith decreased FHC3D.\u2022 OCR: another 3D metric that was initially proposed by Zonoobi and Hareen-dranathan [82]. This metric is relatively new and is not traditionally used instandard clinical practice. We choose to report it as it may provide additional57information about bony rim rounding that is not necessarily captured by the\u03b13D angle. This metric is defined as the radius of the largest sphere that canbe fitted under the bony rim (junction between ilium and acetabulum, alsocalled \u201capex line\u201d). DDH severity increases with increased OCR.4.3.2 Algorithm for Extracting the MetricsWe propose new methods for extracting these 3 metrics from the 3D-U-Net pelvisbone and femoral head segmentations, that builds on ideas from the previouslyproposed algorithms by Quader [61] and Hareendranathan [82]. The algorithm isillustrated in figure 4.15.Getting OCRStarting with the pelvis bone surface segmentation, we do the following to extractthe OCR:1. Apply connected-component analysis to remove any small detached islandsfrom the main surface of interest, keeping only the largest component.2. Skeletonize to convert the thick segmentation to a thin (one-pixel-wide) sur-face binary segmentation.3. Convert this binary segmentation to a point cloud PCP.4. Fit a polynomial surface SP(x,z) of the 2nd order along the z-axis and 3rdorder along the x-axis to PCP. This allows us to compute the surface Gaus-sian Curvature K in the next step, as the point cloud surface is otherwise toonoisy for this calculation.5. Compute the Gaussian Curvature K(x,z) of the polynomial surface SP(x,z)as described by Zonoobi and Hareendranathan [82].6. Find the coordinates (xK ,zK) of the point of maximum Gaussian Curvature,R = (xK ,zK ,yK), on the polynomial surface.(xK ,zK) = argmax(x,z)|K(x,z)| (4.22)587. Compute OCR as the reciprocal of the First Principal Curvature K1 at thepoint of maximum K:OCR =1|K1(xK ,zK)| (4.23)Getting \u03b13DBuilding on the steps for calculating OCR, we do the following to extract \u03b13D:1. Similar to the pelvis bone segmentation, we convert the femoral head binarysegmentation to a point cloud PCF = fi, j \u2208 RN\u00d73.2. Find the center-of-mass C of the femoral head asC =\u2211i fiN(4.24)3. Define the sphere O with center C and radius \u2016C\u2212R\u2016, which will be used toseparate the ilium and acetabulum point clouds4. Assign all the points in PCP and outside O to the ilium point cloud PCI5. Assign all the points in PCP and inside O to the acetabulum point cloud PCA6. Fit planes A and I to the PCA and PCI point clouds, respectively, with least-squares plane-fitting7. Compute \u03b13D as the angle between the unit normal vector to the iliac planenI and the unit normal vector to the acetabulum plane nA:\u03b13D = cos\u22121nA \u00b7nI|nA||nI| (4.25)Getting FHC3DFurther building on the previous steps, we simply compute FHC3D as,59\ud835\udf703DFHC3DOCRSP(x,z)CIliumAcetabulumnIRnAnIOFigure 4.15: Conceptual illustration of proposed pipeline for extracting \u03b13D,FHC3D, and OCR from the segmented pelvis bone surface and femoralhead. Note that measurement is done in 3D, but concept is simplifiedto 2D for illustration purposes only.FHC3D =\u2211BMF\u2211BF(4.26)Where BF is the femoral head binary segmentation mask, and BMF is the binarysegmentation mask containing only the femoral head portion medial to the planeof the ilium I.4.3.3 Clinical StudyWe conduct a clinical study to evaluate the performance of our proposed algo-rithms against the SOTA methods. Ideally, we would assess accuracy of our meth-ods against a clinical gold standard, but such a measure does not exist because2D-US, the current clinical standard, is highly variable as previously explained socannot be considered a gold standard. Accuracy could be assessed with a longitu-60dinal study that tracks patient outcomes, but this type of study requires long-termtracking on the order of years and is beyond the scope of this thesis. Therefore, weonly compare the reliability of our methods to the state-of-the-art techniques. Ad-ditionally, we perform a comparative visual comparison of the different methodswhen large discrepancies with previous methods are observed.ReliabilityTo assess reliability we use 483 sweeps from 34 participants as was described inCh.2. Our goal is to simulate the real clinical scenario as close as possible, sowe design our study to assess inter-exam test-retest reproducibility. In our study,an expert US technician images both hips of every participant. The same tech-nician scans each hip twice (i.e. 2 exams), removing and replacing the probe inbetween exams. Within an exam, the probe is held still while the transducer isswept anteriorly-posteriorly 4 times. In summary, each participant is imaged 16times, with each hip imaged twice (2 exams), and each exam containing 4 sweeps.See Figure 2.2 for an illustration. We note that the same technician does both ex-ams, so we can only compute the intra-sonographer reproducibility and not theinter-sonographer reproducibility, and we note this as a limitation of our method.We apply our full pipeline including segmentation and metrics extraction toeach sweep in the clinical study set. Similarly, we apply Quader\u2019s [61] full pipelineto the same set of sweeps to extract \u03b13D and FHC3D with their proposed algorithmsthat use CSPS and RFCs for segmentation. Additionally, we manually inspect anddiscard any sweeps according to the scan adequacy labeling procedure describedin \u00a75.1. If all the sweeps within an exam are discarded, we discard that examcompletely, and subsequently do not include that hip in our final analysis. Foreach exam (containing 1-4 adequate sweeps), the assigned dysplasia metric is theaverage of that dysplasia metric across all the remaining adequate sweeps in thatexam. Finally, each remaining exam is assigned the following five metrics:\u2022 \u03b13D using our proposed methods\u2022 \u03b13D using Quader\u2019s methods [61, 63]\u2022 FHC3D using our methods61Table 4.1: Results showing inter-exam, intra-sonographer ICC of our pro-posed pipeline for computing \u03b13D, FHC3D and OCR vs. the state-of-the-art methods (n=42 hips).Method ICC (95% CI) p-value(H1: ours> othermethod)\u03b1 3DQuader\u2019s [61] 0.78 (0.62, 0.88) 0.03Ours 0.87 (0.77, 0.93) -FHC3DQuader\u2019s [61] 0.68 (0.47, 0.81) 0.006Ours 0.84 (0.72, 0.91) -OCR Quader\u2019s [61] - -Ours 0.74 (0.58, 0.86) -\u2022 FHC3D using Quader\u2019s methods [61, 64]\u2022 OCR using our methodsGiven the inter-exam pairs for each of these 5 metrics, for each hip, we cancompute the inter-exam reproduciblity using the ICC. Following the guideline forselecting ICC by Koo [38], we select a two-way mixed effects model, based onsingle measurement, with absolute agreement definition. In addition to ICC, wealso report the test-retest Standard Deviation (SD)s for the different methods asanother measure of reproducibility.4.3.4 Results and DiscussionStarting from 483 sweeps (from 60 hips) in the clinical study set, 317 sweeps (from42 hips) remain after we discard all inadequate sweeps. We report ICC for the DDHmetrics with the different methods in table 4.1. Further, we report the the SDs intable D.1.Comparing to the literatureTo the best of our knowledge, there are only two other techniques in the literaturethat were proposed for extracting DDH metrics from 3D-US of the neonatal hip,and these are Quader\u2019s [61, 63, 64] and Zonoobi\u2019s [82]. We had access to Quader\u2019s62Table 4.2: Qualitative plausibility analysis of adequate sweeps in which alarge discrepancy between our methods and Quader\u2019s (SOTA) was de-tected. Numbers reported are percentages of successful and plausiblemeasurements out of 51 sweeps visually inspected. Inspection was per-formed by an unbiased rater other than the author of this thesis.% Plausible\u03b13DQuader\u2019s [61, 63] 76%Ours 100%FHC3DQuader\u2019s [61, 64] 57%Ours 100%[61] code, so we were able to compare our methods to theirs directly on the samedataset. We use the R implementation of ICC in the IRR package [9], which per-forms a one-sided test against a specified null hypothesis, and these are the p-valueswe report in the ICC table 4.1. Our results show that across all three of the reportedmetrics, our proposed methods are more reproducible than Quader\u2019s [61]. We canonly compare our methods to Zonoobi\u2019s [82] methods indirectly as we do not haveaccess to their code, so we can only compare our results to the numbers reportedin their publication. We first note the following differences in Zonoobi\u2019s meth-ods: 1) they used a different ICC definition (two-way mixed effects, consistency,single rater; 3,1); 2) they averaged the measures from multiple sonographers ateach exam; 3) instead of a single \u03b13D measure, they report \u03b13D-posterior and \u03b13D-anterior. With these differences, Zonoobi reports an inter-exam ICC on a set of 60hips for \u03b13D-posterior, \u03b13D-anterior, and OCR of 68%, 62%, and 50%, respectively.Comparing to our ICC results for \u03b13D and OCR of 87% and 74%, respectively, itappears that our methods may be more reproducible, but this is yet to be confirmedin a direct comparison in future work.Comparing the metrics to each otherFollowing Koo\u2019s guidelines [38] for ICC, our \u03b13D\u2019s reproducibility is good to ex-cellent, FHC3D is moderate to excellent, and OCR is moderate to good. \u03b13D andFHC3D are virtually tied, and they both significantly outperform OCR.63(a) \u03b13D (\u25e6)(b) FHC3D (unitless ratio)Figure 4.16: Bland-altman plots showing large discrepancies between ourmetrics and Quader\u2019s metrics (n=42 hips).64Ultrasound air gap echoes falsely identified as pelvis bone surfaceFemoral head severely under-segmentedacbdCorrectly segmented anatomyFigure 4.17: Example showing failure with Quader\u2019s SOTA CSPS-basedmethod for pelvis bone surface segmentation and \u03b13D measurement[63]. a) Incorrectly segmented pelvis bone surface with Quader\u2019smethod shown in green. b) The corresponding fitted planes result-ing in an implausible \u03b13D measurement. c) The same sweep correctlysegmented with 3D-U-Net shown in red. d) The corresponding fittedplanes and plausible \u03b13D measurement.65a bc dSeverely under-segmented femoral headCorrectly segmented femoral headFigure 4.18: Example showing failure with Quader\u2019s SOTA RFC-basedmethod [64]. a) Incorrectly segmented femoral head with Quader\u2019smethod shown in green. b) The corresponding fitted planes resultingin an implausible FHC3D measurement. c) The same sweep correctlysegmented with 3D-U-Net shown in red. d) The corresponding fittedplanes and plausible FHC3D measurement.66\ufffca bc dOnly a thin sliver of pelvis bone surface is segmentedMost of the pelvis bone surface is segmentedFigure 4.19: Example showing questionable case with Quader\u2019s SOTA CSPS-based method [63]. a) Quader\u2019s CSPS segmentation method only cap-tures a very thin silver of the overall pelvis bone surface. b) The cor-responding fitted planes resulting in a questionable \u03b13D measurement.c) The same sweep correctly segmented with 3D-U-Net shown in red.d) The corresponding fitted planes and plausible \u03b13D measurement.67Large Discrepancy Analysis and Failure CasesOn the set of 42 hips, we inspect cases where there are large discrepancies be-tween our methods and Quader\u2019s. Since we do not have a trusted gold standardmeasurement of DDH in these participants due to the clinical standard 2D-US beingunreliable, we cannot directly judge which of the two methods is closer to the truevalue, or in other words which method is more accurate. In these cases, we visu-alize the segmentation and plane-fitting outputs of Quader\u2019s methods and ours, todetermine if we can make any conclusions which meathod is more plausible.First, we discard inadequate sweeps as described in \u00a75.1. Of the remainingadequate sweeps, we identify how many such large discrepancy cases exist withBland-Altman plot analyses. For both \u03b13D and FHC3D, we visualize the discrep-ancies on Bland-Altman plots as shown in figure 4.16. Note the relatively high1.96SD ranges of -22\u25e6 to 16\u25e6 for \u03b13D, and -52% to 19% for FHC3D, suggesting ahigh level of discrepancy between the two methods that warrants further investiga-tion.Based on these Bland-Altman plots, we select rough cut-offs of -10\u25e6 to 10\u25e6 for\u03b13D, and -30% to 10% for FHC3D, and deem any cases with an absolute difference(i.e. difference between our metrics and Quader\u2019s metrics) beyond these thresholdsas a large discrepancy. Based on this, we identified 51 adequate sweeps (from 7hips) with large discrepancy.An unbiased (i.e. not involved in any of the work done in this thesis or Quader\u2019swork, and does not have conflicts-of-interest) engineering student in our lab isasked to inspect each such case (adequate and large discrepancy). This individualis asked to judge the overall plausibility of the \u03b13D and FHC3D measurements forthe two contrasted methods based on the perceived quality of the segmentation andplane-fitting. We report the results of this analysis in table 4.2.From these results, we can see that in all cases in the inspected dataset ourproposed methods for \u03b13D and FHC3D are always plausible, whereas measurementswith Quader\u2019s methods are in many cases implausible. The most common reasonfor \u03b13D measurement implausibility with Quader\u2019s method was failure to segmentthe pelvis bone surface, instead falsely segmenting other regions (e.g. soft tissue orair gaps) and identifying them as the pelvis bone surface (7\/51 such cases), and we68show an example of this in Figure 4.17. For FHC3D, the rater observed many casesin which Quader\u2019s RFC severely undersegmented the femoral head, resulting in animplausible measurement (17\/51 such cases), and we show an example in Figure4.18. In addition to these clearly implausible cases, the rater reported observingsome borderline (questionable) cases in which Quader\u2019s CSPS-based segmentationmethod resulted in only a very thin sliver of the total pelvis bone surface beingsegmented, for example see Figure 4.19. Finally, the rater reported 3\/51 casesin which Quader\u2019s program failed and quit with an error before completing thesegmentation and measurement, resulting in outputs of 0 for \u03b13D and FHC3D.4.3.5 ConclusionsWe proposed a new algorithm for extracting DDH metrics including \u03b13D, FHC3D,and OCR from segmented neonatal hip 3D-US. We showed that our methods pro-duce higher inter-exam, intra-sonographer ICCs compared to the SOTA methodsproposed by Quader [61, 63, 64]. It also appears that our methods may be morereproducible than the semi-automatic method proposed by Zonoobi [82], althoughthis is yet to be determined in a direct comparison in future work. Further, weshowed that in cases with large disagreement between our methods and Quader\u2019s,our methods appear to produce more plausible results and are more robust to fail-ure.69Chapter 5Automatic Adequacy Assessmentwith 3-DimensionalConvolutional Neural NetworksAs described in Ch.1, Paserin\u2019s [56, 58] was the only work to explore automaticadequacy assessment for 3D neonatal hip ultrasound that made use of 3D data fromadjacent slices. Paserin introduced this in the form of an RNN model that classifieswhether or not a given volume is adequate based on the following criteria definedby Paserin in collaboration with a radiologist:\u2022 The femoral head, a hypo-echoic spherical structure, should be fully presentand seen growing and shrinking in size across the encompassing slices\u2022 The ilium must appear as a straight, horizontal, hyper-echoic line\u2022 The acetabulum must be present and appear continuous with the iliac bone\u2022 Presence of ischium\u2022 Presence of labrum\u2022 All of these features should be collectively present within an adequate vol-ume, but they do not necessarily all need to be present within any singleslice70Paserin\u2019s RNN was able to emulate the radiologist\u2019s adequacy assessment quitewell, achieving a reported AROC of 83% on a test set of 20 volumes. However, thecompleteness of the criteria themselves was not validated. Specifically, the effectof the choice of adequacy criteria on the reproducibility of \u03b13D and FHC3D was nottested. In this chapter, we evaluate if using Paserin\u2019s proposed criteria can improveDDH measurement. Further, we propose our own adequacy criteria and comparethese with the criteria proposed by Paserin, evaluating these against DDH metricreproducibility. Lastly, we show how 3D-CNNs can be used to automate adequacyclassification with our proposed criteria.5.1 Labeling with New CriteriaIn \u00a74.3.3, we described a clinical evaluation study in which a rater (the author ofthis thesis) inspected the full set of 483 sweeps from 34 participants, and judgedwhich sweeps were adequate for DDH measurement. Having gained access to highquality segmentations of the hip anatomy with our trained models from the pre-vious chapter, we could now see patterns in the anatomy which were not previ-ously apparent, and consequently gained an improved understanding of the overallshape. Given this newfound understanding, we suggest that Paserin\u2019s adequacycriteria could be improved, and hypothesize that criteria that are more selectivecould ultimately improve the reliability of the DDH measurement. As such, we didnot strictly adhere to Paserin\u2019s criteria in the labeling process. Instead, to labelthe 483 sweeps, the rater (author of this thesis) was asked to only answer the fol-lowing simplified question for every sweep: \u201cIs the sweep adequate for \u03b13D andFHC3D measurement?\u201d. The rater was given a choice of answering the questionas \u201cyes\u201d, \u201cmaybe\u201d (if not sure), or \u201cno\u201d, after visualizing simultaneously 4 views:1) B-mode coronal view cine, 2) B-mode sagittal view cine, 3) B-mode transverseview cine, and 4) the segmented anatomy as point clouds. Using this procedure,317 sweeps were labelled as \u201cyes\u201d, 101 as \u201cmaybe\u201d, and 65 as \u201cno\u201d.Retrospectively, the following are the reasons we most often observed for whichwe rejected a sweep:\u2022 The ilium is fully or partially beyond the Field-of-View (FOV) of the probe,and this is usually caused by the probe being positioned too inferiorly. If71much of the ilium surface is missing, then we cannot be certain that the fittedplane represents the full ilium surface, and consequently is not adequate for\u03b13D or FHC3D measurement. (e.g. see figure 5.1)\u2022 Similarly, the femoral head is partially or fully beyond the FOV of the probe,and this can be caused by the probe being positioned too anteriorly or pos-teriorly. If the femoral head is occluded, we cannot make a FHC3D measure-ment.\u2022 There is clear movement artifact. This can be usually observed in the sagittalview and appears as a \u201csmudge\u201d. It is also visible when playing a cine of thecoronal view as we see the femoral head abruptly moving superiorly andinferiorly in the cine. (e.g. see figure 5.2)\u2022 The ilium and acetabulum are present and planes can be fitted to them, buttheir segmented surface area appears smaller than other high-quality, \u201cade-quate\u201d examples. This can be caused by the probe pose or rotation deviatingsignificantly from the optimal position. (e.g. see figure 5.3)We highlight the following differences compared to Paserin\u2019s criteria. With ourcriteria:\u2022 We ignore the labrum and ischium, as these are not relevant for measuring\u03b13D and FHC3D.\u2022 We emphasize movement artifact, whereas this was not included in Paserin\u2019scriteria.\u2022 The ilium is treated as a plane (which can be tilted), whereas Paserin\u2019s cri-teria specifies that it must be a horizontal line (and we assume that this isbased on the standard plane as judged by the rater).\u2022 Beyond the simple presence of certain anatomy viewed in the B-mode views,we emphasize probe positioning and image quality in terms of the shape andsurface area of the segmentation, which can be more clearly observed in the3D segmentation view, but is not obvious when looking at only the sagittal,coronal, and transverse B-mode cines.72Figure 5.1: Example showing a sweep that was deemed \u201cinadequate\u201d be-cause the ilium appears to be beyond the FOV of the scan due to theprobe being positioned too inferiorly from the optimal position (right),and an \u201cadequate\u201d volume with ilium fully within the FOV for compari-son (left).5.1.1 Evaluation SchemeTo evaluate and compare our new criteria with Paserin\u2019s, we apply Paserin\u2019s RNNmodel [56, 58] to the same clinical evaluation dataset described in \u00a74.3.3, and dis-card sweeps labelled as inadequate by this model. We again compute the followingmetrics for all 483 sweeps in the set:\u2022 \u03b13D using our methods described in Ch.4\u2022 FHC3D using our methods described in Ch.4\u2022 OCR using our methods described in Ch.4And we compare the inter-exam, intra-sonographer ICC of the following three sets:\u2022 No-Discard: The full clinical evaluation set without discarding any sweeps73Coronal Sagittal TransverseSagittalCoronal TransverseMovement artifactFigure 5.2: Example of a sweep deemed \u201cinadequate\u201d because of movementartifact that can be seen as a \u201csmudge\u201d in the sagittal view (lower row),and for comparison we show the sagittal view of an \u201cadequate\u201d volume(top row).\u2022 Paserin-Criteria: A distilled set, discarding \u201cinadequate\u201d sweeps based onthe predictions from Paserin\u2019s RNN model [56, 58], and we consider the pre-dictions of this model to be an approximation of Paserin\u2019s adequacy criteria\u2022 Our-Criteria: A distilled set, discarding \u201cinadequate\u201d sweeps based on thenew criteria describe in the previous \u00a75.15.1.2 Results and DiscussionWe report the results in table 5.1. First, we note that only 17 sweeps out of 483were labelled as inadequate and rejected with the RNN, compared to 166 labeled74Figure 5.3: Example of a sweep deemed borderline adequate (\u201cmaybe\u201d) onthe right, due to the labeler\u2019s perception that the probe was not posi-tioned optimally. Note the shape and reduced area of the ilium (green)and acetabulum (yellow) surfaces used for \u03b13D in the sweep on the right,compared with the high-quality sweep on the left. This is potentially dueto the probe being slightly tilted (roll around x-axis) or translated (alongz-axis) away from the optimal position.Table 5.1: Comparing inter-exam, intra-rater test-retest ICC with differentadequacy criteria. The number of sweeps remaining after discarding in-adequate sweeps n is shown in parentheses beside each column header.The 95% CI is reported in parentheses next two each ICC number.No-Discard (n=483) Paserin-Criteria (n=466) Our-Criteria (n=317)\u03b1 3D 0.65 (0.48,0.78) 0.67 (0.50,0.79) 0.87 (0.77,0.93)FHC3D 0.74 (0.61,0.84) 0.75 (0.62,0.85) 0.84 (0.72,0.91)OCR 0.67 (0.50,0.79) 0.68 (0.52,0.80) 0.74 (0.58,0.86)as inadequate (in the \u201cmaybe\u201d or \u201cno\u201d category) and rejected with the proposedcriteria. Further, the ICCs across all three metrics appear to be higher in the Our-Criteria set compared to the other two, whereas the Paserin-Criteria set appears tobe tied with the No-Discard set. This suggests that our adequacy criteria are moreselective compared to Paserin\u2019s RNN, and that this selectivity appears to increasetest-retest reproducibility.75Table 5.2: Adequacy train and test sets class distribution.Yes=1 Maybe=0.5 No=0 TotalTrain Set 100 91 145 336Test Set 81 30 25 136Total 181 121 170 4725.2 Automatic Adequacy Classification with 3D-CNNsAs was described in Ch.1, the overall objective of this thesis is to develop a fullyautomatic system that is user-independent, so in this section we attempt to automatethe adequacy classification step that was performed manually as was described in\u00a75.1.5.2.1 Classification ModelInspired by the high accuracy and speed of CNNs proposed by Paserin [56\u201358] forthis task, we also propose to use CNNs for this task. However, similar to our meth-ods proposed for femoral head sphere regression presented in \u00a74.2, we propose touse 3D-CNNs that can fully capture the 3D information in the full volume, com-pared to the 2D-CNNs that can only use limited information from single frames andRNNs that can only use information from a few adjacent slices. Again, we experi-ment with 3D-ResNet-50 and 3D-DenseNet-121 based on promising performancewith on a video classification task reported in the literature [18\u201320], and given ourtime and hardware limitations.5.2.2 Labeling Data for CNN TrainingAs depicted in figure 2.3, for training and testing our models, we use an expandedset of volumes from all 118 participants in our full dataset. We assign 336 sweepsfrom 84 participants to the training set and 136 sweeps from 34 participants tothe test set, totaling 472 sweeps. The same rater (author of this thesis) labelledthis set of training and test data using the procedure described in \u00a75.1. The classdistribution for the train and test sets is illustrated in figure 2.3 and is summarizedin table 5.2.765.2.3 TrainingBased on observations from preliminary cross-validation experiments, we choose3D-DenseNet-121 over 3D-ResNet-50 for our classification task. We train threemodels based on 3D-DenseNet-121 with the following differences:\u2022 Model 1 [B Y\/N]\u2013 Input: single channel B-Mode input only\u2013 Output: binary class label, 1 for adequate and 0 for inadequate\u2013 Training set class distribution: All sweeps labelled as \u201cyes\u201d are as-signed to one class (adequate), all sweeps labelled as \u201cno\u201d are assignedto the other class (inadequate), and all sweeps labelled as \u201cmaybe\u201d areignored and not included in the training.\u2022 Model 2 [B+Seg Y\/N]\u2013 Input: 3-channel input with B-mode in one channel, 3D-U-Net binarymask prediction of the pelvis bone surface in the second channel, and3D-U-Net binary mask prediction of the femoral head in the third chan-nel.\u2013 Output: binary class label, 1 for adequate and 0 for inadequate\u2013 Training set class distribution: All sweeps labelled as \u201cyes\u201d are as-signed to one class (adequate), all sweeps labelled as \u201cno\u201d are assignedto the other class (inadequate), and all sweeps labelled as \u201cmaybe\u201d areignored and not included in the training.\u2022 Model 3 [B+Seg Y\/M\/N]:\u2013 Input: 3-channel input with B-mode in one channel, 3D-U-Net binarymask prediction of the pelvis bone surface in the second channel, and3D-U-Net binary mask prediction of the femoral head in the third chan-nel\u2013 Output: binary class label, 1 for adequate and 0 for inadequate77\u2013 Training set class distribution: All sweeps labelled as \u201cyes\u201d are as-signed to one class (adequate), all sweeps labelled as \u201cmaybe\u201d or \u201cno\u201dare assigned to the other class (inadequate).For all three models we select the same training hyperparameters:\u2022 Batch size: 1\u2022 Learning rate: 0.0001\u2022 Optimizer: Adam [37]\u2022 Loss: BCE\u2022 Epochs: 150\u2022 Augmentation as described in \u00a74.2.2\u2022 Input size: 100\u00d7100\u00d7100 pixels\u2022 Regularization: dropout with rate 50%5.2.4 TestingWe compare the performance of the three models using the AROC on the test set.Although not directly comparable as it was trained on a different training set, forcompleteness we also report the Paserin\u2019s RNN AROC against the same test set.Since our models output is binary (adequate\/inadequate), whereas the test set in-cludes three classes (yes\/maybe\/now), we compute the AROCs on two subsets ofthe test set:\u2022 Test subset 1: ignoring the \u201cmaybe\u201d test cases (this reduces the test set from136 to 111 sweeps), and\u2022 Test subset 2: assigning all \u201cmaybe\u201d cases to the \u201cinadequate\u201d classFurther, we repeat the inter-exam, intra-sonographer ICC analysis on the clin-ical evaluation sweep (483 sweeps), but this time using the predictions from thethree contrasted models to discard inadequate sweeps. We report ICCs for the78Table 5.3: AROC scores of three contrasted models when applied on the testset. In the first row we ignore sweeps in the test set labelled as \u201cmaybe\u201d.In the second row we assign all sweeps labelled as \u201cmaybe\u201d to the \u201cinad-equate\u201d class.Tested on: RNN[58] B Y\/N B+Seg Y\/N B+Seg Y\/M\/NSubset1(Y=1,M=ignore,N=0)0.49 0.89 0.9 0.84Subset2(Y=1, M=0,N=0)0.49 0.75 0.84 0.83Table 5.4: Inter-exam, intra-sonographer ICC with the proposed CNNs. nis the number of remaining sweeps after \u201cinadequate\u201d sweeps are dis-carded. 95% CIs are shown in parentheses.B Y\/N (n=335) B+Seg Y\/N (n=391) B+Seg Y\/M\/N (n=345)\u03b1 3D 0.79 (0.65,0.88) 0.76 (0.62,0.86) 0.73 (0.50,0.86)FHC3D 0.82 (0.70,0.90) 0.77 (0.64,0.87) 0.81 (0.64,0.91)OCR 0.60 (0.40,0.76) 0.72 (0.56,0.83) 0.67 (0.41,0.83)DDH metrics given that inadequate sweeps are discarded with the three trained 3D-DenseNet-121 models, and compare these with the previously proposed strategies:No-Discard, Paserin-Criteria (RNN), and Our-Criteria (manually labeled) summa-rized in table 5.1.5.2.5 Results and DiscussionThe AROC scores in table 5.3 show that all three 3D-DenseNet-121 models havemostly learned to emulate the labeler\u2019s ability to predict adequacy based on our pro-posed criteria. Paserin\u2019s RNN, in comparison, has a very poor score of 49%, indi-cating that it is not good at predicting adequacy based on our new criteria, althoughPaserin reported a high AROC of 83% on a test set of 20 volumes labeled with theirown criteria. Model1 [B Y\/N] performs well (AROC of 89%) when tested on testsubset1 that does not include \u201cmaybes\u201d, but performs relatively poorly (AROC of7975%) when tested on subset2 that includes \u201cmaybes\u201d. This suggests that this modelis less selective and might classify borderline \u201cmaybe\u201d cases as \u201cadequate\u201d, whichpresents a risk in a real-world scenario where it is safer to be more selective andrepeat acquisition for borderline cases, as repeated acquisitions are low cost andfast. In contrast, Model2 (B+Seg Y\/N) and Model3 (B+Seg Y\/M\/N) which addi-tionally take as input the segmentation binary masks, scored higher AROCs of 84%and 83% on Subset2. This suggests that segmentation data may provide additionaluseful information that can help determine adequacy in uncertain cases. Models 2and 3 appear to be more selective compared to Model 1, suggesting that they arepotentially safer for use in real-world scenarios.Considering the ICC scores in table 5.4 and comparing these to scores in table5.1, in general we see worse test-retest reproducibility with the 3D-DenseNet-121compared to the human-labelled \u201cOur-Criteria\u201d set. This is probably explainedby over-fitting to the training data and a relative decrease in accuracy on unseentest data, despite using augmentation and dropout regularization. This could beimproved with more data which we did not have. Further, based on the reportedICC scores, we do not see a clear pattern on which of the three 3D-DenseNet-121models performs the best across all three DDH metrics. However, in general wesee that models 2 and 3 appear to score as good or better than the No-Discardand Paserin-Criteria sets, suggesting their potential to assist in improving DDHmeasurement in a real-world scenario.5.2.6 ConclusionsWe proposed new adequacy criteria and compared these criteria to Paserin\u2019s [56,58], the closest work to ours, and to the best of our knowledge the only other workon scan adequacy for 3D-US for DDH. We showed that our newly proposed criteriacapture more relevant information and are more selective. Due to this selectivity,we showed that using these criteria to discard inadequate sweeps improves repro-ducibility of \u03b13D and FHC3D measurement, suggesting that using the proposed ad-equacy criteria may reduce misdiagnosis (to be confirmed in future work). Further,we evaluated 3D-DenseNet-121 to automate adequacy classification with our cri-teria. We show that our trained models capture more information and are more80selective compared to the RNN, but we did not show conclusively which of thethree proposed training regimes for 3D-DenseNet-121 is the best.LimitationsWe can only make limited conclusions due to the limitations of this study. Theseinclude:\u2022 The adequacy criteria we proposed and used, although they showed improve-ments in ICC, are not precisely defined and remain subjective.\u2022 There is a data imbalance in our train and test sets. For example 43% of thetraining data was labelled as \u201cno\u201d, whereas only 18% of the test data waslabelled as \u201cno\u201d. It appears that our sonographers became better at acquiringadequate images as time progressed. This can also be seen in the last row offigure 2.3, of which the x-axis is ordered chronologically, we see that as timeprogresses we get more \u201cyes\u201d labels, and fewer \u201cmaybe\u201d and \u201cno\u201d labels.Future WorkFuture work will focus on improving the adequacy criteria, the models, and exper-imental methodologies. Suggestions for future work include:\u2022 More precisely and quantitatively defining adequacy criteria to improve la-beling reproduciblity.\u2022 Having more than one labeler, which allows testing reproducibility of thecriteria and reduces labeler bias.\u2022 Using novice sonographers, ideally having a new sonographer for every scan,which would reduce the learning effect described in the limitations.\u2022 Mitigating the class imbalance in the training process, by reorganizing thetrain and test sets, re-sampling, or re-weighting training examples for exam-ple with Focal Loss [41].\u2022 In our analysis we simply discarded \u201cinadequate\u201d cases. In a real clinicalscenario, this presents a real problem, as an inadequate case means that the81DDH diagnosis must be made based on information from different tests, po-tentially forcing the clinician to revert to standard techniques including 2D-US and clinical examination. As these techniques are unreliable, as we dis-cussed in Ch.1, this puts the patient at risk of misdiagnosis. Future workshould incorporate point-of-care adequacy feedback, and assess how manyof all the \u201cinadequate\u201d cases can be adequately re-acquired.\u2022 An interesting avenue that could be explored in future work is incorporatinga built-in uncertainty measure into the network that can tell the user if thenetwork is uncertain of its prediction. This additional uncertainty informa-tion may potentially help identify the \u201cmaybe\u201d cases.82Chapter 6Discussion and Conclusions6.1 Revisiting Research Questions and comparing theState-of-the-ArtHere we reiterate the research questions presented in \u00a71.4, and outline conclusionsmade for each RQ based on work presented in the previous chapters, in the processcomparing our proposed methods to the SOTA previously presented in the literature.6.1.1 Research Question 1Can CNNs be trained to segment the pelvis bone surfaces, includingthe ilium and acetabulum, in neonatal hip 3D-US? Would the predic-tions produced by such CNNs more closely resemble human labels, ascompared to existing SOTA methods such as CSPS?In Ch.3, we trained U-Net [67] on 439 2D slices for segmenting the pelvis bonesurface in neonatal hip 3D-US. We chose U-Net because of its proven success formedical image segmentation [30] and ability to learn from very few images. In ourcomparison, we mainly compare with CSPS as it is the only fully automatic methodthat was applied to 3D-US volumes for DDH, to the best of our knowledge, and weconsider to be the SOTA for our application. When tested on 103 previously unseen2D slices, U-Net achieved a DSC of 86%, outperforming CSPS+ROI [61, 62] whichachieved a DSC of 81%.83In \u00a74.1, we further experimented with 3D-U-Net [6], which uses 3D convolu-tion kernels as opposed to U-Net which uses 2D convolutions, hypothesizing thatincorporating 3D information may improve the segmentation accuracy. We trained3D-U-Net on a set of 64 volumes. When tested on 52 volumes, we report DSCs of85% an 91% for 3D-U-Net and U-Net, respectively, but very different CDI scoresof 76% and 24% with 3D-U-Net and U-Net, respectively. These results, supportedby our visual observations, suggest that 3D-U-Net is less likely to produce falsepositive segmentations (islands) that are distant from the bone surface, but thatboth methods can capture most of the bone surface. Importantly, these are muchhigher than the scores achieved by the SOTA CSPS method, which achieved DSC of26% and CDI of near 0%.We conclude that CNNs can be trained to segment the pelvis bone surfaces inneonatal hip 3D-US, and that the resulting segmentations more closely resemblehuman labels as compared to the SOTA.6.1.2 Research Question 2Can CNNs be trained to locate the femoral head in neonatal hip 3D-US? Would the predictions produced by such CNNs more closely re-semble human labels, as compared to existing SOTA methods such asQuader\u2019s RFC [61, 64]?In \u00a74.2, we trained 3D-U-Net [6] to segment the femoral head in neonatal hip US.We compare 3D-U-Net to an RFC-based method introduced by Quader [61, 64],which is to the best of our knowledge the only method previously presented inthe literature for fully automatic segmentation of the femoral head in 3D-US ofthe neonatal hip, and which we consider the SOTA for this application. Whentested on a set of unseen 53 volumes, 3D-U-Net localized the femoral head withCED 1.42 mm and RAE 0.46 mm. This more closely matched the human labelcompared to the RFC [61, 64], which achieved 3.90 mm CED and 2.01 mm RAEon the same test set. We conclude that CNNs can be trained to locate the femoralhead in neonatal hip 3D-US, and that the predictions produced by these CNNs moreclosely resembles the human labels as compared to the SOTA.846.1.3 Research Question 3Can we develop automatic methods for extracting \u03b13D and FHC3Dmetrics with our improved segmentations that are at least as repro-ducible as the previously proposed methods [61, 63, 64]? Can weshow that our proposed methods are at least as robust and plausibleas these previously proposed methods?In \u00a74.3, we presented algorithms for automatically extracting dysplasia metricsincluding \u03b13D, FHC3D, and OCR from the segmented neonatal hip 3D-US volumes.Comparing to the literature, to the best of our knowledge, there are only two othersystems which use 3D-US for DDH diagnosis that are comparable to our system,and both were developed simultaneously. One system, which was developed byZonoobi et al. [82], is semi-automatic, requiring seed-point inputs from the user,and measures the \u03b13D-anterior, \u03b13D-poster, and OCR metrics. The other system,developed in our lab by Quader et al. [61, 63, 64], is fully automatic, uses CSPS forbone surface segmentation, an RFC classifier for femoral head segmentation, andmeasures \u03b13D and FHC3D. Of these two, Quader\u2019s method, being fully automatic,is closest to ours, so we consider this to be the SOTA for our application, so wecompare directly to this method in our experiments.On a clinical set of 42 hips, our method achieves inter-exam, intra-sonographerICCs of 87%, 84%, and 74% for \u03b13D, FHC3D, and OCR, respectively. On the sameset of 42 hips, Quader\u2019s methods [61, 63, 64] achieved lower ICCs of 78% and 68%for \u03b13D and FHC3D, respectively. Further, qualitative observations by an indepen-dent observer suggest higher plausibility, fewer failures, and improved robustnesswith our methods.Based on our experiments in \u00a74.3, we conclude that our methods are morereproducible (in the inter-exam, intra-sonographer setting), more robust, and moreplausible than Quader\u2019s methods [61, 63, 64], the current SOTA for fully automaticmeasurement of DDH from neonatal hip 3D-US volumes.6.1.4 Research Question 4Are the current adequacy criteria proposed by Paserin [56] sufficient?Can we improve the criteria? Can we train new models for automating85classification based on the newly defined criteria?To the best of our knowledge, Paserin\u2019s work [56\u201358] is the only other workin the literature that addressed the problem of adequacy classification of 3D-US,so we consider it the SOTA for this application. In \u00a75, we propose a new set ofadequacy criteria based on recent observations of segmented volumes, and com-pare these to Paserin\u2019s. When applied to a set of 483 volumes, only 317\/483 caseswere deemed adequate with the new criteria, whereas 466\/483 were deemed ad-equate with Paserin\u2019s criteria (as approximated by the RNN [58]). Based on this,we conclude that our new criteria are more selective. Further, we report higherinter-exam, intra-sonographer ICCs when inadequate volumes are discarded withour criteria, for example 87% for \u03b13D vs. 67% when using the RNN for adequacyclassification. These results, in addition to qualitative observations, suggest thatthe newly proposed criteria are more selective, and that this selectivity results inimproved test-retest reproducibility of DDH measurement.Further, we experimented with 3D-DenseNet-121 [18\u201320], for automating theadequacy classification based on the new adequacy criteria. Tested on an un-seen test set of 136 sweeps, which was labelled based on the new criteria, 3D-DenseNet-121 achieved classification AROC of 84%, much higher than the RNNwhich achieved an AROC of 49%. With this improved selectivity, using 3D-DenseNet-121 for identifying and discarding inadequate sweeps, we observed higher ICCscompared to using the RNN for identifying discarding inadequate sweeps, but stilllower than manual labeling (e.g. for \u03b13D, ICCs of 65% without discarding inad-equate sweeps, 67% with the RNN, 76% with 3D-DenseNet-121, and 87% withmanual inadequate sweep identification). We conclude that 3D CNNs show somepromise towards this task, but can likely be much improved in future work.6.2 LimitationsOverall, the biggest limitations of the work presented in this thesis include:\u2022 Homogeneity and limited diversity in the data. For example, the data col-lected included only scans with the Ultrasonix 4DL14-5 probe, and from asample of participants only from British Columbia. A known problem with86deep neural networks is their tendency to overfit to the training data, whichis best mitigated by training with a diverse dataset, so our models have likelyoverfitted to our limited dataset. Their performance will likely deterioratewhen used on data from other domains, such as different US probes, but thisis yet to be determined with an expanded, more diverse dataset.\u2022 Due to the lack of a reliable and trusted gold standard diagnostic techniquefor measuring hip socket depth, we do not report accuracy or validity ofour methods, only test-retest reproducibility. Therefore, we cannot makeany strong conclusions about our methods\u2019 validity beyond our qualitativeobservations that show plausibility of our proposed methods.\u2022 The ICCs reported are inter-exam, intra-sonographer, and only two expertsonographers participated in the study. Therefore, conclusions about repro-ducibility cannot be generalized to inter-sonographer and novice user sce-narios.6.3 Future WorkUltimately, the goal of this project is to develop an accurate, safe, and robust so-lution for DDH diagnosis. This device should additionally be optimized for cost,computational efficiency, and usability to facilitate wide-spread clinical adoptionto reach as many participants as possible, and to reduce misdiagnosis rates glob-ally. Building on work presented in this thesis, and considering these overarchinggoals, I recommend that future research should prioritize addressing the aforemen-tioned limitations, as well as exploring new research avenues that would targetthese goals.6.3.1 Domain Shift and AdaptationData from different domains: To address the problem of homogeneous data anddomain shift, the first challenge would be to obtain more diverse data, for exam-ple from different probes, settings, and geographical regions. This is potentiallypossible with the help of clinical researchers at the International Hip Dysplasia In-stitute. This data would be crucial not only for evaluating, but also for improving87the accuracy of our models under domain shift. Here, different data scenarios mayarise, which would require different solutions. For example, in the best-case sce-nario, we may get many images and many labels from different domains; or worse,we may get many images but few or no labels; or in the worst-case scenario, wemay get few images and few labels. All of these are realistic scenarios, and presentinteresting research avenues.Solutions for domain shift: Ultimately, the solution will depend on the avail-able images and labels. In the scenario where many unlabelled images are pro-vided from a new domain, unsupervised domain adaptation techniques can be used[75, 77]. A related approach is neural style transfer, whose main application hasbeen artistic style transfer [34], but one can imagine each probe and setting com-bination as a different artistic style and apply the same techniques. In the scenariowith many images but few or weak labels, weakly-supervised techniques, whichuse cheaper labels such as image-level tags to train segmentation networks, orsemi-supervised techniques, which leverage a small number of strongly labelledimages and many weakly-labelled images to cheaply improve segmentation, havebeen proposed [55]. In the more challenging scenario where images are scarce,new approaches such as few-shot learning [15], which aims to learn with very fewlabelled images, may be useful. Notably, few-shot techniques for segmentation areseemingly relatively under-studied compared to classification. The final solutionwill likely be a combination of such techniques as dictated by the available data.Evaluation: Robustness to domain shift of proposed models and solutionscould then be evaluated on a more diverse and heterogeneous dataset designedto test performance under domain shift. For example, solutions could be trainedon data from one domain (e.g. with the Ultrasonix probe), and then tested on datafrom an unseen domain (e.g. different probe and settings). Depending on the task(e.g. segmentation or adequacy classification), performance could be measuredand contrasted quantitatively with relevant metrics for the task (e.g. classificationaccuracy, Dice Score, etc.) on examples from the unseen domain.886.3.2 Improved Clinical StudyTo address the aforementioned limitations of validity (as opposed to reliability) andintra-sonographer ICC, an improved clinical study will likely be necessary in thefuture. Ideally, one could conduct a randomized clinical trial which would random-ize patients into control and experimental groups, treat based on diagnosis with our3D-US-based methods, and track clinical outcomes in the long run to assess sen-sitivity, specificity, and AROC of our proposed diagnostic techniques. However,this is likely not possible due to ethical considerations and resource limitations.Alternatively, one could make smaller modifications to future clinical studies toaddress the limitations, at least partially. For example, to assess inter-sonographer(as opposed to intra-sonographer) reproducibility, one could require each hip to bescanned by more than one sonographer. To address the question of validity, onecould perhaps require patients to be scanned with a different imaging modality (e.g.MRI).6.3.3 Detectability of Failure: Deep Learning with UncertaintyAnother interesting topic to be explored in future work is uncertainty. Patientsafety is paramount in medical applications. Safety is a function of severity, proba-bility, and detectability of failure. The biggest safety risk with using our AI-enabledUS device is perhaps the risk of misdiagnosis. Work in this thesis has focused onreducing the probability of misdiagonsis. Further implementing a measure of un-certainty in our models would improve detectability of failure and potential mis-diagnosis. This is perhaps especially important under scenarios of domain shift(e.g. different probe), in which an indication of low confidence can alert the oper-ator that the model output cannot be trusted and that manual intervention may benecessary.Uncertainty is an active area of research that has gained much attention re-cently. Many methods have been proposed to estimate uncertainty, perhaps mostpopular of which are Monte Carlo Dropout [8] and Bayes by Backprop [3]. Arecent study [52] that compared the aforementioned techniques and others underdomain shift concluded that quality of uncertainty degrades with increasing datasetshift regardless of method. It would be interesting to evaluate different uncertainty89techniques (e.g. Monte Carlo Dropout, BNNs, etc.) using the metrics proposed byOvadia [52] on the DDH dataset, including Reliability Diagrams, Expected Cali-bration Error, and Entropy.6.4 Clinical Impact and SignificanceWe have proposed a system for DDH diagnosis with 3D-US. CNNs played a keyrole in improving the segmentation and classification components of this system,ultimately improving the reliability, robustness, and usability of the system as awhole, as we showed in a limited clinical study. We hypothesize that these im-provements will serve to improve the accuracy of DDH diagnosis in the clinic,reducing misdiagnosis rates, and consequently improving patient outcomes and re-ducing costs. Improved automation and usability of our system further serves tomake this solution more attractive to clinicians, especially in low-resource settingsthat lack expertise, ultimately encouraging clinical translation of our system andconsequently reaching more patients globally.90Bibliography[1] A. Z. Alsinan, V. M. Patel, and I. Hacihaliloglu. Automatic segmentation ofbone surfaces from ultrasound using a filter-layer-guided CNN.International Journal of Computer Assisted Radiology and Surgery, 14(5):775\u2013783, may 2019. ISSN 1861-6410. doi:10.1007\/s11548-019-01934-0.URL https:\/\/doi.org\/10.1007\/s11548-019-01934-0http:\/\/link.springer.com\/10.1007\/s11548-019-01934-0. \u2192 pages 12, 28, 34[2] T. G. Barlow. EARLY DIAGNOSIS AND TREATMENT OFCONGENITAL DISLOCATION OF THE HIP. The Journal of Bone andJoint Surgery. British volume, 44-B(2):292\u2013301, may 1962. ISSN0301-620X. doi:10.1302\/0301-620X.44B2.292. URLhttp:\/\/online.boneandjoint.org.uk\/doi\/10.1302\/0301-620X.44B2.292. \u2192page 6[3] C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra. Weightuncertainty in neural networks. 32nd International Conference on MachineLearning, ICML 2015, 2:1613\u20131622, 2015. \u2192 page 89[4] R. J. Cabin and R. J. Mitchell. To Bonferroni or Not to Bonferroni : Whenand How Are the Questions of America Society Bulletin Ecological.America, 81(3):246\u2013248, 2010. \u2192 pages 43, 54[5] A. Canziani, A. Paszke, and E. Culurciello. An analysis of deep neuralnetwork models for practical applications. CoRR, abs\/1605.07678, 2016.URL http:\/\/arxiv.org\/abs\/1605.07678. \u2192 page 49[6] O\u00a8. C\u00b8ic\u00b8ek, A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger. 3DU-Net: Learning Dense Volumetric Segmentation from Sparse Annotation.In S. Ourselin, L. Joskowicz, M. R. Sabuncu, G. Unal, and W. Wells, editors,Medical Image Computing and Computer-Assisted Intervention \u2013 MICCAI2016, pages 424\u2013432, Cham, 2016. Springer International Publishing. ISBN978-3-319-46723-8. \u2192 pages 17, 36, 38, 46, 8491[7] C. Dezateux and K. Rosendahl. Developmental dysplasia of the hip. TheLancet, 369(9572):1541\u20131552, 2007. ISSN 0140-6736.doi:https:\/\/doi.org\/10.1016\/S0140-6736(07)60710-7. URLhttp:\/\/www.sciencedirect.com\/science\/article\/pii\/S0140673607607107. \u2192page 1[8] Y. Gal and Z. Ghahramani. Dropout as a bayesian approximation: Insightsand applications. In Deep Learning Workshop, ICML, volume 1, page 2,2015. \u2192 page 89[9] M. Gamer, J. Lemon, I. Fellows, and P. Singh. Package \u2019irr\u2019, 2019. URLhttps:\/\/cran.r-project.org\/web\/packages\/irr\/irr.pdf. \u2192 page 63[10] R. Ganz, M. Leunig, K. Leunig-Ganz, and W. H. Harris. The etiology ofosteoarthritis of the hip: An integrated mechanical concept. ClinicalOrthopaedics and Related Research, 466(2):264\u2013272, 2008. ISSN15281132. doi:10.1007\/s11999-007-0060-z. \u2192 page 1[11] D. Golan, Y. Donner, C. Mansi, J. Jaremko, and M. Ramachandran. FullyAutomating Graf\u2019s Method for DDH Diagnosis Using Deep ConvolutionalNeural Networks. In LABELS 2016, volume 10008, pages 130\u2013141, 2016.ISBN 978-3-319-46975-1. doi:10.1007\/978-3-319-46976-8 14. URLhttp:\/\/link.springer.com\/10.1007\/978-3-319-46976-8http:\/\/link.springer.com\/10.1007\/978-3-319-46976-8{ }14. \u2192 page 13[12] R. Graf. Fundamentals of sonographic diagnosis of infa[1] R. Graf,\u201cFundamentals of sonographic diagnosis of infant hip dysplasia,\u201d J. Pediatr.Orthop., vol. 4, no. 6, pp. 735\u2013740, 1984.nt hip dysplasia. Journal ofPediatric Orthopedics, 4(6):735\u2013740, 1984. ISSN 0271-6798.doi:10.1097\/01241398-198411000-00015. \u2192 pages 2, 8, 9, 57[13] R. Graf, S. Scott, K. Lercher, F. Baumgartner, and A. Benaroya. HipSonography. Springer Berlin Heidelberg, 2006. ISBN 978-3-540-30957-4.doi:10.1007\/3-540-30958-6. URLhttp:\/\/books.google.com\/books?id=x6Z8Sh8fJUEC{&}pgis=1http:\/\/link.springer.com\/10.1007\/3-540-30958-6. \u2192 pages xv, 3, 8[14] R. Graf, M. Mohajer, and F. Plattner. Hip sonography update.quality-management, catastrophes - tips and tricks. MedicalUltrasonography, 15(4):299\u2013303, 2013. ISSN 2066-8643.doi:10.11152\/mu.2013.2066.154.rg2. URL https:\/\/www.medultrason.ro\/medultrason\/index.php\/medultrason\/article\/view\/772.\u2192 pages 8, 2492[15] A. Guha Roy, S. Siddiqui, S. Po\u00a8lsterl, N. Navab, and C. Wachinger.\u2018Squeeze & excite\u2019 guided few-shot segmentation of volumetric images.Medical Image Analysis, 59, 2020. \u2192 page 88[16] V. Gulati. Developmental dysplasia of the hip in the newborn: A systematicreview. World Journal of Orthopedics, 4(2):32, 2013. ISSN 2218-5836.doi:10.5312\/wjo.v4.i2.32. URLhttp:\/\/www.wjgnet.com\/2218-5836\/full\/v4\/i2\/32.htm. \u2192 page 1[17] I. Hacihaliloglu. Ultrasound imaging and segmentation of bone surfaces: Areview. TECHNOLOGY, 05(02):74\u201380, jun 2017. ISSN 2339-5478.doi:10.1142\/S2339547817300049. URLhttp:\/\/www.worldscientific.com\/doi\/abs\/10.1142\/S2339547817300049. \u2192page 12[18] K. Hara. 3D ResNets for Action Recognition, 2018. URLhttps:\/\/github.com\/kenshohara\/3D-ResNets-PyTorch. \u2192 pages49, 50, 76, 86[19] K. Hara, H. Kataoka, and Y. Satoh. Learning spatio-Temporal features with3D residual networks for action recognition. Proceedings - 2017 IEEEInternational Conference on Computer Vision Workshops, ICCVW 2017,2018-January:3154\u20133160, 2018. doi:10.1109\/ICCVW.2017.373. \u2192 page 17[20] K. Hara, H. Kataoka, and Y. Satoh. Can Spatiotemporal 3D CNNs Retracethe History of 2D CNNs and ImageNet? Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition, pages6546\u20136555, 2018. ISSN 10636919. doi:10.1109\/CVPR.2018.00685. \u2192pages 17, 49, 50, 76, 86[21] A. R. Hareendranathan, M. Mabee, K. Punithakumar, M. Noga, and J. L.Jaremko. A technique for semiautomatic segmentation of echogenicstructures in 3D ultrasound, applied to infant hip dysplasia. InternationalJournal of Computer Assisted Radiology and Surgery, 11(1):31\u201342, 2016.ISSN 18616429. doi:10.1007\/s11548-015-1239-5. \u2192 pages 9, 11[22] A. R. Hareendranathan, D. Zonoobi, M. Mabee, D. Cobzas,K. Punithakumar, M. Noga, and J. L. Jaremko. Toward automatic diagnosisof hip dysplasia from 2D ultrasound. Proceedings - InternationalSymposium on Biomedical Imaging, pages 982\u2013985, 2017. ISSN 19458452.doi:10.1109\/ISBI.2017.7950680. \u2192 pages 13, 3493[23] A. R. Hareendranathan, D. Zonoobi, M. Mabee, C. Diederichs,K. Punithakumar, M. Noga, and J. L. Jaremko. Semiautomatic classificationof acetabular shape from three-dimensional ultrasound for diagnosis ofinfant hip dysplasia using geometric features. International Journal ofComputer Assisted Radiology and Surgery, 12(3):439\u2013447, 2017. ISSN18616429. doi:10.1007\/s11548-016-1510-4. \u2192 pages 11, 15[24] W. H. Harris. Etiology of osteoarthritis of the hip. Clinical orthopaedics andrelated research, 213:20\u201333, dec 1986. ISSN 0009-921X. URLhttp:\/\/europepmc.org\/abstract\/MED\/3780093. \u2192 page 1[25] M. Harris-Hayes and N. K. Royer. Relationship of Acetabular Dysplasia andFemoroacetabular Impingement to Hip Osteoarthritis: A Focused Review.PM&R, 3(11):1055 \u2013 1067.e1, 2011. ISSN 1934-1482.doi:https:\/\/doi.org\/10.1016\/j.pmrj.2011.08.533. URLhttp:\/\/www.sciencedirect.com\/science\/article\/pii\/S1934148211010768. \u2192page 1[26] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for imagerecognition. In The IEEE Conference on Computer Vision and PatternRecognition (CVPR), June 2016. \u2192 page 49[27] F. T. Hoaglund. Primary Osteoarthritis of the Hip: A Genetic DiseaseCaused by European Genetic Variants. JBJS, 95(5), 2013. ISSN 0021-9355.URL https:\/\/journals.lww.com\/jbjsjournal\/Fulltext\/2013\/03060\/Primary{ }Osteoarthritis{ }of{ }the{ }Hip{ }{ }A{ }Genetic.11.aspx. \u2192 page 7[28] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. Denselyconnected convolutional networks. Proceedings - 30th IEEE Conference onComputer Vision and Pattern Recognition, CVPR 2017, 2017-January:2261\u20132269, 2017. doi:10.1109\/CVPR.2017.243. \u2192 page 49[29] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep networktraining by reducing internal covariate shift. 32nd International Conferenceon Machine Learning, ICML 2015, 1:448\u2013456, 2015. \u2192 page 27[30] F. Isensee, J. Petersen, A. Klein, D. Zimmerer, P. F. Jaeger, S. Kohl,J. Wasserthal, G. Koehler, T. Norajitra, S. Wirkert, and K. H. Maier-Hein.nnu-net: Self-adapting framework for u-net-based medical imagesegmentation, 2018. \u2192 pages 50, 83[31] J. Jackson, M. Runge, and N. Nye. Common Questions AboutDevelopmental Dysplasia of the Hip. American Family Physician, 90(12):94843\u2013850, 2014. URL https:\/\/www.aafp.org\/afp\/2014\/1215\/p843.html. \u2192page 1[32] J. L. Jaremko, M. Mabee, V. G. Swami, L. Jamieson, K. Chow, and R. B.Thompson. Potential for Change in US Diagnosis of Hip Dysplasia SolelyCaused by Changes in Probe Orientation: Patterns of Alpha-angle VariationRevealed by Using Three-dimensional US. Radiology, 273(3):870\u20138, 2014.ISSN 1527-1315. doi:10.1148\/radiol.14140451. URLhttp:\/\/www.ncbi.nlm.nih.gov\/pubmed\/24964047. \u2192 pages 1, 8[33] A. Jennings. MATLAB Central: Sphere Fit (least squares), 2013. URLhttps:\/\/www.mathworks.com\/matlabcentral\/fileexchange\/34129-sphere-fit-least-squared. \u2192 page 48[34] Y. Jing, Y. Yang, Z. Feng, J. Ye, Y. Yu, and M. Song. Neural Style Transfer:A Review. IEEE Transactions on Visualization and Computer Graphics,pages 1\u20131, 2019. \u2192 page 88[35] A. Karamalis, W. Wein, T. Klein, and N. Navab. Ultrasound confidencemaps using random walks. Medical Image Analysis, 16(6):1101\u20131112, aug2012. ISSN 13618415. doi:10.1016\/j.media.2012.07.005. URLhttp:\/\/dx.doi.org\/10.1016\/j.media.2012.07.005https:\/\/linkinghub.elsevier.com\/retrieve\/pii\/S1361841512000977. \u2192 page 28[36] R. Kikinis, S. D. Pieper, and K. G. Vosburgh. 3D Slicer: A Platform forSubject-Specific Image Analysis, Visualization, and Clinical Support, pages277\u2013289. Springer New York, New York, NY, 2014. ISBN978-1-4614-7657-3. doi:10.1007\/978-1-4614-7657-3 19. URLhttps:\/\/doi.org\/10.1007\/978-1-4614-7657-3{ }19. \u2192 page 37[37] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization, 2014.\u2192 pages 37, 78[38] T. K. Koo and M. Y. Li. A Guideline of Selecting and Reporting IntraclassCorrelation Coefficients for Reliability Research. Journal of ChiropracticMedicine, 15(2):155\u2013163, 2016. ISSN 15563707.doi:10.1016\/j.jcm.2016.02.012. \u2192 pages 62, 63[39] P. Kovesi. Symmetry and Asymmetry from Local Phase. In Tenth AustralianJoint Conference of Artificial Intelligence, volume 190, pages 2\u2014-4, 1997.\u2192 page 1395[40] H. P. Lehmann, R. Hinton, P. Morello, and J. a. Santoli. DevelopmentalDysplasia of the Hip Practice Guideline: Technical Report. Pediatrics, 105(4):e57\u2014-e57, 2000. ISSN 0031-4005. doi:10.1542\/peds.105.4.e57. URLhttps:\/\/pediatrics.aappublications.org\/content\/105\/4\/e57. \u2192 page 7[41] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar. Focal loss for denseobject detection. In The IEEE International Conference on Computer Vision(ICCV), Oct 2017. \u2192 page 81[42] R. T. Loder and E. N. Skopelja. The Epidemiology and Demographics ofHip Dysplasia. ISRN Orthopedics, 2011:1\u201346, 2011. ISSN 2090-6161.doi:10.5402\/2011\/238607. \u2192 page 1[43] J. Long, E. Shelhamer, and T. Darrell. Fully Convolutional Networks forSemantic Segmentation. In The IEEE Conference on Computer Vision andPattern Recognition (CVPR), pages 3431\u20133440, 2015. ISBN9781467369640. doi:10.1109\/CVPR.2015.7298965. URLhttp:\/\/arxiv.org\/abs\/1503.06350. \u2192 page 12[44] M. G. Mabee, A. R. Hareendranathan, R. B. Thompson, S. Dulai, and J. L.Jaremko. An index for diagnosing infant hip dysplasia using 3-D ultrasound:the acetabular contact angle. Pediatric Radiology, 46(7):1023\u20131031, 2016.ISSN 14321998. doi:10.1007\/s00247-016-3552-8. URLhttp:\/\/dx.doi.org\/10.1007\/s00247-016-3552-8. \u2192 page 9[45] K. McHale and D. Corbett. Parental noncompliance with pavlik harnesstreatment of infantile hip problems. Journal of pediatric orthopedics, 9(6):649\u2014652, 1989. ISSN 0271-6798.doi:10.1097\/01241398-198911000-00003. URLhttps:\/\/doi.org\/10.1097\/01241398-198911000-00003. \u2192 page 7[46] F. Milletari, N. Navab, and S. A. Ahmadi. V-Net: Fully convolutional neuralnetworks for volumetric medical image segmentation. Proceedings - 20164th International Conference on 3D Vision, 3DV 2016, pages 565\u2013571,2016. doi:10.1109\/3DV.2016.79. \u2192 page 36[47] C. Morin, H. T. Harcke, and G. D. MacEwen. The infant hip: real-time usassessment of acetabular development. Radiology, 157(3):673\u2013677, 1985.doi:10.1148\/radiology.157.3.3903854. URLhttps:\/\/doi.org\/10.1148\/radiology.157.3.3903854. PMID: 3903854. \u2192 pages2, 9, 5796[48] E. Mostofi, B. Chahal, D. Zonoobi, A. Hareendranathan, K. P. Roshandeh,S. K. Dulai, and J. L. Jaremko. Reliability of 2D and 3D ultrasound forinfant hip dysplasia in the hands of novice users. European Radiology, 29(3):1489\u20131495, mar 2019. ISSN 0938-7994.doi:10.1007\/s00330-018-5699-1. URLhttp:\/\/link.springer.com\/10.1007\/s00330-018-5699-1. \u2192 pages 1, 9[49] M. L. Murnaghan, R. H. Browne, D. J. Sucato, and J. Birch. Femoral nervepalsy in pavlik harness treatment for developmental dysplasia of the hip.Journal of Bone and Joint Surgery - Series A, 93(5):493\u2013499, 2011. ISSN15351386. doi:10.2106\/JBJS.J.01210. \u2192 page 7[50] S. Nakamura, S. Ninomiya, and T. Nakamura. Primary osteoarthritis of thehip joint in Japan. Clinical orthopaedics and related research, 241:190\u2013196,apr 1989. ISSN 0009-921X. URLhttp:\/\/europepmc.org\/abstract\/MED\/2924462. \u2192 pages 1, 7[51] H. O\u00a8merog\u02d8lu, N. Ko\u00a8se, and A. Akceylan. Success of Pavlik HarnessTreatment Decreases in Patients 4 Months and in UltrasonographicallyDislocated Hips in Developmental Dysplasia of the Hip. ClinicalOrthopaedics and Related Research, 474(5):1146\u20131152, 2016. ISSN15281132. doi:10.1007\/s11999-015-4388-5. \u2192 page 6[52] Y. Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, J. V. Dillon,B. Lakshminarayanan, and J. Snoek. Can you trust your model\u2019suncertainty? evaluating predictive uncertainty under dataset shift, 2019. \u2192pages 89, 90[53] P. Pandey. Real-time ultrasound bone segmentation and robust US-CTregistration for surgical navigation of pelvic fractures. PhD thesis,University of British Columbia, 2018. URL https:\/\/open.library.ubc.ca\/cIRcle\/collections\/ubctheses\/24\/items\/1.0375839. \u2192page 12[54] P. Pandey, P. Guy, A. Hodgson, and R. Garbi. Shadow Peak: AccurateReal-time Bone Segmentation for Ultrasound and Developmental Dysplasiaof the Hip. In 19th Annual Meeting of the International Society forComputer Assisted Orthopaedic Surgery, New York, 2019. \u2192 pages13, 17, 26, 39[55] G. Papandreou, L. C. Chen, K. P. Murphy, and A. L. Yuille. Weakly-andsemi-supervised learning of a deep convolutional network for semantic97image segmentation. Proceedings of the IEEE International Conference onComputer Vision, 2015 Inter:1742\u20131750, 2015. \u2192 page 88[56] O. Paserin. Fully Automatic 3D Ultrasound Techniques for ImprovingDiagnosis of Developmental Dysplasia of the Hip in Pediatric Patients :Classifying Scan Adequacy and Quantifying Dynamic Assessment. PhDthesis, The University of British Columbia, 2018. \u2192 pagesxix, 14, 15, 16, 17, 18, 70, 73, 74, 76, 80, 85, 86, 115[57] O. Paserin, K. Mulpuri, A. Cooper, and A. J. Hodgson. Automatic NearReal-Time Evaluation of 3D Ultrasound Scan Adequacy for DevelopmentalDysplasia of the Hip. Computer Assisted and Robotic Endoscopy andClinical Image-Based Procedures, 10550:124\u2013132, 2017.doi:10.1007\/978-3-319-67543-5. URLhttp:\/\/link.springer.com\/10.1007\/978-3-319-67543-5. \u2192 page 14[58] O. Paserin, K. Mulpuri, A. Cooper, A. J. Hodgson, and R. Garbi. Real TimeRNN Based 3D Ultrasound Scan Adequacy for Developmental Dysplasia ofthe Hip. In MICCAI 2018, volume 8151, pages 365\u2013373. SpringerInternational Publishing, 2018. ISBN 978-3-642-40810-6.doi:10.1007\/978-3-030-00928-1 42. URLhttp:\/\/link.springer.com\/10.1007\/978-3-642-40760-4http:\/\/link.springer.com\/10.1007\/978-3-030-00928-1{ }42. \u2192 pagesxix, 14, 15, 16, 17, 29, 30, 70, 73, 74, 76, 79, 80, 86, 115[59] O. Paserin, K. Mulpuri, A. Cooper, A. J. Hodgson, and R. Garbi. Automateddynamic 3d ultrasound assessment of developmental dysplasia of the infanthip. In International Workshop on Computational Methods and ClinicalApplications in Musculoskeletal Imaging, pages 136\u2013145. Springer, 2018.\u2192 pages 21, 57[60] C. T. Price and B. A. Ramo. Prevention of Hip Dysplasia in Children andAdults. Orthopedic Clinics, 43(3):269\u2013279, jul 2012. ISSN 0030-5898.doi:10.1016\/j.ocl.2012.05.001. URLhttps:\/\/doi.org\/10.1016\/j.ocl.2012.05.001. \u2192 page 7[61] N. Quader. Automatic Characterization of Developmental Dysplasia of theHip in Infants using Ultrasound Imaging. PhD thesis, University of BritishColumbia, 2018. \u2192 pagesxvi, 7, 8, 9, 11, 12, 14, 15, 16, 17, 18, 26, 39, 45, 53, 55, 58, 61, 62, 63, 69, 83, 84, 85, 111[62] N. Quader, A. Hodgson, and R. Abugharbieh. Confidence Weighted LocalPhase Features for Robust Bone Surface Segmentation in Ultrasound. In98CLIP 2014, volume 9958, pages 76\u201383, 2014. ISBN 978-3-319-46471-8.doi:10.1007\/978-3-319-13909-8 10. URLhttp:\/\/link.springer.com\/10.1007\/978-3-319-46472-5http:\/\/link.springer.com\/10.1007\/978-3-319-13909-8{ }10. \u2192 pages11, 12, 37, 39, 83[63] N. Quader, A. Hodgson, K. Mulpuri, A. Cooper, and R. Abugharbieh.Towards Reliable Automatic Characterization of Neonatal Hip Dysplasiafrom 3D Ultrasound Images. In MICCAI, volume 9900, pages 602\u2013609,2016. ISBN 978-3-319-46725-2. doi:10.1007\/978-3-319-46720-7 70. URLhttp:\/\/link.springer.com\/10.1007\/978-3-319-46726-9http:\/\/link.springer.com\/10.1007\/978-3-319-46720-7{ }70. \u2192 pagesxvii, xviii, 9, 16, 17, 57, 61, 62, 63, 65, 67, 69, 85[64] N. Quader, A. J. Hodgson, K. Mulpuri, A. Cooper, and R. Abugharbieh. A3D Femoral Head Coverage Metric for Enhanced Reliability in DiagnosingHip Dysplasia. In MICCAI, volume 10433, pages 100\u2013107, 2017. ISBN978-3-319-66181-0. doi:10.1007\/978-3-319-66182-7 12. URLhttp:\/\/link.springer.com\/10.1007\/978-3-319-66182-7http:\/\/link.springer.com\/10.1007\/978-3-319-66182-7{ }12. \u2192 pagesxvi, xvii, 9, 11, 16, 17, 18, 26, 45, 51, 53, 55, 57, 62, 63, 66, 69, 84, 85[65] N. Quader, A. J. Hodgson, K. Mulpuri, E. Schaeffer, and R. Abugharbieh.Automatic Evaluation of Scan Adequacy and Dysplasia Metrics in 2-DUltrasound Images of the Neonatal Hip. Ultrasound in Medicine andBiology, 43(6):1252\u20131262, 2017. ISSN 1879291X.doi:10.1016\/j.ultrasmedbio.2017.01.012. \u2192 page 14[66] N. Quader, E. K. Schaeffer, A. J. Hodgson, R. Abugharbieh, and K. Mulpuri.A Systematic Review and Meta-analysis on the Reproducibility ofUltrasound-based Metrics for Assessing Developmental Dysplasia of theHip. Journal of Pediatric Orthopaedics, 38(6):e305\u2013e311, 2018. ISSN15392570. doi:10.1097\/BPO.0000000000001179. \u2192 pages 1, 7, 8, 9, 57[67] O. Ronneberger, P. Fischer, and T. Brox. U-Net: Convolutional Networks forBiomedical Image Segmentation. In MICCAI, pages 234\u2013241, 2015. ISBN9783319245737. doi:10.1007\/978-3-319-24574-4 28. URLhttp:\/\/link.springer.com\/10.1007\/978-3-319-24574-4{ }28. \u2192 pages12, 17, 26, 27, 36, 37, 39, 83[68] J. A. Rosenthal, X. Lu, and P. Cram. Availability of Consumer Prices FromUS Hospitals for a Common Surgical Procedure. JAMA Internal Medicine,99173(6):427\u2013432, 2013. ISSN 2168-6106.doi:10.1001\/jamainternmed.2013.460. URLhttps:\/\/doi.org\/10.1001\/jamainternmed.2013.460. \u2192 page 7[69] F. Saberi Hosnijeh, M. E. Zuiderwijk, M. Versteeg, H. T. W. Smeele,A. Hofman, A. G. Uitterlinden, R. Agricola, E. H. G. Oei, J. H. Waarsing,S. M. Bierma-Zeinstra, and J. B. J. van Meurs. Cam Deformity andAcetabular Dysplasia as Risk Factors for Hip Osteoarthritis. Arthritis &Rheumatology, 69(1):86\u201393, 2017. doi:10.1002\/art.39929. URLhttps:\/\/onlinelibrary.wiley.com\/doi\/abs\/10.1002\/art.39929. \u2192 page 1[70] M. D. Sewell, K. Rosendahl, and D. M. Eastwood. Developmental dysplasiaof the hip. BMJ, 339(nov24 2):b4454\u2013b4454, nov 2009. ISSN 0959-8138.doi:10.1136\/bmj.b4454. URLhttp:\/\/www.bmj.com\/cgi\/doi\/10.1136\/bmj.b4454. \u2192 page 1[71] D. Shorter, T. Hong, and D. A. Osborn. Cochrane Review: Screeningprogrammes for developmental dysplasia of the hip in newborn infants.Evidence-Based Child Health: A Cochrane Review Journal, 8(1):11\u201354,2013. ISSN 15576272. doi:10.1002\/ebch.1891. URLhttp:\/\/doi.wiley.com\/10.1002\/ebch.1891. \u2192 page 1[72] G. Tulder. Elastic deformations for N-dimensional images (Python, SciPy,NumPy, TensorFlow), 2018. URL https:\/\/github.com\/gvtulder\/elasticdeform.\u2192 page 38[73] A. Valada, J. Vertens, A. Dhall, and W. Burgard. AdapNet: Adaptivesemantic segmentation in adverse environmental conditions. In Proceedings- IEEE International Conference on Robotics and Automation, pages4644\u20134651. IEEE, 2017. ISBN 9781509046331.doi:10.1109\/ICRA.2017.7989540. \u2192 page 12[74] M. Villa, G. Dardenne, M. Nasan, H. Letissier, C. Hamitouche, andE. Stindel. FCN-based approach for the automatic segmentation of bonesurfaces in ultrasound images. International Journal of Computer AssistedRadiology and Surgery, 13(11):1707\u20131716, 2018. ISSN 18616429.doi:10.1007\/s11548-018-1856-x. URLhttps:\/\/doi.org\/10.1007\/s11548-018-1856-x. \u2192 pages 12, 28, 29, 31, 34, 39[75] M. Wang and W. Deng. Deep visual domain adaptation: A survey.Neurocomputing, 312:135\u2013153, 2018. \u2192 page 88100[76] P. Wang, V. M. Patel, and I. Hacihaliloglu. Simultaneous Segmentation andClassification of Bone Surfaces from Ultrasound Using a Multi-featureGuided CNN. In MICCAI, volume 9901, pages 134\u2013142. SpringerInternational Publishing, 2018. ISBN 978-3-319-46722-1.doi:10.1007\/978-3-030-00937-3 16. URLhttp:\/\/link.springer.com\/10.1007\/978-3-319-46723-8http:\/\/link.springer.com\/10.1007\/978-3-030-00937-3{ }16. \u2192 pages 12, 28, 34[77] G. Wilson and D. J. Cook. A survey of unsupervised deep domainadaptation, 2018. \u2192 page 88[78] T. Woodacre, A. Dhadwal, T. Ball, C. Edwards, and P. J. Cox. The costs oflate detection of developmental dysplasia of the hip. Journal of Children\u2019sOrthopaedics, 8(4):325\u2013332, 2014. ISSN 18632548.doi:10.1007\/s11832-014-0599-7. \u2192 pages 1, 7[79] S. Xie, R. Girshick, P. Dolla\u00b4r, Z. Tu, and K. He. Aggregated residualtransformations for deep neural networks. Proceedings - 30th IEEEConference on Computer Vision and Pattern Recognition, CVPR 2017,2017-January:5987\u20135995, 2017. doi:10.1109\/CVPR.2017.634. \u2192 page 50[80] S. Zagoruyko and N. Komodakis. Wide residual networks. Procedings ofthe British Machine Vision Conference 2016, 2016. doi:10.5244\/c.30.87.URL http:\/\/dx.doi.org\/10.5244\/C.30.87. \u2192 page 49[81] Z. Zhang, M. Tang, D. Cobzas, D. Zonoobi, M. Jagersand, and J. L.Jaremko. End-to-end detection-segmentation network with ROI convolution.Proceedings - International Symposium on Biomedical Imaging, 2018-April(Isbi):1509\u20131512, 2018. ISSN 19458452. doi:10.1109\/ISBI.2018.8363859.\u2192 pages 13, 35[82] D. Zonoobi, E. Mostofi, M. Mabee, and S. Pasha. Developmental HipDysplasia Diagnosis at Three-dimensional US: A Multicenter Study.Radiology, 287(3):1003\u20131015, 2018. ISSN 15271315.doi:10.1148\/radiol.2018172592. \u2192 pages9, 11, 15, 17, 18, 57, 58, 62, 63, 69, 85101Appendix A\u00a72 Supporting Materials102Automatic Characterization of the Neonatal Hip with 3-Dimensional and Tracked Ultrasound  Date form completed\u200b:  __ __\/__ __ __\/ __ __ __ __ Study ID: 3DUS19__ __ __             \u200bDay       Month                 Year   Data Collection Form    Participant Demographics:  Date of Appointment: __ __\/__ __ __\/ __ __ __ __    \u200bDay       Month                 Year   Chronologic age: ___________   weeks (rounded to nearest whole number)  Gender:  M F    Affected Hip:  R L Bilateral    Familial History of DDH: Yes No  If yes, whom:  ________________________________    First Born Child: Yes No   Breech Presentation: Yes No   Caesarian Section: Yes No Data Collection Form March 1, 2019  Figure A.1: Data collection form used in the clinical study103Appendix B\u00a74.1 Supporting Materials104Table B.1: Mean performance metrics for the four contrasted methods on atest set of 52 volumes from 13 participants.CSPS CSPS DDH U-Net 3D-U-Net ANOVA p-valuePrecision 0.11 0.28 0.84 0.83 5.0E-115Recall 0.83 0.28 1.00 0.87 7.9E-103J 0.11 0.15 0.84 0.74 2.9E-127DSC 0.19 0.26 0.91 0.85 1.5E-126MED R2P(mm) 0.15 3.54 0.00 0.12 3.3E-24MED P2R(mm) 14.52 9.16 0.93 0.29 1.4E-73MED max(mm) 14.52 9.23 0.93 0.31 2.6E-74HD R2P(mm) 1.65 11.63 0.39 3.38 3.2E-40HD P2R(mm) 33.57 32.78 16.50 3.56 3.8E-96HD max(mm) 33.57 32.78 16.50 5.46 1.5E-86RMS P2R(mm) 17.15 12.65 2.91 0.70 8.7E-84CAI 0.35 0.31 0.94 0.92 3.1E-101CDI 0.00 0.00 0.24 0.76 1.4E-56Table B.2: Precision post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.0E+00 1.5E-16 7.6E-66 2.5E-76CSPS DDH 1.5E-16 1.0E+00 1.3E-46 2.0E-51U-Net 7.6E-66 1.3E-46 1.0E+00 4.9E-013D-U-Net 2.5E-76 2.0E-51 4.9E-01 1.0E+00Table B.3: Recall post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.0E+00 9.5E-43 7.0E-19 3.8E-02CSPS DDH 9.5E-43 1.0E+00 3.6E-62 2.1E-51U-Net 7.0E-19 3.6E-62 1.0E+00 4.4E-303D-U-Net 3.8E-02 2.1E-51 4.4E-30 1.0E+00Table B.4: Jaccard Coefficient post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.00E+00 2.55E-04 5.99E-66 2.80E-70CSPS DDH 2.55E-04 1.00E+00 4.62E-62 5.83E-65U-Net 5.99E-66 4.62E-62 1.00E+00 2.45E-073D-U-Net 2.80E-70 5.83E-65 2.45E-07 1.00E+00105Table B.5: Dice-Sorensen Coefficient post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.0E+00 4.6E-04 1.9E-69 6.5E-69CSPS DDH 4.6E-04 1.0E+00 7.7E-62 2.3E-60U-Net 1.9E-69 7.7E-62 1.0E+00 1.3E-063D-U-Net 6.5E-69 2.3E-60 1.3E-06 1.0E+00Table B.6: MEDR2P post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.0E+00 3.3E-10 6.2E-15 2.5E-01CSPS DDH 3.3E-10 1.0E+00 8.0E-11 2.6E-10U-Net 6.2E-15 8.0E-11 1.0E+00 7.2E-183D-U-Net 2.5E-01 2.6E-10 7.2E-18 1.0E+00Table B.7: MEDP2R post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.0E+00 2.3E-10 3.2E-48 1.9E-51CSPS DDH 2.3E-10 1.0E+00 2.2E-23 3.7E-26U-Net 3.2E-48 2.2E-23 1.0E+00 2.3E-043D-U-Net 1.9E-51 3.7E-26 2.3E-04 1.0E+00Table B.8: MEDmax post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.0E+00 2.5E-10 3.2E-48 2.1E-51CSPS DDH 2.5E-10 1.0E+00 4.8E-24 8.6E-27U-Net 3.2E-48 4.8E-24 1.0E+00 3.5E-043D-U-Net 2.1E-51 8.6E-27 3.5E-04 1.0E+00Table B.9: HDR2P post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.0E+00 3.5E-23 7.7E-25 7.7E-03CSPS DDH 3.5E-23 1.0E+00 1.3E-26 4.4E-13U-Net 7.7E-25 1.3E-26 1.0E+00 8.2E-063D-U-Net 7.7E-03 4.4E-13 8.2E-06 1.0E+00106Table B.10: HDP2R post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.0E+00 1.7E-01 2.4E-33 3.1E-61CSPS DDH 1.7E-01 1.0E+00 4.7E-32 1.3E-60U-Net 2.4E-33 4.7E-32 1.0E+00 1.1E-213D-U-Net 3.1E-61 1.3E-60 1.1E-21 1.0E+00Table B.11: HDmax post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.0E+00 1.7E-01 2.4E-33 2.5E-52CSPS DDH 1.7E-01 1.0E+00 4.7E-32 1.5E-51U-Net 2.4E-33 4.7E-32 1.0E+00 8.5E-163D-U-Net 2.5E-52 1.5E-51 8.5E-16 1.0E+00Table B.12: CAI post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.0E+00 1.3E-01 3.8E-50 1.2E-49CSPS DDH 1.3E-01 1.0E+00 2.3E-55 4.5E-55U-Net 3.8E-50 2.3E-55 1.0E+00 5.8E-023D-U-Net 1.2E-49 4.5E-55 5.8E-02 1.0E+00Table B.13: CDI post hoc t-test p-values.CSPS CSPS DDH U-Net 3D-U-NetCSPS 1.0E+00 1.0E-04 2.8E-09 1.6E-36CSPS DDH 1.0E-04 1.0E+00 3.3E-09 1.8E-36U-Net 2.8E-09 3.3E-09 1.0E+00 2.6E-163D-U-Net 1.6E-36 1.8E-36 2.6E-16 1.0E+00107Appendix C\u00a74.2 Supporting Materials108Table C.1: Results comparing the two proposed methods with the state-of-the art RFC for predicting the location of the femoral head. Note that theRFC and 3D-ResNet-50 were compared against the full sphere label asground truth (as described in \u00a74.2.1), whereas 3D-U-Net was comparedagainst the semi-sphere cropped by bounding box B as ground truth.RFC 3D-ResNet-50 Reg 3D-U-Net Seg ANOVA p-valuePrecision 0.46 0.62 0.73 1.1E-12Recall 0.49 0.81 0.82 7.3E-20J 0.29 0.53 0.62 4.4E-27DSC 0.43 0.69 0.76 3.3E-26CAE x(mm) 1.63 1.61 1.04 9.3E-03CAE y(mm) 1.87 2.31 0.45 1.3E-11CAE z(mm) 2.27 1.19 0.66 2.2E-09CED(mm) 3.90 3.35 1.42 5.3E-15RAE(mm) 2.01 1.01 0.46 1.9E-17Table C.2: Precision post hoc t-test p-values.RFC 3D-ResNet-50 Reg 3D-U-Net SegRFC 1.0E+00 2.0E-05 6.1E-133D-ResNet-50 Reg 2.0E-05 1.0E+00 6.1E-043D-U-Net Seg 6.1E-13 6.1E-04 1.0E+00Table C.3: Recall post hoc t-test p-values.RFC 3D-ResNet-50 Reg 3D-U-Net SegRFC 1.0E+00 4.0E-13 5.3E-133D-ResNet-50 Reg 4.0E-13 1.0E+00 5.1E-013D-U-Net Seg 5.3E-13 5.1E-01 1.0E+00Table C.4: Jaccard Coefficient post hoc t-test p-values.RFC 3D-ResNet-50 Reg 3D-U-Net SegRFC 1.0E+00 3.6E-15 3.2E-243D-ResNet-50 Reg 3.6E-15 1.0E+00 5.3E-043D-U-Net Seg 3.2E-24 5.3E-04 1.0E+00109Table C.5: DSC post hoc t-test p-values.RFC 3D-ResNet-50 Reg 3D-U-Net SegRFC 1.0E+00 3.8E-14 4.2E-213D-ResNet-50 Reg 3.8E-14 1.0E+00 4.4E-043D-U-Net Seg 4.2E-21 4.4E-04 1.0E+00Table C.6: CAEx post hoc t-test p-values.RFC 3D-ResNet-50 Reg 3D-U-Net SegRFC 1.0E+00 9.4E-01 7.9E-033D-ResNet-50 Reg 9.4E-01 1.0E+00 3.6E-033D-U-Net Seg 7.9E-03 3.6E-03 1.0E+00Table C.7: CAEy post hoc t-test p-values.RFC 3D-ResNet-50 Reg 3D-U-Net SegRFC 1.0E+00 1.5E-01 8.1E-083D-ResNet-50 Reg 1.5E-01 1.0E+00 7.3E-163D-U-Net Seg 8.1E-08 7.3E-16 1.0E+00Table C.8: CAEz post hoc t-test p-values.RFC 3D-ResNet-50 Reg 3D-U-Net SegRFC 1.0E+00 3.2E-04 3.7E-083D-ResNet-50 Reg 3.2E-04 1.0E+00 2.8E-043D-U-Net Seg 3.7E-08 2.8E-04 1.0E+00Table C.9: CED post hoc t-test p-values.RFC 3D-ResNet-50 Reg 3D-U-Net SegRFC 1.0E+00 9.7E-02 7.8E-133D-ResNet-50 Reg 9.7E-02 1.0E+00 2.3E-143D-U-Net Seg 7.8E-13 2.3E-14 1.0E+00Table C.10: RAE post hoc t-test p-values.RFC 3D-ResNet-50 Reg 3D-U-Net SegRFC 1.0E+00 4.5E-07 9.1E-163D-ResNet-50 Reg 4.5E-07 1.0E+00 3.7E-063D-U-Net Seg 9.1E-16 3.7E-06 1.0E+00110Appendix D\u00a74.3 Supporting MaterialsTable D.1: Comparing the SD for paired inter-exam measures for the differentDDH metrics (n=42 hips)SD\u03b13D (\u25e6)Quader[61] 2.6Ours 2.1FHC3D (%)Quader[61] 2.9Ours 3.5OCR (mm)Quader[61] -Ours 0.41111Appendix E\u00a75 Supporting Materials112Figure E.1: An example case for which the ground truth label is \u201cinade-quate\u201d, our models predicted as \u201cinadequate\u201d, but that the RNN pre-dicted as \u201cadequate\u201d. Left: the coronal view near the standard plane,with the 3D-U-Net pelvis bone surface prediction overlaid in pink.Right: the ilium and acetabulum point clouds after processing with themetrics extraction algorithm described in \u00a74.3. We get a clear picturefrom these views that the probe is positioned too inferiorly, and thatmuch of the ilium surface is not imaged. As a result, the bony rim ap-pears to be misidentified, and the ilium plane appears to be incorrectlyfitted, ultimately resulting in invalid \u03b13D and FHC3D measurements.113Coronal Sagittal TransverseFigure E.2: Another example case for which the ground truth is \u201cinade-quate\u201d, our models predicted as \u201cinadequate\u201d, but that the RNN pre-dicted as \u201cadequate\u201d. In this case we show the segmented points cloudsin the top row, and the 3 anatomical planes in the bottom row. We cansee in the sagittal view that there is a \u201csmudge\u201d due to movement ar-tifact, circled in red. The effect of this on the acetabulum point cloudcan be seen as a gap in the acetabulum that is usually not present inhigh-quality, adequate volumes.114RNN PredictionCoronal FrameFigure E.3: Another example case for which the ground truth is \u201cinade-quate\u201d, our models predicted as \u201cinadequate\u201d, but that the RNN pre-dicted as \u201cadequate\u201d. Left: we show our best attempt at locating thestandard plane by browsing all the coronal slices. Right: the per-frameprediction of the RNN, from which the finall RNN prediction is madeby thresholding and summing. We clearly see that the RNN incorrectlypredicts very high scores for the first 40 slices, although none of thesemeet the criteria defined by Paserin [56, 58].115","attrs":{"lang":"en","ns":"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note","classmap":"oc:AnnotationContainer"},"iri":"http:\/\/www.w3.org\/2009\/08\/skos-reference\/skos.html#note","explain":"Simple Knowledge Organisation System; Notes are used to provide information relating to SKOS concepts. There is no restriction on the nature of this information, e.g., it could be plain text, hypertext, or an image; it could be a definition, information about the scope of a concept, editorial information, or any other type of information."}],"Genre":[{"label":"Genre","value":"Thesis\/Dissertation","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/hasType","classmap":"dpla:SourceResource","property":"edm:hasType"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/hasType","explain":"A Europeana Data Model Property; This property relates a resource with the concepts it belongs to in a suitable type system such as MIME or any thesaurus that captures categories of objects in a given field. It does NOT capture aboutness"}],"GraduationDate":[{"label":"Graduation Date","value":"2020-05","attrs":{"lang":"en","ns":"http:\/\/vivoweb.org\/ontology\/core#dateIssued","classmap":"vivo:DateTimeValue","property":"vivo:dateIssued"},"iri":"http:\/\/vivoweb.org\/ontology\/core#dateIssued","explain":"VIVO-ISF Ontology V1.6 Property; Date Optional Time Value, DateTime+Timezone Preferred "}],"IsShownAt":[{"label":"DOI","value":"10.14288\/1.0389533","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt","classmap":"edm:WebResource","property":"edm:isShownAt"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/isShownAt","explain":"A Europeana Data Model Property; An unambiguous URL reference to the digital object on the provider\u2019s website in its full information context."}],"Language":[{"label":"Language","value":"eng","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/language","classmap":"dpla:SourceResource","property":"dcterms:language"},"iri":"http:\/\/purl.org\/dc\/terms\/language","explain":"A Dublin Core Terms Property; A language of the resource.; Recommended best practice is to use a controlled vocabulary such as RFC 4646 [RFC4646]."}],"Program":[{"label":"Program (Theses)","value":"Biomedical Engineering","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline","classmap":"oc:ThesisDescription","property":"oc:degreeDiscipline"},"iri":"https:\/\/open.library.ubc.ca\/terms#degreeDiscipline","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the program for which the degree was granted."}],"Provider":[{"label":"Provider","value":"Vancouver : University of British Columbia Library","attrs":{"lang":"en","ns":"http:\/\/www.europeana.eu\/schemas\/edm\/provider","classmap":"ore:Aggregation","property":"edm:provider"},"iri":"http:\/\/www.europeana.eu\/schemas\/edm\/provider","explain":"A Europeana Data Model Property; The name or identifier of the organization who delivers data directly to an aggregation service (e.g. Europeana)"}],"Publisher":[{"label":"Publisher","value":"University of British Columbia","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/publisher","classmap":"dpla:SourceResource","property":"dcterms:publisher"},"iri":"http:\/\/purl.org\/dc\/terms\/publisher","explain":"A Dublin Core Terms Property; An entity responsible for making the resource available.; Examples of a Publisher include a person, an organization, or a service."}],"Rights":[{"label":"Rights","value":"Attribution-NonCommercial-NoDerivatives 4.0 International","attrs":{"lang":"*","ns":"http:\/\/purl.org\/dc\/terms\/rights","classmap":"edm:WebResource","property":"dcterms:rights"},"iri":"http:\/\/purl.org\/dc\/terms\/rights","explain":"A Dublin Core Terms Property; Information about rights held in and over the resource.; Typically, rights information includes a statement about various property rights associated with the resource, including intellectual property rights."}],"RightsURI":[{"label":"Rights URI","value":"http:\/\/creativecommons.org\/licenses\/by-nc-nd\/4.0\/","attrs":{"lang":"*","ns":"https:\/\/open.library.ubc.ca\/terms#rightsURI","classmap":"oc:PublicationDescription","property":"oc:rightsURI"},"iri":"https:\/\/open.library.ubc.ca\/terms#rightsURI","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the Creative Commons license url."}],"ScholarlyLevel":[{"label":"Scholarly Level","value":"Graduate","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#scholarLevel","classmap":"oc:PublicationDescription","property":"oc:scholarLevel"},"iri":"https:\/\/open.library.ubc.ca\/terms#scholarLevel","explain":"UBC Open Collections Metadata Components; Local Field; Identifies the scholarly level of the author(s)\/creator(s)."}],"Title":[{"label":"Title ","value":"Reliable and robust hip dysplasia measurement with three-dimensional ultrasound and convolutional neural networks","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/title","classmap":"dpla:SourceResource","property":"dcterms:title"},"iri":"http:\/\/purl.org\/dc\/terms\/title","explain":"A Dublin Core Terms Property; The name given to the resource."}],"Type":[{"label":"Type","value":"Text","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/type","classmap":"dpla:SourceResource","property":"dcterms:type"},"iri":"http:\/\/purl.org\/dc\/terms\/type","explain":"A Dublin Core Terms Property; The nature or genre of the resource.; Recommended best practice is to use a controlled vocabulary such as the DCMI Type Vocabulary [DCMITYPE]. To describe the file format, physical medium, or dimensions of the resource, use the Format element."}],"URI":[{"label":"URI","value":"http:\/\/hdl.handle.net\/2429\/73709","attrs":{"lang":"en","ns":"https:\/\/open.library.ubc.ca\/terms#identifierURI","classmap":"oc:PublicationDescription","property":"oc:identifierURI"},"iri":"https:\/\/open.library.ubc.ca\/terms#identifierURI","explain":"UBC Open Collections Metadata Components; Local Field; Indicates the handle for item record."}],"SortDate":[{"label":"Sort Date","value":"2020-12-31 AD","attrs":{"lang":"en","ns":"http:\/\/purl.org\/dc\/terms\/date","classmap":"oc:InternalResource","property":"dcterms:date"},"iri":"http:\/\/purl.org\/dc\/terms\/date","explain":"A Dublin Core Elements Property; A point or period of time associated with an event in the lifecycle of the resource.; Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF].; A point or period of time associated with an event in the lifecycle of the resource.; Date may be used to express temporal information at any level of granularity. Recommended best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601 [W3CDTF]."}]}