UBC Theses and Dissertations
Uncertainty-based assessment of hip joint segmentation and 3D ultrasound scan adequacy in paediatric dysplasia measurement using deep learning Kannan, Arunkumar
Developmental Dysplasia of the Hip (DDH) - a condition characterized by hip joint instability, is one of the most common hip disorders in newborns. Clinical practice for diagnosis remains reliant on manual measurement of hip joint features from 2D Ultrasound (US) scans, a process plagued with high inter/intra operator and scan variability. Recently, 3D US was shown to be markedly more reliable with deeply-learned image features effectively used to localize and measure anatomical bone landmarks. However, standard Neural Network (NN) provide no means for assessing the reliability of computed results, a limitation that hampers deployment in clinical settings. In this thesis, we aim to improve the trustworthiness and reliability of deep-learning based DDH diagnostic system, addressing two components: uncertainty and calibration of NN. We propose interpretable uncertainty measures that allow for measuring hip joint segmentation reliability and quantifying scan adequacy in clinical DDH assessments from 3D US. Our approach measures variability of estimates generated from a Monte-Carlo (MC) dropout-based deep network optimized for hip joint localization. Results demonstrate US scans with lower dysplasia metric variability are strongly associated with those labelled as clinically adequate by a human expert. In segmentation tasks, quantifying levels of confidence can provide meaningful additional information to aid clinical decision making. We propose to quantify confidence in segmentation that incorporates voxel-wise uncertainty into the loss function used in the training regime. For ilium and acetabulum segmentation, we report mean Dice score of 81% when trained with voxel-wise uncertainty loss vs. 76% with cross-entropy loss. Recent works proposed Bayesian frameworks to quantify confidence in the segmentation process but the confidence measures tend to be miscalibrated. We propose a non-Bayesian-based system to calibrate the confidence values, in order to reduce over-confident and under-confident predictions. We show deep ensembles optimized with compounded loss achieve low NLL of 11% and Brier score of 3% producing calibrated confidence estimates. Our findings suggest that the uncertainty quantification may improve clinical workflow acting as a quality control check on DL based analysis. This in turn may improve overall reliability of the DDH diagnostic process and the prospects of adoption in clinical settings.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International