UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Machine learning in ultrasound-guided spinal anesthesia Pesteie, Seyed Mehran 2019

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2019_november_pesteie_seyedmehran.pdf [ 18.02MB ]
JSON: 24-1.0380889.json
JSON-LD: 24-1.0380889-ld.json
RDF/XML (Pretty): 24-1.0380889-rdf.xml
RDF/JSON: 24-1.0380889-rdf.json
Turtle: 24-1.0380889-turtle.txt
N-Triples: 24-1.0380889-rdf-ntriples.txt
Original Record: 24-1.0380889-source.json
Full Text

Full Text

Machine learning in ultrasound-guidedspinal anesthesiabySeyed Mehran PesteieA THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinThe Faculty of Graduate and Postdoctoral Studies(Electrical and Computer Engineering)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)September 2019© Seyed Mehran Pesteie 2019The following individuals certify that they have read, and recommend tothe Faculty of Graduate and Postdoctoral Studies for acceptance, the dis-sertation entitled:Machine learning in ultrasound-guided spinal anesthesiasubmitted by Seyed Mehran Pesteie in partial fulfillment of the require-ments for the degree of Doctor of Philosophy in Electrical and Com-puter Engineering.Examining Committee:Robert Rohling, Electrical and Computer EngineeringSupervisorPurang Abolmaesumi, Electrical and Computer EngineeringSupervisory Committee MemberPanos Nasiopoulos, Electrical and Computer EngineeringUniversity ExaminerLenoid Sigal, Computer ScienceUniversity ExaminerShuo Li, Department of Medical imaging, University of Western OntarioExternal ExamineriiAbstractAcute and chronic pain treatment with topically or orally administeredagents alone is often insufficient and requires injection of analgesic or anes-thetic agents directly at the nociceptive site. Examples of target site injec-tions include epidural or facet joint blocks in treatment of acute labour painor chronic back pain, respectively.In this thesis, machine learning models and algorithms are proposedthat aim to facilitate spinal injections through automatic identification ofthe anatomy in 2D and 3D ultrasound (US). In particular, the proposedtechniques detect the anatomical landmarks of the needle target in para-median and transverse planes for lumbar epidural and facet joint injections.Such methods can be used to identify the correct needle puncture site beforeinjection.A local-directional feature extraction method is first proposed in orderto recognize the patterns of the US echoes from the vertebrae in paramedianplanes. Later, a supervised convolutional model is proposed to localize theneedle target for epidural anesthesia in the paramedian US images. Themodel uses a combination of hand-engineered local-directional features andconvolutional feature maps that are automatically learned from the images.In the transverse plane, a deep classifier is designed to identify the sym-metry in the image, and accordingly, classify the midline from off-centerimages. Later, a conditional generative model along with an adaptive dataaugmentation algorithm is proposed, which synthesize transverse US imagesbased on the performance of the classifier to improve its accuracy. Finally,an unsupervised model is proposed that learns the variations of the datawithout the need for labels. Unsupervised learning from US images is valu-able because there is a major cost associated with data annotation, whichcan be avoided by unsupervised learning.The above mentioned methods were tested on US images collected invivo and demonstrated promising performance to be a useful guide for in-jections. Moreover, the proposed systems can be utilized as training toolsto familiarize the novices with the spine anatomy in US.iiiLay SummaryBack pain management often requires direct injection of anesthetic agentsinto the spine. Without image guidance, spinal injections have a steep learn-ing curve and may require multiple insertion attempts, which increases com-plications rate and patient discomfort. When image-guidance is used, itlargely consists of computed tomography and fluoroscopy, which carry therisk of radiation to the patient, the anesthesiologist and the surroundingstaff. In an attempt to reduce the radiation risk, ultrasound imaging hasbeen proposed for guiding delivery of agents to the target sites. However,ultrasound image interpretation remains a challenge, particularly for thenovices. The purpose of this thesis is to propose machine learning modelsthat aim to facilitate spinal injections through automatic identification ofthe anatomy in 2D and 3D ultrasound. In addition, the proposed modelscan be utilized as training tools to familiarize the novices with the spineanatomy in ultrasound.ivPrefaceThis thesis primarily is based on five published and one under review manuscripts,resulting from the collaboration between multiple researchers. The manuscriptshave been modified accordingly to present a consistent thesis. Ethical ap-proval for conducting this research has been provided by the Clinical Re-search Ethics Board, certificate numbers: H07-0691 and H15-01310.Two versions of Chapter 2 have been published in:• Mehran Pesteie, Purang Abolmaesumi, Hussam Al-Deen Ashab, Vic-toria Lessoway, Simon Massey, Vit Gunka, and Robert N. Rohling.Real-time ultrasound image classification for spine anesthesia usinglocal directional Hadamard features. International Journal of Com-puter Assisted Radiology and Surgery, 10, no. 6 (2015): 901-912.• Mehran Pesteie, Purang Abolmaesumi, Hussam Al-Deen Ashab, Vic-toria Lessoway, Simon Massey, Vit Gunka, and Robert N. Rohling.Automatic recognition of the target plane in 3D ultrasound with EpiGu-ide. 7th NCIGT and NIH Image Guided Therapy Workshop - Inter-national Conference on Medical Image Computing and Computer As-sisted Intervention (MICCAI), (2014): 15.The author contributed by developing, implementing and evaluating themethod. HAD Ashab helped with data collection and developing its proto-col. Dr. Gunka and Dr. Massey were the medical collaborators and helpedwith development of the study protocol. V. Lessoway is a sonographer, whohelped develop the ultrasound scanning protocol, data acquisition and anno-tation. Profs. Abolmaesumi and Rohling provided feedback and suggestionsin improving the methodology and the evaluation protocol.All co-authors contributed to the editing of the manuscripts.A version of Chapter 3 has been published in:• Mehran Pesteie, Victoria Lessoway, Purang Abolmaesumi, and RobertN. Rohling. Automatic localization of the needle target for ultrasound-vguided epidural injections. IEEE Trans. Medical Imaging 37, no. 1(2018): 81-92.The author developed, implemented and evaluated the method. V. Lessowayhelped in developing the ultrasound scanning protocol, data acquisition andannotation. Profs. Abolmaesumi and Rohling provided feedback and sug-gestions in improving the methodology and the evaluation protocol.All co-authors contributed to the editing of the manuscript.A version of Chapter 5 has been published in:• Mehran Pesteie, Purang Abolmaesumi, and Robert N. Rohling. Adap-tive augmentation of medical data using independently conditionalvariational auto-encoders. IEEE Trans. Medical Imaging (2019). (e-pub ahead of print)The author contributed by developing, implementing and evaluating themethod. Profs. Abolmaesumi and Rohling provided feedback and sugges-tions in improving the methodology.All co-authors contributed to the editing of the manuscript.Two versions of Chapter 6 have been published in:• Mehran Pesteie, Purang Abolmaesumi, and Robert N. Rohling. Deepneural maps. 6th International Conference on Learning Representa-tions (ICLR), (2018). Available at https://openreview.net/forum?id=HyG76D1wf• Alireza Sedghi, Mehran Pesteie, Golara Javadi, Shekoofeh Azizi, PingkunYan, Jin Tae Kwak, Sheng Xu, Baris Turkbey, Peter Choyke, PeterPinto, Bradford Wood, Robert Rohling, Purang Abolmaesumi andParvin Mousavi. Deep neural maps for unsupervised visualization ofhigh-grade cancer in prostate biopsies. International Journal of Com-puter Assisted Radiology and Surgery 14, 6, (2019): 1009-1016.For the first publication, the author contributed by developing, implement-ing and evaluating the method. Profs. Abolmaesumi and Rohling providedfeedback and suggestions in improving the methodology.All co-authors contributed to the editing of the manuscript.For the second publication, the author contributed by developing and imple-menting the DNM algorithm and co-developed the strategy for evaluationof DNM in prostate cancer imaging along with A. Sedghi. G. Javadi and S.viAzizi, who assisted with data preparation and additional analysis.Profs. Abolmaesumi, Mousavi and Rohling supervised the project and as-sisted with preparation and finalization of manuscript. All other authorscontributed to data preparation and quality control of ground truth data.viiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Clinical background . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 Spinal injection . . . . . . . . . . . . . . . . . . . . . 11.1.2 Risks and complications . . . . . . . . . . . . . . . . 41.2 Need for visual feedback for spinal injections . . . . . . . . . 51.3 Ultrasound for spine imaging . . . . . . . . . . . . . . . . . . 51.4 Related prior studies . . . . . . . . . . . . . . . . . . . . . . 71.4.1 Clinical feasibility studies . . . . . . . . . . . . . . . . 71.4.2 Computer-assisted systems . . . . . . . . . . . . . . . 81.5 Objectives and hypotheses . . . . . . . . . . . . . . . . . . . 101.6 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 111.7 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . 122 Automatic classification of the target planes for epidural andfacet joint injections via local directional features . . . . . 152.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . 162.2 Methods and materials . . . . . . . . . . . . . . . . . . . . . 172.2.1 Hadamard transform . . . . . . . . . . . . . . . . . . 17viii2.2.2 Feature extraction . . . . . . . . . . . . . . . . . . . . 202.2.3 Pattern classification . . . . . . . . . . . . . . . . . . 242.2.4 Data acquisition . . . . . . . . . . . . . . . . . . . . . 252.2.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . 262.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Automatic localization of the epidural needle target via aug-menting convolutional and hand-engineered features . . . 333.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . 343.2 Methods and materials . . . . . . . . . . . . . . . . . . . . . 353.2.1 Local directional features of target planes . . . . . . . 363.2.2 Target localization . . . . . . . . . . . . . . . . . . . . 373.2.3 Image enhancement . . . . . . . . . . . . . . . . . . . 413.2.4 Data collection . . . . . . . . . . . . . . . . . . . . . . 423.2.5 Data preparation . . . . . . . . . . . . . . . . . . . . 433.2.6 Implementation . . . . . . . . . . . . . . . . . . . . . 433.3 Experiments and results . . . . . . . . . . . . . . . . . . . . . 433.3.1 Pixel-level classification performance . . . . . . . . . 443.3.2 Performance evaluation of the augmented LDH fea-tures . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.3.3 Target localization accuracy . . . . . . . . . . . . . . 463.3.4 Comparative study . . . . . . . . . . . . . . . . . . . 473.3.5 Execution time . . . . . . . . . . . . . . . . . . . . . . 493.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514 Identification of the center line of the spine in transverseimages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.1.1 Contribution . . . . . . . . . . . . . . . . . . . . . . . 544.2 Methods and materials . . . . . . . . . . . . . . . . . . . . . 544.2.1 Network architecture . . . . . . . . . . . . . . . . . . 554.2.2 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 564.3 Experiments and results . . . . . . . . . . . . . . . . . . . . . 574.3.1 Experiment 1-feature comparison . . . . . . . . . . . 574.3.2 Experiment 2-classification performance of SymNet ver-sus single-path CNN . . . . . . . . . . . . . . . . . . 58ix4.3.3 Experiment 3-comparison with normalized cross-correlation594.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 604.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 605 Adaptive augmentation of transverse US images during train-ing of a classifier . . . . . . . . . . . . . . . . . . . . . . . . . . 625.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.1.1 Model engineering and learning strategies . . . . . . . 635.1.2 Data augmentation . . . . . . . . . . . . . . . . . . . 655.1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . 675.2 Methods and materials . . . . . . . . . . . . . . . . . . . . . 675.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . 685.2.2 Variational auto-encoder . . . . . . . . . . . . . . . . 685.2.3 Independent conditional VAE . . . . . . . . . . . . . 695.2.4 Adaptive training using the ICVAE model . . . . . . 725.2.5 Dataset: transverse US images of the spine . . . . . . 735.2.6 Transformation function . . . . . . . . . . . . . . . . 735.2.7 Model architectures . . . . . . . . . . . . . . . . . . . 735.2.8 Implementation . . . . . . . . . . . . . . . . . . . . . 745.3 Experiments and results . . . . . . . . . . . . . . . . . . . . . 755.3.1 Original data . . . . . . . . . . . . . . . . . . . . . . . 765.3.2 Static augmentation . . . . . . . . . . . . . . . . . . . 765.3.3 Adaptive augmentation . . . . . . . . . . . . . . . . . 775.3.4 Training SymNet via adaptive data augmentation on 785.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 795.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806 Deep neural maps for unsupervised feature learning . . . 826.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 826.1.1 Probabilistic models . . . . . . . . . . . . . . . . . . . 836.1.2 Deterministic parametric models . . . . . . . . . . . . 846.1.3 Manifold and cluster representation learning . . . . . 856.1.4 Contribution . . . . . . . . . . . . . . . . . . . . . . . 856.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.2.1 Self organizing maps . . . . . . . . . . . . . . . . . . 876.2.2 DNM model architecture . . . . . . . . . . . . . . . . 896.2.3 Parzen window . . . . . . . . . . . . . . . . . . . . . . 906.2.4 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 926.3 Experiments and results . . . . . . . . . . . . . . . . . . . . . 93x6.3.1 Experiment 1-training DNM on MNIST and COIL-20 956.3.2 Experiment 2-DNM on ultrasound data . . . . . . . . 976.3.3 Experiment 3-comparative study . . . . . . . . . . . . 986.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 996.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1007 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1017.1 Insights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . 105Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107AppendicesA KL divergence of ICVAE . . . . . . . . . . . . . . . . . . . . . 130B ICVAE’s variational lower bound . . . . . . . . . . . . . . . . 131C Standard Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 133xiList of Tables2.1 Comparison of the classification performance, i.e., accuracy(ACC), sensitivity (SNS), specificity (SPC) and precision (PRC)of automatic detection of the lamina and facet joint (target)planes using LDH and LPT features on test data. Precisionindicates the correct classification rate for target planes. . . . 282.2 Average classification accuracy (ACC), sensitivity (SNS), speci-ficity (SPC) and precision (PRC) of the proposed method onleave-one-out cross-validation for epidural and facet joint tar-get plane detection. . . . . . . . . . . . . . . . . . . . . . . . . 303.1 Summary of the evaluation metrics and their description. . . 443.2 Comparison of the target localization errors (mm) of tem-plate matching approach (TM), convolutional neural networkwith no feature augmentation (CNN), state-of-the-art deepnetwork for biomedical image segmentation (U-net) and theproposed method (CNN+LDH) on test data sets. The errorsof the proposed method are highlighted via asterisk wherethey are significantly lower compared to other approaches(p − value < 0.05). Minimum and maximum errors havebeen indicated in brackets. The clinically acceptable lateraland vertical errors are 7 mm and 5 mm. . . . . . . . . . . . . 484.1 Summary of the classification performance of the static methodusing normalized cross correlation with the optimal threshold0.61 (NCC0.61), the conventional single-path CNN and Sym-Net on the test subjects. The conventional CNN had lowerarea under ROC for both midline (AUCm) and off-center(AUCo) images. Adding a path for mirrored convolutionalfeature maps contributed to the classification accuracy. . . . . 60xii5.1 Comparison of the classification metrics of the test data for abaseline discriminative model trained with the original, stati-cally augmented and adaptively augmented data. Since boththe midline (c) and off-center (o) classes are balanced in theaugmented datasets, the respective model indicates highersensitivity compared to that of the original data. Moreover,adaptive adjustment of the latent space leads to improvementin classification accuracy. . . . . . . . . . . . . . . . . . . . . 775.2 Comparison of the classification metrics of the test data forSymNet trained with on the original and adaptively aug-mented dataset. Adaptive augmentation leads to 6% im-provement of the classification accuracy. . . . . . . . . . . . . 796.1 Summary of the classification performances on the test trans-verse ultrasound images. Because the DNM model has learnedthe structures of the midline and off-center images from alarger number of samples it indicates better classification per-formance, when compared against the original SymNet thatis only trained on the supervised set. . . . . . . . . . . . . . . 98xiiiList of Figures1.1 Three sections of the spinal column, i.e., cervical, thoracicand lumbar (left, from Wikimedia Commons) and the anatomyof a vertebra (right, courtesy of Vicky Earle). . . . . . . . . . 21.2 Median sagittal (a), paramedian sagittal (b) and transverse(c) planes of the spine. . . . . . . . . . . . . . . . . . . . . . . 31.3 The anatomy of a vertebra in transverse (a) and paramedian(b) planes. The transverse view can visualize the symmetricalUS echoes from the left and right side of the vertebrae, hence,it is preferred for midline needle insertion. However, the LFis relatively easier to see in the paramedian plane, due to thelarger acoustic window. . . . . . . . . . . . . . . . . . . . . . 62.1 Ultrasound planes of (a) transverse process, (b) facet joint,(c) lamina and (d) spinous process of a vertebra extractedfrom a 3D volume, courtesy of Vicky Earle. Depending onthe acquisition location and the orientation of the transducer,the volume may include the full range of the spinal struc-tures. The paramedian target planes for facet joint injectionsand epidural steroid injections are (b) and (c), respectively.The goal of the classification system is to identify planes (b)and (c), however, similar wave-like appearance of the spinalanatomy among the different planes makes the identificationof the desired target plane difficult. . . . . . . . . . . . . . . . 16xiv2.2 Overview of the LDH classification system. Each 2D imageof a 3D volume is first decomposed via Complete RecursivePartitioning algorithm. Pj is the list of decomposed imagepatches at scale j. Local Hadamard coefficients (CPj) of eachimage patch Si are calculated via the Hadamard matrix H.Then, directional features are extracted in sequency domainusing the proposed wedge filters for training. The output ofthe binary classifier indicates whether or not the input imageis a target plane. . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 Naturally Ordered (NOH) (left) and Sequency Ordered (SOH)(right) Hadamard matrices of size 64 × 64. Walsh functionsare sorted in ascending order in the SOH matrix. Note thatthe sequency center is located at the top-left corner of theSOH matrix. Moving to the bottom-right corner of the trans-form matrix, the order of the Walsh functions increases whichleads to higher sequency domain coefficients comparing to thetop-left corner. . . . . . . . . . . . . . . . . . . . . . . . . . . 192.4 Local Hadamard coefficients of patch A (a) and patch B (b) ofa lamina plane (left). Because patch B only includes acousticshadow beneath the bone, the transformed matrix containsno significant information whereas the sequency domain rep-resentation of patch A has relatively large coefficients in lowersequency bands. In the global transform, Hadamard coeffi-cients are distributed in the entire transform domain regard-less of acoustic echoes or shadows. . . . . . . . . . . . . . . . 212.5 Ideal sequency domain (a) vertical Fvt , (b) horizontal Fhrand (c) diagonal Fdg wedge filters of size 64 × 64 with α =10. The parameter α determines the area covered by eithervertical or horizontal filters. The black area corresponds to 0to mask the rest of the coefficients. . . . . . . . . . . . . . . . 232.6 `2 norm of the intraclass similarities in 3D (vertical, horizon-tal and diagonal) feature space, ||vs||, versus α for (a) laminaand facet joint, (b) lamina and non target and (c) facet jointand non target classes. The `2 norm is minimized (less than0.1) in 15 ≤ α ≤ 35. Therefore, filters with 15 ≤ α ≤ 35 yieldthe most distinctive features. Each line indicates the `2 normof intraclass similarities of a particular volunteer. . . . . . . . 24xv2.7 Local Directional Hadamard features of laminae, facet jointsand non target planes for δ = 3 in a complete recursive par-titioning. Each line indicates multi-scale filter variances ofa sample laminae, facet joints or non target images from aparticular volunteer. Vertical and horizontal axes correspondto feature values and indices of images (k) at different scales,respectively. 1 ≤ k ≤ 4 (white window) corresponds to thepatches at δ = 1 and similarly, 5 ≤ k ≤ 20 (light gray win-dow) and k ≥ 21 (dark gray window) correspond to δ = 2and δ = 3, respectively. Because of their particular curvedshape, facet joints have relatively higher diagonal filter vari-ances at δ = 2 and vertical filter variances at δ = 3 whereasthe laminae have more significant horizontal features at δ = 2compared to the other planes. . . . . . . . . . . . . . . . . . . 272.8 Receiver Operator Curves of target classification based on(a,b) LDH and (c,d) LPT features. The area under curve(AUC) of laminae and facet joints classifiers with LDH fea-tures are 0.97 and 0.96, respectively. Compared to the AUCof the LPT classifier which is 0.87 for laminae and 0.88 forfacet joints classification, LDH features outperform LPT inclassifier AUC. . . . . . . . . . . . . . . . . . . . . . . . . . . 292.9 Classification accuracy of the leave-one-out cross validationfor (a) lamina and (b) facet joint classification using LDHfeatures. The average cross validation accuracy for laminaeand facet joints are 94% and 90% with mean error of 2.39mmand 3.79mm, respectively. . . . . . . . . . . . . . . . . . . . . 303.1 Block diagram of the system. Target planes are identifiedfrom the 3D US volume using local directional Hadamardfeatures and a fully connected neural network. Within thetarget planes, the needle target is localized by classifying pix-els in the image via a deep convolutional network. The out-put of the final layer is augmented with the local directionalHadamard features and is then given to a fully connectedlayer for classification. . . . . . . . . . . . . . . . . . . . . . . 35xvi3.2 Sample paramedian US image from a volunteer’s laminae ofthe L3-L4 vertebrae at 7 cm depth. Prominent anatomicallandmarks of the target for epidural insertion in the parame-dian view are a) the wave-like pattern of the US echoes fromthe surfaces of the laminae of adjacent vertebrae and b) theecho from the ligamentum flavum, representing the target. . . 373.3 Original paramedian US image of a randomly selected vol-unteer’s laminae of the L3-L4 vertebrae at 7 cm depth (a),the bone enhanced image using phase symmetry (b) and theweighted sum of the enhanced and the original image (c). . . 413.4 Histogram of the correlation between LDH features and a)first (mean 0.03 ± 0.11) b) second (mean 0.04 ± 0.13 ) andc) third (mean 0.01 ± 0.11) convolutional feature maps of asimilar CNN architecture with no feature augmentation aftertraining (t-test p− value < 0.05). . . . . . . . . . . . . . . . . 453.5 Pixel-level classifications of a) LDH features, b) CNN withno augmentation c) CNN augmented with the LDH featuresat the fully connected layer and d) ground truth relative tothe original paramedian US image of the laminae of the L3-L4vertebrae of a randomly selected volunteer at 7 cm depth. TheLDH features are not distinctive enough alone to distinguishthe area between target regions. However, they contribute tothe CNN by removing some outliers in (c). . . . . . . . . . . . 463.6 Pixel-level classification performance of the proposed method(CNN+LDH) and the network without feature augmentation(CNN) on the leave-one-out cross-validation balanced data.LDH features contribute to the classification accuracy for allof the cases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.7 Sample segmentation result of a) the proposed network andb) U-net on a randomly selected volunteer’s image from dataset B at 8 cm depth. The yellow and red regions correspondto the correct and incorrect segmentations, respectively. Thegreen area indicates parts of the ground truth which have notbeen covered by the network. . . . . . . . . . . . . . . . . . . 49xvii4.1 Sample transverse US frames of the spine, visualizing the ver-tebral anatomy in the off-center (a and d) and midline (b andc) images. Vertebral structures and their corresponding USechoes are shown in the vertebra schematic and the US im-ages. Because of the geometrical shape of the vertebrae, themidline frames show symmetrical features, when comparedagainst the off-center images. Illustrations are provided byVicky Earle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.2 Block diagram of the proposed network architecture. Theconvolutional weights and biases are shared between the origi-nal and the mirrored paths. The obtained features maps fromthe original image (forig) and its mirrored version (fmirror)are concatenated before classification. . . . . . . . . . . . . . 564.3 Histogram of the `2 norm of the distance between forig andfmirror of the test data. Since the midline frames have sym-metrical anatomical features, the `2 norm is significantly lessthan that of the off-center images (mean 3.55±1.54 vs. mean5.59± 1.80 t-test p− value < 0.0001). . . . . . . . . . . . . . 584.4 Histogram of the maximum normalized cross correlation (NCC)of the midline and off-center images of the training data (a)and the effect of the threshold on the classification perfor-mance for test data (b). Both a and b suggest that the op-timal threshold for distinguishing midline versus off-centerimages via NCC is approximately 0.61. . . . . . . . . . . . . . 595.1 Block diagram of the independent conditional VAE (ICVAE)model. The variational approximator and the generator areparameterized by φ (red) and {ω, pi} (green), respectively. Af-ter training, synthetic data can be generated from a randomlatent sample z and a particular label y. . . . . . . . . . . . . 695.2 Architectures of a baseline classification model used in exper-iments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 745.3 Synthetic images generated by ICVAE (columns 3 to 5) usingtest labels (column 1) and three different latent variables (z1,z2 and z3). Since y and z are independent in ICVAE, thecharacteristics of the image are controlled by z, while thelabel specifies the category of the images. The real imagesfrom the dataset are shown in the second column. . . . . . . . 76xviii5.4 ROC curves of the baseline classification model for midline (a)and off-center (b) test images for S1, S2 and S3. Training viaadaptive data augmentation leads to higher classification per-formance when compared against conventional training withand without static data augmentation (AUC=93% vs. 88%and 90% for center-line and AUC=93% vs. 88% and 88% foroff-center images). . . . . . . . . . . . . . . . . . . . . . . . . 785.5 Training discriminative loss of the baseline classification modelfor S1, S2 and S3. In each round of augmentation, ICVAEgenerates images that the models indicated greater loss thanthe average. Hence, there is an increase at the beginningof the augmentation rounds. However after several epochs,the model learns new features associated with the augmenteddata and therefore, the loss values start to decrease. . . . . . 796.1 Block diagram of the DNM model. DNM consists of an auto-encoder that learns a parametric mapping from the data toan embedding space and a self-organizing map, which learns atopographic map from the embedding. After pre-training, theparameters of the auto-encoder are optimized along with theneurons of the SOM, in order to yield a separable embeddingspace. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 866.2 Back projection of the 20×30 SOM neurons w from the latticeto the data space using D for MNIST (a) and COIL-20 (b).Despite its relatively simple architecture and small number ofparameters, the DNM has successfully learned the style andpose variations of MNIST and COIL-20, respectively. . . . . . 946.3 Block diagram of experiment 2. After unsupervised trainingof the DNM model (a), the likelihoods of the midline and off-center images on the topographic map are learned via Parzenwindow (orange-b). Moreover, the encoder of the DNM isdetached from the model and a fully connected classifier istrained on the supervised set (orange-c). During supervisedtraining, the weights of the encoder and the neurons of theSOM are fixed (green). For both b and c, a small set oflabeled images are required. . . . . . . . . . . . . . . . . . . . 95xix6.4 Back projection of the 20× 30 SOM neurons w from the lat-tice to the data space using D trained on 2D transverse ul-trasound images of the spine. Similar to the MNIST andCOIL-20 datasets, DNM has learned representations that re-flect the variations of the midline and off-center images, suchas shadows and the intervertebral gaps. Likelihoods of bothclasses can be learned from the projection to the lattice. . . . 966.5 Probability distribution of the midline (a) and off-center (b)images on the 20 × 30 lattice, learned from the supervisedset via Parzen window with window size 2. Notice that thedistributions are complements, i.e., locations on the latticewith high probability of midline have low probability of of-centered and vice-versa. This indicates that DNM has learnedan efficient topographic map of the data, which has distinctiverepresentations for each class. . . . . . . . . . . . . . . . . . . 97xxList of AbbreviationsAE Auto-encoderBMI Body mass indexCNN Convolutional neural networkCT Computed tomographyDNM Deep neural mapsICVAE Independent conditional variational auto-encoderKDE Kernel density estimationKL Kullback-LeiblerLDH Local directional HadamardLF Ligamentum flavumLOR Loss-of-resistanceML Machine learningMRF Markov random fieldPCA Principle component analysisRBM Restricted Boltzmann machineSGD Stochastic gradient descentSOM Self-organizing mapUS UltrasoundVAE Variational auto-encoderxxiAcknowledgementsI would like to give my deepest gratitude to my advisor, Professor Rohlingfor his continuous support during my PhD studies. Prof. Rohling has taughtme the mentality of a researcher. His invaluable insights have been essentialin developing this thesis. I particularly appreciate him for his considerationin giving me the opportunity to investigate and try different ideas. Also, heis a wonderful human being, who has been greatly supportive in my real-lifechallenges.My sincere thanks also goes to Professor Abolmaesumi for his consistentand constructive feedback during my studies. His comments have been atremendous help in improving the methodologies of this thesis. Moreover, hegenerously gave me the opportunity to work with his students on a numberof projects, which have greatly helped me with developing my team-workskills.I would also like to thank Vickie Lessoway for her invaluable help withdata acquisition and annotation. She has been a truly reliable gold-standardfor evaluating my work. Vickie is an amazing person, whom I am verygrateful for having the opportunity to work with and learn from.I am also thankful to all of the former and current members of RCLat UBC: Abtin, Lucas, Mikael, Manyou, Emran, Hussam, Ilker, Alexander,Jeff, Alireza, Amir, Mohammad, Zhibin and Fatemeh.I thankfully acknowledge the grants that supported my research duringthe course of my studies: the Natural Science and Engineering ResearchCouncil of Canada (NSERC), and the Canadian Institutes of Health Re-search (CIHR). Also, I would like to acknowledge Vicky Earle for her preciseillustrations of the vertebrae and the lumbar spine.Last but not least, I would like to express my gratitude to my family fortheir unconditional love and support, which have helped me accomplish mymilestones, including this one.xxiiChapter 1Introduction1.1 Clinical background1.1.1 Spinal injectionSpinal injections typically refer to procedures that involve placement of aneedle into a target surrounded and characterized by particular spinal struc-tures in order to deliver a local anesthetic. They are being performed tomanage acute and chronic pain syndromes [112]. Generally, these techniquescan be divided into two categories, namely, diagnostic and therapeutic [75].For diagnostic purposes, spinal injections can be used to correlate struc-tural abnormality with patient’s pain during injection. This is based on theprinciple that stressing the vertebral structure, which is the source of thepain, reproduces the pain and anesthetizing that structure should result inpain relief. Therefore, spinal injections may provide additional physiologicinformation, hence, help with a more definitive diagnosis of the cause of thepain. For example, during a facet joint injection, if the majority of the painis relieved when the anesthetic is injected into the joint, then a therapeuticinjection of a steroid results in a long lasting pain relief. However, if anoutlying nerve root is the source of the pain, a nerve block may be required[153].Spinal injections can be performed for the purpose of pain managementand relief. For example, cervical epidural anesthesia can be performed fortreatment of radicular pain [132] and also to provide analgesia for surgery[151]. In the lumbar region, epidurals are commonly performed for pain reliefduring labor or cesarean section. In addition, lumbar facet joint injectionsare widely practiced for treatment of lower back pain, which is the leadingcause of disability in most countries [120].An understanding of the spine and vertebrae anatomy is essential priorto discussing the significance of spine imaging. The vertebrae are dividedinto three main regions, namely, cervical, thoracic and lumbar. Althoughthe vertebrae in each region have unique features, they are mainly composedof two parts: the vertebral body and the neural arch (see Figure 1.1). The1Neural archVertebral bo dySpineous processLaminaPedicleTransverse processThoracicCervicalLumbarSacrumFigure 1.1: Three sections of the spinal column, i.e., cervical, thoracic andlumbar (left, from Wikimedia Commons) and the anatomy of a vertebra(right, courtesy of Vicky Earle).vertebral body, which is composed of a spongy bone tissue, called cancellousbone, lies anteriorly and its dimension increases from cervical to lumbar.The neural arch is located posterior to the body and consists of a pairof pedicles, which extend from the sides posteriorly. A lamina from eachpedicle, projects medially to join and complete the neural arch.There are seven processes arising from the neural arch; one spinous pro-cess, two transverse processes, and four articular processes, which includetwo inferior and two superior facets. The spinous process points superiorlyfrom the laminae and serves to attach muscles and ligaments. The angle ofthe spinous process varies at different regions. At the lumbar level, they areoriented transversely, while in the thoracic region they become angulatedinferiorly. The transverse processes are located on each side of the vertebralbody and join the laminae and the pedicles. They also attach the musclesand ligaments, in particular, intertransverse ligaments. Each vertebra isconnected to the vertebra below and above through two inferior and twosuperior facet joints, hence they are considered as articular processes. Themorphology of the lumbar facets are quite different from those of cervicaland thoracic. In the lumbar region, the superior facets have concave sur-faces and are angulated posteromedially, while the surface of the inferior2Figure 1.2: Median sagittal (a), paramedian sagittal (b) and transverse (c)planes of the spine.facets is convex and faces anterolaterally. The facet joints guide and limitthe movement of the spine to protect the vertebrae from excessive rotationand flexion.An intervertebral disc lies between two adjacent vertebrae in the spine.Their primary function is to act as a shock absorber. In addition to thediscs, the vertebrae are joined and held together via different ligaments. Forinstance, the interspinous ligaments connect the adjacent spinous processesof the vertebrae. They extend from the base to the apex of each spinousprocess pair. Additionally, the superspinous ligament connects the tips ofthe spinous processes and it runs from the seventh cervical vertebra (C7)to the sacrum. The connective tissue between laminae of the vertebrae iscalled the ligamentum flavum (LF) and is located posterior to the spinalcanal. The spinal cord is surrounded by a thick membrane, called the duramater. The gap between the LF and the dura is referred to as the epiduralspace, where analgesic or anesthetic agents are delivered for epidurals. Theepidural space is largest in the lumbar region when compared to thoracicand cervical regions.3An epidural injection can be administered in seated, lateral decubitus orprone positions. Traditionally, the needle puncture site level is determinedby palpation. Prior to needle insertion, local anesthesia is performed at thecorresponding level. This is intended to reduce the pain of the epidural nee-dle insertion. Then with a proper angle, the needle is inserted slowly. Priorto the epidural space, the needle trajectory encounters skin, muscle, inter-spinous ligament and ligamentum flavum. During insertion, the anesthesi-ologist uses a technique, called loss-of-resistance (LOR), to confirm that thetip of the needle is reached the target, i.e., epidural space. LOR involvesusing a saline-filled syringe with the needle and applying moderate pres-sure to the plunger while advancing the needle towards the epidural space.When the tip of the needle enters the epidural space, the injection pres-sure significantly reduces and the plunger rapidly descends. Therefore, theanesthesiologist can confirm that the correct depth is achieved. The needleinsertion plane usually depends on the level. For lumbar or low thoracicinsertions, the needle is inserted in the median plane through the center lineof the spine in the sitting position, since the needle trajectory encountersfewer layers of soft tissue, compared to the paramedian insertion. On theother hand, the paramedian sagittal plane affords a larger needle target,when compared to the midline approach (see Figure 1.2). By placing theneedle laterally, the limitations associated with the shape and the angle ofthe spinous processes can be avoided. One major error when attemptingparamedian needle insertion is being too far from the midline and hence,encountering the lamina.1.1.2 Risks and complicationsClinical research indicates that spinal injections are generally safe [39]. How-ever, there are associated complications and risks. One of the common risksin epidurals is dural puncture, which is caused by needle overshoot and canlead to loss of cerebrospinal fluid (CSF). It can cause severe headaches, if theamount of the loss is sufficient to allow the CSF pressure drop. Althoughthis particular complication of the postdural puncture is usually resolvedwithout permanent injury, it leads to an increased level of anxiety in thepatient and can be catastrophic in rare cases.Other risks of spinal injections include bleeding, nerve or spinal corddamage. To avoid the associated complications with spinal injections, aset of tools are required to facilitate accurate identification of the needlepuncture site, needle target, and therefore, needle trajectory from puncturesite to the target.41.2 Need for visual feedback for spinal injectionsStudies indicate that the rate of postdural puncture is considerably higherduring training programs, when compared with an experienced anesthesiolo-gist (3-5% vs. 0.8%) [28, 41]. Therefore, they typically have a steep learningcurve [41].In order to overcome the challenges associated with relying solely on theLOR method, a number of solutions have been suggested to get tactile/audio-visual feedback for needle insertion. For instance, Lechner et al. proposed anacoustic puncture assist device, which connects to the epidural needle [89].Their device translates the pressure needed for the injection into correspond-ing acoustic and visible feedback. In another study, Tran et al. developed asystem that measures the pressure of the syringe plunger from the force anddisplacement [169]. Using the proposed system, they showed that differenttissues have different injection flow rates. Hence, the endpoint of needleinsertion (epidural space) can be identified through these measurements.Although these methods facilitate the needle placement procedure, theydo not help the anesthesiologist with selection of the needle puncture siteand its trajectory. Imaging techniques, however, can provide feedback priorto the procedure for accurate selection of the puncture site. Moreover, theycan be used for visualization of the needle, its target and the surroundingtissue during the procedure. Therefore, systems that can identify the needletarget and its anatomical landmarks would be beneficial. Such systems couldfacilitate the selection of the correct puncture location through identifyingthe corresponding vertebral structures.1.3 Ultrasound for spine imagingBecause of the complex geometry of the vertebrae, fluoroscopy (X-ray) andcomputed tomography (CT) have been investigated for guiding spinal injec-tions [157]. Amrhein et al. showed that when guided under CT-fluoroscopy,the incidence of dural puncture in cervical epidural injection is minimal [4].Filippiadis et at. investigated the accuracy of the blind needle insertion forlumbar epidural injections [37]. Their study showed that the needle wasmisplaced in 40% of the cases (56 in 138). Moreover, in 30% of the cases,the needle was inserted at the wrong level. On the other hand, fluoroscopy-guided needle insertions indicated a success rate of 100% in needle place-ment.Although CT and fluoroscopy provide high quality visualization of the5Figure 1.3: The anatomy of a vertebra in transverse (a) and paramedian (b)planes. The transverse view can visualize the symmetrical US echoes fromthe left and right side of the vertebrae, hence, it is preferred for midline nee-dle insertion. However, the LF is relatively easier to see in the paramedianplane, due to the larger acoustic window.spine anatomy, they expose the patient to ionizing radiation, which lim-its their application, especially on pregnant patients. Studies have beenconducted to develop techniques and protocols for minimizing the exposuretime [3, 16]. However, avoiding all exposure to ionizing radiation is oftenpreferred by both the physician and the patient. Ultrasound (US) as a safe,real-time, affordable and accessible alternative has been investigated for vi-sualizing the spine anatomy and needle guidance. The literature indicatesa growing interest in the past two decades for pre-procedure US and US-guided spinal injections in clinics [19, 43, 45]. In addition, the advantages ofUS-guided spinal injections have been studied extensively. Liu et al. showedthat US-guided injections are performed more quickly with fewer number ofneedle insertion attempts, when compared against nerve stimulation anes-thesia [101]. In another research, Perlas et al. showed that US imagingsignificantly improve the precision and efficacy of epidural anesthesia [133],when compared against palpation method.Depending on the orientation and the angle of the US transducer, dif-ferent aspects of the spine anatomy can be viewed. The sagittal medianplane can be used to visualize the adjacent vertebrae in a single image. Inthis plane, the proper lumbar intervertebral level for needle insertion canbe found by counting down from either the last rib (T12) or up from thesacrum, respectively. Once the correct level is identified, the transducer canbe moved medially and slightly angled to visualize the LF. This particularplane is referred to as the paramedian plane (see Figure 1.2). Because oflarger intervertebral gap and accordingly, acoustic window, the paramedianview is considered the optimal plane for visualizing the LF [44]. However for6midline needle insertion, the transverse view is preferred, since it containssymmetric ultrasonic echoes from both sides of the vertebra. Figures 1.3-aand 1.3-b show the spine anatomy in transverse and paramedian US images,respectively.Despite its advantages, US images are often challenging to interpret,when compared against CT or fluoroscopy. This is because the images areaffected by noise and artifacts such as reverberation and shadowing, whichobscure the anatomy. Moreover, identifying spinal structures and the nee-dle target in US images involves a long and steep learning curve. In astudy, Margarido et al. investigated the number of required supervised nee-dle insertion trials to achieve competence in spinal injections [113]. Theirexperimental results showed that even 20 supervised trials in addition toteaching sessions are not sufficient for novices to reach competence in ultra-sound assessment of the lumbar spine. In addition, different spinal struc-tures have very similar signatures in US images. Therefore, this imagingmodality, although much researched, has not become the standard-of-carefor pre-puncture scans or needle guidance in the spine.1.4 Related prior studiesPrevious works on using US guidance for spinal injections can be dividedinto two major categories:1. clinical feasibility studies, which investigate the accuracy and successrate of US-guided injections using a standard US system,2. advanced computer assisted systems, which propose image process-ing, computer vision and machine learning techniques to provide thephysician with additional information on the standard US images.1.4.1 Clinical feasibility studiesUS has been clinically investigated for needle guidance [24, 71, 95, 98, 150]and the accuracy and efficacy of the US-guided injections have been com-pared with those of fluoroscopy/CT guided injections [150, 189, 194]. Forinstance in a study, Evansa et al. compared the accuracy and proceduretime of US-assisted and fluoroscopy-controlled epidural steroid injections[36]. Their study was conducted over 112 patients with axial chronic backpain and experiments showed there was no significant difference betweenthe procedure times and number of needle insertion attempts. Moreover,7both US-assisted and fluoroscopy-controlled procedures resulted in similarpain intensity relief, which shows US-guided injections are clinically feasible.Obernauer et al. compared US-guided and CT-controlled cervical facet jointinjections [129]. Their experiments show that not only US-guided injectionswere faster by 6 minutes, but their therapeutic effects were similar to thoseof the CT-guided injections. 3D and 4D ultrasound have been studied forspinal injections because they makes it possible to view both the needle tipand the target during the procedure. Such investigations can be found in[11, 138]. In a study, Voloshin investigated the use of four dimensional (real-time 3D) US guidance for epidural anesthesia at lower thoracic and lumbarlevels [179]. His experimental results showed that due to enhanced spatialorientation of the operator and the ability to visualize the bone tissue andneuroaxial structures, 4D ultrasound can facilitate epidural anesthesia.In another study, Grau et al. performed US scans of the lumbar spineand employed three different US planes, namely, the transverse, medianand paramedian sagittal planes [44]. Their study showed that, because ofproviding better visibility of the anatomical features and the needle target,the paramedian plane is the optimal plane for US images of the lumbaranatomy.Srinivasan et al. [162] compared the conventional palpation based mid-line injections versus US-guided paramedian needle insertion. In their study,one hundred patients were consented and were randomized into group C(conventional) and group P (pre-procedural ultrasound-guided) with 50 pa-tients in each group. Their results showed that the average number of needleinsertion attempts in group P was significantly lower than those of groupC. However, landmark identification in paramedian US took a significantamount of time which slowed down the procedure.Alongside the studies on the clinical feasibility of US-guided injections,pre-puncture US scans have been also investigated to identify the verte-bral levels and accordingly, correctly identify the desired level for the needlepuncture site [26, 133]. Generally, clinical studies show that US-guided nee-dle insertions are feasible [26, 103, 130, 133], however, the success rate ofsuch procedures depends on the experience and the skill level of the anes-thesiologist [69, 125, 198].1.4.2 Computer-assisted systemsSeveral new US guidance systems have been proposed to visualize the spinalanatomy during the needle insertion [118, 170]. Ungi et al. [175] proposeda navigation system using an electromagnetically tracked transducer and8needle. Their proposed system records a sequence of US snapshots andregisters the spatial coordinates of the tracked needle to the coordinatesof the snapshot. Their experiments on a cadaveric lamb model showeda significant improvement in success rate of the facet joint injections. Inanother study, Chen et al. augmented the reconstructed three dimensional(3D) ultrasound volume with a preoperative 3D CT volume, which providesthe physician with a two dimensional x-ray vision of the anatomy whileinserting the needle [20]. Also, Rafii-Tari et al. proposed an ultrasoundguidance system which uses a transducer-mounted camera to detect a uniquemarker strip attached to the skin and accordingly, estimate the position ofthe US images with respect to the patient [135]. Their proposed systemcreates 3D panoramic volumes of the spine which can be used to visualizethe spinal anatomy during needle insertions.Statistical shape models have also been proposed to provide a bettervisualization of the spinal anatomy which helps the anesthesiologist withimage interpretation [15, 72, 138, 141]. For example, Rasoulian et al. [139]introduced a navigation system which registers a statistical model of thespine to a set of US images. Their proposed method uses an electromagnet-ically tracked transducer and needle for real-time guidance. Later, Brudforset al. developed a real-time system that can continuously register the sta-tistical model to the reconstructed 3D US volume without the need for anelectromagnetic tracker [17].Template matching along with machine learning algorithms have beeninvestigated to help with US image interpretation. In a recent study, Yu etal. proposed an image classification method to identify the needle puncturesite in the US images [199]. They used the transverse view of the vertebraeand template matching for identifying the target features, namely the artic-ular process, epidural space and vertebral body. Their research group laterproposed a spinal level identification system which matches a predefinedtemplate on a panoramic sagittal image of the spine [200]. Moreover, Tranand Rohling utilized a bone enhancement technique and template match-ing to identify the US echo from the surface of the bone and then, detectthe epidural space [172]. However, since matching-based approaches use astatic template for target identification, which is obtained from a subsetof subjects, they cannot cover the full intra-patient shape variations of thespine.3D US may facilitate improved identification of the spinal anatomy andthe needle target in epidural anesthesia [11]. In order to visualize the spinalanatomy in paramedian 3D US while performing needle insertion, a novelneedle guide called EpiGuide, was proposed by Malenfant et al. [110]. Using9EpiGuide, the US probe does not obscure the needle puncture site. There-fore, the physician has real-time visualization of the needle target duringinjection.In summary, most of the previous computer-assisted systems providesome guidance for either 2D or 3D US in various planes. However, dueto the complex structure of the spinal anatomy and the surrounding tissue,which are affected by the US noise, image interpretation still remains the keylimitation restricting US from becoming the preferred modality for guidingspinal needle insertions.1.5 Objectives and hypothesesRecently, machine learning (ML) algorithms have achieved outstanding per-formances in many cognitive tasks, such as speech recognition, image seg-mentation, classification, object detection [91]. Hence, these techniques canpotentially be the enabling solution for the challenge of spinal ultrasoundinterpretation. Moreover, these techniques can be used as teaching toolsduring training of novices.The main objective of this thesis is to develop machine learning modelsand algorithms that can learn key features from a dataset of US images tofacilitate interpretations, identification and guidance. In particular, we in-vestigate the problem of identifying the proper paramedian plane for needleinsertion and localizing the needle target in the paramedian planes. In ad-dition, we investigate identification of transverse midline planes for midlineneedle insertion.From the ML perspective, we study different classes of ML techniques,namely, hand-crafted features and deep supervised learning as well as semi-supervised and unsupervised learning. The latter two are particularly sig-nificant, since one of the typical characteristic of medical image datasets istheir limited number of samples.The following hypotheses are tested in this thesis:1. The ultrasound images of the spine have distinctive appearance pat-terns that can be recognized via a set of local feature extractors.2. Automatic feature learning from the ultrasound images can bene-fit from hand-engineered feature extractors for localizing particularanatomy in the image.3. An efficient classifier can be trained through learning the underlying10distribution of the ultrasound images and augmenting the trainingdataset with synthetic samples from that distribution.4. Unsupervised learning of the variations of the transverse ultrasoundimages of the spine helps with reducing the number of labeled samplesneeded to train a classifier.From the clinical procedure perspective, the anesthesiologist can utilizepre-puncture paramedian US to help find the insertion level, the correspond-ing intervertebral gap and accordingly, the target depth by counting up fromthe sacrum. Once the gap is aligned with the center of the US transducer,the skin is marked at the center of the probe. While this plane is beneficialfor paramedian visualization of the anatomy in conjuction with EpiGuide[110], the transverse view can be utilized in order to identify the center lineof the spine for the midline approach. The intersection of the center line andthe intervertebral gap, identified from the paramedian plane, indicates thedesired puncture site. Moreover, the anesthesiologist can still use the LORtechnique to confirm that the correct depth is achieved during injection.Pre-puncture US is just one of the steps of effective anesthesia. Thefinal steps involve accurate guidance of the needle towards the target anddelivering the anesthetic, while trying to avoid extra needle insertion at-tempts. Therefore, although the work in this thesis helps the operator withimage interpretation, accurate guidance of the needle is also required for asuccessful epidural anesthesia. Several guidance systems or techniques arefeasible, including the commercial electromagnetic needle tracking systemfor freehand needle guidance (SonixGPS, Analogic, Richmod, Canada) orEpiGuide [110], which is designed and optimized specifically for epiduralsi.e., can be used along with the loss-of-resistance technique with zero forceneedle release while providing the needle trajectory as an overlay.1.6 ContributionsThe following contributions have been made in order to achieve the mainobjective of the thesis:• A novel directional and multi-scale feature extraction technique in theWalsh-Hadamard domain has been developed. This method yieldsdistinctive features to classify paramedian US planes using a fully-connected neural network model. Because of the binary nature of theHadamard transformation, the proposed method can extract feature11vectors of the US images in real time without the requirement forspecific hardware such as graphical processing units (GPUs).• A novel deep neural network architecture has been developed, whichutilizes the directional Walsh-Hadamard features along with the fea-ture maps that are automatically learned from the dataset. The modelis used for localizing the needle target in paramedian images.• A novel generative model based on variational auto-encoders has beenproposed. It learns the distribution of the appearance features of im-ages and their labels independently. This model is essential for semi-supervised training of a classifier.• An adaptive training algorithm is developed, which uses the generativemodel to train a classifier. The algorithm augments the dataset basedon the performance of the discriminator during training.• A novel neural network architecture for identifying the symmetry inthe images has been developed. After adaptive training, the modelis used to identify the transverse midline images for midline needleinsertion.• A novel unsupervised model based on auto-encoders and self-organizingmaps, called deep neural maps (DNM), has been developed. DNMlearns representations of the structural features of the dataset.1.7 Thesis outlineThe rest of the thesis is divided into the following chapters:CHAPTER TWO: AUTOMATIC CLASSIFICATION OF THE TARGETPLANES FOR EPIDURAL AND FACET JOINT INJECTIONS VIA LO-CAL DIRECTIONAL FEATURESThe initial step for paramedian needle insertion involves identification ofthe anatomical landmarks in the image. A system based on the local direc-tional Hadamard features for detecting the laminae and facet joint planes inparamedian ultrasound images has been proposed. A neural network classi-fier is used to identify the target versus non-target planes. Due to the binarynature of the proposed feature extractor, the system can detect the target12planes in real time. Moreover, it has the potential to assist the anesthesiol-ogists in quickly finding the target plane for epidural steroid injections andfacet joint injections.CHAPTER THREE: AUTOMATIC LOCALIZATION OF THE EPIDU-RAL NEEDLE TARGET VIA AUGMENTING CONVOLUTIONAL ANDHAND-ENGINEERED FEATURESOnce the target plane is identified, the needle target should be detectedfor epidural injection. Hence, a hybrid machine learning model is proposedto automatically localize the needle target for epidural needle placement inthe paramedian ultrasound images of the spine, which are classified via themethod introduced in the previous chapter. In particular, a deep networkarchitecture along with a feature augmentation technique is proposed forautomatic identification of the anatomical landmarks of the epidural spacein ultrasound images. The augmented features are obtained via the localdirectional Hadamard transform. The effect of the augmented features onthe performance of the model has been investigated.CHAPTER FOUR: IDENTIFICATION OF THE CENTER LINE OF THESPINE IN TRANSVERSE IMAGESAside from the paramedian access, a commonly performed method for epidu-ral anesthesia involves midline needle insertion, where the needle puncturesite lies on the center line of the spine, which can be visualized in the trans-verse view. Therefore, a model that classifies the midline and off-centertransverse images is favourable to facilitate midline epidural injections. Inthis chapter, a deep neural network is proposed to automatically classifythe midline and off-center transverse ultrasound images of the vertebrae. Inorder to distinguish the midline from the off-center images, the proposednetwork detects the left-right symmetrical anatomical landmarks. Oncetrained, it can be used for identification of the midline in transverse USplanes, which is the first step of ultrasound-guided midline needle insertion.CHAPTER FIVE: ADAPTIVE AUGMENTATION OF TRANSVERSE USIMAGES DURING TRAINING OF A CLASSIFIERDeep supervised models require large amounts of labeled data to be reg-ularized. However, clinical data acquisition and annotation is associatedwith considerable costs. Thus, augmentation techniques are investigated to13increase the size of an existing data set of images. In this chapter, an adap-tive augmentation approach has been proposed, which utilizes a generativemodel to synthesize realistic transverse images of the spine. In particular, avariational model has been developed that learns the probability distributionof the image data conditioned on a latent variable and the correspondinglabels. Hence, the trained model can be used to generate new images witha particular label for data augmentation. An adaptive training algorithmfor discriminative models has been developed that utilizes the generator toaugment the training set with synthetic images that contribute to the per-formance of the discriminator. The efficacy of the proposed augmentationalgorithm has been investigated on a base line classifier to identify the centerline of the spine. Moreover the model, which was proposed in Chapter 4, istrained via the adaptive training algorithm and its performance is comparedagainst training on the unaugmented data.CHAPTER SIX: DEEP NEURAL MAPS FOR UNSUPERVISED FEA-TURE LEARNINGThe models proposed in Chapters 4 and 5 require labels for training, hence,supervised. However, a representative set of variations of the transverse im-ages of the spine can be learned without supervision i.e., labels. Therefore,the image annotation cost can be avoided. In this chapter, a novel unsu-pervised representation learning model using deep convolutional networksand self organizing maps, called Deep Neural Maps (DNM), is introduced.DNM jointly learns an embedding of the input data and a mapping fromthe embedding space to a N-dimensional hyper-lattice. Therefore, it canlearn efficient representations of the input data, which reflect characteristicsof each class. We investigate the performance of the DNM model in themidline identification problem.CHAPTER SEVEN: CONCLUSIONIn this chapter, a short summary of the thesis followed by suggestions forfuture work is provided.14Chapter 2Automatic classification ofthe target planes for epiduraland facet joint injections vialocal directional features2.1 IntroductionLow back pain (LBP) is the leading cause of activity limitation, work absenceand disability in most countries [62, 119]. It is estimated that between70% and 80% of adults experience at least one episode of back pain duringtheir lifetime [145]. In the United States, LBP is the fifth most commonreason individuals seek medical care and annually up to 50 billion healthcare dollars are spent on the treatment of LBP [184]. Injection therapy,in the forms of epidural steroid injections and facet joint injections, is oneof the typical treatments of LBP. Currently, injection guidance is based oneither fluoroscopy and CT or palpation and the LOR technique for epiduralinjections. Because of exposure to ionizing radiation, the application of CTand fluoroscopy is prohibited on pregnant patients for epidural anaesthesiaduring delivery. Alternatively, ultrasound (US) has been investigated forvisualization of the spinal anatomy.As explained in Chapter 1, the majority of current studies are focusedon utilizing 3D ultrasound instead of the standard 2D, because it makes itpossible to view both the needle tip and the target during the procedure. ButThe materials in this chapter are adopted from:Pesteie et al., Automatic recognition of the target plane in 3D ultrasound with EpiGuide.7th NCIGT and NIH Image Guided Therapy Workshop - International Conference onMedical Image Computing and Computer Assisted Intervention (MICCAI), (2014)Pesteie et al., Real-time ultrasound image classification for spine anesthesia usinglocal directional Hadamard features. International Journal of Computer AssistedRadiology and Surgery, 10, no. 6 (2015): 901-912.15Figure 2.1: Ultrasound planes of (a) transverse process, (b) facet joint, (c)lamina and (d) spinous process of a vertebra extracted from a 3D volume,courtesy of Vicky Earle. Depending on the acquisition location and theorientation of the transducer, the volume may include the full range of thespinal structures. The paramedian target planes for facet joint injectionsand epidural steroid injections are (b) and (c), respectively. The goal of theclassification system is to identify planes (b) and (c), however, similar wave-like appearance of the spinal anatomy among the different planes makes theidentification of the desired target plane difficult.due to the complex structure of the spinal anatomy, a key challenge in using3D ultrasound for needle guidance is identifying the target plane among thenumerous planes of a 3D volume. Moreover, as shown in Figure 2.1, differentspinal structures show similar wave-like patterns in US images which makesthe identification of the proper plane even more difficult. For epidural steroidinjections and facet joint injections, the target plane should contain thelaminae of the vertebrae and the facet joints, respectively. Therefore, anautomated system that can identify precisely the optimal plane for needleinsertion, would be beneficial.2.1.1 ContributionIn this chapter, we introduce a system that is able to automatically identifythe target plane in 3D US for facet joint and epidural steroid injections inreal time. We propose a machine learning approach based on artificial neuralnetworks to learn the key features of the anatomical landmarks of the targets.16In order to achieve accurate classification, we introduce a real-time, multi-scale and multi-directional feature descriptor based on the Sequency Or-dered Hadamard transform [10] called Local Directional Hadamard (LDH)features. The proposed method offers localized scale/location/orientationfeatures. We acquired 104 in vivo ultrasound volumes from 13 volunteerswhich are annotated by an expert sonographer for training and testing theclassifier. In order to determine the accuracy of classification, we performeda leave-one-out validation.2.2 Methods and materialsFigure 2.2 shows the overview of our proposed system. Like many harmonyanalyzers, the Hadamard transform gives the sequency representation of theimage along vertical and horizontal directions which can be used for detect-ing different wave-like patterns of the spinal structures in the ultrasoundimages. In order to extract local features, we utilize a recursive partition-ing algorithm. Accordingly, we introduce a multi scale feature extractionmethod based on the Sequency Ordered Hadamard transform to extract thekey features of the target planes for classification. We discuss the mathe-matical background of the Hadamard transform in the next section whichis followed by our feature extraction algorithm. Later, the materials andprocedures used for pattern recognition, data acquisition and experimentsare discussed.2.2.1 Hadamard transformAmong the family of orthogonal linear transforms applied on spatial domainsignals (e.g., images) the Hadamard transform, which decomposes the inputsignal into rectangular wave primitives in the transform domain, is definedas [10]:X(u, v) =1N2N−1∑i=0N−1∑j=0I(i, j)[(−1)ψ(u,v,i,j)], (2.1)where I is the image function of size N ×N and ψ determines the paramet-ric kernel function of the transform. In the Naturally Ordered Hadamardtransform, ψ is defined as:ψ(u, v, x, y) =l−1∑k=0[bk(x)bk(u) + bk(y)bk(v)] . (2.2)17Figure 2.2: Overview of the LDH classification system. Each 2D imageof a 3D volume is first decomposed via Complete Recursive Partitioningalgorithm. Pj is the list of decomposed image patches at scale j. LocalHadamard coefficients (CPj) of each image patch Si are calculated via theHadamard matrix H. Then, directional features are extracted in sequencydomain using the proposed wedge filters for training. The output of thebinary classifier indicates whether or not the input image is a target plane.18Figure 2.3: Naturally Ordered (NOH) (left) and Sequency Ordered (SOH)(right) Hadamard matrices of size 64 × 64. Walsh functions are sorted inascending order in the SOH matrix. Note that the sequency center is lo-cated at the top-left corner of the SOH matrix. Moving to the bottom-rightcorner of the transform matrix, the order of the Walsh functions increaseswhich leads to higher sequency domain coefficients comparing to the top-leftcorner.bk(w) is the k’th bit in the binary representation of w and l is the totalnumber of bits. However, the recursive matrix form of the Naturally OrderedHadamard transform is more convenient and popular which is defined as:Hm =1√2(Hm/2 Hm/2Hm/2 −Hm/2), (2.3)where m is an integer power of 2 and is referred to as the size of the trans-form and H1 = 1 for the base Hadamard matrix. The elements of the basisvectors are orthogonal and are binary values (±1) called Walsh functions.Computation of the Hadamard transform just requires addition and sub-traction rather than multiplication which makes the Hadamard transformfast and easy to implement.In the Sequency Ordered Hadamard transform (SOH), the Walsh func-tions are ordered in the Hadamard matrix based on their sign change. There-fore, there is no sign change in the first row and column of the SOH matrixwhile the n’th row and column contains the Walsh function with n− 1 zero-crossings. For instance, H2 has zero and one sign change in the first andsecond row/column, respectively. Equation 2.1 and its matrix form withoutnormalization factor can be written as:W (u, v) =N−1∑i=0N−1∑j=0HN (u, i)I(i, j)HTN (j, v)⇔W = H× I×HT, (2.4)where HN and HTN are the Hadamard matrix of size N and its transpose,19respectively. The map of the Naturally Ordered and Sequency OrderedHadamard matrices of size 64× 64 are shown in Figure Feature extractionMulti-scale featuresThe optimal ultrasound plane in epidural steroid injections or facet jointinjections, must contain the laminae of the vertebrae or the facet joints re-spectively. Using equation (2.4), global Hadamard coefficients do not yielddistinctive features from complex structures. Therefore, due to the similarpatterns of different spinal structures, local features are generally preferred.With this motivation, we begin by exploring a simple question: Is it possibleto have local Hadamard coefficients of the image at different scales? If so,we can locally investigate the image in the Wash-Hadamard domain. Look-ing at equation (2.3), it is possible to have different Hadamard matrices ofarbitrary sizes of 2n. Therefore, it is also possible to utilize a Recursive Par-titioning (RP) algorithm to compute the Hadamard coefficients at any scaleof interest. Using Donoho and Huo’s notation [33], recursive partitioningfor image S is defined by the following rules:• Q = {Si} i = 1, ..., n is an RP.• If S` can be decomposed into four non-overlapping squares of thesame size S`,00, S`,01, S`,10 and S`,11, then the new partition Q′ ={S1, ...,S`,00,S`,01,S`,10,S`,11, ...,Sn} is also an RP.We adopt the same RP algorithm with a slight modification. In orderto maintain the order of decomposition consistent, we use a list instead ofa set. Therefore, in the RP list, image patches are sorted based on theirscale. The key insight to the multi-scale features based on the Hadamardtransform is to decompose the image into four patches at each scale andrepeat this recursive procedure until a certain depth δ. Note that the depthof the RP indicates the maximum decomposition scale (δ).Let Pj be a list of all decomposed images at scale j such that Pj is asublist of Q. Also let CPj and Hj be the list of the Hadamard coefficients ofimage patches and the Hadamard matrix at scale j with the correspondingsize m, respectively. Thus, for j ≤ δ and all S ∈ Pj :CPj = 〈Hj × Sk ×HjT〉 k = 1, ..., |Pj | (2.5)20Figure 2.4: Local Hadamard coefficients of patch A (a) and patch B (b) ofa lamina plane (left). Because patch B only includes acoustic shadow be-neath the bone, the transformed matrix contains no significant informationwhereas the sequency domain representation of patch A has relatively largecoefficients in lower sequency bands. In the global transform, Hadamard co-efficients are distributed in the entire transform domain regardless of acous-tic echoes or shadows.We call CPj the Recursive Partitioning Hadamard Coefficients (RPHC).RPHC has two major contributions. First, it gives a multi-scale representa-tion of the image, based on the Hadamard transform. Using the multi-scalerepresentation, one is able to examine sequency domain coefficients at anylevel of detail. Second, it is able to locally correlate spatial and sequencydomains. This correlation can be further used to recognize different patternswhich may appear in different regions of the image. As an example, Fig-ure 2.4 shows local Hadamard coefficients of two particular image patchesof an ultrasound image. The sequency domain representation of the imagepatch A (Figure 2.4-(a)) shows large coefficients in lower sequency bandswhereas in patch B (Figure 2.4-(b)), there are no significant information inHadamard domain, comparing to image patch A.Multi-directional featuresThe two-dimensional Hadamard transform provides sequency representa-tion of the image along vertical and horizontal directions, thus it should beapplicable in detecting different structures within the image. An elemen-tary example would be the identification of a simple horizontal line which21yields significantly large coefficients in the first transformed matrix column.Similarly, a vertical line gives the most significant coefficients in the firsttransformed matrix row and accordingly, in a diagonal line, the coefficientsare distributed in the entire transform domain rather than the first columnor row. Thus, going from horizontal orientation to vertical, the distributionof the Hadamard coefficients varies from the first column to the first row.In the spinal ultrasound images, bone surface is represented by particularwave-like signatures which can be approximated by line segments. There-fore, similar to the simple synthetic lines, the patterns of different spinalstructures are identifiable by finding the region-orientation correlation ofthe signatures in sequency domain.In order to extract orientation features, one must look at the Hadamardcoefficients of vertical, horizontal and diagonal orientations separately andthen, based on the significant coefficients, can differentiate between orienta-tions. We introduce three crisp wedge filters, Fvt, Fhr and Fdg in Hadamarddomain with the same dimension as that of the decomposed image patchesat a give scale, where Fvt masks the Hadamard coefficients except those cor-respond to vertical patterns and similarly, Fhr and Fdg mask for horizontaland diagonal coefficients respectively. We define Fvt as:Fvt(u, v, α) =12+12|v − uα/50|v − uα/50 , (2.6)where u and v are sequency domain indices and α is the parameter whichdefines the percentage of the covered area (α ≤ 50). Fhr and Fvt aresymmetric, therefore:Fhr = FTvt, (2.7)and the diagonal filter, Fdg, should cover the remaining area of the trans-formed matrix. Thus:Fdg = 1− (Fvt + Fhr). (2.8)The parameter α also tunes the precision of the filters. Small values ofα correspond to narrower range of angles for either vertical (≈ pi/2 ± ) orhorizontal (≈ 0 ± ) structures, while a large α translates into wider rangeof angles for such structures. Therefore, choosing the right α depends onthe application. Later, we will discuss how the optimal α is chosen for ourapplication. Figure 2.5 shows Fvt, Fhr and Fdg with α = 10. Also note thatalthough the filters’ size changes with respect to scale, they are identical inshape.22Figure 2.5: Ideal sequency domain (a) vertical Fvt , (b) horizontal Fhr and(c) diagonal Fdg wedge filters of size 64 × 64 with α = 10. The parameterα determines the area covered by either vertical or horizontal filters. Theblack area corresponds to 0 to mask the rest of the coefficients.Because directional structures have significant Hadamard coefficients intheir corresponding filter outputs, we propose to use the variances of the non-zero elements of the outputs as the multi-scale, multi-directional features.Therefore the local feature vector v¯j at scale j is given as:v¯j =[V ar(CPjk ◦ Fjvt) , V ar(CPjk ◦ Fjhr) , V ar(CPjk ◦ Fjdg)]k = 1, ..., |CPj |,(2.9)where CPjk is the k’th Hadamard transformed matrix in RPHC CPj , Fjis the wedge filter at scale j and A ◦ B denotes the Hadamard product.Accordingly, the final feature vector would be the aggregation of all thelocal features at decomposition scales and it is given as:v¯final =[v¯j]j = 1, ..., δ. (2.10)The lamina and facet joint of a vertebra have very similar structuresin US. The location/orientation representations of the laminae and facetjoints in sequency domain also indicate similar signatures. However, using anon-linear classifier, it is possible to learn these signatures for classification.Parameter estimationIn order to find the optimum value for the wedge filter coverage area pa-rameter (α), vertical, horizontal and diagonal feature vectors are extractedseparately from samples of different classes (lamina vs. facet vs. non-target).Our goal is to find the parameter α such that the filters yield distinctive ver-tical, horizontal and diagonal features from laminae, facet joints and non-target planes. Therefore, the optimal wedge filters would give orthogonal or23Figure 2.6: `2 norm of the intraclass similarities in 3D (vertical, horizontaland diagonal) feature space, ||vs||, versus α for (a) lamina and facet joint,(b) lamina and non target and (c) facet joint and non target classes. The`2 norm is minimized (less than 0.1) in 15 ≤ α ≤ 35. Therefore, filters with15 ≤ α ≤ 35 yield the most distinctive features. Each line indicates the `2norm of intraclass similarities of a particular volunteer.nearly orthogonal intraclass feature vectors. Hence, the intraclass distancebetween feature vectors should be maximized. For this reason, the intra-class dot products of vertical, horizontal and diagonal features, hereaftercalled as similarity vectors vs, are calculated for 5 ≤ α ≤ 45 from the entiredataset. To ensure maximum orthogonality, the three feature sets shouldbe minimized simultaneously. Thus the `2 norm of the similarity vectors in3 dimensional feature space should be minimized. Figure 2.6 shows the `2norm of the intraclass similarity vectors of sample images from all volunteers’data. Investigating Figure 2.6, we can see that until α = 15, the `2 norm di-minishes with respect to α and exponentially grows after α = 35. Therefore,the filters yield the most distinctive feature vectors for some 15 ≤ α ≤ 35.In practice, the classifier reached its maximum performance with α = 15 ontraining data.The optimum decomposition scale, δ, is chosen based on measuring theclassification accuracy for various values of δ. The best correct classificationrate was achieved with three levels of decomposition (δ = 3) in the recursivepartitioning algorithm.The parameter values (α = 15 and δ = 3) are not changed throughoutthe experimental validations.2.2.3 Pattern classificationWe used a two-layer neural network for pattern classification. The number ofneurons in the hidden layer was determined by testing the classification per-24formance for different network configurations. The classifier was at its bestperformance with 35 neurons in the hidden layer. The hyperbolic tangentsigmoid and softmax activation functions were used in the hidden and out-put layers, respectively. The network weights were updated using the ScaledConjugate Gradient algorithm. Two different networks with the same train-ing parameters were trained to classify the lamina and facet planes based onthe final feature vectors, v¯final, extracted from the dataset and the labelsprovided by the expert sonographer.2.2.4 Data acquisitionOne hundred and four 3D ultrasound volumes from 13 healthy volunteers,from whom written consent was obtained, were captured by an expert sono-grapher using a SonixTOUCH US machine (Ultrasonix Medical Corp. Rich-mond, Canada) equipped with a 5 MHz 3D transducer with depth of 7 cm,50% gain and FOV of 43.9◦. BMIs of the volunteers were all less than 30.All of the volunteers were aged between 20 and 35 years and none of thethem had any records of previous spinal pathology or surgery. During dataacquisition, the US probe was placed on the L3 and L4 vertebrae within10 mm on both left and right side of the midline and slightly angled by 5◦to 10◦. Therefore, the vertebral structures namely, spinous process, lamina,facet joint and transverse process were in the field of view of the 3D probe inall of the volumes. The vertebral levels were identified by counting up fromthe sacrum to the L3 and L4 vertebrae. From each side of the midline, fourvolumes were acquired with slight displacement of the transducer on the L3and L4 vertebrae, resulting in 104 3D US volumes in total. All of the config-urations of the machine remained unchanged during data acquisition. Eachnative ultrasound plane was annotated by an expert sonographer as eitherthe target plane for epidural/facet joint injections or non-target plane. Thelabels were later used as the gold standard to measure the correct classifica-tion rate, specificity and sensitivity. To maintain consistency, the followingprotocol was defined and followed by the sonographer during the labelingprocedure.• A plane is considered as a target, if contains one or more facets orlaminae.• In the case of laminae and facet joints appearing in the same plane,the plane would be marked as target for both epidural and facet jointinjections.25• A plane is labeled as a target if the laminae or facet joints are recog-nizable despite their low contrast.• If a plane falls at the border of lamina and facet (i.e., not exactlyrecognizable) it is marked as a target for both epidural and facet jointinjections.The annotation procedure was repeated on the entire dataset after 36weeks for the second time by the same sonographer using the predefinedprotocol. The dice similarity coefficient for the lamina and facet joint labelswas 0.92 and 0.93 respectively.2.2.5 ExperimentsBecause the number of the target planes is significantly less than the non-target planes in a 3D volume, the classifier was trained based on equalnumber of target and randomly selected non-target slices (total 1090 planesfor lamina and 1052 for facet joint) using 70% of data for training (764planes for lamina and 736 planes for facet joint) and the remaining 30% fortesting (326 planes for lamina and 316 planes for facet joint). The classifierwas trained with the same parameters and then was tested to classify thelamina, and facet joint planes.In order to validate the preliminary classification results, one volunteer’sdata was left out from the training set and the classifier was trained based onthe LDH features from the remaining volunteers. Later, the trained classifierwas tested on the left-out volunteer’s balanced data. This procedure wasrepeated for all of the volunteers.To compare the classification performance of the LDH features againstthe previously proposed and well-known phase symmetry features, whichare the state-of-the-art approach in ultrasound feature extraction [136, 137],particularly in bone ridge enhancement [48, 49, 172], the US images wereprocessed with Log-Gabor bandpass filters. Odd and even tensors wereextracted to give local phase symmetry features of the image [49]. Becauseof the high dimensionality of the feature vector, PCA was used to extractprinciple components of the feature vector. The parameters of the classifierwere tuned to yield the best classification results.2.3 ResultsThe proposed Local Directional Hadamard feature extractor was imple-mented in MATLAB and run on a 3.40 GHz Intel(R) CoreTM i7 with 1626Figure 2.7: Local Directional Hadamard features of laminae, facet jointsand non target planes for δ = 3 in a complete recursive partitioning. Eachline indicates multi-scale filter variances of a sample laminae, facet joints ornon target images from a particular volunteer. Vertical and horizontal axescorrespond to feature values and indices of images (k) at different scales,respectively. 1 ≤ k ≤ 4 (white window) corresponds to the patches atδ = 1 and similarly, 5 ≤ k ≤ 20 (light gray window) and k ≥ 21 (darkgray window) correspond to δ = 2 and δ = 3, respectively. Because of theirparticular curved shape, facet joints have relatively higher diagonal filtervariances at δ = 2 and vertical filter variances at δ = 3 whereas the laminaehave more significant horizontal features at δ = 2 compared to the otherplanes.27Table 2.1: Comparison of the classification performance, i.e., accuracy(ACC), sensitivity (SNS), specificity (SPC) and precision (PRC) of auto-matic detection of the lamina and facet joint (target) planes using LDH andLPT features on test data. Precision indicates the correct classification ratefor target planes.Target # of planes Method ACC % SNS % SPC % PRC %Laminae 326 LDH 95 93 97 97LPT 84 82 86 86Facet joints 316 LDH 94 96 92 93LPT 84 84 84 84GB of memory.To compare directional LDH features, a sample lamina, facet and non-target plane was chosen from each volunteer’s data. Figure 2.7 shows verti-cal, horizontal and diagonal multi-scale (1 ≤ δ ≤ 3) features of the sampleplanes for all of the volunteers. As Figure 2.7 shows, although the laminaand the facet joint have very similar signatures, facet joint planes have rel-atively larger diagonal features at δ = 2 comparing to lamina planes (mean164.3 vs. mean 130.13) and also vertical features at δ = 3 (mean 44.65 vs.mean 30.02). This is because the facet joints appear as a highly curvedwave-like pattern compared to the laminae. Likewise, the laminae of thevertebrae indicate higher horizontal filter variances at δ = 2, because oftheir particular shape comparing to the facet joints (mean 116.18 vs. mean96.03).After training, the proposed system successfully classified 309 ultrasoundimages out of 326 (95%), and 298 images out of 316 (94%) of the test datafor epidural and facet joint injections, respectively whereas, classificationvia Local Phase Tensor (LPT) features, was 84% accurate for both laminaand facet joint plane detection.Figure 2.8 shows the Receiver Operative Curves of the proposed systemand the LPT based classifier. As shown in the figure, when trained on LDHfeatures both lamina and facet joint classifiers inidcated AUCs greater than0.95. On the other hand, LPT based classifiers reached AUCs of 0.87 and0.88 for lamina and facet joint classification, respectively. Moreover, theaverage feature extraction time per image for the proposed method is lessthan that of the LPT filters (20 ms vs. 324 ms).The average classification accuracy was 94% for laminae and 90% forfacet joints detection for leave-one-out cross validation (see Figure 2.9). Ta-28Figure 2.8: Receiver Operator Curves of target classification based on (a,b)LDH and (c,d) LPT features. The area under curve (AUC) of laminae andfacet joints classifiers with LDH features are 0.97 and 0.96, respectively.Compared to the AUC of the LPT classifier which is 0.87 for laminae and0.88 for facet joints classification, LDH features outperform LPT in classifierAUC.29Table 2.2: Average classification accuracy (ACC), sensitivity (SNS), speci-ficity (SPC) and precision (PRC) of the proposed method on leave-one-outcross-validation for epidural and facet joint target plane detection.Target # of planes ACC % SNS % SPC % PRC %Laminae 1090 94 94 93 93Facet joints 1052 90 94 85 87Figure 2.9: Classification accuracy of the leave-one-out cross validation for(a) lamina and (b) facet joint classification using LDH features. The averagecross validation accuracy for laminae and facet joints are 94% and 90% withmean error of 2.39mm and 3.79mm, respectively.ble 2.2 summarizes the performance of the proposed system for leave-one-outvalidation of lamina and facet joint classification, respectively for all volun-teers. In addition to calssification performance, the validation error wasmeasured as the actual distance between the misclassified image as targetand the actual closest target in the volume. The average validation errorwas 2.39 mm and 3.79 mm for lamina and facet joint detection, respectively.2.4 DiscussionInvestigating the leave-one-out cross validation results of the laminae classi-fication of the first volunteer, which indicates relatively lower classificationaccuracy (76%) comparing to the rest of the dataset (see Figure 2.9), thealgorithm is 95% precise on the volunteer’s data and only one non lam-ina plane is misclassified as target. Clinically, with the aid of ultrasound30visualization, an anesthesiologist performs the needle placement when thetarget is recognizable in the ultrasound image. Hence, complications of nee-dle misplacement can be avoided by achieving a high precision classification.Moreover, investigating the classification results, we noted that the misclas-sification often happens at the borders of the transition to the subsequentspinal structure in a sweep of the 3D probe. Thus, the anesthesiologist mightmiss only one or two frames at the border, not the entire target range offrames for needle placement. Hence, a safe spinal injection is possible evenwithout 100% classification accuracy.Because of the binary nature of the Hadamard transform, efficient im-plementation of the proposed method is possible. To obtain real-time per-formance, no pre/post-processing steps were performed in our MATLABimplementation and the original LDH features were used for classificationwhereas due to high dimensionality, principle components of the LPT fea-tures had to be extracted. Accordingly, PCA adds additional computationalcost on top of that of the LPT feature extraction. Noise reduction or boneenhancement algorithms might increase the classification accuracy of ourproposed method but it must be ensured that using such methods does notimpose high computational cost to the system.One typical weakness of the majority of the automated algorithms is theneed to tune relatively large number of parameters for the optimal perfor-mance. In the proposed algorithm, filter coverage parameter α was deter-mined and verified to yield maximum intraclass orthogonality. Maximumdecomposition scale δ was determined by classification performance via trialand error. The parameters remained unchanged during the tests. Since allof the subjects were healthy volunteers with no obesity or spinal pathol-ogy, the machine configurations (i.e., time gain compensation, frequencyand etc.) were set and remained the same for all of the volunteers by thesonographer based on providing the best visibility of the spinal structures.Yet for clinical application, data from obese patients should also be includedin the training samples as well as patients with spinal pathology.Future work of the proposed system involves including samples fromobese patients and spine pathology cases in the training data, in order tocover a wider range of the population.2.5 ConclusionAn automated system to detect the target plane in 3D ultrasound for epidu-ral steroid injections and facet joint injections has been proposed. The pro-31posed system successfully classified the target planes for epidural and facetjoint injections from non target images in 94% and 90% of cases on average,respectively during leave-one-out cross validation. It should be emphasizedthat the proposed method provides automated feedback to quickly find thetarget plane and is not a replacement for image interpretation by either asonographer or anesthesiologist. The proposed system can also be used tohelp the novices to learn spinal anatomy and to identify appropriate planesfor epidural steroid injection and facet joint injection.Upon identification of the correct plane, a system is needed to localizethe needle target within the detected plane. Since epidural injections involveneuroaxial anesthesia within the spinal canal, there is a greater risk of con-sequences associated with misguidance, when compared against facet jointinjections. Therefore in Chapter 3, we focus on localizing the epidural spacei.e., the needle target for epidural anesthesia, in the paramedian images ofthe spine.32Chapter 3Automatic localization of theepidural needle target viaaugmenting convolutionaland hand-engineered features3.1 IntroductionEpidural needle injection is a commonly used procedure in obstetrics [1, 34]and chronic pain treatment [6, 111]. An epidural injection is administeredby placing a needle into the epidural space between ligamentum flavum anddura mater, while the patient is either in a sitting or lying position with theback arched, in order to increase the intervertebral gap. Studies indicatethat neither position is superior to the other in terms of efficacy, proceduretime and patient’s comfort level [147, 178]. Traditional epidurals are eitherguided by palpation and the LOR technique [168], or under fluoroscopy [38].The conventional LOR technique has a failure rate within 6-20% [88]. Failureis referred to as inadequate or no pain relief after the procedure and is oftendue to needle misplacement. In some cases, misplacement via overshoot ofthe needle tip past the epidural space and puncture of the dura mater, canlead to headache and other complications.Although US has been used to facilitate lumbar epidural needle inser-tions, US images are often difficult to interpret. In particular, identifyingthe needle target in an US image is challenging, due to the complexity ofthe anatomical landmarks of the target. Moreover, US images are usuallyaffected by speckle noise [180], acoustic clutter [94], reverberation artifactsand shadowing [74]. Hence, although improvements continue to be madeto the beam forming and image filtering algorithms in order to improveThe materials in this chapter are adopted from:Pesteie et al., Automatic localization of the needle target for ultrasound-guided epiduralinjections. IEEE Transactions on Medical Imaging, 37, no. 1 (2018): 81-92.33the overall image quality, image interpretation remains a key challenge fornovices.Therefore, a system is needed to automatically identify and localize theneedle target within the US images of the spine. Such a system has potentialbenefits to the patient such as increased use and effectiveness of analgesia,and a reduction in the frequency and severity of associated complications.In particular, the goal is to allow any operator, including the novice ultra-sound user to correctly identify the needle target. Studies have shown thatultrasound guidance reduces the learning time for novices [124, 158] and thenumber of needle insertions needed before the target is reached [43]. There-fore, the expected benefit of this work is to give successful anesthesia to alarger number of patients through the complication of more accurate needletargeting and operator confidence and success rate. The objective of thischapter is to develop a machine learning technique which helps the operatorwith accurately localizing the needle target in US images prior to, or duringneedle insertion.3.1.1 ContributionsExtending the previous work on automatic US image classification presentedin Chapter 2, in this chapter a convolutional network architecture that bothidentifies and localizes the epidural space in paramedian images of the lum-bar spine in 3D and 2D US is proposed.We introduce a feature augmentation technique that incorporates convo-lutional feature maps, which are obtained by convolutional layers, with multiscale local directional Hadamard features. Since the multi-scale Hadamardfeatures are sensitive to the directionality of the US echoes from the surfaceof the vertebrae in the US image, the augmentation provides the deep net-work with a set of distinctive directional features from the sequency domainin addition to the feature maps automatically obtained from the spatialdomain.We demonstrate that the augmented Hadamard features are not auto-matically learned by a deep network with the same number of convolutionallayers and kernels and hence, the augmented features are not redundant.The impact of the feature augmentation on the performance of the pixel-level classification is shown through evaluation of accuracy of the proposednetwork against a conventional CNN architecture with the same number ofconvolutional layers and kernels. Also, we demonstrate that augmenting theHadamard features with the convolutional feature maps improves the targetlocalization accuracy when compared to the localization results of template34Figure 3.1: Block diagram of the system. Target planes are identified fromthe 3D US volume using local directional Hadamard features and a fullyconnected neural network. Within the target planes, the needle target islocalized by classifying pixels in the image via a deep convolutional net-work. The output of the final layer is augmented with the local directionalHadamard features and is then given to a fully connected layer for classifi-cation.matching and a state-of-the-art deep network for biomedical image segmen-tation.Using features obtained from the training data along with a set of hand-crafted features for image analysis has been previously investigated by Wanget al. for breast cancer detection [182]. However, their proposed methoddoes not take into account the hand-crafted features while learning the onesfrom training data. Hence, it is arguable that a wide range of the learnedfeatures are already extracted via the hand-engineered feature extractors.3.2 Methods and materialsFigure 3.1 shows the block diagram of our proposed hybrid model. In orderto localize the target in an ultrasound volume, first, the 2D planes of thevolume are passed to the classifier, which was presented in Chapter 2. Themodel detects the target planes, and passes them to a secondary networkfor localization. The proposed network for target localization uses two setsof features in order to identify the ligamentum flavum in the target planes:1. A set of hand-engineered local directional features from the sequencydomain, and2. feature maps, which are automatically learned from the images.35We briefly discuss the feature extraction technique, followed by detailedexplanation of the proposed model.3.2.1 Local directional features of target planesAs shown in Figure 3.2, the laminae of the vertebrae are the prominentanatomical features of the epidural space in the paramedian US images ofthe spine. Because the pixel intensities of the US image is proportionalto the direction of the reflected US wave from the tissue with respect tothe transducer, the particular wave-like signature of the laminae of the ver-tebrae, can be identified via distinctive local directional features from theUS echoes of the bone surfaces. Let I be a target plane of size m×m fromthe input volume. In Chapter 2, we showed that local Hadamard coefficients(CPj ) can be extracted from I using the recursive partitioning algorithm andlower scale Hadamard matrices (equation 2.5). The algorithm represents adepth-first traversal of the partitioning tree with maximum depth δ whichthe root indicates the target image. Previously, we introduced three wedgefilters in the sequency domain, Fvt, Fhr and Fdg to identify the signifi-cant Hadamard coefficients in vertical, horizontal and diagonal orientations,respectively, as:Fvt(u, v, α) =12+12|v − uα/50|v − uα/50 ,Fhr = FTvt,Fdg = 1− (Fvt + Fhr),(3.1)where u and v are sequency domain indices and α is the parameter whichdefines the percentage of the covered area by the filter (α ≤ 50). In order toextract directional features from CPj , we utilize the same wedge filters in thesequency domain and compute the vertical, horizontal and diagonal featuresusing equation 2.9. Therefore, at a given scale, j, the local directionalHadamard (LDH) feature vector is comprised of the variances of the nonzeroelements of the filter outputs:vj =[V ar(CPjk ◦ Fjvt) , V ar(CPjk ◦ Fjhr), V ar(CPjk ◦ Fjdg)]k = 1, ..., |CPj |,(3.2)All of the parameters of the proposed feature extractor remain the same asthose introduced in Chapter 2.36Figure 3.2: Sample paramedian US image from a volunteer’s laminae ofthe L3-L4 vertebrae at 7 cm depth. Prominent anatomical landmarks ofthe target for epidural insertion in the paramedian view are a) the wave-like pattern of the US echoes from the surfaces of the laminae of adjacentvertebrae and b) the echo from the ligamentum flavum, representing thetarget.3.2.2 Target localizationTarget definitionThe epidural space is superior to the upper margin of the base of the spinousprocess at its junction with the lamina [171]. In 2D paramedian US images ofadult spine, the echo from the ligamentum flavum can be seen immediatelyadjacent to the anterior base of the laminae of vertebrae in paramedian plane[172] (see Figure 3.2). Since the location of the epidural space correspondsto that of the base of the laminae of vertebrae, the anterior base of thelamina is referred to as the “needle target” in the US images, hereafter.Convolutional neural networksConvolutional neural networks (CNNs) have shown noticeable success innumerous cognitive tasks such as object and speech recognition [84, 148]and also medical image analysis [77, 187]. In contrast to traditional machinelearning methods that require hand-engineered features, CNNs learn a setof convolutional kernels in multiple layers, which are specifically tuned for37the problem. This is achieved by minimizing a loss function L(.) of theoutput f(X; K,b) and target t, over the convolutional kernels K and biasesb, given input X from data set D:arg minK,b 1|D||D|∑i=1L (f(Xi; K,b), ti) . (3.3)The hierarchical feature learning in a CNN allows it to extract represen-tations of the data at multiple levels of abstraction. A CNN architecturemay perform a down-sampling operation, called “pooling”, before proceed-ing to the next convolutional layer, which reduces the number of weights andalso provides some level of robustness to a set of transformations, includingrotation and translation of the features. Therefore, a CNN can potentiallycover a large variability in US appearance of the target.CNN architectureThe needle target, namely the epidural space, can be identified by 1) the par-ticular wave-like signature of the laminae of the vertebrae in the parasagittalplanes and 2) the ultrasonic echo from the ligamentum flavum (see Figure3.2). A CNN architecture with feature augmentation is proposed for pixel-level classification of the US image to localize the epidural space throughidentification of the respective anatomical landmarks. For a given pixel p,the goal of the network is to classify the image patch centered at p, Jp, andaccordingly, predict whether or not p belongs to the needle target regionannotated by the expert.The proposed CNN consists of three convolutional layers. The first layerhas 10 5×5 kernels and is followed by a max pooling layer with a 2×2 poolsize. The next two convolutional layers have 10 3 × 3 kernels each and arefollowed by max pooling layers with the same pool size as the first poolinglayer. The activation functions of the convolutional and fully connectedlayers are the rectifier function, which is defined as:f(x) = max(0, x). (3.4)The last convolutional layer yields feature maps which represent local de-pendencies of the spinal anatomy in spatial domain. However, since con-volution is a spatial domain operation, multi-scale analysis of the imagein the sequency domain provides a set of features, which the conventionalCNN architectures with relatively small number of parameters cannot ex-tract from the spatial domain. Therefore, augmenting convolutional feature38maps with multi-scale Hadamard features provides a richer representationof the anatomical landmarks of the needle target in US images.Let Yf be the final output feature map for image patch Jp prior to thefully connected layer, i.e.:Yf = ↓(F(∑iKf ?Xi + bf)), (3.5)where ? and ↓ denote the 2D discrete convolution and max pooling operators,respectively, F(.) is the non-linear activation function, Xi is the i’ th featuremap obtained from the previous layer and Kf and bf are the trainable kerneland the bias vector, respectively. Also, let vm be the 1 × d local sequencydomain filter outputs of the image patch Jp. We define shift matrix Td×|P |das:Tm = [tm(p, q)] ={1 if p− d(m− 1) = q0 otherwise,for m ≤ |P |, (3.6)where P is the list of the partitioned images. The shift matrix adjusts thedimensionality of the convolutional features to match that of the sequencydomain features. The multi-scale filter outputs are aggregated as:fLDH =|P |∑i=1viTi. (3.7)In order to augment the convolutional feature maps Yf ∈ RM×N×L withthe sequency domain multi-scale filter outputs fLDH , we introduce matricesΩ ∈ {0, 1}|P |d×MNL+|P |d and Λ ∈ {0, 1}MNL×MNL+|P |d as:Ω =[I(|P |d×|P |d) | [0](|P |d×MNL)], (3.8)Λ =[[0](MNL×|P |d) | I(MNL×MNL)]. (3.9)The final input to the fully connected layer is:y = fLDHΩ + vec (Yf ) Λ, (3.10)IN×N and [0]M×N are the identity matrix of size N and zero matrix of sizeM ×N , respectively. Also, vec(.) denotes the vectorization operation on amatrix.39In order to classify the augmented feature maps, a fully connected layerwith 2000 hidden neurons follows the last pooling layer. As indicated inFigure 3.1, in the proposed architecture, the input to the fully connectedlayer is comprised of a) the convolutional features which are obtained viathe convolution kernels from the previous layers (Yf ) and b) the multi-scaleand multi-directional filter outputs in the sequency domain (fLDH) fromeach image patch Jp in the training set.Since the rectifier function was used as the activation of the convolutionaland fully connected layers, all of the weights of the model were initializedusing the approach proposed in [53].TrainingThe conventional gradient descent algorithm along with the chain rule with aslight modification can be used to update the parameters of the network andaccordingly, train the model. Let o be the output of the softmax activationand z be the input to the fully connected layer. Therefore in order to updatethe weights of the model using the gradient descent algorithm, the error canbe back propagated via applying the chain rule to the derivative of the costfunction with respect to the parameters:∂L∂Θ=∂L∂o× ∂o∂z× ∂z∂y× ∂y∂Θ, (3.11)where Θ is the parameter set of the model before the fully connected layer.Hence, Θ can be decomposed into two sets: Θ = [ΘLDH ; ΘCNN ] whereΘLDH and ΘCNN correspond to the parameter set which is associatedwith the local directional Hadamard and convolutional features, respectively.(3.11) can be rewritten based on (3.10) as:∂L∂Θ=∂L∂oi× ∂oi∂z× ∂z∂y× ∂ (fLDHΩ + vec(Yf )Λ)∂Θ. (3.12)Finally, since only ΘLDH and ΘCNN contribute to the error originated fromfLDH and Yf , respectively, we have:∂ (fLDHΩ + vec(Y)fΛ)∂Θ=∂fLDHΩ∂ΘLDH+∂vec(Yf )Λ∂ΘCNN. (3.13)The backpropagation algorithm continues to update the convolutional ker-nels and biases by applying the chain rule to∂vec(Yf )Λ∂ΘCNN, as in a conventionalCNN.40Figure 3.3: Original paramedian US image of a randomly selected volun-teer’s laminae of the L3-L4 vertebrae at 7 cm depth (a), the bone enhancedimage using phase symmetry (b) and the weighted sum of the enhanced andthe original image (c).Due to high capacity, CNNs are often vulnerable to overfitting wherethere is not enough variability in the training data. Accordingly, regu-larization is critical in obtaining a generalizable model. In the proposedCNN architecture, the absolute value of the weights were bounded by the `2norm to prevent overfitting. Moreover, the dropout technique was used tostochastically add noise in the computation of the network [164]. Dropoutencourages the network to learn features that are independently useful, sinceeach neuron activation is masked to zero by a certain probability. Duringtraining, the `2 regularization parameter was set to 0.002 and after eachconvolutional layer a dropout layer with P = 0.25 was added.3.2.3 Image enhancementIn order to enhance the bone surfaces and accordingly, the wave-like patternof the laminae, a set of quadrature band-pass filters were used, as proposedby Abu Anas et al. [5], to estimate the phase symmetry, which is maximizedat bone locations. Also to exploit the echo from the ligamentum flavum, theenhanced image was augmented by a weighted sum of the original US image,which contains the echo, and the enhanced image as:augmented image = α× enhanced+ (1− α)× US,where α is the weight of the bone enhanced image and was set to 0.7 basedon trial and error. Figure 3.3 shows a sample US target plane and the boneenhanced image.413.2.4 Data collectionSince the input to the classification network are 2D planes of the 3D volume,the performance of the hybrid proposed localization model should be eval-uated on both 2D and planes of 3D US. Following written consent, threeseparate paramedian US data sets of the lumbar spine were collected byan expert sonographer using a SonixTOUCH US machine (Analogic Corp.Richmond, Canada): (A) one hundred and four 3D US volumes from thir-teen volunteers (8 volumes from each side of the spine with subtle movementof the US probe) at 7 cm depth. The gain and field of view were set to 50%and 43.9◦, respectively (ethics certificate number: H07-0691), (B) twenty3D US volumes from 10 different volunteers (one volume from each side ofthe spine) with various depth, gain and field of view optimized by the sono-grapher for each volunteer (ethics certificate number: H15-01310) and (C)ten sequences of 2D US images from 10 different volunteers 7 cm depth and50% gain (ethics certificate number: H15-01310).All of the participants were aged between 20 and 35 years whose bodymass indices (BMI) were less than 30 with no records of spinal pathology orsurgery.3D US data sets (A and B) were collected with a 5 MHz 3D motorizedtransducer from L3-L4 vertebrae of the spine and the transducer was placedlaterally within 10 mm of the midline and angled by 5◦ to 10◦. Thereforethe paramedian volumes contained the vertebral structures and particularly,the laminae of vertebrae which is the prominent anatomical landmark of theepidural space, was present in the volumes. The L3/L4 vertebral level wasscanned by identifying the sacrum and counting up to the L3 and L4 ver-tebrae. Each US plane was annotated by an expert sonographer as eithera target plane for epidural needle insertion or non-target plane. The la-beling procedure was repeated 36 weeks after the initial annotation by thesame sonographer. The average Dice similarity coefficient [161] of the cor-responding labels was 0.92. Based on the sonographer’s annotations, 545and 150 paramedian target planes were extracted from data set A and B,respectively.2D US sequences in (C) were captured with a C5-2/60 curvilinear trans-ducer at 3.3 MHz from the laminae of the vertebrae. The sequences startedfrom the L4 and ended at the L3 vertebra. Due to high correlation of thesubsequent frames, the captured 2D sequences were sub-sampled every 5frames. In total, 300 images were extracted from the sequences (30 fromeach participant).423.2.5 Data preparationThe proposed network was trained and validated on data set A and testedon data sets B and C. For pixel-level classification, image patches of sizeM ×M centered at the expert’s annotation, were extracted from the USimages. The patch size should be chosen with care, in order to take intoaccount the contextual information and local dependencies of the anatomicallandmarks i.e., relative location of the target with respect to the laminae ofthe vertebrae. Following the definition of the needle target in Section 3.2.2,the location of the bases of the laminae were annotated on the image as theneedle target. Target annotation was repeated by the same sonographer.The average distance between the corresponding annotations were 0.4 mmin lateral and 0.3 mm in vertical directions. In order to generate sufficienttraining samples, the annotated targets was transformed from a point to aregion via a box of the size 25×40 pixels. Pixels of the target and non-targetregions were used to train the network.Originally, the images contain more than 170,000 pixels each, therefore,training the network on the original planes would be computationally inten-sive and practically impossible. Thus, after augmenting the native planeswith the bone enhanced images, they were resized down to 100× 100. Thesliding window size M was set to 33 to guarantee an overlap of at least 13 ofthe image size between the window and the image. Also, the image was zeropadded to let the sliding window be located at the borders of the image.The CNN training data set consisted of 501,298 image patches includingequal number of target and non-target images, extracted from the optimalplanes. The network was also validated on the balanced leave-one-out cross-validation data.3.2.6 ImplementationThe proposed algorithm for target localization was implemented in Theano[2] and trained on a GeForce(R) GTX 980 Ti GPU with 6 GB of memory,hosted by a machine running Ubuntu 14.04 operating system on a 3.4 GHzIntel(R) CoreTM i7 CPU with 16 GB of memory.3.3 Experiments and resultsMultiple experiments were conducted to evaluate the performance of theproposed method. As an intermediate step to target localization, the pixel-level classification performance was investigated on equal number of target43Table 3.1: Summary of the evaluation metrics and their description.Category Metric DescriptionPixel-level classification Accuracy Correct classification rate on target andnon-target pixelsSensitivity Correct classification rate on target pixelsSpecificity Correct classification rate on non-target pixelsNormalized Similarity between two image patchescross-correlationTarget localization Lateral error Error in x direction (mm)Vertical error Error in y direction (mm)and randomly selected non-target patches. Then, target localization accu-racy was measured by comparing the expert annotations against the detectedtarget obtained by classifying all of the pixels of the image in validation/testdata sets. Table 3.1 indicates a summary of the evaluation metrics whichhave been used throughout the experiments.3.3.1 Pixel-level classification performanceFor each image patch, LDH features were extracted and aggregated usingequation 3.7. The final feature map of the convolutional layer was aug-mented with the corresponding LDH features using equation 3.10. Due tothe large size of the data set, the training phase was repeated 10 timeson 50,000 image patches which were randomly selected from the data set.Classification accuracy is defined as TtNt where Tt and Nt correspond to totalnumber of correct classifications and total number of pixels, respectively.The average classification accuracy of the proposed CNN with feature aug-mentation was 96% in the final training step.Different experiments were conducted to evaluate the effect of the aug-mented features and the performance of the proposed network. First, in or-der to verify whether or not the proposed network can generalize to cover theinter-patient variabilities in shape and size of the vertebrae, a leave-one-outcross-validation was performed on the training data (data set A). The av-erage pixel classification accuracy during the leave-one-out cross-validationwas 94% with 93% sensitivity and 98% specificity. Sensitivity and specificityare defined asTpNpand TnNn , respectively, where Tp is the number of correctlyclassified target pixels, Tn is the number of correctly classified non-targetpixels, NP is the number of target pixels and Nn is the number of non-targetpixels. Therefore, sensitivity corresponds to the classification performanceof the method on target pixels whereas specificity corresponds to the classi-fication performance on non-target pixels.44Figure 3.4: Histogram of the correlation between LDH features and a)first (mean 0.03 ± 0.11) b) second (mean 0.04 ± 0.13 ) and c) third (mean0.01± 0.11) convolutional feature maps of a similar CNN architecture withno feature augmentation after training (t-test p− value < 0.05).3.3.2 Performance evaluation of the augmented LDHfeaturesIn order to investigate the impact of the LDH features on the accuracy ofthe pixel-level classification, another CNN with the same number of convo-lutional layers, kernels and hidden neurons in the fully connected layer withno feature augmentation was trained on the same data set (experiment 1).Experimental results indicated that the CNN without feature augmentationsuccessfully classified 90% of the pixels during leave-one-out cross-validation.In order to investigate the correlation of the CNN feature maps and the LDHfeatures, pairwise correlations between the LDH features and CNN featuremaps were calculated at all of the convolutional layers (experiment 2). Fig-ure 3.4 shows the histograms of the pairwise correlations, which indicate nosignificant linear correlation between the CNN feature maps and the LDHfeatures, since the means are all smaller than 0.05 (t-test p− value < 0.05).Therefore, the CNN has not learned any variation of feature maps whichcan replicate the behavior of the LDH. However, since the classification per-formance has been improved with the LDH features, there is a nonlinearcorrelation between the LDH and CNN feature maps which has been ex-tracted by the nonlinearities in the fully connected layer of the proposednetwork.In the next experiment, the weights of the fully connected layer whichcorrespond to the LDH features were compared against those which corre-spond to the CNN feature maps (experiment 3). The average dot productof the weights of the fully connected layer and their associated feature maps(LDH or CNN) were monitored. After 200 epochs, the average dot productswere 0.018 for CNN and 0.0175 for LDH features, which implies that the45Figure 3.5: Pixel-level classifications of a) LDH features, b) CNN withno augmentation c) CNN augmented with the LDH features at the fullyconnected layer and d) ground truth relative to the original paramedianUS image of the laminae of the L3-L4 vertebrae of a randomly selectedvolunteer at 7 cm depth. The LDH features are not distinctive enough aloneto distinguish the area between target regions. However, they contribute tothe CNN by removing some outliers in (c).network has not neglected the LDH features during training. However, theLDH features were not distinctive enough for pixel-level classification. Thishas been investigated through training a fully connected neural network withthe same number of neurons in the hidden layer as in the proposed CNN, onthe LDH features (experiment 4). The average classification accuracy of theLDH features during leave-one-out cross-validation was 89% and the corre-sponding sensitivity and specificity was 83% and 90%, respectively. Figure3.5 shows sample pixel-level classification results of the CNN with featureaugmentation, without augmentation, only with the LDH features and thegold standard. As shown in Figure 3.5, multi-scale and multi-directional fea-tures contribute to the classification accuracy with removing some outliers.This was verified on the leave-one-out cross-validation data for all volunteers(see Figure 3.6).3.3.3 Target localization accuracyThe expert annotations were used to evaluate the target localization accu-racy of the proposed method. Since the annotated points were transformedto regions of interest i.e., a box centered at the needle target annotated bythe expert (see Section 3.2.5), the CNN tries to obtain the target regionsfrom the images. The localization error was measured by first, classifying allof the pixels of the image and obtaining target regions, accordingly. Then,the centroids of the regions were extracted and the distance between theexpert annotations and the extracted targets was measured for all of theimages in validation/test data sets. The centroid of the region which isclose to the midline of the image was considered as the clinically significant46Figure 3.6: Pixel-level classification performance of the proposed method(CNN+LDH) and the network without feature augmentation (CNN) on theleave-one-out cross-validation balanced data. LDH features contribute tothe classification accuracy for all of the cases.target for epidural needle insertion. The average error was 0.75 mm and0.38 mm in lateral and vertical directions, respectively for the leave-one-outcross-validation.To evaluate the target localization of the trained network on unseen datawith different acquisition parameters, the trained network was tested on thetarget planes of data set B. Also, in order to investigate the application ofthe proposed network on standard 2D US systems, the trained network wastested on data set C. The average localization error for planes of 3D vol-umes was 1 mm and 0.4 mm in lateral and vertical orientations, respectively(data set B). Also, the average localization error on the test data set of 2Dimages was 1.7 mm in lateral and 0.8 mm in vertical directions (data setC ). Moreover, target localization accuracy of a similar CNN without featureaugmentation was investigated. The average localization error for data setB was 8 mm and 6.2 mm and for data set C was 2.6 mm and 4.5 mm inlateral and vertical directions, respectively.3.3.4 Comparative studyThe proposed method was compared against two other previously proposedapproaches:1. a static template matching and image processing approach, proposed47Table 3.2: Comparison of the target localization errors (mm) of templatematching approach (TM), convolutional neural network with no feature aug-mentation (CNN), state-of-the-art deep network for biomedical image seg-mentation (U-net) and the proposed method (CNN+LDH) on test data sets.The errors of the proposed method are highlighted via asterisk where theyare significantly lower compared to other approaches (p − value < 0.05).Minimum and maximum errors have been indicated in brackets. The clini-cally acceptable lateral and vertical errors are 7 mm and 5 mm.Data Direction TM CNN U-net CNN+LDHB Lateral 2.6± 2.2 ≤ 9.2 8± 6.6 ≤ 36.9 1.4± 0.7 ≤ 3.3 1± 0.7∗ ≤ 3.4150 planes Vertical 2.1± 2 ≤ 8.7 6.2± 6 ≤ 24 0.5± 0.4 ≤ 2.2 0.4± 0.3 ≤ 2C Lateral 7.8± 4 ≤ 23.8 2.6± 2.6 ≤ 22.8 1± 0.7 ≤ 3.7 1.7± 1 ≤ 5.2300 images Vertical 3± 2.3 ≤ 12 4.5± 4.7 ≤ 23.5 1± 0.8 ≤ 3.6 0.8± 0.6∗ ≤ 3.2by Tran and Rohling [172] and,2. a state-of-the-art deep convolutional network for biomedical image seg-mentation, called “U-net”, proposed by Ronneberger et al. [143].For template matching, the bone surfaces of the US images were en-hanced using a set of bandpass filters in frequency domain, as suggested byTran and Rohling. Later, the LF template was generated using the sameparameters as in their study, and the region with maximum cross correla-tion score was annotated as the LF on the image. The results of templatematching (TM) were compared against the expert annotations. Next, theU-net architecture was implemented using Keras deep learning frameworkand was trained on the same training data i.e., bone-enhanced augmentedimages from data set A. The weights of the U-net were initialized based onHe’s method, as suggested by Ronneberger et al. [143], The same approachwas also used to initialize the weights of the proposed method. Since theU-net architecture is designed for end to end segmentation, its performancewas evaluated using the Sorensen Dice coefficient [161]. U-net achieved 81%dice coefficient on leave-one-out cross-validation. Figure 3.7 shows a samplesegmentation result of U-net along with the proposed method. Target local-ization error of template matching and U-net were investigated on data setsB and C. Template matching on the bone-enhanced images indicated anaverage error of 2.6 mm (lateral) and 2.1 mm (vertical) on the 3D test dataset (B) and 7.8 mm (lateral) and 3 mm (vertical) on the 2D test data (C ),whereas U-net localized the target with the average error of 1.4 mm andhttps://github.com/fchollet/keras48Figure 3.7: Sample segmentation result of a) the proposed network and b)U-net on a randomly selected volunteer’s image from data set B at 8 cmdepth. The yellow and red regions correspond to the correct and incorrectsegmentations, respectively. The green area indicates parts of the groundtruth which have not been covered by the network.0.5 mm for lateral and vertical directions, respectively on the test data setB. Also, on the 2D test data set, U-net target localization error was 1 mmin both lateral and vertical directions. Target localization errors on datasets B and C in lateral and vertical directions are summarized in Table 3.2.Since the group variances are non-homogeneous, Kruskal-Wallis analysis ofvariance along with Nemenyi post hoc tests were performed on target local-ization errors, in order to test the statistical significance of the localizationresults. The Kruskal-Wallis H-test on both 3D and 2D data sets indicatedthat the population median of all of the groups are significantly different(p < 0.05). Moreover, the Nemenyi post hoc tests on 2D data set indicatedthat the localization errors are significantly different between all pairwisecombinations. On 3D test data, the Nemenyi test failed to reject the nullhypothesis on the errors of U-net and the proposed method with p < 0.05.However, the test indicated there is a significant difference between the errorsof the proposed method and the original CNN without feature augmenta-tion on 3D test data. Therefore, the augmented LDH features significantlycontribute to the accuracy of target localization.3.3.5 Execution timeThe execution time of the algorithm on the same environment described inSection 3.2.6 was measured for 100 images, which were randomly selectedfrom data sets B and C (50 from each set). On the described setup, the49average localization time per image was 1.34 seconds.3.4 DiscussionAlthough the target localization error is correlated with pixel-level classifi-cation accuracy, accurate target localization can still be achieved without100% pixel classification accuracy, if the misclassified pixels do not changethe location of the centroid of the target region. Therefore, another metricwhich shows the performance of the proposed CNN is the normalized cross-correlation of the misclassified image patches and the closest patch whichbelongs to the region of the needle target, which was 81% on average forboth the 2D and the planes of the 3D data set. This indicates a high sim-ilarity between the misclassified patches and the closest patch belonging tothe target region. Therefore, the majority of the misclassified patches arelocated close to the border of the regions.The clinically accepted error for epidural needle placement is definedby the width and the thickness of the ligamentum flavum to avoid over-shoot. According to the literature, the width of the ligamentum flavum isapproximately 7 mm [126]. The thickness of the ligamentum flavum is ap-proximately 5 mm [58] but thickness decreases as the ligament is stretched,e.g. thickness was 2.6 mm on average in B-mode ultrasound images of par-turients with spinal flexion [172]. Because of the complications of the needleovershoot, minimizing the vertical localization error is clinically more sig-nificant compared to the lateral error. The vertical localization error of theproposed method and U-net is below 5 mm in both data sets B and C for allof the test images. Also, both U-net and the proposed method successfullylocalized the epidural space with less than 2.6 mm error in all of the planesof the 3D data set (B). However, the vertical localization errors of U-net andthe proposed method were above 2.6 mm (max. 3.6 mm vs. max. 3.2 mm)in 34 and 14 images out of 300 in C (11% vs. 5%), respectively. Therefore,not only the proposed algorithm has fewer outliers in localizing the needletarget compared to U-net, but also the maximum vertical error of the pro-posed method is slightly less than U-net. Moreover, the proposed methodoutperforms U-net and the template matching approach in estimating thedepth of the needle target in all of the test data, since it indicates smallermean vertical error in data sets B and C (see Table 3.2).Since efficient implementation of the proposed method has not been thefocus of this work, the current implementation does not run in real time(see Section 3.3.5). However, because classifying a given pixel of an image is50independent of classifying other pixels, a concurrent parallel implementationof the method would highly reduce the execution time. Alongside concur-rency, utilizing additional techniques, such as cropping the image based onanatomy can reduce the target localization time. Therefore, the proposedmethod can potentially be used in real-time applications.Since the classification network detects the target planes within native2D planes of 3D US, the proposed method can be combined with a guidancesystem for accurate target localization and needle guidance in both 2D and3D US-guided epidurals. However, the target localization performance needsto be evaluated on misclassified planes, once the classification network hasbeen integrated into the localization pipeline. Finally, the output of theproposed method should be regarded as “additional information” to increasethe confidence level of the operator for target identification and localization.As operator experience with the system and interpretation of the targetappearance in US grows, the operator is expected to be able to identify theoutliers from the algorithm and simply recognize the target.3.5 ConclusionIn this chapter, we presented a hybrid model to localize the epidural space inthe paramedian US images of the lumbar spine. In particular, we proposed aconvolutional network architecture along with a feature augmentation tech-nique to localize the needle target in the target planes. This is achievedthrough pixel-level classification of the target planes. We showed there isno linear correlation between the augmented LDH features obtained fromthe sequency domain and the convolutional feature maps. Our experimentalresults showed the LDH features from the sequency domain provide multi-scale representations of the data which are non-linearly correlated with theconvolutional feature maps, hence they contribute to the performance of thepixel-level classification. We validated the accuracy of our proposed methodusing leave-one-out cross-validation and also tested on different unseen datasets.The commonly performed technique for epidural anesthesia is midlineneedle insertion, where the needle puncture site lies on the center line ofthe spine. As explained in Chapter 1, the hybrid target localization system,which was proposed in this chapter, can be used to assess the depth of theepidural space in the paramedian view. This assessment gives the anesthesi-ologist an estimate of the epidural space depth for midline insertion, whichcan be confirmed via the conventional LOR technique. US can be utilized51to visualize the anatomy in the transverse view in order to facilitate themidline insertion. However, a system is needed to help the anesthesiologistwith the identification of the correct needle puncture site in the transverseplane. In Chapter 4, the problem of midline identification in transverse USimages is investigated.52Chapter 4Identification of the centerline of the spine intransverse images4.1 IntroductionIn Chapters 2 and 3, machine learning models were proposed to facilitateepidural injection using the paramedian view. However, midline insertionis considered the standard-of-care in North America, since unlike the para-median approach, the needle trajectory does not encounter muscle. Hence,the procedure is less painful for the patient. In a study, Kopacz et al. as-sessed the epidural success rate of new residents [80]. Their results indicatedthat the midline approach was associated with fewer attempts and a highersuccess rate in novices.Because of the symmetry of the lateral vertebral structures, visualizingthe intended needle trajectory and the anatomical landmarks of the target ismore apparent in the transverse plane for midline needle insertion and hence,transverse US has been studied to identify the proper puncture site. Arzolaet al. investigated the success rate of using transverse US for identificationof the needle puncture site for midline insertion [8]. Their study showedthat the puncture site can be accurately identified with transverse US inapproximately 92% of the cases.Although research indicates promising results in using transverse US forepidural anesthesia, anesthesiologists receive limited or no US training inorder to utilize pre-procedural US. This is significant, since displacement ofthe US transducer by a few millimetres radically changes the depiction ofthe spinal anatomy. Computer vision and machine learning algorithms havebeen proposed to augment the US image with additional information to helpthe anesthesiologist with identifying the correct puncture site.Hetherington et al. proposed an automatic method, called “SLIDE”, toidentify the vertebral levels [55] in the transverse plane. They transferred53the weights of a pre-trained neural network from natural image classificationfine-tuned them on the transverse US images of the spine.In another study, Toews et al. proposed a technique, called “Project-Align”, based on the normalized cross correlation between the original andthe horizontally flipped image to detect the center line of the spine in thetransverse images [167]. Upon identification of the center line, their systemprojects the detected line on the back of the patient. Hence, the anesthesi-ologist is provided with real-time augmented reality, during the scan.Although SLIDE can detect the vertebral levels, it does not provide anyfeedback regarding whether or not the current US frame is centered at themidline of the spine. On the other hand, ProjectAlign detects the center lineof the spine within the midline images. This implies that the US transducershould be placed within sub-centimeter distance of the center line of thespine. Therefore, a system is required to identify the midline images priorto detecting and projecting the center line.4.1.1 ContributionIn this chapter, we propose a model, called SymNet, based on deep convolu-tional neural networks to identify the transverse 2D US midline frames anddistinguish them from the off-center images. Since 2D US is more accessibleand more commonly used in clinics compared to 3D, we focus on 2D trans-verse images in this chapter. Moreover using 2D US, the proposed systemalong with SLIDE and ProjectAlign can be used in a complete guidance sys-tem to accurately identify the proper needle puncture site prior to insertion.Even if used alone, midline guidance may benefit the clinical workflow andtraining novices.4.2 Methods and materialsAs shown in Figure 4.1, the transverse midline images of the spine indi-cate symmetrical ultrasonic echoes on the left and right side of the image.The bright echoes are typically reflected from the surfaces of the bone, i.e.,spinous processes, transverse processes, laminae and facet joints. In addi-tion, midline images indicate symmetrical shadow, when the center of theUS transducer is placed on the spinous processes. However, the aforemen-tioned symmetry disappears once the transducer is moved laterally from thecenter line of the spine (see Figure 4.1-a and d). Therefore, accurate detec-tion of the symmetrical features will lead to identifying whether or not agiven image has been acquired from the center line of the spine.54spinous processlaminafacet jointtransverse processabcdbcadlaminaetransverse processfacet joint spinous processspinous processlaminaspinous processlaminaFigure 4.1: Sample transverse US frames of the spine, visualizing the ver-tebral anatomy in the off-center (a and d) and midline (b and c) images.Vertebral structures and their corresponding US echoes are shown in thevertebra schematic and the US images. Because of the geometrical shapeof the vertebrae, the midline frames show symmetrical features, when com-pared against the off-center images. Illustrations are provided by VickyEarle.In the following, a deep network architecture is proposed that can learndistinctive features from the midline and off-center images.4.2.1 Network architectureFigure 4.2 shows the architecture of the proposed deep neural network, calledSymNet. We hypothesize that since the midline images show symmetricalanatomical features, convolutional feature maps obtained from the originaland the mirrored images are very similar, when compared against the off-center images. In order to verify this hypothesis, we propose the networkarchitecture shown in Figure 4.2. SymNet has two independent paths forthe original and the mirrored image, with shared convolutional kernels andbiases. Therefore if the input image is symmetrical, the final feature vectorsfrom the original and the mirrored paths (forig and fmirror) are going tobe similar. Our proposed network uses four convolutional layers with 10kernels of size 3 × 3 with a 2 × 2 stride for both paths. The output ofeach convolutional layer is masked by 0 with probability 25% via dropout5510 conv 3x3stride(2,2)dropout 0.2510 conv 3x3stride(2,2)dropout 0.2510 conv 3x3stride(2,2)dropout 0.2510 conv 3x3stride(2,2)dropout 0.2510 conv 3x3stride(2,2)dropout 0.2510 conv 3x3stride(2,2)dropout 0.2510 conv 3x3stride(2,2)dropout 0.2510 conv 3x3stride(2,2)dropout 0.25inputmirror dense 2softmax shared weightsf orig  f mirrorFigure 4.2: Block diagram of the proposed network architecture. Theconvolutional weights and biases are shared between the original and themirrored paths. The obtained features maps from the original image (forig)and its mirrored version (fmirror) are concatenated before classification.operation [164]. A dense layer with 2 output neurons connects the featurevectors to a softmax function, which produces the likelihood of a given imagebelonging to either the midline or the off-center class. Rectified linear units(ReLU) [42] were used as nonlinear activations for all of the layers.4.2.2 DatasetAfter obtaining written consent, twenty volunteers aged between 20 and 40,whose body mass indices (BMI) were less than 30, participated in the study(ethics certificate number: H15-01310). From each volunteer, an expertsonographer acquired three separate 2D US sequences from the right, left,and the center line of the spine, each starting from the sacrum to the T12 (60US sequences in total). The sequences were reviewed by the sonographer forquality control and then anonymized prior to saving. In order to maintainthe scan consistency, the following protocol was defined:• A frame is considered “midline”, if the center of the transducer iswithin the width of the spinous process.• Prior to each scan, the sonographer identifies the sacrum as the start-ing point.• The off-center scans are captured within an arbitrary distance, rangingfrom 1 cm to 6 cm, from the center line.Due to high inter-frame correlation, the sequences were subsampled every5 frames. All of the frames were divided into two categories for classifica-tion, i.e., midline and off-center frames. Sixteen volunteers were randomlyselected and a random subset of their corresponding data were considered56for training and validation, which consists of 11090 off-center and midlineframes (22180 in total).The data from remaining 4 volunteers were put into the test set, con-taining 11720 images. In order to evaluate the performance of the classifier,the train and test data were balanced with an equal number of midline andoff-center images.All of the sequences were collected by a C5-2/60 curvilinear transduceroperating at 3.3 MHz attached to a SonixTOUCH US machine (AnalogicCorp. Richmond, Canada). Data acquisition parameters including depth,gain, and focal point were optimized by the sonographer for each volunteer.4.3 Experiments and resultsThe following metrics were used to evaluate the classification performance:1. classification accuracy (ACC): the correct classification rate for midlineand off-center images,2. specificity (SPC): the correct classification rate for off-center images,3. sensitivity (SEN): the correct classification rate for midline images,4. the area under the receiver operative curve (AUC), which defines theprobability of correct classification for either midline or off-center cat-egories.Several experiments were conducted to evaluate the performance of thenetwork. First, we investigate the efficacy of adding a separate path for themirrored input through analyzing forig and fmirror for midline and off-centerimages. Then, we evaluate the classification performance of the networkon the test data. Moreover, we compare the classification results of ourmodel with those of a conventional CNN and a static approach based on thenormalized cross correlation metric.4.3.1 Experiment 1-feature comparisonAfter training, forig and fmirror of all of the images in the test dataset wereextracted. Then, we compared the `2 norm of the difference between the forigand fmirror. Figure 4.3 shows the histogram of the `2 norm of the differencefor the midline and the off-center images. As shown in Figure 4.3, the meanof the `2 norm is significantly lower for midline frames, compared to theoff-center images (3.55±1.54 vs. 5.59±1.80 t-test pvalue < 0.0001). This is570 2 4 6 8 10 12 14 16||forig fmirror||050100150200midlineoff-centerFigure 4.3: Histogram of the `2 norm of the distance between forig andfmirror of the test data. Since the midline frames have symmetrical anatom-ical features, the `2 norm is significantly less than that of the off-centerimages (mean 3.55± 1.54 vs. mean 5.59± 1.80 t-test p− value < 0.0001).due to the symmetrical anatomical features of the vertebrae in the transverseUS images. Therefore, the proposed network has efficiently learned featureextractors, which are sensitive to symmetry in the images.4.3.2 Experiment 2-classification performance of SymNetversus single-path CNNWe trained our proposed network on the training data for 1000 epochs on thebatches of 64 samples with learning rate 10−4. When trained on the originaltrain set, the proposed network correctly classified 88% of the unseen datawith 86% sensitivity and 90% specificity. The network had an AUC of 0.94for both the midline and the off-center classes.In order to investigate the impact of the mirrored features on the classifi-cation accuracy, a single-path CNN with the same number of parameters asthe proposed model, was trained on the training data. For consistency, allof the hyper-parameters remained unchanged. The network achieved 86%of correct classification rate on the test data with 83% sensitivity and 90%specificity.The AUC of the single-path CNN was 0.93 for both of the midline andthe off-center classes.58Figure 4.4: Histogram of the maximum normalized cross correlation (NCC)of the midline and off-center images of the training data (a) and the effectof the threshold on the classification performance for test data (b). Both aand b suggest that the optimal threshold for distinguishing midline versusoff-center images via NCC is approximately Experiment 3-comparison with normalizedcross-correlationIn addition to the single-path CNN, we investigated whether or not thenormalized cross correlation (NCC) metric can be used to distinguish themidline images from off-center frames, i.e., a classic non-learning approach.Similar to the approach proposed by [167], a rectangular template from themiddle of the training images was cross-correlated with its mirrored version.Figure 4.4-a shows the histogram of the maximum NCC for midline and off-center images of the training data. As shown, a threshold of approximately0.61 the is optimal value to distinguish midline versus off-center images.NCC with the optimal threshold was able to classify 71% of the test data,which was the highest accuracy on the test data that the NCC can achieve.Figure 4.4-b shows the effect of the threshold on the classification accuracy,sensitivity and specificity. Note that at 0.61 NCC, the accuracy reachesto its maximum value. Moreover, the specificity and the sensitivity areapproximately equal at 0.61 (71% vs. 69%), which indicates the thresholdis optimal for balanced classification.Table 4.1 summarizes the classification performances of SymNet alongwith those of a conventional single path CNN and the non-learning methodusing NCC.59Table 4.1: Summary of the classification performance of the staticmethod using normalized cross correlation with the optimal threshold 0.61(NCC0.61), the conventional single-path CNN and SymNet on the test sub-jects. The conventional CNN had lower area under ROC for both midline(AUCm) and off-center (AUCo) images. Adding a path for mirrored convo-lutional feature maps contributed to the classification accuracy.Method (11720 images) ACC % SEN % SPC % AUCm AUCoNCC0.61 71 69 71 N/A N/Asingle-path CNN 86 83 90 0.93 0.93SymNet 88 86 90 0.94 0.944.4 DiscussionAccording to Figure 4.3, the histograms of the `2 norm for the midlineand off-center images have some overlap. This suggests that similar to themidline frames, a number of the off-center images have matching forig andfmirror. However, since the fully connected layer along with the softmaxfunction comprise a nonlinear classifier, it is able to discriminate the midlineand off-center feature vectors efficiently. This has been reflected in theperformance evaluation metrics of the proposed method on the test data.The NCC is a non-learning metric and does not leverage the statisticsof the data. Hence, the obtained classification threshold cannot be gener-alized to unseen images. Thus, a statistical approach that can “learn” theanatomical features from the images is preferred over approaches such asNCC for identifying midline images.4.5 ConclusionIn this chapter we introduced a deep convolutional neural network, calledSymNet, which identifies the transverse US midline images of the spine. Inparticular, the network learns the symmetrical anatomical features of themidline frames and utilizes them for image classification.We validated our results by testing the network on data from unseen sub-jects. Moreover, we compared the classification results of our proposed net-work with those of a conventional CNN and a non-learning approach basedon the NCC as a symmetry metric. The experimental results showed that anadditional path with shared convolutional parameters for the horizontallyflipped image leads to an improvement of the classification performance,60when compared against the conventional deep neural network.A large set of labeled images is needed in order for SymNet to learnvariations of the transverse images. Therefore, an effective algorithm thatcan augment the existing training set with realistic US data is favourable.In Chapter 5, we investigate the problem of medical data augmentation toimprove the performance of the classifier.61Chapter 5Adaptive augmentation oftransverse US images duringtraining of a classifier5.1 IntroductionData-driven models in medical image analysis utilize machine learning algo-rithms to learn information about the anatomy from a set of images. Dueto the power of these algorithms in learning structures directly from thedata, machine learning for medical image analysis is becoming a widespreadtechnology to solve real-world problems in clinics [99, 154]. In particular,deep models have shown significant success in automatically extracting keyfeatures from clinical images and utilizing them for image classification orsegmentation [21, 35, 60, 144].For instance, Korez et al. proposed a 3D convolutional neural network(CNN) architecture for segmentation of vertebral bodies from 3D magneticresonance (MR) images of the spine [81]. Their proposed network obtainsa 3D spatial probability map of the vertebral bodies and guides a set ofdeformable models to converge to the map.In another study, Mwikirize et al. proposed a CNN architecture based onfully convolutional networks (FCN) and region based convolutional networks(R-CNN) for needle detection. Moreover, they transferred the FCN and R-CNN weights from non-medical domain and fine-tuned them on their dataset. Their method successfully localized the shaft and the tip with sub-degree and sub-millimetre accuracy, respectively [121].Despite their success in image recognition tasks, deep models requirelarge amounts of data to be regularized. This is required in order to guaran-tee a minimum difference between train and test errors, which is crucial inThe materials in this chapter are adopted from:Pesteie et al., Adaptive augmentation of medical data using independently conditionalvariational auto-encoders. IEEE Transactions on Medical Imaging (2019).62clinical applications. However, small sample size is a typical characteristic ofmedical image data sets since clinical data collection is associated with highcomplexity and cost. On top of data acquisition, annotation and cleaningis often required prior to training, which adds a significant cost overhead.Moreover, inter-observer interpretation agreement of medical data is highlydependent on clinical expertise [27, 50, 140, 146, 197], which directly affectsthe training of models. In order to avoid the costs associated with collectinglarge datasets, two main approaches have been investigated:1. model engineering and learning strategies, which investigates designingmodels and developing learning algorithms that can learn from smalldatasets,2. data augmentation, which tries to expand the existing dataset in orderto introduce more intra-class variance in the data.These two approaches have been reviewed in the following.5.1.1 Model engineering and learning strategiesThe architecture of deep models is of crucial importance, which directly af-fects their performance [54]. Deeper architectures do not necessarily corre-spond to higher accuracies. Therefore, careful design of the model is requiredto efficiently learn the hidden features of the data. Hence, specific modulessuch as residual [54], dense [64] and inception [166] were introduced in orderto achieve more accurate object recognition. Optimizing deep models formedical image analysis, in particular, is challenging due to the small sizeof datasets. In a study, Shaikhina et al. showed that a large set of variousinitial parameters (≥ 2000) is required to mitigate the effect of insufficientsample size [154].A group of methods leverage the context via down scaled input images.For instance, Kamnitsas et al. proposed a dual path CNN model for brain le-sion segmentation that incorporates local and global contextual informationvia two separate input streams [70]. Ronneberger et al. proposed the U-netarchitecture that utilizes data augmentation in order to overcome the is-sue of limited data samples [144]. Their architecture outperformed previousstate-of-the-art on segmentation of the neural structures in microscopic im-ages. However in order to achieve high performance, the input data neededto be augmented via random transformations and deformations. Since theintroduction of the U-net architecture, different variations of the same modelhave been proposed for various tasks [21, 117, 142]. In a study, Clark et al.63incorporated the inception reduction blocks from [166] into the U-net modelfor prostate gland segmentation in diffusion weighted magnetic resonanceimages [174]. Their proposed method achieved a Dice score of 0.89.Aside from engineering the model architecture, learning strategies havebeen proposed to leverage small amounts of data for training. For instance,active learning algorithms have been investigated for training high perfor-mance models with minimum supervision. These algorithms are well-suitedto problems, where data annotation is costly. A typical active learner startsthe training with a subset of the dataset, which is labeled. Then, it se-lects informative samples, i.e., queries, from unlabeled data, and asks anexpert to label them, which are added to the training dataset. The modelis re-trained on the new set and this procedure is repeated until the modelreaches the desired performance.Because of high labeling costs of medical data, active learning has beenconsidered a viable solution to train efficient models [40, 59, 155, 192]. Ina study, Stanitsas et al. utilized active learning to recognize three differ-ent types of cancerous tissue, namely, breast cancer, prostate cancer andmyometrium [165]. They proposed a multi-stage training method to avoidoverfitting, where at each stage a CNN is trained on a small number oflabeled samples and then used to predict the class labels of the “activepool”. The predictions are ranked based on an uncertainty measure andthen presented to an expert for labeling. Their experiments showed thatactive learning leads to faster training of the model, while achieving similarperformance compared to randomized sampling methods.In another study, Yang et al. proposed a similar approach for biomed-ical image segmentation [195]. They obtained uncertainty and similaritymeasures from a fully-convolutional network to determine the most repre-sentative data samples to annotate. Those samples are then labeled by anexpert and added to the training set.Transfer learning has also been investigated to overcome the problemsassociated with small size of dataset. Two major strategies have been in-troduced: 1) using a pre-trained network as a feature extractor, and 2) fine-tuning an existing pre-trained model on medical data for a specific task [100].In a recent study, Nguyen et al. used a combination of pre-trained inceptionand resnet to extract features from microscopic images [123]. They laterconcatenated the extracted features and trained a classifier to distinguishnormal versus abnormal cells. In another recent work, Yap et al. inves-tigated the performance of three conventional deep models, namely patch-based LeNet [90], U-net and transfer learning fully convolutional-AlexNet[102] for breast lesion detection in US images [196]. Their experimental64results showed that the transfer learning FCN-AlexNet can achieve high F-measures, i.e., the weighted harmonic mean of recall and precision, (0.90 onaverage).Typically, the conventional pre-trained networks have a large number ofparameters, which is not efficiently utilized for recognition tasks [65]. More-over, smaller models are less memory demanding and accordingly, can bemore easily deployed on standard computing hardware locally in the clinic.However, due to the limited size of medical datasets, designing efficient mod-els remains a key challenge. Hence, data augmentation methods that expandthe dataset with realistic data are beneficial.5.1.2 Data augmentationPrevious research indicates that data augmentation acts as a regularizerand prevents overfitting in deep models [22]. Moreover, data augmentationtechniques can be used to improve the performance of the model in imbal-anced problems, which is common in medical data [114, 201]. Therefore,augmentation methods have been an active area of research in the past fewyears in the computer vision community [30, 173]. In a study, Wong et al.showed that when plausible transformations of data are known, augmen-tation in data space improves the classification performance [188]. Cubuket al. proposed a procedure for training deep models based on reinforce-ment learning (RL) [25]. In order to find the best set of transformationsfor data augmentation, they quantized the parameter space of all of theknown image transformations, e.g., rotation and translation. They used RLto search the parameter space and optimize the parameters based on theperformance of the classifier. In another work, Lemley et al. introduced atechnique, called Smart Augmentation, that generates new images via com-bining samples from the same class [96]. DeVries and Taylor investigateddata augmentation in feature space [29]. They showed that extrapolatingbetween representations in the feature space can improve the performanceof supervised models.Variational auto-encoders (VAEs) [76], as a class of generative modelsthat learn an embedding space to approximate the distribution of data, havebeen utilized for data augmentation [63, 128]. Wan et al. showed that VAEscan be helpful in imbalanced problems [181]. They modified the originalMNIST dataset [92] by reducing the number of samples in some classes andtrained a VAE on the reduced dataset. Then, they generated samples bythe trained VAE and augmented the modified dataset with synthetic images.Their experimental results showed that the generated samples contribute to65improving classification accuracy.In addition to VAEs, generative adversarial networks (GANs) have alsobeen investigated for data augmentation. In a study, Antoniou et al. pro-posed a GAN model that views the generative process as a transformation toa sample from the dataset [7]. Hence, the generator can be used to augmenteach class of the data. They showed that their proposed model consis-tently improves the classification performance of vanilla classifiers on differ-ent datasets. Zhu et al. proposed a framework based on cycle-consistentGAN [202] and regular CNN to learn unpaired image-to-image translationfrom reference to target classes. Their experiments show that GAN-baseddata augmentation techniques can lead to improvement of classification ac-curacy by 5-10%. In another study, Bao et al. proposed a framework thatutilizes a conditional VAE in an adversarial setup [9]. In addition, they pro-posed a mean feature matching objective to avoid the unstable training ofthe GAN. They showed that their proposed model can be used to effectivelyaugment the training dataset of a discriminative network.Han et al. showed that adversarial networks can generate realistic mag-netic resonance (MR) images of brain for data augmentation and physiciantraining [51]. Their experiments showed that even an expert physician failsin distinguishing between real and synthetic images in some cases. In an-other study, Nie et al. used GANs to generate computed tomography (CT)images from their corresponding MR images [127].GANs typically try to match the distribution of source (synthetic) totarget (real) domains. However, this can be problematic when the targetdomain has over or under representation of some classes. A recent study byCohen et al. showed that the generated images by GANs are often mislead-ing for diagnosis [23]. This is due to the fact that in an adversarial setup, thediscriminator can be biased by certain features, e.g., tumors in image, andhence, assign more capacity of its weights to learn those features. Accord-ingly, the generator tries to generate samples that contain those features toincrease the loss of the discriminator, which leads to an over-represented fea-ture in generated samples. Moreover, GANs usually suffer from instabilityduring training and require improvements [46, 149, 163].Conventional augmentation techniques, which include static transfor-mations such as translating the image by a few pixels, mirroring, randomlycropping the image and RGB channel shift, do not consistently improve theperformance of deep models across different datasets. For example, whilemirroring the image improves the training accuracy on CIFAR-10 dataset[82], it has limited effect on MNIST, due to the different symmetrical featuresof the images between these datasets [25]. Moreover, although conventional66augmentation methods enforce invariance in models, they cannot incorpo-rate prior knowledge during training. Therefore, a method for adaptive dataaugmentation based on the performance of the discriminative model couldbe beneficial during training. Hence, the objective of this study is to 1)introduce a flexible generative model for data augmentation that can eas-ily incorporate prior knowledge into the generated data and, 2) propose anadaptive algorithm that utilizes the generative model to train the discrimi-native model.5.1.3 ContributionsIn this chapter, we propose a conditional generative model based on VAEsalong with an adaptive training algorithm, which utilizes the generativemodel to augment the training data. Conditional variational auto-encodershave been previously proposed [160]. However, our model learns the em-bedding space independent of the labels, which is the key difference whencompared against the method proposed in [160]. By enforcing independencebetween the embedding space and the labels, we show that the proposedgenerative model is capable of incorporating prior knowledge and hence,can generate realistic samples from sparse parts of the distribution of data.In addition, since the embedding space and the labels are independent, weshow that the model can generate samples with different features while pre-serving the label correspondence. Using this property, we demonstrate thata discriminative model can be trained on an augmented dataset, where theproposed model can adaptively generate samples, which contribute to thetraining accucary of the discriminator. Since our proposed generative modelis not adversarially influenced by the discriminator, it does not over- orunder-represent certain features of images.In order to investigate the impact of the proposed adaptive trainingframework on the performance of a discriminative model, we consider theproblem of identification of the midline of the spine in transverse US images.We show that the proposed adaptive training improves the performance ofa baseline deep discriminative model. Moreover, we investigate the effect ofadaptive data augmentation during training SymNet, which was proposedin Chapter 4.5.2 Methods and materialsWe start with a brief introduction of the preliminaries followed by a dis-cussion about the background of the variational auto-encoders. We, then,67proceed to our proposed model and derive the variational lower bound forit. Next, we discuss sampling from the learned distribution and introducetransformations, which are used to enforce the prior knowledge into sam-pling. We then use the generative model along with the transformations asbuilding blocks of a training algorithm and introduce an adaptive methodto train a discriminative model.5.2.1 PreliminariesIn order to formulate the problem of adaptive data augmentation, let (x, y)be a sample pair from dataset D, where x is the input and y is its corre-sponding label. Also, let z be a latent variable in an embedding space witha known distribution parameterized by ξ. Our goal is to design a genera-tive model that encodes the appearance features of x into z, independentof its label. Later, it draws samples from the embedding space and usesy to generate new synthetic data, where the correspondence between theinput and labels is preserved. Let G be the generator of the model, with twoinputs: a random sample from the latent space (z) and y. Also, letM be adiscriminative model with parameter set ψ.We hypothesize that G along with a transformation function that mod-ifies the labels (T (y)), can be used to adaptively augment the D duringtraining of M, based on its performance. T (y) is a heuristic, which incor-porates the prior knowledge about D into labels. Hence, it can modify thedistribution of the training labels in order to match that of the test data.For instance, if the training data are imbalanced between classes, a transfor-mation function can be utilized to generate samples from the sparse classes.Moreover, since the appearance features of the input are encoded into z,different samples from the embedding space correspond to various patternsin the synthetic data. Therefore, the D can be adaptively augmented withrespect to the performance ofM during training. This is feasible by chang-ing the parameters of the distribution of z, based on the training loss ofM.Accordingly, D is augmented with a set of synthetic data that contribute tothe performance of M.5.2.2 Variational auto-encoderVariational auto-encoders (VAEs) are a type of directed graphical modelswith parametric latent variables, such as Gaussian latent variables. VAEsare comprised of a variational approximator qφ(z|x), which tries to estimatethe intractable posterior p, and a generative model pθ(z)pθ(x|z), where z is68Figure 5.1: Block diagram of the independent conditional VAE (ICVAE)model. The variational approximator and the generator are parameterizedby φ (red) and {ω, pi} (green), respectively. After training, synthetic datacan be generated from a random latent sample z and a particular label y.the latent variable and x is the input. The approximator and the generatorare parameterized by φ and θ, respectively. The goal of the VAE modelis to jointly learn θ and φ. This can be efficiently achieved via stochasticgradient variational Bayes (SGVB) framework [76], where the variationallower bound is maximized with respect to the variational and generatorparameters (φ and θ). In a conventional VAE model with embedding z, thevariational lower bound is defined as:LVAE(θ, φ, x) = −KL(qφ(z|x)||p(z)) + Eqφ(z|x) [log pθ(x|z)] , (5.1)where KL(q||p) is the Kullback-Leibler divergence of q and p. Since thegradient of L with respect to φ is not straightforward to calculate, qφ(z|x)can be reparameterized by a known differentiable function gφ(x, ), where is a random noise with prior p(). The reparameterization trick allows thevariational lower bound to be optimized using gradient descent. Both thevariational and the generator parts of the VAE can be modeled via a neuralnetwork.5.2.3 Independent conditional VAELet p(x, y, z) be the joint distribution of the input, labels and the latentvariable. The goal of the independent conditional VAE (ICVAE) is to learn69p(x, y, z) and later, generate images for data augmentation through sam-pling from the sparse parts of the distribution. We hypothesize that thebiased sampling contributes to the training of a discriminative model, dueto more balanced training data, as well as increased intra-class variance ineach batch.Learning to encode structures in the embedding space in the VAE modelhas been investigated in [160]. However, the existing conditional VAE en-codes both the labels and the input into the embedding space.For data augmentation, it is preferred to have the embedding space andthe labels independent of each other. This allows one to control the char-acteristics of the image independently and also incorporate prior knowledgeinto the labels in order to balance the data. If y and z are independent, byconditioning over z, p(x, y, z) can be rewritten as:p(x, y, z) = p(x|y, z)p(y|z)p(z) = p(x|y, z)p(y)p(z). (5.2)Similar to a conventional VAE, we can consider a known prior distributionfor p(z), e.g., z ∼ N (0, I). Moreover, we can use p(y) to impose priorknowledge about the dataset into sampling.In the ICVAE model, p(x|y, z) and accordingly, the joint distributionare parameterized via θ. Therefore, since p(z) and p(y) are known, thejoint distribution can be learned by optimizing θ. In order to enforce theembedding and label independence, the variational and the generator modelshave to be modified such that y has no effect on z. Figure 5.1 shows ablock diagram of the ICVAE model, which consists of a variational (red),a generator and an encoder model (green) parameterized by φ, ω and pi,respectively. The variational approximator encodes the input into the latentvariable z and the generator uses the labels y and the latent variable toreconstruct the input xˆ. Accordingly, the generator would approximatep(x|y, z) after training. In the ICVAE model, the latent variable and thelabels are concatenated prior to the first layer of the generator. Since theymay have different dimensionalities, the reconstructed input xˆ would bebiased by the one with higher dimensionality. This is due to the dominanceof the vector with higher dimensionality in weighted summation in a fullyconnected layer, or convolutional feature maps in a convolutional layer. Inorder to balance the impact of y and z on xˆ, an encoder parameterized byω compresses y into a vector yˆ with matching dimension with z. Hence,p(x|y, z) is parameterized by θ = {ω, pi}:ppi,ω(x, y, z) = ppi,ω(x|y, z)p(y)p(z). (5.3)70Algorithm 1 AdaTrainInput: D = {(x, y)}, G, T (y), E = {ei}, Mψ, µ0, σ0Output: Mψ with trained ψξ ← (µ0, σ0)for e in E doz ← Sample(N (ξ))D = D ∪ {G(z, T (yD))}for each minibatch B in D dog ← grad(LD(B))ψ ← SGD(g, ψ)end for¯`= 1N∑i LD(G(zi, T (yiD))β ← {zi : LD(G(zi, T (yiD))) > ¯`}ξ ← (mean(β), std(β))end forVariational lower boundIn order to learn ppi,ω(x|y, z), we use the SGVB framework to optimize piand ω. Let p(z|x) be the true posterior, which is going to be approximatedby q(z|x) through minimizing the KL divergence. It can be shown that theKL divergence for the ICVAE model can be written as (see Appendix A):KL(q(z|x)||p(z|x)) = log p(x)−∑zq(z|x) log p(x, y, z)p(y|z, x)q(z|x) . (5.4)ICVAE’s objective is very similar to the conventional VAE. By maximiz-ing the likelihood of the data, the ICVAE model is enforced to first, learna mapping from the distribution of the data to a known parametric prior(p(z)) and then, use samples from the latent space along with the labelinformation to generate realistic synthetic data.With the use of the Bayes law, the variational lower bound can be rewrit-ten as (see Appendix B):LICVAE(φ, pi, ω, x, y) =−KL(qφ(z|x)||p(z))+ Eqφ(z|x) [log ppi,ω(x|y, z)]+ log p(y)− log p(y|x).(5.5)log p(y)− log p(y|x) corresponds to the standard error and is negligible if thebatch size is large.71The ICVAE model is trained using the AEVB algorithm, introduced in[76], with L = 4 random samples from the noise distribution p() for eachdata point in the batch.5.2.4 Adaptive training using the ICVAE modelAs shown in equation (5.2), p(x|y, z) is directly proportional to p(y) andp(z): p(x|y, z) = p(x,y,z)p(y)p(z) . Therefore, any changes to the distributions of zand y are going to affect p(x|y, z) and accordingly, the generated image.Let Y = {y′ : p(y′) < τ} represent the set of labels from the originaltraining dataset, where their probability is smaller than a threshold τ , i.e.,class imbalanced problem. Moreover, let ξ0 = {µ0, σ0} be the initial pa-rameters of the probability distribution of z. In a trained ICVAE, ξ can betuned based on the loss of the discriminator (LD).In addition, in the case of imbalanced data, drawing samples from Y andgenerating their corresponding images would balance each training batch.Since z and y are independent, labels and the distribution parameters ofthe embedding space have different effects on the generated images. Whileξ controls the characteristics of the image, y′ ∈ Y would lead to a morebalanced discriminative model M.The goal of training via adaptive data augmentation (AdaTrain) is toutilize a trained ICVAE G, adjust ξ based on the discriminative loss, and, ifrequired, utilize y′s and generate corresponding images. The synthetic im-ages are then augmented with the training batch. Algorithm 1 summarizesthe AdaTrain. Sampling from Y can be done via a transformation functionT (y) that modifies the distribution of the training labels based on the priorknowledge about the data. In order to adjust ξ, AdaTrain uses the initialξ0 to augment the training data for a certain number of epochs, which arereferred to as “rounds”. After the first round, it evaluates LD for all of thegenerated images. Then, it updates ξ with the mean and the standard de-viation of the latent variable values, whose corresponding images indicatedhigher discriminative loss compared to the mean loss of the batch. There-fore, in each round, AdaTrain augments the original dataset with syntheticdifficult samples that the discriminator was not accurate in identifying theirkey features. Note that the number of epochs in each round needs to besufficiently large so that the discriminator has enough number of iterationsfor learning. E = {ei} represents the set of augmentation rounds, where eiindicates the number of epochs for the i’th round.725.2.5 Dataset: transverse US images of the spineWe use the dataset that was introduced in Chapter 4 to evaluate the perfor-mance of the classifiers, trained via AdaTrain. In summary, the dataset wascollected from twenty volunteers aged between 20 and 40, whose body massindices (BMI) were less than 30 (ethics certificate number: H15-01310). Anexpert sonographer acquired three separate 2D US frame sequences fromthe right, left and the center line of the spine from each volunteer (60 USsequences in total). All of the sequences were acquired from the lumbarspine i.e., from the sacrum to the T12. The quality of each sequence wasreviewed by the sonographer before saving.We use the data from the same randomly-selected 16 volunteers for train-ing and validation. Since originally, there are fewer midline frames comparedto off-center, a sub-set of 11240 off-center and 5610 midline frames (16850in total) were randomly selected. Out of 16850 frames, 500 were randomlyselected for training the ICVAE and the baseline classification model. Thedata from the remaining 4 volunteers, i.e., 11720 2D US frames, with equalmidline and off-center images were used to test the models. To reduce train-ing time, all of the training and test frames were resized to 64× 64 pixels.5.2.6 Transformation functionThe transformation function (T ) can be defined with respect to the priorknowledge about each particular dataset. Since the off-center images consistof both the left and right scans, the training data contain considerably fewermidline frames compared to off-center (119 vs. 381 from total of 500). Hence,the dataset is imbalanced. Therefore assuming that y = 1 denotes the labelfor the midline images, the following transformation function is proposed tobalance the training data:TA(y) = Y ∼ B(|{y}|, 1− |{y : y = 1}||{y}| ), (5.6)where B(n, p) is the binomial distribution with n total experiments, withthe success probability, i.e., midline label, of p. Y is a random sample fromB(n, p).5.2.7 Model architecturesICVAEThe ICVAE model uses two fully connected layers with 1024 and 512 neu-rons symmetrically in the variational approximator and the generator. The73Figure 5.2: Architectures of a baseline classification model used in experi-ments.nonlinear activation function for each layer of the generator and approxi-mator is tanh and the output is activated via the sigmoid function. Thedimension of the latent variable is set to 10.Moreover, the encoder of the labels uses one fully connected layer withsigmoid activation, which encodes the labels into a vector with the dimen-sion of the latent variable. For consistency, this architecture is maintainedthroughout all of the experiments.ClassifierThe effect of adaptive data augmentation is investigated on two classifiers:1) a base line convolutional model, and 2) SymNet, which was proposed inChapter 4. Figure 5.2 shows the architecture of the base line model. Theclassifier was initialized with random weights with mean zero and standarddeviation of√2N , where N is the number of incoming nodes of one neuron,as suggested in [53]. The baseline discriminative model uses a convolutionalneural network with 4 layers, where each layer has 64 kernels with size 4×4.Rectified linear units (ReLU) [42] was used as nonlinear activations. Aftereach layer, a 2×2 maxpooling operation is performed. In addition, all of thefeature maps are randomly masked by 0 via dropout operation [164] withprobability 0.25. The last convolutional layer is followed by a fully connectedlayer, which converts the final feature maps into class predictions using asoftmax function. The weights of the model were optimized by minimizingthe negative log-likelihood of the outputs given targets. The architecture ofSymNet remained the same from the previous chapter (see Figure 4.2).5.2.8 ImplementationThe proposed augmentation algorithm and the independent conditional VAEwere implemented in Theano (https://github.com/mpslxz/icvae and74https://github.com/mpslxz/theano_ops). The models were trained on aGeForce(R) GTX 980 Ti GPU with 6 GB of memory. The host machinewas running on a Core i7 CPU with 16 GB of RAM.5.3 Experiments and resultsFor the experiments, the gray levels of each image were normalized between0 and 1. The ICVAE model was trained with the negative independentconditional variational lower bound (equation (5.5)) as the loss functionon training data for 500 epochs. In order to neglect the standard error(log p(y)− log p(y|x)) in the variational lower bound, the batch size was setto 100.Figure 5.3 shows synthetic data generated by ICVAE with different la-bels and latent variables. Note that changes in the latent variable leads todifferent visual patterns within a given class.In order to investigate the effect of adaptive data augmentation, theperformance of the baseline model was evaluated in three scenarios:1. trained on the original data (S1),2. trained on the statically augmented data, without tuning ξ (S2),3. trained on the augmented data with adaptive ξ using AdaTrain (S3).The classification performance is evaluated via the following metrics:• classification accuracy (ACC): the correct classification rate for bothmidline and off-center classes,• sensitivity (SNS): the correct classification rate for midline images,• specificity (SPC): the correct classification rate for off-center images,• false positive rate (FPR): the rate of misclassified off-center images,• false negative rate (FNR): the rate of misclassified midline images, and• the area under ROC curve (AUC) which defines the probability ofcorrect classification for a given class.Table 5.1 summarizes the experimental results, which are discussed indetail in the following.75Figure 5.3: Synthetic images generated by ICVAE (columns 3 to 5) usingtest labels (column 1) and three different latent variables (z1, z2 and z3).Since y and z are independent in ICVAE, the characteristics of the imageare controlled by z, while the label specifies the category of the images. Thereal images from the dataset are shown in the second column.5.3.1 Original dataThe network described in Section 5.2.7 was trained on the original datasetfor 450 epochs with batch size 64, using the stochastic gradient descent(SGD) algorithm with learning rate 10−4. The `2 norm of the weights wasadded to the negative log-likelihood loss with coefficient λ = 10−4. Aftertraining, the model accurately classified 78% of the test data with 66%and 91% sensitivity and specificity, respectively. Since the training datasetis imbalanced, the model is biased towards off-center images, hence, lowsensitivity. Moreover, the model indicated 10% and 52% false positive andfalse negative rates, respectively. This further shows that the network cannotidentify the midline images as accurately as the off-center frames.5.3.2 Static augmentationAfter training the ICVAE on the original dataset, the generator along withthe transformation function (5.6) was used to augment the dataset. z wassampled from N (0, 2). Using the generator, 500 images were added to theoriginal dataset to balance the number of midline and off-center images.Later, the same classification model was trained for 450 epochs on the aug-mented dataset with λ = 10−4 and learning rate 10−4. The model suc-cessfully classified 81% of the test data. Moreover, the network was 79%76Table 5.1: Comparison of the classification metrics of the test data for abaseline discriminative model trained with the original, statically augmentedand adaptively augmented data. Since both the midline (c) and off-center(o) classes are balanced in the augmented datasets, the respective model in-dicates higher sensitivity compared to that of the original data. Moreover,adaptive adjustment of the latent space leads to improvement in classifica-tion accuracy.Exp. (11720 images) ACC % SNS % SPC % FPR % FNR % AUCc AUCoNo Aug (S1) 78 66 91 10 52 0.90 0.88Static (S2) 81 79 82 22 27 0.88 0.88AdaTrain (S3) 85 88 81 22 14 0.93 0.93sensitive and 82% specific. Since the training data are balanced after aug-mentation between classes, the FPR is closer to the FNR (22% and 27%,respectively) when compared against those in the S1.5.3.3 Adaptive augmentationThe generator of the trained ICVAE was used to adaptively augment eachbatch of the dataset via Algorithm 1. The classifier network was trained onthe original data during the first 200 epochs. Later, the training dataset wasaugmented in 5 rounds with 50 epochs in each round (450 epochs in total).Like static augmentation, initially ξ was set to (µ = 0, σ = 2) and the sametransformation function was used to balance each batch in a given round.After 450 epochs, the trained model successfully classified 85% of the testdata with 88% sensitivity and 81% specificity. FPR and FNR were 22%and 14%, respectively. Because the ICVAE generates the midline imageswith higher probability and the latent space of the generator is adaptivelytuned, the classifier has slightly lower FNR compared to FPR. This meansthat the model has learned efficient features from the augmented data thatcontribute to identification of the midline images.Figure 5.4 shows the ROC curve of the model for all three scenarios fortest data. In addition, the training loss of the model during training for allthe three scenarios is shown in Figure 5.5. Note that since AdaTrain aug-ments the dataset with images that indicate higher negative log-likelihood,the accuracy significantly dropped at the beginning of each round. Howeverafter several epochs, the model learned new features from the synthetic data,hence was able to recover.77Figure 5.4: ROC curves of the baseline classification model for midline(a) and off-center (b) test images for S1, S2 and S3. Training via adaptivedata augmentation leads to higher classification performance when com-pared against conventional training with and without static data augmenta-tion (AUC=93% vs. 88% and 90% for center-line and AUC=93% vs. 88%and 88% for off-center images).5.3.4 Training SymNet via adaptive data augmentation onIn order to investigate the effect of adaptive data augmentation on the clas-sification performance of SymNet, the ICVAE model was trained for 1000epochs on the entire train set (22180 frames). SymNet was trained for 500epochs on the same data and for 5 rounds of augmentation with 100 epochsin each round (1000 epochs in total). Initially, the mean and the standarddeviation of the generator’s prior was set to ξ = (0, 2). During AdaTrain abatch of size 32 was selected from the train set, which was augmented with32 synthetic samples, hence, the size of the batch increased to 64.After training, the network correctly classified 94% of the test data with94% sensitivity and 93% specificity. The model indicated an AUC of 0.98for both midline and off-center images. Table 5.2 compares the classifi-cation performance of SymNet, when trained via adaptive augmentationversus conventional training. Since midline identification is a binary classi-fication problem, we used the McNamer’s Chi-square test [115] in order tostatistically investigate the effect of adaptive data augmentation. The testindicated an statistic of 602.6 for the classification results of conventionalvs. adaptively trained SymNet, which implies p < 0.0005.78Figure 5.5: Training discriminative loss of the baseline classification modelfor S1, S2 and S3. In each round of augmentation, ICVAE generates imagesthat the models indicated greater loss than the average. Hence, there is anincrease at the beginning of the augmentation rounds. However after severalepochs, the model learns new features associated with the augmented dataand therefore, the loss values start to decrease.Table 5.2: Comparison of the classification metrics of the test data forSymNet trained with on the original and adaptively augmented dataset.Adaptive augmentation leads to 6% improvement of the classification accu-racy.Method (11720 images) ACC % SNS % SPC % FPR % FNR % AUCc AUCoSymNet 88 86 90 11 16 0.94 0.94SymNet+AdaTrain 94 94 93 7 6 0.98 0.985.4 DiscussionIn order for ICVAE to approximate p(x|y, z) accurately, the training datashould have various samples that represent different parts of p(x, y, z). More-over, since one projection of p(x, y, z) is over labels y, efficient transforma-tion functions are required to accurately transform the existing labels to thesparse ones in the dataset. For classification, T (y) is usually straightforward,since the labels are one-hot encoded vectors that indicate the correspondingclasses. However, defining T (y) is a heuristic technique, which requires priorknowledge about the particular training and test data in each study.Since data synthesis via VAE and similarly, ICVAE involves random sam-pling from a probability distribution (p(z)), the generated samples mightlack expressive features and semantic information. This is significant inhigh resolution and highly detailed input data. Several methods have beenproposed to work around this problem, one of which is training the VAE79adversarially [9, 109, 116, 134]. Another approach is to modify the objectiveof the model so that it is more sensitive to high frequency and semantic in-formation in the input. Lamb et al. proposed to add a discriminative loss asa regularizer to the variational lower bound [87]. Also, Hou et al. enforcedconsistency between the hidden representations of the input and its recon-struction by comparing their corresponding feature maps and penalizing themodel for large differences [61]. In our experiments, a relatively simple net-work architecture could approximate the distribution of the data. However,for more complex and high resolution datasets, applying the proposed so-lutions for detailed data synthesis to the ICVAE model is straightforward,since it shares a lot of similarities with the conventional VAE.The AdaTrain algorithm contributes to training of the discriminativemodel through 1) balancing the categories in dataset by incorporating priorknowledge and 2) adding more variance in the appearance of the images. Thelatter is achieved by adaptively adjusting the latent variable based on theperformance of the discriminative model. In addition, more variance in thetraining data leads to generalized discriminative models. This is shown inFigure 5.5. The augmented data in each round of AdaTrain have regularizedthe model and therefore the classifier of S3 indicate higher accuracies on thetest data, when compared against S1 and S2.Finally, although AdaTrain improves the accuracy of the model by lever-aging the statistics of the data in conjunction with the prior knowledge aboutthe dataset, model architecture is another key factor that has a significantimpact on the accuracy. Therefore, an efficient network should be designedproperly with enough learning capacity in order to be trained with the pro-posed adaptive data augmentation algorithm to achieve the state-of-the-artresults.5.5 ConclusionIn this chapter, an adaptive augmentation method for training discrimina-tive models was proposed. In particular, we presented a generative networkbased on variational auto-encoders that can learn the distribution of theinput with respect to the label and the latent variable. The key difference ofour model and the conventional conditional VAE is that we enforce indepen-dence between labels and the latent space. Therefore, we can adjust eachof them independently for image synthesis. This is beneficial because bytuning the latent variable, we can control the characteristics of the image,while modifying labels enables one to incorporate prior knowledge about80the dataset into the synthetic images. We showed that the performance ofa baseline classifier can be improved by training with adaptively generatingimages based on the feedback from the discriminative model. We validatedthis improvement with the use of unseen data from test subjects.Although the ICVAE model can be trained with a relatively small dataset,expert annotations are still required for training, since it learns the condi-tional distribution of the image with respect to the latent variable and thelabels. Hence, AdaTrain is viable when labeled data are provided. Becauseof the cost of clinical data acquisition and annotation, unsupervised algo-rithms have the advantage of learning the variations of the data without theneed for expert supervision. In Chapter 6 we investigate whether or not thetransverse US midline and off-center images indicate distinctive features,which can be learned without labels.81Chapter 6Deep neural maps forunsupervised featurelearning6.1 IntroductionDeep supervised models have outperformed the previous generation of thetraditional machine learning algorithms, which involved hand-engineeringfeatures. However, the downside of deep supervised learning is the need forlarge amounts of labeled data. This becomes particularly significant whenthe capacity of the model is increased. For instance, the large performanceincrease achieved by successors of Krizhevsky’s network (AlexNet) [84], suchas DenseNet [64], was mainly because of massive efforts of manually annotat-ing millions of images. However, medical image datasets are typically smallin size because of the cost associated with data acquisition. Moreover, thecollected data should be annotated by experts, which adds a cost overhead.In particular, capturing and labeling US images of the spine that cover abroad range of the population to train supervised models, is not practical.On the other hand, unsupervised learning, which is a category of ma-chine learning, involves recognizing patterns within the data without labels.In the context of computer aided recognition tasks, such as image classifi-cation, unsupervised learning algorithms leverage the practically unlimitedunlabeled images to extract patterns and meaningful representations fromthe data. Linear methods such as principal component analysis (PCA) [186]are not effective in nonlinear data representation. Hence, representationlearning has been an active area of research for decades [13].Generally, representation learning algorithms can be divided into threemain categories:The materials in this chapter are adopted from:Pesteie et al., Deep neural maps. 6th International Conference on Learning Representa-tions (ICLR), (2018).82• probabilistic models that try to learn a set of latent random variables,which can accurately describe the underlying distribution of the data,• deterministic parametric models that learn a direct mapping from in-put to representations, and• manifolds and clusters that capture variations in the data.6.1.1 Probabilistic modelsRepresentation learning can be associated with the problem of learning thejoint distribution of the data and a set of latent variables, where those vari-ables can be considered as features that the original data can be recoveredfrom. Hence, the feature values h of a given input x are the results ofdetermining the conditional probability distribution p(h|x), which is oftenregarded as the posterior distribution. Therefore, learning the posteriorcan be achieved via optimizing the parameters of a parametric model, withrespect to the likelihood of the data. Undirected graphical models, i.e.,Markov Random Fields (MRFs), can parameterize the joint probability dis-tribution of the data and the latent variable. In order to do so, the interac-tions between visible and hidden units of the MRFs are described via a setof non-negative potentials. Restricted Boltzmann Machines (RBMs) [159]are a popular subclass of probabilistic models that have been investigatedin the context of unsupervised learning. In a study, Jaitly and Hinton pro-posed an approach to model speech sound waves using RBMs [66]. Theyshowed that RBMs are capable of efficiently capturing the statistics of theraw speech signals, hence, they achieved better performance in phonemerecognition, when compared against traditional feature extraction methods.In another study, Krizhevsky et al. showed that RBMs can extract distinc-tive features from natural images, which can be used for classification [83].Schmah et al. demonstrated that RBMs can be useful in discriminativetasks, such as image classification [152]. Although probabilistic models haveshown promising results for representation learning, their posterior probabil-ity distribution becomes intractable, once the model has more than a coupleof interconnected layers. Therefore, they are difficult to optimize and ac-cordingly, different algorithms have been proposed to overcome this issue.However, these algorithms often have a large approximation error with highcomputation cost. Moreover, a secondary step is required to obtain featurevectors, as the posterior distribution does not directly represent the featurevector.836.1.2 Deterministic parametric modelsAn alternative to probabilistic models is a deterministic feature learningparadigm that is computationally efficient. These models yield a parameter-ized mapping from the input to the representation, which is often regardedas encoding. The encoder can be accompanied by a secondary parameter-ized function that maps the representation back to its corresponding input.Therefore, the secondary function decodes the representation. In unsuper-vised learning, the objective of the model can be mapping an input to a latentspace and then reconstructing it, hence it is referred to as “auto-encoder”(AE) [57]. More formally, in an autoencoder framework a latent variable zis obtained from the input x via Eθ parameterized by θ: z = Eθ(x). Later,x is reconstructed from z by Dω: xˆ = Dω(z) = Dω(Eθ(x)). The objectiveof the AE is to minimize the reconstruction error, which is tractable andeasier to optimize compared to that of the probabilistic models. Therefore,AEs provide a straightforward framework for unsupervised feature learningin different tasks.For instance, Vincent et al. demonstrated that unsupervised initializa-tion of layers of a discriminative model via a denoising AE helps with bettercapturing the underlying structures of the input distribution [176]. In addi-tion in a recent similar study, Luo et al. showed that AEs weights can beused as initializations for a convolutional neural network [104]. Their ex-periments on benchmarks indicated that their proposed method can achievecomparable results with the state-of-the-art models. Kube et al. investi-gated the problem of outlier detection using AEs [85]. In their proposedmethod, they projected all of the data points on the latent space after train-ing an AE. Since normal data points share similarities in the embeddingspace, they were able to identify anomalies by classifying their latent repre-sentation.AEs are also commonly used in medical applications, due to their unsu-pervised training algorithm [108, 183, 190, 193]. Kumar et al. showed thata trained AE’s latent variable can be effectively used as features to classifylung nodules in the computed tomography (CT) images [86]. Shin et al.proposed a method for tissue characterization that utilizes the hidden rep-resentation of single layer sparse AEs [156] obtained from time series signals.Their experiments showed that the AE’s latent representations provide a setof distinctive features for characterizing different tissue types.846.1.3 Manifold and cluster representation learningThe hypothesis of manifold representation learning models is that the struc-tured high-dimensional data are expected to be distributed in a neighbor-hood of a manifold, which has significantly lower dimensionality. Accord-ingly, the manifold would reflect major variations of the structured data.This is particularly interesting from unsupervised learning perspective, sincea topology preserving low-dimensional manifold would not only approximatethe data space, but also preserve the relative distance between data points.Hence, the learned representation can be considered as an intrinsic coordi-nate in the embedding manifold space.The majority of the manifold learning algorithms, such as t-SNE [107]and Laplacian and Hessian eigenmaps [12, 32] adopt a non-parametric ap-proach, where each sample in the data space corresponds to one low di-mensional representation [13]. Therefore, these methods do not learn aparameterized function that can map new data points to the embeddingspace, although some of them have been modified to overcome this issue[106]. Several non-linear parametric models have been proposed, including[31, 47, 67, 68]. In a study, Watson et al. proposed a semi-supervised ap-proach that regularizes the supervised loss for labeled data with respect tothe neighborhood of each data point [185]. Hence, a parameterized neuralnetwork learns the embedding and the classifier, simultaneously. Xie et al.investigated clustering via a deep parametric model [191]. Similar to [185],their proposed method involves simultaneous feature learning and clusterassignment. However, their model is unsupervised, hence, no labels are re-quired during training. Caron et al. proposed a method for representationlearning that utilizes cluster assignments as supervision for classification[18]. In particular, their proposed algorithm iteratively clusters the featuresof the input data with a conventional clustering method. Then, it uses thesubsequent assignments as targets of the classification layer and optimizethe weights of the model based on the classification loss.6.1.4 ContributionAlthough these parametric models yield an embedding space that is efficientfor clustering, they do not learn a manifold, which preserves the topology ofthe data space. On the other hand, non-parametric manifold techniques pro-vide a one-to-one mapping from data space to a lower-dimensional space.Therefore, additional clustering is often required in the embedding spaceto obtain category-specific representations. We propose a novel paramet-85Figure 6.1: Block diagram of the DNM model. DNM consists of an auto-encoder that learns a parametric mapping from the data to an embeddingspace and a self-organizing map, which learns a topographic map from theembedding. After pre-training, the parameters of the auto-encoder are op-timized along with the neurons of the SOM, in order to yield a separableembedding space.ric unsupervised model, called deep neural maps (DNM) for representationlearning that maps the data to a latent space via a parametric function andthen, learns a manifold from the latent space. DNM uses a particular neuralnetwork, namely, self-organizing maps (SOM) as the manifold. Therefore,it benefits from both parametric feature extraction and manifold learning.During training, DNM does not require labels, hence, it is a viable solutionfor learning from large datasets with high annotation costs, such as medi-cal images. We investigate the problem of spine center-line detection usingDNM and a non-parametric density estimation algorithm on the manifold.Moreover, we investigate whether or not the latent space of DNM can beused to classify midline versus off-center images. Therefore, we hypothesizethat:1. the DNM model leverages statistics of the unlabeled data to learn rep-resentations that a probability density estimation method can learn theclass likelihood from a relatively smaller labeled dataset, and2. upon training, the latent space of DNM yields a separable embedding,which can be used for classification via a non-linear classifier.866.2 MethodsFigure 6.1 shows the block diagram of our proposed model. The DNM modelconsists of a convolutional auto-encoder and an SOM. First, we start witha brief introduction to the SOM model, as one of the building blocks of theproposed model.We then, discuss the DNM and its training algorithm in detail. Later,we briefly explain a non-parametric density estimation algorithm, namelyParzen window, which learns the class likelihood of the data from the DNMmodel.6.2.1 Self organizing mapsSelf organizing maps [79] are a class of neural networks that are trained basedon principles of competitive learning, which implies that the neurons of thenetwork compete among each other to be activated. Hence, at a given timeonly one neuron is activated. SOMs are motivated by the biological functionof the human brain: in the brain, different sensory inputs are mapped ontodifferent areas of the cerebral cortex in a topologically ordered manner [52].Generally in an SOM, the neurons are placed in an N-dimensional hyper-lattice. However, for dimensionality reduction and visualization purposes,lattices up to three-dimensional are more common. Therefore, an SOM is atopographic map of the input patterns, where the coordinates of each neu-ron in the lattice reflect intrinsic statistical features in the input. Trainingthe SOM model involves three steps, namely, competition, cooperation andadaptation.CompetitionLet the weight vector of each neuron in the SOM be denoted by wj =[wj1, wj2, ..., wjm] for j = 1, 2, ..., l, where m is the dimension of the em-bedding space and l is the total number of the neurons in the SOM. Foreach input Z and the neurons in the network, a discriminant measure iscalculated. The neuron with the optimum discriminant measure is consid-ered as the winning neuron or “best matching unit” (BMU). If the discrim-inant measure is the `2 norm, the BMU can be obtained by minimizingu = arg minj ||Z − wj ||, j = 1, 2, .., l.87CooperationThe BMU determines the topological neighborhood, where its neurons aregoing to receive partial updates with respect to their proximity to the BMU.These neurons are referred to as cooperating neurons. Let H(u, j) be thefunction that determines the topological neighborhood of the BMU u andneuron j. H should:1. be symmetrical around u, and2. monotonically decrease the amplitude with respect to the relative dis-tance of j to u. For convergence, when the distance approaches inf,H should decay to zero.We choose a Gaussian kernel for the neighborhood function, which satisfiesthe conditions above:H(u,j)(n) = exp(−√∑[pj − pu]22σ2(n)), σ(n) = σ0 exp(−nα), (6.1)where pj is the position of neuron j in the lattice and σ(n) denotes time-decaying standard deviation with constant parameters σ0 and α. As indi-cated in equation in 6.1, H is a function of the position of the BMU and agiven neuron as well as time (i.e., epoch n). Hence, the topological neighbor-hood shrinks with time, as the model is being trained. Accordingly, duringthe early stages of training, H correlates the directions of the weights ofthe model globally. As the width of H decreases, the weights are updatedlocally.AdaptationIn order to update the weight matrix, wj needs to change in relation to theinput. Using discrete formalism, the updated weight wj(n+1) at time n+1is defined as:wj(n+ 1) = wj(n) + η(n)H(u,j)(n)(Z − wj(n)), (6.2)where η(n) is a learning rate function of time, with initial value η0 and timeconstant α:η(n) = η0 exp (−n/α). (6.3)The choice of the learning rate constant and the width of the neighbor-hood function with respect to the number of epochs is critical, since dur-ing the early stages of training, the algorithm should globally correlate the88weights of the model. After global updates, the model should be fine-tunedi.e., to accurately reflect statistical quantifications of the input.6.2.2 DNM model architectureLet Eθ : X → Z be an encoding function with parameters θ, which mapsinput X to embedding Z and Dφ : Z → X̂ be the corresponding decodingfunction with parameters φ, which takes Z and reconstructs X̂. Hence, thelatent vector encodes the variations of the input space, which are efficientfor accurate reconstruction. The purpose of the SOM in the DNM modelis to learn a topographic lattice from the embedding space. Therefore, thecoordinates of the lattice reflect statistical properties of the latent space.This is particularly useful, because the SOM’s neurons are representativeof significant variations of the latent space. Since the latent space is adeterministic mapping, one can conclude that the SOM part is learning thevariations of the data space. It has been shown that the SOM adaptationupdate (equation 6.2) leads to a local minimum of the following [105]:D = 1/2∫ +∞−∞p(Z)||Z − wu||2dZ, (6.4)where p(Z) is the probability distribution function of Z and wu is the weightvector of the BMU.Equation 6.4 implies that the expectation over the `2 norm of the latentvectors and wu is minimized for all Z. Therefore, the weights of the SOMprovide a good approximation of the latent space [52]. However in the DNMmodel, Z is given by a parametric function of the input (Eθ : X → Z). Thismeans that changing θ indirectly affects w. The goal of the DNM model isto learn the parameter set θ jointly with the topographic map of the latentspace, such that ||Eθ(X)− wu||2 is minimized for all X.Pre-trainingAn initial estimate of θ and w is required, prior to optimizing them. Weproposed to initialize θ with a stacked denoising AE (SDAE) [177], whichis trained on the input. The SDAE can be trained using the greedy layer-wise algorithm proposed by Bengio et al. [14], followed by fine-tuning allof the layers stacked together. Later, the SOM is initialized by computingthe embedding Z via Eθ for all of the data points in the training set andupdating w using equation 6.2.89Joint optimization of θ and wIn order to jointly optimize θ and w, the error of the SOM, i.e., the dis-tance between a given embedding and its best matching neuron, needs tobe incorporated into the loss function of the AE. This is essential, since theparameters of the AE should be optimized to yield the most distinctive Z.We adopt the idea from [191] to project the similarity measure between thei’th latent vector Zi and wj on the Student’s t-distribution and define atarget distribution (P ):qij =(1 + ||Zi − wj ||2)−1∑j′(1 + ||Zi − wj′ ||2)−1 , pij = q2ij/∑i qij∑j′ q2ij′/∑i qij′,and minimize the KL divergence between Q and P . The KL divergenceminimization makes the probability of assigning wj to Z closer to 0 and1, hence stricter. The KL divergence is regularized by adding the recon-struction error term and the `2 norm of the weights of the AE to the lossfunction. This regularization controls the parameters of the SOM and guar-antees the convergence of the SOM, since the latent variable should remainreconstructable. Therefore, the loss function of the AE is defined as:L = KLD(Q||P ) + γ||X − X̂||22 + β||θ, φ||2. (6.5)γ is a parameter that controls the significance of reconstruction. Joint opti-mization of θ and w is a two step algorithm. First, for all data points Xi ina batch, Zi is obtained using Eθ and w is updated using equation 6.2. Later,w is fixed and L is minimized with the conventional stochastic gradient de-scent algorithm through ∂L∂θ,φ . This process is repeated for all of the databatches in the training set. Algorithm 2 summarizes the training procedureof the DNM model.Because of the reconstruction regularization, Dφ is required. Moreoverafter training, the decoder can be used to back-project the neurons of theSOM to the data space. This is particularly of interest, since the back-projected neurons can be manually labeled and used to classify the data,similar to the K-means algorithm. Conversely, a density estimation methodcan be trained on a small labeled data to obtain the likelihood of eachcategory on the map, which is discussed in the next section.6.2.3 Parzen windowLet P be the probability of a data point x with a density p(x) falling ina region R in a given space, e.g., latent space. Therefore, P = ∫R p(x)dx.90Algorithm 2 Training of DNMInput: D ← training dataset, epochsOutput: Eθ,Dφ,W : [wj ] for j = 1, 2, ..., lInitialize θ, φ,W from pre-trained modelsα← 2000, σ0 ← 10, η0 ← 0.3for n = 0 to epochs dofor each minibatch B in D dofor each Zi in Eθ(B) dou← argmin||Zi −W||σ(n) = σ0exp(−nα)Hu(n) = exp(− ||pwj−pu||22σ2(n))ηn = η0exp (−n/α)qij =(1+||Zi−wj ||2)−1∑j′(1+||Zi−wj′ ||2)−1 ,pij =q2ij/∑i qij∑j′ q2ij′/∑i qij′Wn+1 = Wn + ηnHu(n)(Zi −Wn)end forg ← grad(KLD(P ||Q) + γ||X − X̂||22 + β||θ||2)θ, φ← Adam(g, θ, φ)end forend for91Assuming R is small enough that p(x) does not significantly vary inside it,P can be rewritten as:P =∫Rp(x)dx ≈ p(x)∫Rdx = p(x)V, (6.6)where V is the volume of R. Also, let x1, ..., xn be n independently andidentically distributed samples of the dataset drawn according to p(x). Theprobability that k of these samples fall inR is k/n. Therefore, using equation6.6, p(x) = k/nV .In Parzen-window density estimation [131], R is considered to be a d-dimensional hypercube centered at x. Assuming h is the length of thishypercube, its volume would be V = hd. Lastly, let W(x−xih ) be a windowfunction that indicates whether or not xi is inside the hypercube with lengthh and centered at x:W(x− xih) ={1, if |x−xi|h ≤ 1/2 ∀j = 1, ..., d0, otherwise.(6.7)Therefore, k can be given by:k =n∑i=1W(x− xih), (6.8)and accordingly, p(x) can be rewritten as:p(x) =1nn∑i=11hdW(x− xih). (6.9)Computing the conditional probability distribution of a labeled datasetusing DNM and Parzen window is straightforward. In order to learn theclass likelihood of the data on a trained DNM, the labeled samples aremapped to the hyper-lattice. Then, a hypercube of length h is defined andp(x) is calculated for all of the neurons of the map and labeled data for eachcategory.6.2.4 DatasetThe proposed model and its training algorithm is unsupervised, hence, no la-bels are required during training. This is particularly beneficial for datasetswith high labeling cost associated to them, e.g., medical image datasets.Therefore, both of the hypotheses of the study were tested on the problem92of spine center-line detection in transverse ultrasound images, discussed inChapter 4. In summary, the dataset was collected by an expert sonographerfrom twenty healthy volunteers, who were aged between 20 and 40 with norecord of spine surgery or pathology. The dataset consisted of a training setwith 22180 frames from 16 randomly selected volunteers and a test set of11720 frames from the remaining 4. The number of off-center and midlineimages were equal in both parts. All of the training set were used for unsu-pervised training. However in order to investigate hypothesis 1, 500 imagesand their labels were randomly selected from the training set, referred tosupervised set hereafter, to learn the conditional probability distributionsfrom the DNM’s map via Parzen window and to train a classifier on theembedding space of the DNM. The test data and their labels were used toevaluate the performance of the models in each experiment.In addition to ultrasound data, MNIST [93] and COIL-20 [122] datasetswere used as a proof of concept of DNM. MNIST dataset consists of 70000hand-written digits of size 28×28 pixels (60000 train, 10000 test) and COIL-20 includes 20 objects, each of which has 72 images. The original COIL-20images were resized to 64× 64 and 1000 images were randomly selected fortraining (70%).6.3 Experiments and resultsThe following experiments were conducted to evaluate the performance ofthe proposed model and also test the hypotheses.• Training DNM on several public image classification datasets as a proofof concept, using a simple convolutional network architecture.• Training on the transverse ultrasound images of spine with optimalnetwork architecture, followed by supervised midline detection withParzen window and a fully-connected layer on the topographic mapand latent space, respectively.• Comparative study between:– unsupervised DNM along with Parzen window and fully con-nected classifier,– a CNN with similar network architecture on supervised set, and– the same CNN initialized by an auto-encoder that is trained onan unsupervised set.93Figure 6.2: Back projection of the 20× 30 SOM neurons w from the lattice to the data space using D for MNIST(a) and COIL-20 (b). Despite its relatively simple architecture and small number of parameters, the DNM hassuccessfully learned the style and pose variations of MNIST and COIL-20, respectively.94Figure 6.3: Block diagram of experiment 2. After unsupervised training ofthe DNM model (a), the likelihoods of the midline and off-center images onthe topographic map are learned via Parzen window (orange-b). Moreover,the encoder of the DNM is detached from the model and a fully connectedclassifier is trained on the supervised set (orange-c). During supervisedtraining, the weights of the encoder and the neurons of the SOM are fixed(green). For both b and c, a small set of labeled images are required.6.3.1 Experiment 1-training DNM on MNIST and COIL-20Initially, the DNM was trained on MNIST and COIL-20 datasets to con-ceptually evaluate the model and its training algorithm. The auto-encoderfilter dimensions were set to (10×5×5) - (8×5×5) - (5×5×5) - dense100symmetrically for encoder and decoder for both datasets. Also, the latticesize of the SOM was set to 20 × 30, which has 600 neurons in total. Wepre-trained the AE and the SOM for 500 and 10000 iterations, respectivelywith α = 2000, σ0 = 10 and η(n) = 0.3 exp (−n/α). Later, we trained theDNM using loss function 6.5 with γ = 0.5 and β = 1e− 6 for 500 iterations.Figure 6.2 shows the back-projection of the SOM from lattice to thedata space for all of the neurons of the map. Note that the DNM model hassuccessfully learned and captured the variations within each class of dataand mapped them to their corresponding location in the lattice.95Figure 6.4: Back projection of the 20× 30 SOM neurons w from the lattice to the data space using D trained on2D transverse ultrasound images of the spine. Similar to the MNIST and COIL-20 datasets, DNM has learned rep-resentations that reflect the variations of the midline and off-center images, such as shadows and the intervertebralgaps. Likelihoods of both classes can be learned from the projection to the lattice.96Figure 6.5: Probability distribution of the midline (a) and off-center (b)images on the 20 × 30 lattice, learned from the supervised set via Parzenwindow with window size 2. Notice that the distributions are complements,i.e., locations on the lattice with high probability of midline have low proba-bility of of-centered and vice-versa. This indicates that DNM has learned anefficient topographic map of the data, which has distinctive representationsfor each class.6.3.2 Experiment 2-DNM on ultrasound dataWe adopt the network architecture of SymNet, since it showed promising re-sults on analyzing transverse ultrasound spine images. The model consistedof a CNN with four convolutional layers of 10×3×3 kernels. Both E and Dof DNM’s AE use the same CNN. The SOM’s lattice size was set to 20×30.The rest of the hyper-parameters remained unchanged from Experiment 1.The unsupervised training set was used to pre-train the AE and SOMof the DNM model for 500 and 1000 epochs, respectively. Then, the modelwas jointly optimized for 750 epochs. Figure 6.4 shows the back-projectedneurons of the SOM to the data space. Similar to the MNIST and COIL-20 datasets, DNM has successfully learned the variations of the transverseimages of the spine, which include different representations for the sacrum,intervertebral gaps, spinous processes and off-center images.Learning probability distribution of midline and off-center fromthe latticeIn order to learn the likelihood of the midline and off-center images on thelattice, we projected the images of the supervised set to the embedding97Table 6.1: Summary of the classification performances on the test transverseultrasound images. Because the DNM model has learned the structures ofthe midline and off-center images from a larger number of samples it indi-cates better classification performance, when compared against the originalSymNet that is only trained on the supervised set.Method (11720 images) ACC % SNS % SPC %SymNet 83 79 87Pretrained DNM + fully-connected 83 80 88DNM + fully-connected 84 80 88DNM + Parzen window 86 82 89space via E for midline and off-center categories. Then, we obtained thecorresponding location of their BMU on the map. The probability densityfunction of each category was computed using equation (6.9) with a windowsize 2. Figure 6.5 shows the probability distribution of midline and off-center images on the map. Using the distributions, we classified the testdata through obtaining their BMU and comparing the class likelihoods fromthe learned distributions. The average classification accuracy was 86% with82% sensitivity and 89% specificity.Classification of the embedding space of DNMIn order to classify the embeddings of the DNM model, the encoder of themodel was clipped from the rest. We initialized the SymNet with the trainedencoder. Accordingly, the final feature vector consisted of the original andthe mirrored embedding of the input. Next, a fully-connected layer witha binary softmax activation was added to the output feature vector of themodel. We trained the weights of the fully-connected layer on the supervisedset for 750 epochs. After training, the network successfully classified 84% ofthe test images, on average. The model was 80% sensitive and 88% specific.Figure 6.3 shows the workflow of the proposed supervised learning meth-ods from DNM’s embedding space and its topographic map.6.3.3 Experiment 3-comparative studyIn order to investigate the effect of the DNM’s unsupervised feature learningfrom a large dataset, we trained the original SymNet model on the supervisedset for 750 epochs. We compared its performance against both methodsdiscussed in Experiment 2. As shown in Table 6.1, the model was able to98correctly classify 83% of the test images with sensitivity 79% and specificity87%. Moreover, we initialized the SymNet model with the DNM’s auto-encoder that was pretrained on the unsupervised set for 500 epochs. Thenthe fully-connected layer was trained on the supervised set for 750 epochs toclassify the final feature map. The classification performance of this modelwas close to that of jointly optimized DNM along with the fully-connectednetwork, as shown in Table DiscussionDNM’s joint training algorithm involves a two stage optimization process:1) incrementally optimizing the neurons of the SOM through competitivelearning, and 2) optimizing the parameters of the AE via stochastic gradi-ent descent (SGD). Compared to SGD, the competitive learning process isconsiderably slower, since there is no gradient involved in the optimizationprocess. Therefore, this should be mitigated by choosing a relatively smalllearning rate for the SGD optimizer. Hence, the SOM can learn from theembedding space of the model.The size of DNM’s hyper-lattice inherently determines the number ofsignificant variations of the data captured by the model. Therefore, a largehyper-lattice corresponds to more specific representations with lower recon-struction loss, while a small hyper-lattice would learn generic representa-tions, hence higher reconstruction error. The choice of lattice size dependson the application and the dataset. For multi-class datasets, where each classhas significant variations, a large hyper-lattice might be required. However,large hyper-lattice corresponds to slower training and is difficult to visuallyanalyze by an expert to verify the learned representations. Hence, a trade-offhas to be established between generality and efficiency of the SOM weights.As discussed, the loss function (6.5) consists of a reconstruction errorand a KL divergence, which essentially penalizes the encoder for ambiguouslatent space. These two terms have different effects on the embedding space,i.e., one corresponds to latent vectors that can be reconstructed accuratelyand the other corresponds to a latent space that is separable. For learningthe conditional likelihood of the data from the hyper-lattice or classifyingthe latent space, accurate reconstruction is not as critical as the KL di-vergence. Hence, one might sacrifice the reconstruction performance for anefficiently separable latent space. On the other hand, in order to visualize thehyper-lattice to analyze the variations of the data, accurate reconstruction iscrucial. Therefore, separability should be traded for more efficient mapping99from embedding space to the data space. This trade-off is controlled by theparameter γ during training.As shown in Table 6.1, the classification performance of the jointly opti-mized DNM along with fully-connected layer is comparable with that of thepre-trained DNM. This is expected, since even with γ = 0.5 the reconstruc-tion error is the dominant term in equation (6.5). Moreover, DNM’s AE haslearned the significant variations of the dataset during pre-training. How-ever, the Parzen window estimation outperforms both of the fully-connectedclassifiers in all metrics, since the conditional probability distributions havebeen calculated from the representations, which are generalized over thedataset.Unsupervised models, including DNM, inherently do not require any ex-pert supervision during training. However in order to achieve a high predic-tion confidence in clinical applications, expert supervision is critical. Hence,we proposed secondary supervised algorithms be trained on the latent spaceand the topographic map of DNM, using labels provided by an expert radi-ologist. Therefore, the models can benefit from both unsupervised learningfrom large amounts of data as well as expert supervision.6.5 ConclusionIn this study, we proposed an unsupervised model, called deep neural maps,to learn a parametric function that maps the data space to an embeddingand a manifold from the latent space, which models variations of the datasetin a topographic map. When DNM is trained on large unlabeled datasets,we showed that a small number of labeled samples are sufficient to learnconditional likelihoods from its topology via a non-parametric density esti-mation method. Also, our experiments showed that a non-linear classifiercan be trained on a small labeled dataset in order to efficiently classify theDNM’s latent space.The algorithms and models presented in earlier chapters required expertlabeled data. Hence, they are limited to the small expert labeled datasets.Moreover, annotating a large dataset of images that covers the full rangeof the population is significantly costly. Therefore, unsupervised learningfrom large unlabeled data may be the only feasible solution to learn fewerrepresentative samples of the dataset, which can be annotated by an expertfor higher confidence recognition.100Chapter 7ConclusionCompared to other imaging modalities, ultrasound is considered a safe andan accessible alternative, which provides real-time visualization. In recentyears, ultrasound has gained popularity in spinal injections and anesthesia.In particular, it can be used to visualize the vertebral structures to iden-tify the correct needle puncture site and also to guide the needle in thecorrect trajectory during needle insertion. Transverse or paramedian ultra-sound planes may be used to identify the needle target and the surroundinganatomy. However, since ultrasound images are often difficult to interpretand are affected by noise and other artifacts, the conventional manual tech-niques remain the standard-of-care for spinal anesthesia. Machine learningtechniques can alleviate this problem by learning distinctive features fromthe anatomy of the vertebrae and augmenting the ultrasound images withadditional information to help the anesthesiologist with image interpreta-tion.In this thesis, novel machine learning algorithms and models were pro-posed that aim to provide feedback to the anesthesiologist to correctly iden-tify the vertebral anatomy and accordingly, needle target. We investigatedthe following hypotheses:1. The ultrasound images of the spine have distinctive appearance pat-terns that can be recognized via a set of local feature extractors.2. Automatic feature learning from the ultrasound images can bene-fit from hand-engineered feature extractors for localizing particularanatomy in the image.3. An efficient classifier can be trained through learning the underlyingdistribution of the ultrasound images and augmenting the trainingdataset with synthetic samples from that distribution.4. Unsupervised learning of variations of the ultrasound images of thespine helps with reducing the number of labeled samples to train anefficient classifier.101In Chapter 2, hypothesis 1 was tested; a multi-scale and multi-directionaltransformation based on the Hadamard transform was proposed to classifythe target images for epidural and facet joint injections using paramedianplanes. Because of the binary nature of the proposed method, the systemcan classify the paramedian planes in real time without the need for specifichardware. This was significant at the time that the classification system wasproposed, since it was intended to run on conventional US machines, whichhave limited memory and computation power. However recently, widespreadavailability of GPUs, which provide high computation power and large mem-ory, has led to a shift in the research community towards deep models withlarge parameter spaces for classification. Although such models do not re-quire hand-engineered features, it is still desired to keep the number ofparameters small for efficiency. In Chapter 3, a supervised deep network ar-chitecture with relatively small parameter space was proposed. The learnedfeature maps of the deep network were augmented with the local directionalfeatures to identify the needle target in within the target paramedian planes.This study tested hypothesis 2.Midline needle insertion is a common technique for epidural needle place-ment, hence, a deep network architecture was proposed to identify the sym-metry in the transverse images in Chapter 4. The proposed network wastrained to classify the midline images for needle insertion. As explained inChapter 1, prior to the injection, the anesthesiologist can utilize the parame-dian view to help find the insertion level, the corresponding intervertebralgap and accordingly, the target depth. Once the gap is aligned with thecenter of the US transducer, the skin is marked at the center of the probe.Then, the transverse view can be utilized in order to identify the center lineof the spine for the midline approach. The intersection of the center lineand the intervertebral gap, identified from the paramedian plane, indicatesthe desired puncture site.In Chapter 5, we proposed a generative model to learn the distributionof the appearance features of the transverse ultrasound images independentof their labels. In addition, the proposed model learns the conditional dis-tribution of those appearance features and the corresponding labels of theimages. Therefore given a label, it is able to generate synthetic data withparticular features. In order to test hypothesis 3, we proposed an adaptivetraining algorithm that leverages the generative model and trains a classifiermore efficiently. In Chapter 6, hypothesis 4 was investigated. We proposedan unsupervised model along with its training algorithm to automaticallylearn features from the ultrasound images. The performance of the proposedmodel was evaluated in the context of midline image classification.1027.1 InsightsThe following were learned from the research presented in this thesis:1. Both paramedian and transverse US indicate distinctive patters, whichcan be automatically learned by machine learning algorithms.2. Providing deep models with hand-engineered features leads to a re-duction of size of the parameter space of the model, while maintainingits performance.3. When large amounts of labeled data are available, supervised trainingof a deep model that is specifically designed for the problem, alongwith adaptive data augmentation is superior to an unsupervised model.However when annotating large data set of images is not feasible, theunsupervised model has the advantage, since it learns the variationsof the data without the need for labels.7.2 ContributionsThe contributions of this thesis are summarized as follows:• A novel multi-scale and multi-directional feature extraction methodbased on the Hadamard transform is developed to detect the targetplane in 3D ultrasound for epidural steroid injections and facet jointinjections. We show that a fully connected neural network model canefficiently distinguish between target and non-target plane using theproposed local directional Hadamard (LDH) features. Hence, the pro-posed method provides automated feedback to quickly find the plane,which visualizes the target properly. The proposed system can also beused to help the novices to learn spinal anatomy.• A novel deep neural network model is developed to localize the needletarget for epidural injections in 2D paramedian images of the spine.The proposed model utilizes LDH features along with the feature mapslearned automatically from the images for pixel-level classification ofthe target planes. We show that there is no linear correlation betweenthe augmented LDH features obtained from the sequency domain andthe convolutional feature maps. However, there is a non-linear corre-lation between the LDH features and the convolutional feature maps,which is learned by the proposed model. Hence, the LDH featurescontribute to the performance of pixel-level classification.103• A novel generative model is developed, which learns the distributionof the appearance features of the data, independent of their labels.Moreover, the proposed model learns the conditional distribution ofdata given labels and the appearance features. Therefore, the priorknowledge about the data can be incorporated into the generated syn-thetic data. Since labels and the appearance features are forced to beindependent, the proposed model can be used to augment the trainingset of a discriminative model. To this aim, an adaptive training algo-rithm is proposed that utilizes the generative model and adjusts theappearance patterns of the synthetic data based on the performanceof the discriminator. We show that the adaptive augmentation of thetraining dataset leads to an efficient classifier.• A novel deep convolutional network architecture is developed to iden-tify the center-line of the spine in transverse ultrasound images formidline needle insertion. Since the midline images indicate symmet-rical anatomical features, the proposed model tries to capture thissymmetry. This is achieved by extracting the convolutional featuremaps from the horizontally mirrored image via the same convolutionkernels. For midline images, the feature maps of the mirrored imageare similar to those of the original one. A fully connected network isused to classify the original and mirrored feature maps. We show thatthe additional mirrored path to the deep network improves the clas-sification performance of the model. Moreover, training the networkvia adaptive data augmentation using the proposed generative modelfurther improves the classification accuracy.• A novel unsupervised model, called deep neural maps (DNM), is de-veloped which learns a parametric mapping from the data to a latentspace along with a manifold that models the variations of the data ona topographic map. After training, the class likelihood of the data canbe learned from the map via a probability density estimation method.Moreover, since the embedding space of DNM is enforced to be sepa-rable during training, a classifier with relatively small parameter spacecan be trained to distinguish the embeddings of different classes. Weshow that DNM can efficiently learn representations that model thevariations of the transverse ultrasound images without labels. Accord-ingly, the learned representations can be used to identify the center-lineof the spine.1047.3 Future workNovel methods have been proposed in this thesis, which leverage the powerof statistical learning to facilitate ultrasound image interpretation for spineanesthesia. The following research and development areas are suggested toinvestigate:• The proposed methods have been trained and tested on data fromhealthy volunteers with no record of spine surgery or pathology. There-fore, their performance should be evaluated on data from clinics, whichinvolve a wider variety of BMIs and ages. This is an essential step be-fore deploying the models for clinical use. Further training of modelsmay be required especially on the cases that involve different spinalpatterns such as scoliosis.• The proposed model for automatic localization of the needle targetin the paramedian ultrasound images identifies the target throughpixel-level classification. Therefore, depending on the size of the in-put image, it is not real-time. This can be mitigated by a vectorizedimplementation, since the model’s prediction for a particular pixel isindependent of others. Another potential solution can be reducing thesearch space through identifying the region, where the needle target ishighly likely to appear. Hence, the running time would be significantlyreduced for pixel-level classification of the target region.• The generative model proposed in Chapter 5 requires a heuristic forwarping the distribution of the labels, i.e., transformation function.For some applications such as image classification, the heuristic isstraightforward to find. However, defining a valid heuristic requiresdeep understanding of the prior distribution of the labels, for segmen-tation tasks. Learning the distribution of the labels has been investi-gated in recent studies. Klys et al. [78] proposed a model that utilizesa VAE architecture to learn the distribution of the data and that of thecategorical labels independently. Therefore, the model yields a finerlevel of granuality for image synthesis. Hence, incorporating such amodel in AdaTrain may lead to an improvement in classification per-formance of SymNet. In addition, due to a factorized latent variable,disentagling the latent space of the model can also potentially improveSymNet’s performance. This is because each dimension of the latentvariable would model a particular variation in the appearance feature105of the image. Learning disentagled latent space has been studied ex-tensively in the recent years and such investigations can be found in[56, 73, 97].• The proposed DNM model was intended to learn the variations of themidline and off-center transverse images. However, it can potentiallylearn the patterns of the vertebrae in different views, if large amountsof data are available. Therefore, compound classifiers can be trained,which are conditional on the view and particular anatomies. This isbeneficial from the deployment point of view, since only one modelneeds to be deployed for multiple classification tasks.106Bibliography[1] Dipti Agrawal, Bela Makhija, and Sunita Fotedar. Impact of epiduralanalgesia on labour: A review. Annals of Woman and Child Health,1(1):1–9, 2015.[2] Rami Al-Rfou et al. Theano: A Python framework for fast compu-tation of mathematical expressions. arXiv e-prints, abs/1605.02688,2016.[3] Timothy J. Amrhein, J. Scott Schauberger, Peter G. Kranz, andJenny K. Hoang. Reducing patient radiation exposure from ctfluoroscopy-guided lumbar spine pain injections by targeting the plan-ning ct. American Journal of Roentgenology, 206(2):390–394, 2016.[4] TJ Amrhein, SN Parivash, L Gray, and PG Kranz. Incidence of in-advertent dural puncture during ct fluoroscopy–guided interlaminarepidural corticosteroid injections in the cervical spine: an analysis of974 cases. American Journal of Roentgenology, 209(3):656–661, 2017.[5] Emran Mohammad Abu Anas, Alexander Seitel, Abtin Rasoulian,Paul St John, David Pichora, Kathryn Darras, David Wilson, Vic-toria A Lessoway, Ilker Hacihaliloglu, Parvin Mousavi, et al. Boneenhancement in ultrasound using local spectrum variations for guid-ing percutaneous scaphoid fracture fixation procedures. InternationalJournal of Computer Assisted Radiology and Surgery, 10(6):959–969,2015.[6] Michael H Andreae and DA Andreae. Regional anaesthesia to preventchronic pain after surgery: a cochrane systematic review and meta-analysis. British Journal of Anaesthesia, page 213, 2013.[7] Antreas Antoniou, Amos Storkey, and Harrison Edwards. Dataaugmentation generative adversarial networks. arXiv preprintarXiv:1711.04340, 2017.107[8] Cristian Arzola, Sharon Davies, Ayman Rofaeel, and Jose CA Car-valho. Ultrasound using the transverse approach to the lumbar spineprovides reliable landmarks for labor epidurals. Anesthesia & Analge-sia, 104(5):1188–1192, 2007.[9] Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, and Gang Hua.Cvae-gan: Fine-grained image generation through asymmetric train-ing. In 2017 IEEE International Conference on Computer Vision(ICCV), pages 2764–2773. IEEE, 2017.[10] Kenneth G Beauchamp. Applications of Walsh and related functions.Academic press, 1984.[11] David Belavy, MJ Ruitenberg, and RB Brijball. Feasibility studyof real-time three-/four-dimensional ultrasound for epidural catheterinsertion. British Journal of Anaesthesia, 107(3):438–445, 2011.[12] Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimen-sionality reduction and data representation. Neural Computation,15(6):1373–1396, 2003.[13] Y. Bengio, A. Courville, and P. Vincent. Representation learning: Areview and new perspectives. IEEE Transactions on Pattern Analysisand Machine Intelligence, 35(8):1798–1828, Aug 2013.[14] Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Larochelle.Greedy layer-wise training of deep networks. In Advances in neuralinformation processing systems, pages 153–160, 2007.[15] Jonathan Boisvert, Farida Cheriet, Xavier Pennec, Hubert Labelle,and Nicholas Ayache. Articulated spine models for 3-d reconstructionfrom partial radiographic data. IEEE Transactions on BiomedicalEngineering, 55(11):2565–2574, 2008.[16] Edward Braun, Andrew M Sack, Dawood Sayed, Smith Manion, BrianHamm, Michael Brimacombe, Michael Tollette, Talal W Khan, Wal-ter Orr, and Andrea Nicol. Reducing radiation exposure in lumbartransforaminal epidural steroid injections with pulsed fluoroscopy: Arandomized, double-blind, controlled clinical trial. Pain Physician,21(1):53–60, 2018.[17] Mikael Brudfors, Alexander Seitel, Abtin Rasoulian, Andras Lasso,Victoria A Lessoway, Jill Osborn, Atsuto Maki, Robert N Rohling, and108Purang Abolmaesumi. Towards real-time, tracker-less 3d ultrasoundguidance for spine anaesthesia. International Journal of ComputerAssisted Radiology and Surgery, pages 1–11, 2015.[18] Mathilde Caron, Piotr Bojanowski, Armand Joulin, and MatthijsDouze. Deep clustering for unsupervised learning of visual features. InProceedings of the European Conference on Computer Vision (ECCV),pages 132–149, 2018.[19] Jose Carlos Almeida Carvalho. Ultrasound-facilitated epidurals andspinals in obstetrics. Anesthesiology clinics, 26(1):145–158, 2008.[20] Elvis CS Chen, Parvin Mousavi, Sean Gill, Gabor Fichtinger, andPurang Abolmaesumi. Ultrasound guided spine needle insertion. InSPIE Medical Imaging, pages 762538–762538. International Society forOptics and Photonics, 2010.[21] O¨zgu¨n C¸ic¸ek, Ahmed Abdulkadir, Soeren S Lienkamp, Thomas Brox,and Olaf Ronneberger. 3d u-net: learning dense volumetric segmen-tation from sparse annotation. In International Conference on Medi-cal Image Computing and Computer-Assisted Intervention, pages 424–432. Springer, 2016.[22] Dan Claudiu Cires¸an, Ueli Meier, Luca Maria Gambardella, andJu¨rgen Schmidhuber. Deep, big, simple neural nets for handwrittendigit recognition. Neural Computation, 22(12):3207–3220, 2010.[23] Joseph Paul Cohen, Margaux Luck, and Sina Honari. Distributionmatching losses can hallucinate features in medical image translation.arXiv preprint arXiv:1805.08841, 2018.[24] PH Conroy, C Luyet, CJ McCartney, and PG McHardy. Real-time ultrasound-guided spinal anaesthesia: a prospective observa-tional study of a new approach. Anesthesiology Research and Practice,2013:1–7, 2013.[25] Ekin D Cubuk, Barret Zoph, Dandelion Mane, Vijay Vasudevan, andQuoc V Le. Autoaugment: Learning augmentation policies from data.arXiv preprint arXiv:1805.09501, 2018.[26] Christelle Darrieutort-Laffite, Geraldine Bart, Lucie Planche, JoelleGlemarec, Yves Maugars, and Benoit Le Goff. Usefulness of a pre-procedure ultrasound scanning of the lumbar spine before epidural109injection in patients with a presumed difficult puncture: A randomizedcontrolled trial. Joint Bone Spine, 82(5):356–361, 2015.[27] Matthew S. Davenport, Shokoufeh Khalatbari, Peter S. C. Liu,Katherine E. Maturen, Ravi K. Kaza, Ashish P. Wasnik, Mahmoud M.Al-Hawary, Daniel I. Glazer, Erica B. Stein, Jeet Patel, Deepak K. So-mashekar, Benjamin L. Viglianti, and Hero K. Hussain. Repeatabilityof diagnostic features and scoring systems for hepatocellular carcinomaby using mr imaging. Radiology, 272(1):132–142, 2014.[28] Getu´lio Rodrigues de Oliveira Filho. The construction of learningcurves for basic skills in anesthetic procedures: an application forthe cumulative sum method. Anesthesia & Analgesia, 95(2):411–416,2002.[29] Terrance DeVries and Graham W Taylor. Dataset augmentation infeature space. arXiv preprint arXiv:1702.05538, 2017.[30] Junhua Ding, XinChuan Li, and Venkat N. Gudivada. Augmentationand evaluation of training data for deep learning. In 2017 IEEE In-ternational Conference on Big Data (Big Data), pages 2603–2611, 122017.[31] Kamran Ghasedi Dizaji, Amirhossein Herandi, Cheng Deng, WeidongCai, and Heng Huang. Deep clustering via joint convolutional autoen-coder embedding and relative entropy minimization. In 2017 IEEEInternational Conference on Computer Vision (ICCV), pages 5747–5756. IEEE, 2017.[32] David L Donoho and Carrie Grimes. Hessian eigenmaps: Locally linearembedding techniques for high-dimensional data. Proceedings of theNational Academy of Sciences, 100(10):5591–5596, 2003.[33] David L Donoho and Xiaoming Huo. Beamlets and multiscale imageanalysis. Springer, 2002.[34] Holger K Eltzschig, Ellice S Lieberman, and William R Camann. Re-gional anesthesia and analgesia for labor and delivery. New EnglandJournal of Medicine, 348(4):319–332, 2003.[35] Andre Esteva, Brett Kuprel, Roberto A Novoa, Justin Ko, Susan MSwetter, Helen M Blau, and Sebastian Thrun. Dermatologist-levelclassification of skin cancer with deep neural networks. Nature,542(7639):115, 2017.110[36] Irina Evansa, Inara Logina, Indulis Vanags, and Alain Borgeat. Ultra-sound versus fluoroscopic-guided epidural steroid injections in patientswith degenerative spinal diseases: a randomised study. European Jour-nal of Anaesthesiology, 32(4):262–268, 2015.[37] Dimitrios K Filippiadis, Thomas Rodt, Maria-Chrysanthi Kitsou,Chrysanthi Batistaki, Nikolaos Kelekis, Georgia Kostopanagiotou, andAlexis Kelekis. Epidural interlaminar injections in severe degenerativelumbar spine: Fluoroscopy should not be a luxury. Journal of Neu-roInterventional Surgery, 10(6):592–595, 2017.[38] Andrew D Franklin and Elisabeth M Hughes. Fluoroscopicallyguided tunneled trans-caudal epidural catheter technique for opioid-free neonatal epidural analgesia. Journal of Anesthesia, pages 1–5,2016.[39] H. Freise and H. K. Van Aken. Risks and benefits of thoracic epiduralanaesthesia. BJA: British Journal of Anaesthesia, 107(6):859–868,2011.[40] Yarin Gal, Riashat Islam, and Zoubin Ghahramani. Deep bayesianactive learning with image data. arXiv preprint arXiv:1703.02910,2017.[41] Reiner M Giebler, Ralf U Scherer, and Jurgen Peters. Incidence ofneurologic complications related to thoracic epidural catheterization.Anesthesiology: The Journal of the American Society of Anesthesiol-ogists, 86(1):55–63, 1997.[42] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. Deep sparserectifier neural networks. In Proceedings of the fourteenth interna-tional conference on Artificial Intelligence and Statistics, pages 315–323, 2011.[43] Thomas Grau, Erika Bartusseck, Renate Conradi, Eike Martin, andJohann Motsch. Ultrasound imaging improves learning curves in ob-stetric epidural anesthesia: a preliminary study. Canadian Journal ofAnesthesia, 50(10):1047–1050, 2003.[44] Thomas Grau, Ru¨diger W Leipold, Johannes Horter, Renate Conradi,Eike O Martin, and Johann Motsch. Paramedian access to the epidu-ral space: the optimum window for ultrasound imaging. Journal ofClinical Anesthesia, 13(3):213–217, 2001.111[45] Thomas Grau, Rudiger Wolfgang Leipold, Renate Conradi, Eike Mar-tin, and Johann Motsch. Efficacy of ultrasound imaging in obstetricepidural anesthesia. Journal of Clinical Anesthesia, 14(3):169–175,2002.[46] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin,and Aaron C Courville. Improved training of wasserstein gans. InAdvances in Neural Information Processing Systems, pages 5769–5779,2017.[47] Xifeng Guo, Xinwang Liu, En Zhu, and Jianping Yin. Deep cluster-ing with convolutional autoencoders. In International Conference onNeural Information Processing, pages 373–382. Springer, 2017.[48] Ilker Hacihaliloglu, Rafeef Abugharbieh, Antony J Hodgson, andRobert N Rohling. Bone surface localization in ultrasound using imagephase-based features. Ultrasound in Medicine & Biology, 35(9):1475–1487, 2009.[49] Ilker Hacihaliloglu, Abtin Rasoulian, Robert N Rohling, and PurangAbolmaesumi. Local phase tensor features for 3-d ultrasound to sta-tistical shape+pose spine model registration. IEEE Transactions onMedical Imaging, 33(11):2167–2179, 2014.[50] Stephen H. Halpern, Arnab Banerjee, Renato Stocche, and PhyllisGlanc. The use of ultrasound for lumbar spinous process identifica-tion: A pilot study. Canadian Journal of Anesthesia/Journal canadiend’anesthe´sie, 57(9):817–822, 2010.[51] Changhee Han, Hideaki Hayashi, Leonardo Rundo, Ryosuke Araki,Wataru Shimoda, Shinichi Muramatsu, Yujiro Furukawa, GiancarloMauri, and Hideki Nakayama. Gan-based synthetic brain mr imagegeneration. In 2018 IEEE 15th International Symposium on Biomed-ical Imaging (ISBI 2018), pages 734–738, 4 2018.[52] Simon Haykin. Neural networks: a comprehensive foundation. Pren-tice Hall PTR, 1994.[53] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Delvingdeep into rectifiers: Surpassing human-level performance on imagenetclassification. In 2015 IEEE International Conference on ComputerVision (ICCV), page nil, 12 2015.112[54] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deepresidual learning for image recognition. In 2016 IEEE Conference onComputer Vision and Pattern Recognition (CVPR), pages 770–778, 62016.[55] Jorden Hetherington, Victoria Lessoway, Vit Gunka, Purang Abol-maesumi, and Robert Rohling. Slide: Automatic spine level identifi-cation system using a deep convolutional neural network. InternationalJournal of Computer Assisted Radiology and Surgery, 12(7):1189–1198, 2017.[56] Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, XavierGlorot, Matthew Botvinick, Shakir Mohamed, and Alexander Ler-chner. beta-vae: Learning basic visual concepts with a constrainedvariational framework. In International Conference on Learning Rep-resentations, volume 3, 2017.[57] Geoffrey E Hinton and Ruslan R Salakhutdinov. Reducing the dimen-sionality of data with neural networks. science, 313(5786):504–507,2006.[58] VL Hoffmann, MP Vercauteren, JP Vreugde, GH Hans, HC Coppe-jans, and HA Adriaensen. Posterior epidural space depth: safety ofthe loss of resistance and hanging drop techniques. British Journal ofAnaesthesia, 83(5):807–809, 1999.[59] Steven CH Hoi, Rong Jin, Jianke Zhu, and Michael R Lyu. Batchmode active learning and its application to medical image classifica-tion. In Proceedings of the 23rd International Conference on MachineLearning, pages 417–424. ACM, 2006.[60] Shin Hoo-Chang, Holger R Roth, Mingchen Gao, Le Lu, Ziyue Xu,Isabella Nogues, Jianhua Yao, Daniel Mollura, and Ronald M Sum-mers. Deep convolutional neural networks for computer-aided detec-tion: Cnn architectures, dataset characteristics and transfer learning.IEEE Transactions on Medical Imaging, 35(5):1285, 2016.[61] Xianxu Hou, Linlin Shen, Ke Sun, and Guoping Qiu. Deep feature con-sistent variational autoencoder. In 2017 IEEE Winter Conference onApplications of Computer Vision (WACV), pages 1133–1141. IEEE,2017.113[62] Damian Hoy, Lyn March, Peter Brooks, Fiona Blyth, Anthony Woolf,Christopher Bain, Gail Williams, Emma Smith, Theo Vos, Jan Baren-dregt, Chris Murray, Roy Burstein, and Rachelle Buchbinder. Theglobal burden of low back pain: estimates from the global burden ofdisease 2010 study. Annals of the Rheumatic Diseases, 73(6):968–974,2014.[63] Wei-Ning Hsu, Yu Zhang, and James Glass. Unsupervised domainadaptation for robust speech recognition via variational autoencoder-based data augmentation. arXiv preprint arXiv:1707.06265, 2017.[64] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q.Weinberger. Densely connected convolutional networks. In 2017 IEEEConference on Computer Vision and Pattern Recognition (CVPR),page nil, 7 2017.[65] Forrest N Iandola, Song Han, Matthew W Moskewicz, Khalid Ashraf,William J Dally, and Kurt Keutzer. Squeezenet: Alexnet-level ac-curacy with 50x fewer parameters and < 0.5 mb model size. arXivpreprint arXiv:1602.07360, 2016.[66] N. Jaitly and G. Hinton. Learning a better representation of speechsoundwaves using restricted boltzmann machines. In 2011 IEEE In-ternational Conference on Acoustics, Speech and Signal Processing(ICASSP), pages 5884–5887, May 2011.[67] Pan Ji, Tong Zhang, Hongdong Li, Mathieu Salzmann, and Ian Reid.Deep subspace clustering networks. In I. Guyon, U. V. Luxburg,S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett,editors, Advances in Neural Information Processing Systems 30, pages24–33. Curran Associates, Inc., 2017.[68] Pan Ji, Tong Zhang, Hongdong Li, Mathieu Salzmann, and Ian Reid.Deep subspace clustering networks. In Advances in Neural InformationProcessing Systems, pages 24–33, 2017.[69] James Edward Johnson, Anahi Perlas, and Ki Jinn Chin. Ultrasoundguidance for central neuraxial anesthesia and analgesia. Advances inAnesthesia, 32(1):149–169, 2014.[70] Konstantinos Kamnitsas, Christian Ledig, Virginia F.J. Newcombe,Joanna P. Simpson, Andrew D. Kane, David K. Menon, Daniel Rueck-ert, and Ben Glocker. Efficient multi-scale 3d cnn with fully connected114crf for accurate brain lesion segmentation. Medical Image Analysis,36:61–78, 2017.[71] MK Karmakar, X Li, AM-H Ho, WH Kwok, and PT Chui. Real-timeultrasound-guided paramedian epidural access: evaluation of a novelin-plane technique. British Journal of Anaesthesia, 102(6):845–854,2009.[72] Siavash Khallaghi, Parvin Mousavi, Ren Hui Gong, Sean Gill,Jonathan Boisvert, Gabor Fichtinger, David Pichora, Dan Borsch-neck, and Purang Abolmaesumi. Registration of a statistical shapemodel of the lumbar spine to 3d ultrasound images. In Medical ImageComputing and Computer-Assisted Intervention–MICCAI 2010, pages68–75. Springer, 2010.[73] Hyunjik Kim and Andriy Mnih. Disentangling by factorising. arXivpreprint arXiv:1802.05983, 2018.[74] Lawrence T Kim. Principles of ultrasound and applied ultrasoundphysics relevant for advanced sonographers. In Advanced Thyroid andParathyroid Ultrasound, pages 37–47. Springer, 2017.[75] Richard E. Kinard. Diagnostic spinal injection procedures. Neuro-surgery Clinics of North America, 7(1):151–166, 1996.[76] Diederik P Kingma and Max Welling. Auto-encoding variationalbayes. arXiv preprint arXiv:1312.6114, 2013.[77] Jens Kleesiek, Gregor Urban, Alexander Hubert, Daniel Schwarz,Klaus Maier-Hein, Martin Bendszus, and Armin Biller. Deep MRIbrain extraction: A 3D convolutional neural network for skull strip-ping. NeuroImage, pages 460–469, 2016.[78] Jack Klys, Jake Snell, and Richard Zemel. Learning latent subspacesin variational autoencoders. In Advances in Neural Information Pro-cessing Systems, pages 6444–6454, 2018.[79] Teuvo Kohonen. The self-organizing map. Proceedings of the IEEE,78(9):1464–1480, 1990.[80] Dan J Kopacz, Joseph M Neal, and Julia E Pollock. The regional anes-thesia” learning curve”. what is the minimum number of epidural andspinal blocks to reach consistency? Regional Anesthesia, 21(3):182–190, 1996.115[81] Robert Korez, Bosˇtjan Likar, Franjo Pernusˇ, and Tomazˇ Vrtovec.Model-based segmentation of vertebral bodies from mr images with3d cnns. In International Conference on Medical Image Computingand Computer-Assisted Intervention, pages 433–441. Springer, 2016.[82] Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers offeatures from tiny images. 2009.[83] Alex Krizhevsky, Geoffrey Hinton, et al. Factored 3-way restrictedboltzmann machines for modeling natural images. In Proceedings ofthe Thirteenth International Conference on Artificial Intelligence andStatistics, pages 621–628, 2010.[84] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenetclassification with deep convolutional neural networks. In Advances inNeural Information Processing Systems, pages 1097–1105, 2012.[85] R Kube, FM Bianchi, D Brunner, and B LaBombard. Outlier classifi-cation using autoencoders: application for fluctuation driven flows infusion plasmas. arXiv preprint arXiv:1804.08927, 2018.[86] D. Kumar, A. Wong, and D. A. Clausi. Lung nodule classificationusing deep features in ct images. In 2015 12th Conference on Computerand Robot Vision, pages 133–138, June 2015.[87] Alex Lamb, Vincent Dumoulin, and Aaron Courville. Discriminativeregularization for generative models. arXiv preprint arXiv:1602.03220,2016.[88] Ghislaine Le Coq, Be´atrice Ducot, and Dan Benhamou. Risk fac-tors of inadequate pain relief during epidural analgesia for labour anddelivery. Canadian Journal of Anaesthesia, 45(8):719–723, 1998.[89] Timo J Lechner, Maarten G van Wijk, Ad J Maas, Frank R vanDorsten, Ronald A Drost, Chris J Langenberg, Leo J Teunissen,Paul H Cornelissen, and Jan van Niekerk. Clinical results with theacoustic puncture assist device, a new acoustic device to identify theepidural space. Anesthesia & Analgesia, 96(4):1183–1187, 2003.[90] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-basedlearning applied to document recognition. Proceedings of the IEEE,86(11):2278–2324, 1998.116[91] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning.Nature, 521(7553):436–444, 2015.[92] Yann LeCun, Le´on Bottou, Yoshua Bengio, and Patrick Haffner.Gradient-based learning applied to document recognition. Proceed-ings of the IEEE, 86(11):2278–2324, 1998.[93] Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwrittendigit database. AT&T Labs [Online]. Available: http://yann. lecun.com/exdb/mnist, 2, 2010.[94] Muyinatu A Lediju, Michael J Pihl, Jeremy J Dahl, and Gregg ETrahey. Quantitative assessment of the magnitude, impact and spatialextent of ultrasonic clutter. Ultrasonic Imaging, 30(3):151–168, 2008.[95] Allison Lee. Ultrasound in obstetric anesthesia. In Seminars in Peri-natology, volume 38, pages 349–358. Elsevier, 2014.[96] Joseph Lemley, Shabab Bazrafkan, and Peter Corcoran. Smart aug-mentation learning an optimal data augmentation strategy. IEEE Ac-cess, 5:5858–5869, 2017.[97] Yang Li, Quan Pan, Suhang Wang, Haiyun Peng, Tao Yang, and ErikCambria. Disentangled variational auto-encoder for semi-supervisedlearning. Information Sciences, 482:73–85, 2019.[98] Tung-Liang Lin, Chin-Teng Chung, Howard Haw-Chang Lan, andHuey-Min Sheen. Ultrasound-guided facet joint injection to treat aspinal cyst. Journal of the Chinese Medical Association, 77(4):213–216, 2014.[99] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, ArnaudArindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian,Jeroen Awm Van Der Laak, Bram Van Ginneken, and Clara I Sa´nchez.A survey on deep learning in medical image analysis. Medical imageanalysis, 42:60–88, 2017.[100] Geert Litjens, Thijs Kooi, Babak Ehteshami Bejnordi, ArnaudArindra Adiyoso Setio, Francesco Ciompi, Mohsen Ghafoorian,Jeroen A.W.M. van der Laak, Bram van Ginneken, and Clara I.Sa´nchez. A survey on deep learning in medical image analysis. MedicalImage Analysis, 42:60–88, 2017.117[101] Spencer S Liu, Justin E Ngeow, and Jacques T YaDeau. Ultrasound-guided regional anesthesia and analgesia: a qualitative systematic re-view. Regional Anesthesia and Pain Medicine, 34(1):47–59, 2009.[102] Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully convolu-tional networks for semantic segmentation. In 2015 IEEE Conferenceon Computer Vision and Pattern Recognition (CVPR), pages 341–3440, 6 2015.[103] LinLi Luo, Juan Ni, Lan Wu, and Dong Luo. Ultrasound-guided epidu-ral anesthesia for a parturient with severe malformations of the skeletalsystem undergoing cesarean delivery: a case report. Local and RegionalAnesthesia, 8:7, 2015.[104] W. Luo, J. Li, J. Yang, W. Xu, and J. Zhang. Convolutional sparseautoencoders for image classification. IEEE Transactions on NeuralNetworks and Learning Systems, 29(7):3289–3294, July 2018.[105] SP Luttrell. Self-organization: A derivation from first principles ofa class of learning algorithms. In International Joint Conference onNeural Networks, volume 2, pages 495–498. IEEE Computer SocietyPress, 1989.[106] Laurens Maaten. Learning a parametric embedding by preserving localstructure. In Artificial Intelligence and Statistics, pages 384–391, 2009.[107] Laurens van der Maaten and Geoffrey Hinton. Visualizing data usingt-sne. Journal of Machine Learning Research, 9(Nov):2579–2605, 2008.[108] Laura Mac´ıas-Garc´ıa, Jose´ Mar´ıa Luna-Romera, Jorge Garc´ıa-Gutie´rrez, Mar´ıa Mart´ınez-Ballesteros, Jose´ C Riquelme-Santos, andRicardo Gonza´lez-Ca´mpora. A study of the suitability of autoencodersfor preprocessing data in breast cancer experimentation. Journal ofBiomedical Informatics, 72:33–44, 2017.[109] Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfel-low, and Brendan Frey. Adversarial autoencoders. arXiv preprintarXiv:1511.05644, 2015.[110] PA Malenfant, Vit Gunka, Parmida Beigi, Abtin Rasoulian, RobertRohling, and A Dube. Accuracy of 3d ultrasound for identification ofepidural needle skin insertion point in parturients; a prospective obser-vational study. In Society for Obstetric Anesthesia and Perinatology(SOAP) 46th Annual Meeting. Toronto, ON, Canada, page 308, 2014.118[111] L Manchikanti, AD Kaye, K Manchikanti, M Boswell, V Pampati, andJ Hirsch. Efficacy of epidural injections in the treatment of lumbarcentral spinal stenosis: a systematic review. Anesthesiology and PainMedicine, 5(1):e23139, 2015.[112] Laxmaiah Manchikanti and Vijay Singh. Interventional pain manage-ment: Evolving issues for 2003. Pain Physician, 6(1):125–138, 2003.[113] Clarita B Margarido, Cristian Arzola, Mrinalini Balki, and Jose CACarvalho. Anesthesiologists learning curves for ultrasound assessmentof the lumbar spine. Canadian Journal of Anesthesia/Journal cana-dien d’anesthe´sie, 57(2):120–126, 2010.[114] Maciej A. Mazurowski, Piotr A. Habas, Jacek M. Zurada, Joseph Y.Lo, Jay A. Baker, and Georgia D. Tourassi. Training neural net-work classifiers for medical decision making: the effects of imbalanceddatasets on classification performance. Neural Networks, 21(2-3):427–436, 2008.[115] Quinn McNemar. Note on the sampling error of the difference betweencorrelated proportions or percentages. Psychometrika, 12(2):153–157,Jun 1947.[116] Lars Mescheder, Sebastian Nowozin, and Andreas Geiger. Adversarialvariational bayes: Unifying variational autoencoders and generativeadversarial networks. arXiv preprint arXiv:1701.04722, 2017.[117] Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. V-net:Fully convolutional neural networks for volumetric medical image seg-mentation. In 2016 Fourth International Conference on 3D Vision(3DV), pages 565–571, 10 2016.[118] John Moore, Colin Clarke, Daniel Bainbridge, Chris Wedlake, AndrewWiles, Danielle Pace, and Terry Peters. Image guidance for spinalfacet injections using tracked ultrasound. In Medical Image Com-puting and Computer-Assisted Intervention–MICCAI, pages 516–523.Springer, 2009.[119] Christopher JL Murray, Theo Vos, Rafael Lozano, Mohsen Naghavi,Abraham D Flaxman, Catherine Michaud, Majid Ezzati, KenjiShibuya, Joshua A Salomon, and Safa Abdalla. Disability-adjusted119life years (dalys) for 291 diseases and injuries in 21 regions, 1990–2010: a systematic analysis for the global burden of disease study2010. The Lancet, 380(9859):2197–2223, 2013.[120] Christopher JL Murray, Theo Vos, Rafael Lozano, Mohsen Naghavi,Abraham D Flaxman, Catherine Michaud, Majid Ezzati, KenjiShibuya, Joshua A Salomon, Safa Abdalla, et al. Disability-adjustedlife years (dalys) for 291 diseases and injuries in 21 regions, 1990–2010:a systematic analysis for the global burden of disease study 2010. TheLancet, 380(9859):2197–2223, 2012.[121] Cosmas Mwikirize, John L. Nosher, and Ilker Hacihaliloglu. Convolu-tion neural networks for real-time needle detection and localization in2d ultrasound. International Journal of Computer Assisted Radiologyand Surgery, 13(5):647–657, 2018.[122] Sameer A Nene, Shree K Nayar, Hiroshi Murase, et al. Columbiaobject image library (coil-20). 1996.[123] Long D. Nguyen, Dongyun Lin, Zhiping Lin, and Jiuwen Cao. Deepcnns for microscopic image classification by exploiting transfer learningand feature concatenation. In 2018 IEEE International Symposium onCircuits and Systems (ISCAS), pages 1–5, 5 2018.[124] Ahtsham Uddin Niazi, Nidhi Haldipur, Arun G Prasad, and Vin-cent W Chan. Ultrasound-guided regional anesthesia performancein the early learning period: effect of simulation training. RegionalAnesthesia and Pain Medicine, 37(1):51–54, 2012.[125] AU Niazi, KJ Chin, R Jin, and VW Chan. Real-time ultrasound-guided spinal anesthesia using the sonixgps ultrasound guidancesystem: a feasibility study. Acta Anaesthesiologica Scandinavica,58(7):875–881, 2014.[126] RWD Nickalls and MS Kokri. The width of the posterior epiduralspace in obstetric patients. Anaesthesia, 41(4):432–433, 1986.[127] Dong Nie, Roger Trullo, Jun Lian, Caroline Petitjean, Su Ruan, QianWang, and Dinggang Shen. Medical image synthesis with context-aware generative adversarial networks. In International Conference onMedical Image Computing and Computer-Assisted Intervention, pages417–425. Springer, 2017.120[128] Hiromitsu Nishizaki. Data augmentation and feature extraction usingvariational autoencoder for acoustic modeling. In Asia-Pacific Signaland Information Processing Association Annual Summit and Confer-ence (APSIPA ASC), 2017, pages 1222–1227. IEEE, 2017.[129] Jochen Obernauer, Klaus Galiano, Hannes Gruber, Reto Bale,Alois Albert Obwegeser, Reinhold Schatzer, and Alexander Loizides.Ultrasound-guided versus computed tomography-controlled facet jointinjections in the middle and lower cervical spine: a prospective ran-domized clinical trial. Medical Ultrasonography Journal, 15(1):10–5,2013.[130] Ki Deok Park, Tai Kon Kim, Woo Yong Lee, JaeKi Ahn, Sung HoonKoh, and Yongbum Park. Ultrasound-guided versus fluoroscopy-guided caudal epidural steroid injection for the treatment of unilaterallower lumbar radicular pain: Case-controlled, retrospective, compara-tive study. Medicine, 94(50):e2261, 2015.[131] Emanuel Parzen. On estimation of a probability density function andmode. The Annals of Mathematical Statistics, 33(3):1065–1076, 1962.[132] Alberto Pasqualucci, Giustino Varrassi, Antonio Braschi, Vito AldoPeduto, Andrea Brunelli, Franco Marinangeli, Fabio Gori, FrancescaColo`, Antonella Paladini, and Francesco Mojoli. Epidural local anes-thetic plus corticosteroid for the treatment of cervical brachial radic-ular pain: Single injection versus continuous infusion. The ClinicalJournal of Pain, 23(7):551–557, 2007.[133] Anahi Perlas, Luis E Chaparro, and Ki Jinn Chin. Lumbar neuraxialultrasound for spinal and epidural anesthesia: a systematic review andmeta-analysis. Regional Anesthesia and Pain Medicine, 41(2):251–260,2016.[134] Yuchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan,Chunyuan Li, and Lawrence Carin. Adversarial symmetric variationalautoencoder. In Advances in Neural Information Processing Systems,pages 4330–4339, 2017.[135] Hedyeh Rafii-Tari, Victoria A Lessoway, Allaudin A Kamani, PurangAbolmaesumi, and Robert Rohling. Panorama ultrasound for naviga-tion and guidance of epidural anesthesia. Ultrasound in Medicine &Biology, 41(8):2220–2231, 2015.121[136] Bahbibi Rahmatullah, Aris T Papageorghiou, and J Alison Noble.Integration of local and global features for anatomical object detectionin ultrasound. In Medical Image Computing and Computer-AssistedIntervention–MICCAI, pages 402–409. Springer, 2012.[137] Kashif Rajpoot, Alison Noble, Vicente Grau, and Nasir MahmoodRajpoot. Feature detection from echocardiography images using localphase information. In: Medical Image Understanding and Analysis,2008.[138] Abtin Rasoulian, Robert N Rohling, and Purang Abolmaesumi. Aug-mentation of paramedian 3d ultrasound images of the spine. In Infor-mation Processing in Computer-Assisted Interventions, pages 51–60.Springer, 2013.[139] Abtin Rasoulian, Alexander Seitel, Jill Osborn, Samira Sojoudi,Saman Nouranian, Victoria A Lessoway, Robert N Rohling, and Pu-rang Abolmaesumi. Ultrasound-guided spinal injections: a feasibilitystudy of a guidance system. International Journal of Computer As-sisted Radiology and Surgery, 10(9):1417–1425, 2015.[140] Carole A. Ridge, Afra Yildirim, Phillip M. Boiselle, Tomas Franquet,Cornelia M. Schaefer-Prokop, Denis Tack, Pierre Alain Gevenois, andAlexander A. Bankier. Differentiating between subsolid and solid pul-monary nodules at ct: Inter- and intraobserver agreement betweenexperienced thoracic radiologists. Radiology, 278(3):888–896, 2016.[141] Martin G Roberts, Tim F Cootes, Elisa Pacheco, Teik Oh, and Ju-dith E Adams. Segmentation of lumbar vertebrae using part-basedgraphs and active appearance models. In Medical Image Computingand Computer-Assisted Intervention–MICCAI 2009, pages 1017–1024.Springer, 2009.[142] Marc-Michel Rohe´, Manasi Datar, Tobias Heimann, Maxime Serme-sant, and Xavier Pennec. Svf-net: Learning deformable image regis-tration using shape matching. In International Conference on Medi-cal Image Computing and Computer-Assisted Intervention, pages 266–274. Springer, 2017.[143] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convo-lutional networks for biomedical image segmentation. In Lecture Notesin Computer Science, volume 9351, pages 234–241. Springer, 2015.122[144] Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convo-lutional networks for biomedical image segmentation. In InternationalConference on Medical Image Computing and Computer-assisted In-tervention, pages 234–241. Springer, 2015.[145] Devon I Rubin. Epidemiology and risk factors for spine pain. Neuro-logic Clinics, 25(2):353–371, 2007.[146] Oliver Ruprecht, Philipp Weisser, Boris Bodelle, Hanns Ackermann,and Thomas J. Vogl. Mri of the prostate: Interobserver agreementcompared with histopathologic outcome after radical prostatectomy.European Journal of Radiology, 81(3):456–460, 2012.[147] R Russell, M Popat, E Richards, and J Burry. Combined spinal epidu-ral anaesthesia for caesarean section: a randomised comparison of ox-ford, lateral and sitting positions. International Journal of ObstetricAnesthesia, 11(3):190–195, 2002.[148] Tara N Sainath, Brian Kingsbury, George Saon, Hagen Soltau, Abdel-rahman Mohamed, George Dahl, and Bhuvana Ramabhadran. Deepconvolutional neural networks for large-scale speech tasks. Neural Net-works, 64:39–48, 2015.[149] Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, AlecRadford, Xi Chen, and Xi Chen. Improved techniques for traininggans. In D. D. Lee, M. Sugiyama, U. V. Luxburg, I. Guyon, andR. Garnett, editors, Advances in Neural Information Processing Sys-tems 29, pages 2234–2242. 2016.[150] Ana Ellen Q Santiago, Plinio C Leal, Elmiro Helio M Bezerra,Ana Laura A Giraldes, Leonardo C Ferraro, Andre H Rezende, andRioko Kimiko Sakata. Ultrasound-guided facet block to low back pain:a case report. Brazilian Journal of Anesthesiology (English Edition),2014.[151] Hiroki Sato, Kiyoaki Tsukahara, Ray Motohashi, Minoru Endo, andKazuhiro Nakamura. Application of cervical epidural anesthesia in pa-tients with head and neck carcinoma. International Cancer ConferenceJournal, 4(2):122–126, 2014.[152] Tanya Schmah, Geoffrey E Hinton, Steven L Small, Stephen Strother,and Richard S Zemel. Generative versus discriminative training of123rbms for classification of fmri images. In Advances in Neural Informa-tion Processing Systems, pages 1409–1416, 2009.[153] Nalini Sehgal, Rinoo V Shah, Anne Marie McKenzie-Brown, and Clif-ford R Everett. Diagnostic utility of facet (zygapophysial) joint in-jections in chronic spinal pain: a systematic review of evidence. PainPhysician, 8(2):211–224, 2005.[154] Torgyn Shaikhina, Dave Lowe, Sunil Daga, David Briggs, RobertHiggins, and Natasha Khovanova. Machine learning for predictivemodelling based on small data in biomedical engineering. IFAC-PapersOnLine, 48(20):469–474, 2015.[155] W. Shao, L. Sun, and D. Zhang. Deep active learning for nucleusclassification in pathology images. In 2018 IEEE 15th InternationalSymposium on Biomedical Imaging (ISBI 2018), pages 199–202, April2018.[156] H. Shin, M. Orton, D. J. Collins, S. Doran, and M. O. Leach. Autoen-coder in time-series analysis for unsupervised tissues characterisationin a large unlabelled medical image dataset. In 2011 10th InternationalConference on Machine Learning and Applications and Workshops,volume 1, pages 259–264, Dec 2011.[157] Richard Silbergleit, Bharat A. Mehta, William P. Sanders, and San-jay J. Talati. Imaging-guided injection techniques with fluoroscopyand ct for spinal pain management. RadioGraphics, 21(4):927–939,2001.[158] Brian D Sites, Brian C Spence, John D Gallagher, Christopher W Wi-ley, Marc L Bertrand, and George T Blike. Characterizing novice be-havior associated with learning ultrasound-guided peripheral regionalanesthesia. Regional Anesthesia and Pain Medicine, 32(2):107–115,2007.[159] Paul Smolensky. Information processing in dynamical systems: Foun-dations of harmony theory. Technical report, COLORADO UNIV ATBOULDER DEPT OF COMPUTER SCIENCE, 1986.[160] Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structuredoutput representation using deep conditional generative models. InAdvances in Neural Information Processing Systems, pages 3483–3491,2015.124[161] Thorvald Sørensen. A method of establishing groups of equal ampli-tude in plant sociology based on similarity of species and its appli-cation to analyses of the vegetation on danish commons. BiologiskeSkrifter, 5:1–34, 1948.[162] Karthikeyan Kallidaikurichi Srinivasan, Gabriella Iohom, FrankLoughnane, and Peter J Lee. Conventional landmark-guided mid-line versus preprocedure ultrasound-guided paramedian techniques inspinal anesthesia. Anesthesia & Analgesia, 121(4):1089–1096, 2015.[163] Akash Srivastava, Lazar Valkoz, Chris Russell, Michael U Gutmann,and Charles Sutton. Veegan: Reducing mode collapse in gans us-ing implicit variational learning. In Advances in Neural InformationProcessing Systems, pages 3308–3318. 2017.[164] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever,and Ruslan Salakhutdinov. Dropout: A simple way to prevent neuralnetworks from overfitting. The Journal of Machine Learning Research,15(1):1929–1958, 2014.[165] P. Stanitsas, A. Cherian, A. Truskinovsky, V. Morellas, and N. Pa-panikolopoulos. Active convolutional neural networks for canceroustissue recognition. In 2017 IEEE International Conference on ImageProcessing (ICIP), pages 1367–1371, Sept 2017.[166] Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, ScottReed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, andAndrew Rabinovich. Going deeper with convolutions. In 2015 IEEEConference on Computer Vision and Pattern Recognition (CVPR),pages 1–9, 6 2015.[167] Alexander R Toews, Simon Massey, Vit Gunka, Victoria A Lessoway,and Robert N Rohling. Projectalign: a real-time ultrasound guidancesystem for spinal midline detection during epidural needle placement.In Medical Imaging 2018: Image-Guided Procedures, Robotic Inter-ventions, and Modeling, volume 10576, page 1057630. InternationalSociety for Optics and Photonics, 2018.[168] De QH Tran, Andrea P Gonza´lez, Francisca Bernucci, and Roderick JFinlayson. Confirmation of loss-of-resistance for epidural analgesia.Regional Anesthesia and Pain Medicine, 40(2):166–173, 2015.125[169] Denis Tran, King-Wei Hor, Allaudin A Kamani, Victoria A Lessoway,and Robert N Rohling. Instrumentation of the loss-of-resistance tech-nique for epidural needle insertion. IEEE Transactions on BiomedicalEngineering, 56(3):820–827, 2009.[170] Denis Tran, Allaudin A Kamani, Elias Al-Attas, Victoria A Lessoway,Simon Massey, and Robert N Rohling. Single-operator real-timeultrasound-guidance to aim and insert a lumbar epidural needle.Canadian Journal of Anesthesia/Journal Canadien D’anesthe´sie,57(4):313–321, 2010.[171] Denis Tran, Allaudin A. Kamani, Victoria A. Lessoway, Carly Peter-son, King Wei Hor, and Robert N. Rohling. Preinsertion paramedianultrasound guidance for epidural anesthesia. Anesthesia & Analgesia,109(2):661–667, 2009.[172] Denis Tran and Robert N Rohling. Automatic detection of lumbaranatomy in ultrasound images of human subjects. IEEE Transactionson Biomedical Engineering, 57(9):2248–2256, 2010.[173] Toan Tran, Trung Pham, Gustavo Carneiro, Lyle Palmer, and IanReid. A bayesian data augmentation approach for learning deep mod-els. In Advances in Neural Information Processing Systems, pages2794–2803, 2017.[174] Alexander Tyler Clark, Farzad Wong, and Khalvati1 MasoomB AHaider. Fully deep convolutional neural networks for segmentationof the prostate gland in diffusion-weighted mr images. In Image Anal-ysis and Recognition: 14th International Conference, ICIAR 2017,Montreal, QC, Canada, July 5–7, 2017, Proceedings, volume 10317,page 97. Springer, 2017.[175] Tamas Ungi, Purang Abolmaesumi, Rayhan Jalal, Mattea Welch,Irene Ayukawa, Simrin Nagpal, Andras Lasso, Melanie Jaeger,Daniel P Borschneck, Gabor Fichtinger, and Parvin Mousavi. Spinalneedle navigation by tracked ultrasound snapshots. IEEE Transac-tions on Biomedical Engineering, 59(10):2766–2772, 2012.[176] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and Pierre-AntoineManzagol. Extracting and composing robust features with denoisingautoencoders. In Proceedings of the 25th International Conference onMachine Learning, ICML ’08, pages 1096–1103, New York, NY, USA,2008. ACM.126[177] Pascal Vincent, Hugo Larochelle, Isabelle Lajoie, Yoshua Bengio, andPierre-Antoine Manzagol. Stacked denoising autoencoders: Learninguseful representations in a deep network with a local denoising cri-terion. Journal of Machine Learning Research, 11(Dec):3371–3408,2010.[178] RD Vincent and DH Chestnut. Which position is more comfortablefor the parturient during identification of the epidural space? Inter-national Journal of Obstetric Anesthesia, 1(1):9–11, 1991.[179] Alexey G Voloshin. Four-dimensional ultrasound guidance duringepidural anaesthesia. Journal of Ultrasound, 18(2):135–142, 2015.[180] Robert F Wagner, Stephen W Smith, John M Sandrik, and HectorLopez. Statistics of speckle in ultrasound b-scans. IEEE Transactionson Sonics and Ultrasonics, 30(3):156–163, 1983.[181] Zhiqiang Wan, Yazhou Zhang, and Haibo He. Variational autoencoderbased synthetic data generation for imbalanced learning. In 2017 IEEESymposium Series on Computational Intelligence (SSCI), pages 1–7,11 2017.[182] Haibo Wang, Angel Cruz-Roa, Ajay Basavanhally, Hannah Gilmore,Natalie Shih, Mike Feldman, John Tomaszewski, Fabio Gonzalez, andAnant Madabhushi. Mitosis detection in breast cancer pathology im-ages by combining handcrafted and convolutional neural network fea-tures. Journal of Medical Imaging, 1(3):034003–034003, 2014.[183] Jinhua Wang, Xi Yang, Hongmin Cai, Wanchang Tan, Cangzheng Jin,and Li Li. Discrimination of breast cancer with microcalcifications onmammography by deep learning. Scientific Reports, 6:27327, 2016.[184] Brian R Waterman, Philip J Belmont Jr, and Andrew J Schoenfeld.Low back pain in the united states: incidence and risk factors forpresentation in the emergency setting. The Spine Journal, 12(1):63–70, 2012.[185] Jason Weston, Fre´de´ric Ratle, Hossein Mobahi, and Ronan Collobert.Deep learning via semi-supervised embedding. In Neural Networks:Tricks of the Trade, pages 639–655. Springer, 2012.[186] Svante Wold, Kim Esbensen, and Paul Geladi. Principal componentanalysis. Chemometrics and Intelligent Laboratory Systems, 2(1-3):37–52, 1987.127[187] Jelmer M Wolterink, Tim Leiner, Bob D de Vos, Robbert W vanHamersvelt, Max A Viergever, and Ivana Isˇgum. Automatic coronaryartery calcium scoring in cardiac ct angiography using paired convo-lutional neural networks. Medical Image Analysis, 34:123–136, 2016.[188] Sebastien C. Wong, Adam Gatt, Victor Stamatescu, and Mark D. Mc-Donnell. Understanding data augmentation for classification: When towarp? In 2016 International Conference on Digital Image Computing:Techniques and Applications (DICTA), pages 1–6, 11 2016.[189] Tao Wu, Wei-hua Zhao, Yan Dong, Hai-xin Song, and Jian-hua Li.Effectiveness of ultrasound-guided versus fluoroscopy or computedtomography scanning guidance in lumbar facet joint injections inadults with facet joint syndrome: a meta-analysis of controlled trials.Archives of Physical Medicine and Rehabilitation, 97(9):1558–1563,2016.[190] Yawen Xiao, Jun Wu, Zongli Lin, and Xiaodong Zhao. A semi-supervised deep learning method based on stacked sparse auto-encoderfor cancer prediction using rna-seq data. Computer Methods and Pro-grams in Biomedicine, 166:99–105, 2018.[191] Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deepembedding for clustering analysis. In International Conference onMachine Learning, pages 478–487, 2016.[192] Xinpeng Xie, Yuexiang Li, and Linlin Shen. Active learning for breastcancer identification. arXiv preprint arXiv:1804.06670, 2018.[193] Jun Xu, Lei Xiang, Qingshan Liu, Hannah Gilmore, Jianzhong Wu,Jinghai Tang, and Anant Madabhushi. Stacked sparse autoencoder(ssae) for nuclei detection on breast cancer histopathology images.IEEE Transactions on Medical Imaging, 35(1):119–130, 2016.[194] Ge Yang, Jinfeng Liu, Liangjuan Ma, Zhenhua Cai, Chao Meng, Si-hua Qi, and Huacheng Zhou. Ultrasound-guided versus fluoroscopy-controlled lumbar transforaminal epidural injections: A prospectiverandomized clinical trial. The Clinical Journal of Pain, pages 103–108, 2015.[195] Lin Yang, Yizhe Zhang, Jianxu Chen, Siyuan Zhang, and Danny Z.Chen. Suggestive annotation: A deep active learning framework forbiomedical image segmentation. In Medical Image Computing and128Computer-Assisted Intervention 2017, pages 399–407, Cham, 2017.Springer International Publishing.[196] Moi Hoon Yap, Gerard Pons, Joan Mart´ı, Sergi Ganau, Melcior Sent´ıs,Reyer Zwiggelaar, Adrian K Davison, and Robert Mart´ı. Automatedbreast ultrasound lesions detection using convolutional neural net-works. IEEE Journal of Biomedical and Health Informatics, 2017.[197] Jung Hyun Yoon, Myung Hyun Kim, Eun-Kyung Kim, Hee JungMoon, Jin Young Kwak, and Min Jung Kim. Interobserver variabil-ity of ultrasound elastography: How it affects the diagnosis of breastlesions. American Journal of Roentgenology, 196(3):730–736, 2011.[198] Steve H Yoon, Sarah Lee OBrien, and Mike Tran. Ultrasound guidedspine injections: Advancement over fluoroscopic guidance? CurrentPhysical Medicine and Rehabilitation Reports, 1(2):104–113, 2013.[199] Shuang Yu, Kok Kiong Tan, Ban Leong Sng, Shengjin Li, and AlexTiong Heng Sia. Lumbar ultrasound image feature extraction andclassification with support vector machine. Ultrasound in Medicine &Biology, 41(10):2677–2689, 2015.[200] Shuang Yu, Kok Kiong Tan, Ban Leong Sng, Shengjin Li, and AlexTiong Heng Sia. Real-time automatic spinal level identification withultrasound image processing. In Biomedical Imaging (ISBI), 2015IEEE 12th International Symposium on, pages 243–246. IEEE, 2015.[201] Mohammad Reza Zare, Woo Chaw Seng, and Ahmed Mueen. Au-tomatic classification of medical x-ray images using a bag of visualwords. IET Computer Vision, 7(2):105–114, 2013.[202] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. Un-paired image-to-image translation using cycle-consistent adversarialnetworks. In 2017 IEEE International Conference on Computer Vi-sion (ICCV), pages 2242–2251, 10 2017.129Appendix AKL divergence of ICVAEThe goal of the ICVAE is to learn a distribution q(z|x), which approximatesthe true posterior p(z|x). This is achieved by minimizing the KL divergence:KL(q(z|x)||p(z|x)) = −∑zq(z|x) log p(z|x)q(z|x)By conditioning p(x, y, z) over x and substituting p(z|x):KL(q(z|x)||p(z|x)) = −∑zq(z|x) log p(x, y, z)p(y|z, x)p(x)q(z|x)= −∑zq(z|x) log p(x, y, z)p(y|z, x)q(z|x)+∑zq(z|x) log p(x)= −∑zq(z|x) log p(x, y, z)p(y|z, x)q(z|x)+ log p(x)=⇒ log p(x) =∑zq(z|x) log p(x, y, z)p(y|z, x)q(z|x)+KL(q(z|x)||p(z|x))Minimizing the KL divergence directly is not straightforward. On the otherhand, log p(x) is independent of z, i.e., a constant. Therefore, we can max-imize the first term of the right hand side of the above equation, which isreferred to as the variational lower bound.130Appendix BICVAE’s variational lowerboundBy conditioning p(x, y, z) over z, the variational lower bound can be rewrit-ten as:L =∑zq(z|x) log p(y|z, x)p(x|z)p(z)p(y|z, x)q(z|x)=∑zq(z|x) log p(x|z) +∑zq(z|x) log p(z)q(z|x)The second term corresponds to the negative KL divergence between q(z|x)and p(z). p(x, y, z) can be rewritten as:p(x, y, z) = p(x|y, z)p(y|z)p(z)= p(y|x, z)p(x|z)p(z)and since p(z) ∼ N (µ, σ) 6= 0 and y and z are independent, using the aboveequation, p(x|z) can be written as:p(x|z) = p(x|y, z)p(y|z)p(y|x, z) =p(x|y, z)p(y)p(y|x)Therefore:L =∑zq(z|x) log p(x|z)−KL(q(z|x)||p(z))=∑zq(z|x) log p(x|y, z)p(y)p(y|x) −KL(q(z|x)||p(z))=∑zq(z|x) log p(x|y, z) +∑zq(z|x) log p(y)−∑zq(z|x) log p(y|x)−KL(q(z|x)||p(z))131Finally, since∑z q(z|x) = 1 and log p(y) and log p(y|x) are independent ofz, the variational lower bound can be rewritten as:L =Eq(z|x)[log p(x|y, z)]−KL(q(z|x)||p(z)) + log p(y)− log p(y|x).132Appendix CStandard Errorlog p(Y ) − log p(Y |X) can be viewed as a Monte Carlo sampling error, be-cause p(Y |X) is given over a mini-batch, which is psuedo-randomly selectedfrom D and p(Y ) is the true prior of the labels. Let µy and y¯ be the mean ofthe true prior and the mean of a batch with size N , respectively. Therefore,y¯ = 1/N∑Ni=1 yi. Since each batch is selected randomly from the dataset,the expected value of yi is µy. Therefore the expectation of the error (µy− y¯)is:E[µy − y¯] = µy − E[1/N∑iyi]= µy − 1/N∑iE[yi]= µy −N × 1/Nµy = 0.Since there is no bias in selecting the batches during training, the ex-pected value of the error is zero. However, the error has a non-zero variance,which depends on the size of the batch. This can be shown by taking thevariance of y¯ − µy, which is:V (y¯ − µy) = V(1/N∑iyi)− 0 = 1/N2V (∑iyi).Since samples in a batch are independent, the variance of their sum equalsthe sum of their variance. Therefore, V (y¯ − µy) = 1/N2∑i V (yi), whichimplies that V (y¯−µy) = 1N2NV (y). V (y) is the variance of all labels. SinceE[y¯ − µy] = 0 and V (y¯ − µy) = V (y)/N , the central limit theorem suggeststhat the error would have a normal distribution with mean 0 and standarddeviation√V (y)/N , if N is large enough (N (0,√V (y)/N)). Notice thata large N would reduce the standard deviation of the Monte Carlo sam-pling error, and accordingly, the difference between the distributions of therandomly selected sample and the true prior would be negligible.133


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items