Robust and Fast T2 Decay Analysis forMeasuring Myelin Water in MRIbyYoungjin YooB.S., Kwangwoon University, 2003M.S., The State University of New York at Buffalo, 2005a thesis submitted in partial fulfillmentof the requirements for the degree ofMaster of Applied Scienceinthe faculty of graduate and postdoctoralstudies(Biomedical Engineering)The University of British Columbia(Vancouver)October 2013c? Youngjin Yoo, 2013AbstractMyelin is an essential component of nerve fibers and monitoring its healthis important for studying neurological diseases that attack myelin, such asmultiple sclerosis. The amount of water trapped within myelin, which is asurrogate for myelin content and integrity, can be measured in vivo usingMRI relaxation techniques that acquire a series of images at multiple echotimes to produce a T2 decay curve at each voxel. These curves are thenanalyzed, most commonly using non-negative least squares (NNLS) fitting,to produce T2 distributions from which water measurements are made. T2decay analysis using NNLS has two main challenges: instability and highcomputational demands. The main contributions of this thesis are: 1) wepropose a new regularization algorithm in which local and non-local infor-mation is gathered and used adaptively for each voxel and 2) we proposea hybrid utilization method of multicore CPUs and GPUs to improve theefficiency of a T2 decay analysis, with consideration of the increased compu-tational burden of regularization and careful analysis of which algorithmiccomponents would benefit from multicore CPU vs. GPU parallelization. Ourresults demonstrated that the proposed regularization method provided moreglobally consistent myelin water measurements, yet preserved fine structures.Our experiment with real patient data suggested that the algorithm improvedthe reproducibility and the ability to distinguish between the myelin maps ofmultiple sclerosis patients and healthy subjects. We also demonstrated ouroptimized implementation?s performance by comparing with a parallelizediiimplementation written in MATLAB which is the most commonly used plat-form for the analysis of T2 relaxation data. We found an improvement inspeed of over 4? when computing single seven-slice myelin map and over14? for a batch processing using the same number of CPU cores.iiiPrefaceAll of the work presented henceforth was conducted in the MS/MRI ResearchGroup at the University of British Columbia, Point Grey campus.Chapter 1. Figures are used with permission from applicable sources, Dr. Cor-ree Laule and Dr. Alex MacKay.Chapter 2. An earlier version of the work presented in this chapter has beenpublished to the MICCAI 2013 conference [Youngjin Yoo and Roger Tam.Non-local spatial regularization of MRI T2 relaxation images for myelin waterquantification. In Proceedings of Medical Image Computing and ComputerAssisted Intervention, vol. 8149, pp. 614-621, 2013]. The data used in thischapter was provided by Dr. Alex MacKay and his research group.Chapter 3. The MATLAB code for performing myelin water fraction (MWF)analysis using stimulated echo correction and the region of interest masksfor the concussion data were provided by Thomas Prasloski and Dr. AlexMacKay at the University of British Columbia.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Myelin Structure and Function . . . . . . . . . . . . . . . . . . 11.2 MR Techniques Used to Measure Myelin . . . . . . . . . . . . 21.3 Tissue Relaxation . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 Myelin Water Imaging . . . . . . . . . . . . . . . . . . . . . . 51.5 Two Key Challenges in T2 Decay Analysis . . . . . . . . . . . 71.6 Hypotheses and Objectives . . . . . . . . . . . . . . . . . . . . 81.7 Overview of Thesis . . . . . . . . . . . . . . . . . . . . . . . . 102 Non-Local Spatial Regularization . . . . . . . . . . . . . . . . 11v2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 T2 Decay Analysis . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4.1 Visual assessment . . . . . . . . . . . . . . . . . . . . . 202.4.2 Coefficient of variation . . . . . . . . . . . . . . . . . . 202.4.3 Student?s t-test and overlapping coefficient . . . . . . . 233 Fast Myelin Water Measurement using Multicore Archi-tectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.2 Flip Angle Optimization . . . . . . . . . . . . . . . . . . . . . 303.3 Implementation Using Multicore CPUs and GPU . . . . . . . 313.3.1 Solving NNLS for each basis flip angle(Module 1) . . . . . . . . . . . . . . . . . . . . . . . . 313.3.2 Interpolation to find an optimal flip angle (Module 2) . 343.3.3 NNLS basis matrix computation for an optimal flipangle (Module 3) . . . . . . . . . . . . . . . . . . . . . 353.3.4 Estimating a T2 distribution with the regularized NNLS(Module 4) . . . . . . . . . . . . . . . . . . . . . . . . 353.3.5 Computing a non-local spatial prior (Module 5) . . . . 363.3.6 Creating multiple instances . . . . . . . . . . . . . . . 363.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.1 Non-local Spatial Regularization . . . . . . . . . . . . . . . . . 414.2 Fast Myelin Water Measurement using Multicore Architectures 42Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44viList of TablesTable 2.1 Coefficient of variation measured on four healthy subjectsscanned at four time points for each method. . . . . . . . . 23Table 2.2 Overlapping coefficients and p-values for distinguishing be-tween 18 healthy subjects and 17 MS patients. . . . . . . . 25Table 3.1 Multicore architecture and C++ libraries used for imple-menting our T2 decay analysis algorithm . . . . . . . . . . 31Table 3.2 Runtime comparisons of our implementation and the paral-lelized MATLAB implementation for each algorithmic com-ponent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Table 3.3 Runtime comparisons of our implementation and the par-allelized MATLAB implementation for a full scan of T2images with 32 echoes and a batch processing of 10 scansof T2 images with 32 echoes . . . . . . . . . . . . . . . . . . 39Table 3.4 Mean myelin water fraction and standard deviation com-parisons of our implementation and the parallelized MAT-LAB implementation . . . . . . . . . . . . . . . . . . . . . 40viiList of FiguresFigure 1.1 The CNS myelin sheath surrounding an axon . . . . . . . 2Figure 1.2 T2 decay curve and T2 distribution from human white matter 7Figure 1.3 An example of noisy myelin water fraction map. . . . . . . 9Figure 2.1 Plot showing the effect of changes in non-local spatialregularization parameter values on the longitudinal repro-ducibility with the Tikhonov regularization. . . . . . . . . 18Figure 2.2 Plot showing the effect of changes in non-local spatialregularization parameter values on the longitudinal repro-ducibility without the Tikhonov regularization. . . . . . . 19Figure 2.3 Visual comparison of myelin water maps produced by dif-ferent regularization methods. The Tikhonov regulariza-tion was used in combination. . . . . . . . . . . . . . . . . 21Figure 2.4 Visual comparison of myelin water maps produced by dif-ferent regularization methods. The Tikhonov regulariza-tion was not used in combination. . . . . . . . . . . . . . . 22Figure 2.5 Visual comparison of a myelin water fraction map andstructural MRI images . . . . . . . . . . . . . . . . . . . . 26Figure 2.6 Mean MWF values produced by different regularizationmethods from 18 healthy subjects and 17 MS patients. . . 27viiiGlossaryRF radiofrequencyMWF myelin water fractionCNS central nervous systemsMS multiple sclerosisWM white matterGM gray matterCSF cerebrospinal fluidNAWM normal appearing white matterNNLS non-negative least squaresrNNLS regularized NNLSsrNNLS spatially regularized NNLSsNNLS spatially regularized NNLS without the TikhonovregularizationnlsrNNLS non-local spatially regularized NNLSnlNNLS non-local spatially regularized NNLS without the TikhonovregularizationixSKL symmetric Kullback-LeiblerCOV coefficient of variationOVL overlapping coefficientMWI myelin water imagingPCT parallel computing toolboxMPSV magnetization phase state vectorFLAIR fluid attenuated inversion recoveryGPU graphics processing unitsAPI application programming interfaceMKL math kernel libraryOpenMP open message passingBLAS basic linear algebra sub-programsLAPACK linear algebra packageCUDA compute unified device architectureROI region of interestSD standard deviationFA flip angleHMC++ hybrid multicore C++xAcknowledgmentsThis thesis would not have been possible without the help and support ofmany people. Most of all, I would like to thank my supervisor Roger Tamfor his invaluable guidance and giving me the freedom to explore my ownideas. I would also like to thank many fellows at UBC MS/MRI ResearchGroup for discussing ideas, answering questions, and providing feedback onmy research: Andrew Riddehough, David Li and Tony Traboulsee. Thewonderful support from Ken Bigelow and Vilmos Soti at UBC MS/MRIResearch Group during my MASc study has allowed me to complete my worktoward MASc. Without the MS/MRI Research Group, this thesis wouldnot exist. Thanks! My thanks to Irene Vavasour and Thomas Prasloskifor providing the reference MATLAB implementation code and the myelinwater imaging (MWI) data, and Alex MacKay and Corree Loule for providingliteratures on the MWI technique and productive suggestions. I am also verythankful to my thesis examiners, Dr. Purang Abolmaesumi and Dr. RafeefAbugharbieh for their encouragement and insightful comments. Tom Broschat UBC MS/MRI Research Group was a great help for me to implementmy work using multicore CPUs and GPUs. Saurabh Garg and ChungfangWang were my invaluable colleagues in our lab who were always willing tohelp me and made my lab life very exciting. Finally, I would really like tothank my family including my mother and father, and especially my lovelywife Heeyeon Choi and my adorable daughter Ahrin Yoo for their tirelesslove and support.xiChapter 1Introduction1.1 Myelin Structure and FunctionMyelin is a lipid-protein lamellar membranous structure enveloping axonsin both the central and peripheral nervous system of vertebrates [1, 2]. Inthe central nervous systems (CNS), most myelin is found in white matter(WM) and a small amount of myelin is present in gray matter (GM). TheCNS myelin is tightly made up of oligodendrocyte cell membranes, which arecovered around the axon in a concentric lamellar fashion (Figure 1.1).Myelin is a key component in the CNS because myelin acts as an electricalinsulator for neurons [3]. Myelin increases the speed of action potentialtransmission by 10 to 100 times compared to that along an unmyelinatedaxon. The action potential moves from one node of Ranvier to another(Figure 1.1), in which the internodal myelin acts as an insulator of highelectrical resistance and low conductance. The resulting saltatory conductionof the action potential is much faster than the continuous conduction bysodium channels evenly distributed along the unmyelinated axon. Myelin isof critical importance because speed of conduction is fundamental in allowingcomplex motor, sensory, and behavior of neuronal functions to occur. Alongwith its conduction function, myelin has also been implicated in regulating11.2. MR Techniques Used to Measure MyelinFigure 1.1: The CNS myelin sheath surrounding an axon. Image cour-tesy of Dr. Corree Laule and Dr. Alex MacKay, the Universityof British Columbiaaxonal transport, maintaining axonal integrity, altering pH, and regulatingfluid volume and ion composition. This highlights the intimate associationbetween myelin and the underlying axon [3].1.2 MR Techniques Used to Measure MyelinMagnetic resonance imaging (MRI) uses a strong main magnetic field, B0,to align the protons in human body parallel and anti-parallel to the mainfield. The signal detecting device of MRI cannot be collimated to restrictthe signal to a specific location, which means that the MR signal originates21.2. MR Techniques Used to Measure Myelinfrom the entire object rather than a single point within it [4]. Thus, to formMR images it is necessary to determine the location of the signal throughsome form of encoding scheme. A set of smaller magnetic field gradients,along X, Y, and Z axises, are used to spatially encode the protons and aradiofrequency (RF) coil is used to excite the sample and to receive RFsignal from the protons. The spatially encoded RF signal is reconstructedinto an image. The signal intensity of each voxel in the image is a functionof the number of protons in the voxel, the spin-spin relaxation time (T2) andthe spin-lattice relaxation time (T1).It is difficult to directly measure myelin because its signal decay is ex-tremely short and the MR signal of myelin is indistinguishable from other MRsignal of CNS tissue [5]. Ultrashort echo time imaging [6] and P spectroscopy[7] have been studied for directly imaging myelin in vivo, however much moreresearch is required before these techniques will be used for clinical applica-tion. Practically, MR signal of myelin is obtained by indirect means suchas conventional T1 or T2 weighted imaging, spectroscopy, diffusion tensorimaging and separation of T2 relaxation components [3].Conventional T1-weighted and T2-weighted images provide an excellentsoft tissue contrast. They are extensively used for studying myelin develop-ment of newborns because it is known that the only nonaqueous tissue addedin early brain development is myelin. However, they have limited applicationin measuring myelin for subjects older than 2 years of age. Unfortunately,there is not a fixed relationship between T1 or T2-weighting and myelination.The limited relationship between T1 or T2-weighting and myelination causesa relatively low specificity on myelin because many brain pathologies giverise to similar bright lesions.MR spectroscopy is a non-invasive analytical technique that has been usedto investigate metabolic changes in diseases affecting the brain. It potentiallyprovides some indirect information associated with myelin loss or myelinlipids. However, intact myelin can not be detected using it [3].31.3. Tissue RelaxationDiffusion tensor imaging (DTI) enables the measurement of the restricteddiffusion of water in tissue in order to produce neural tract images which pro-vide the kinetics of water molecules. It is known that myelin is not correlatedwith the existence of anisotropy in water diffusion in nerves [8]. However, in[9, 10], loss of myelin causes a change in fractional anisotropy. Thus, DTIcan be used to collect some information on changes in myelination. However,since the diffusion tensor components rely on a different orientation of thediffusion sensitizing gradients within the imaging voxel, they are not capableof reliably measuring absolute myelination.Magnetization transfer ratio (MTR) is used to detect the exchange ofmagnetization between nonaqueous tissue and water [11]. MTR has beendesigned to extract tissue parameters and is a very sensitive measure of tissuedamage. Also, it has been shown that MTR is affected by inflammation [12].Since MTR is influenced by many different processes such as tissue damageand inflammation, it is unlikely to be confirmed that MTR is a valid specificmeasure of myelination.In the opinion of the authors of [3], the short T2 component from waterin myelin provides the most specific measure of CNS myelination. This wasbest demonstrated by the experiment in [13], which showed linear relationsbetween myelin water fraction (MWF), which will be described in Section 1.4,and a histological measure of myelination. The MWF can be measured fromT2 relaxation and we present an expanded description on T2 relaxation inthe following sections.1.3 Tissue RelaxationT2 relaxation analysis of frog sciatic nerve was performed in 1978 by Vasilescuet al. [14]. They found three distinct T2 water compartments: a slowlyrelaxing compartment, T2 ? 300? 500 ms, attributed to water in the intra-cellular space, an intermediate relaxing component, T2 ? 80 ms, ascribedto axoplasmic water, and a fast relaxing component, T2 ? 20 ms, ascribed41.4. Myelin Water Imagingto water closely associated with proteins and phospholipids. It was the firstpaper to describe multi-exponential behavior of T2 water compartments inmyelinated nerves [15].In 1986, Kroeker et al. [16] investigated the problem of analyzing bio-logical relaxation times to allow the data determine the appropriate model.It was assumed that the data can be described by an integral of weightedexponents in their work. They used a curve fitting algorithm to solve integralequations for multi-exponential data [15].Menon et al. [17] showed that the short T2 peak (T2 ? 12 ms) wasattributed to water trapped between the myelin bilayers using cat brainsin 1991. Four distinct components were found by T2 decay curves fittingusing non-negative least squares (NNLS), acquired from myelinated cat brainnerves. The four components were: component near T2 = 1 ms attributed tophospholipid protons in the tissue, component near T2 = 12.7 ms attributedto water in the myelin layers, component near T2 = 89 ms attributed tointra and extra-cellular water, and a component not attributed to anythingat T2 = 340 ms. The short component, T2 = 12.7 ms, contributed 6.8% tothe total water in the T2 distributions.The first in vivo evidence that confirmed a supposition of a short T2component was shown by MacKay and his colleagues. In vivo T2 relaxationmeasurements [18, 19] distinguished three water compartments in the brain:water trapped between the myelin bilayers (15 ? T2 ? 40 ms), intra/extraaxonal water (T2 ? 80 ms) and water associated with cerebrospinal fluid(CSF) (T2 ? 2000 ms).1.4 Myelin Water ImagingQuantitative MRI techniques that are sensitive to specific tissue constituentsare important in clinical research. Multi-echo T2 relaxation is a quanti-tative MRI application used for the measurement of myelin content andintegrity. Myelin is a key component of the peripheral and CNS as de-51.4. Myelin Water Imagingscribed in Section 1.1, and measuring the amount of intact myelin accuratelyis of fundamental importance because demyelination, or loss of myelin, isa central pathological feature of a number of neuro-degenerative diseasessuch as multiple sclerosis (MS). For example, normal appearing white mat-ter (NAWM) (on standard conventional MRI scans) in MS has been observedto have less myelin than WM in control subjects, suggesting that myelin lossis a dominant feature of NAWM pathology in MS [20]. T2 relaxation ormyelin water imaging (MWI) [18] produces a normalized measure of myelincalled the MWF, defined as the ratio of the amount of water trapped withinmyelin to the total amount of water, which includes the intra- and extra-cellular water. Laule et al. have validated MWF with histopathology [13].They determined the correlation of MWF with quantitative histopathologicmeasures of myelin density. It was shown that MWF had a strong correlationwith myelin stain, validating the MR derived measure as a good indicator ofmyelin density.As described above, CNS tissue has multiple water components (Fig-ure 1.2), one of which is the water trapped within the myelin, the otherbeing the intracellular and extracellular water. The MR signal from thebrain is almost entirely from water and each water component has its ownrange of T2 relaxation values, and one of the main methods of MWF quan-tification is to acquire an MR signal at many echo times (32 echoes areknown to be appropriate to reproduce the two main T2 components) to pro-duce a T2 decay curve at each voxel using a multi-echo MRI sequence [18].The decay curves are then post-processed to produce T2 distributions fromwhich water content within a given range of T2 values can be derived. SinceCNS tissue actually has multiple components at the resolution of MRI (? 1mm in-plane resolution), T2 relaxation analysis in brain is complicated. Forreliable T2 estimations in the presence of multi-component T2 relaxation,multi-exponential relaxation decay analyses are required [21]. Fitting multi-exponential T2 relaxation decay curves can be used to estimate the amount61.5. Two Key Challenges in T2 Decay AnalysisFigure 1.2: (A) T2 decay curve and (B) T2 distribution from humanwhite matter. Inset in (B) shows a cross-section through an axonwith the locations of intra/extracelluar water and myelin water.Image courtesy of Dr. Corree Laule and Dr. Alex MacKay, theUniversity of British Columbiaof intra- and extracellular water and MWF in CNS tissue due to water com-partmentalization as shown in Figure 1.2.1.5 Two Key Challenges in T2 DecayAnalysisMany algorithms [22?29] convert T2 relaxation decay curves into exponen-tial components. The most established method of analyzing relaxation decaycurves is to use common least-squares algorithms. Whittall and MacKay pro-posed a multi-exponential decay analysis for MRI [18, 22] based on the NNLSalgorithm of Lawson and Hanson [30], which inverts a relaxation decay curvewith a relaxation time distribution. The analysis algorithm is performed tomeasure the amount of myelin at each voxel. For given decay curves, calcu-lating many of these decay curves over a range of logarithmically spaced T271.6. Hypotheses and Objectivesvalues (40 is known to be sufficient [19, 22]) ranging from 0.015 to 2 seconds[22] yields an objective function for the NNLS fit. Solving NNLS with 32echo times and 40 T2 values is an under-determined system (32 linear equa-tions and 40 unknown variables) and thus is an ill-posed problem. It is veryunstable especially in the presence of noise, which is a common situation inT2 decay curve acquisition using MRI. This results in limited reproducibilityof the analysis algorithm, which remains as one of the main obstacles in thepractical application of MWI. Instability in a T2 decay analysis algorithm us-ing NNLS can cause confusion in the analysis and interpretation of T2 decaycurves. Combined with noise and artifacts from T2 decay curve acquisition,the instability results in noisy MWF maps. Figure 1.3 shows an exampleof such noisy myelin measurements. In order to use the MWI technique inclinical applications, it would be crucial to improve the reproducibility, theone of the most important challenges in MWI techniques.Another challenge in T2 decay analysis is its long processing time. For ex-ample, our single-threaded reference MATLAB implementation of a currentalgorithm [31] takes over 2.5 hours to process a single set of T2 images with32 echoes and dimensions of 256 ? 256 ? 7 voxels using a 2.4 GHz proces-sor. T2 decay analysis is computationally demanding which has hamperedresearchers? efforts to develop advanced regularization methods [32, 33] oruse new 3D acquisition methods that provide much greater anatomical cov-erage [34].1.6 Hypotheses and ObjectivesThe two main challenges, limited reproducibility and high computationaldemands, are faced in T2 decay curve analysis as described in Section 1.5.Those challenges have been major obstacles of the MWI technique to beapplied in clinical studies, although it has been demonstrated that the tech-nique is very promising for improving our understanding of a number ofneuro-degenerative diseases such as MS. Our primary research goal is to81.6. Hypotheses and ObjectivesHealthy subject MS patient NNLS sNNLS nlNNLS 00.050.10.150.20.25Healthy subject MS patient NNLS sNNLS nlNNLS 00.050.10.150.20.25Figure 1.3: An example of noisy MWF map to show an instability in aT2 decay curve analysis. The MWF map was produced withoutincorporating regularization.overcome the limited reproducibility in T2 decay curve analysis. Our hypoth-esis is that the improved regularization method for T2 decay curve analysiswill provide greater reproducibility, robustness and sensitivity to MS-inducedchanges than current methods. Our second research objective is to reducethe processing time in T2 decay curve analysis algorithm. In general, paral-lelization through multicore graphics processing units (GPU)s yields betterperformance over parallelization through multicore CPUs, but GPU imple-mentation is sometimes not feasible for small matrices manipulation andrecursive computation. Therefore, we hypothesize that a systematic analy-sis on T2 decay curve analysis algorithm will provide efficient parallelizationapproaches such as a hybrid approach of multicore CPUs and GPUs, which91.7. Overview of Thesiswould eventually deliver processing speed improvement over current meth-ods.1.7 Overview of ThesisThis thesis focuses on the two key challenges in T2 decay analysis as describedabove: instability and long processing speed.First, Chapter 2 presents a novel non-local spatial regularization methodfor T2 decay analysis. Our spatial regularization does not rely on a fixed lo-cal brain region. In order to exploit the global information in brain withoutlosing important details, we construct T2 distribution vectors using neighbor-hood T2 distributions and each voxel is adaptively regularized using the sim-ilarity between T2 distribution vectors, which is estimated by the symmetricKullback-Leibler (SKL) divergence. For validation, six different regulariza-tion methods are compared using three criteria: visual assessment, repro-ducibility as measured by longitudinal coefficient of variation (COV) and theability to distinguish between the myelin maps of healthy subjects and MSpatients.Second, in Chapter 3, we propose a hybrid utilization of multicore CPUsand GPU using C++ programming language for a T2 decay analysis algo-rithm. To optimize overall processing speed, we carefully analyze algorith-mic components of a recently published T2 decay analysis algorithm [31]. Wecompared our implementation with a parallelized MATLAB implementation,which is the most common platform in the MWI techniques [31, 33, 35, 36].10Chapter 2Non-Local SpatialRegularization2.1 IntroductionAs described in Chapter 1, myelin is an essential component of nerve fibersand monitoring its health is important for studying diseases that attackmyelin, such as MS. The amount of water trapped within myelin, whichis a surrogate for myelin content and integrity, can be measured in vivo us-ing MRI relaxation techniques that acquire a series of images at multiple echotimes to produce a T2 decay curve at each voxel. These curves are then ana-lyzed, most commonly using NNLS fitting, to produce T2 distributions fromwhich water measurements are made. Although multi-echo T2 relaxation isa strong candidate for monitoring demyelination and remyelination in MSpatients, one of its current main drawbacks is limited reproducibility. One ofthe main sources of variability is the instability of the NNLS algorithm usedto fit a T2 decay curve with a T2 distribution. This causes the technique towork well only on images with a very low level of noise; unfortunately, thisis often not a practical assumption as T2 relaxation data tend to be noisy.NNLS is unstable with respect to the noise and variations found in typical112.1. IntroductionT2 relaxation images, making some form of regularization inevitable.A number of methods have been proposed to improve the robustnessof MWF computation. Applying noise filtering algorithms after NNLS wasstudied in [37] and a number of recent methods [38?40] precede NNLS withprocessing of the multi-echo images using various denoising strategies, includ-ing anisotropic filtering, temporal filtering, and spatially independent meth-ods applied to smooth the signal across the acquired echoes. Regularizationapproaches for NNLS have been also studied, ranging from the conventionalTikhonov regularization [22, 41], which is non-spatial, to smoothing methodsthat work by averaging the neighboring spectra to constrain the NNLS of agiven voxel [32]. While the spatial regularization methods have been shownto improve MWF reproducibility, the current approaches only use a small lo-cal neighborhood for each voxel, and do not take advantage of more distantneighbors that may have similar T2 distributions. In addition, these methodsassume that neighboring voxels always have similar T2 distributions, whichis not the case in areas with fine details, such as the WM structures near thecortex and the ventricles of the brain, and such details can be lost duringregularization. The current methods of NNLS regularization for measuringmyelin water have two key limitations: 1) they use strictly local neighbor-hood information to regularize each voxel, which limits their effectivenessfor very noisy images, and 2) the neighbors of each voxel contribute to itsregularization equally, which can over-smooth fine details.To overcome these limitations, we propose a new regularization algorithmin which local and non-local information is gathered and used adaptively foreach voxel. We present a spatial regularization method that reduces globalvariability by using information from non-local voxels but also improves finedetails by using weighted averaging that avoids combining T2 distributionsthat are very different. The algorithm starts with an initial T2 distribu-tion estimate using the conventional Tikhonov regularized NNLS (rNNLS)algorithm. These distributions are then used in a non-local means algo-122.2. T2 Decay Analysisrithm [42], using the SKL divergence [43] to quantify the similarity betweendistributions, to compute the priors for a spatially regularized NNLS, whichproduces the final T2 distribution for each voxel. We show that by incorpo-rating non-local means for T2 distributions into a spatially regularized NNLSframework, an accurate and robust MWF map can be produced. We demon-strate on real MRIs that the method improves the consistency of MWF mea-surements and the ability to distinguish between MS patients and healthysubjects.2.2 T2 Decay AnalysisA set of exponential basis functions characterizes the measured signal yiyi =M?j=1sje?ti/T2j , i = 1, 2, . . . , N, (2.1)where ti is the measurement time, M is the number of logarithmically spacedT2 decay times, N represents the total number of data points (number ofacquired T2 echoes, 32 in our data), and sj is the relative amplitude for eachpartitioned T2 time, T2j. The set of sj represents the T2 distribution forwhich we need to solve. Equation 2.1 can be written in the general matrixform such thatyi =M?j=1aijsj, i = 1, 2, . . . , N, (2.2)where A = [ai,j]N?M . The NNLS algorithm [30] is used to minimize?2min = mins?0??N?i=1?????M?j=1aijsj ? yi?????2?? . (2.3)Extra constraints can be incorporated into Equation 2.3 to provide morerobust fits in the presence of noise [22], using the additional term on the132.3. Methodright.?2r = mins?0??N?i=1?????M?j=1aijsj ? yi?????2+ ?K?k=1?????M?j=1hkjsj ? fk?????2?? (2.4)where H = [hk,j]K?M is a matrix representing K additional constraints, andf is the corresponding vector of right-hand side values. With a given targetvalue for the misfit ?2r , the parameter ? is automatically determined by thegeneralized cross-validation approach [44]. The regularization parameter ?is selected such that ?2r in Equation 2.4 is equal to 1.02?2min where ?2min isthe unregularized minimum misfit found in Equation 2.3. The constant mul-tiplier is set to 1.02 incurring a 2% increase in the misfit, to be consistentwith previously reported works using the Tikhonov regularization [34, 37, 45].From Equation 2.4, the T2 distribution can be estimated as the set of sj forthe decay times T2j. If H is the identity matrix and the vector f is set to 0,Equation 2.4 is equivalent to the conventional Tikhonov regularization. TheTikhonov regularization works as a L2 norm regularizer which can producesmoother T2 distributions. If the vector f is not set to zero, then the regu-larizer tries to produce a T2 distribution whose shape is similar to the vector f.2.3 MethodIn the closest related work [32] to our proposed method, the T2 distributionprior f in Equation 2.4 is estimated by averaging the spectra of nine neigh-boring voxels of each voxel. Although spatial regularization is achieved, finedetails can be smoothed over during NNLS with this method, because theneighbors contributions are weighted equally without regard for how differ-ent their T2 distributions are to that of the voxel to be regularized. Wecompute an initial T2 distribution for each voxel using rNNLS as describedabove. The regularization parameter ? is selected such that ?2r in Equa-142.3. Methodtion 2.4 is equal to 1.02?2min where ?2min is the unregularized minimum misfitfound in Equation 2.3. The constant multiplier is set to 1.02 to be consistentwith previously reported works using Tikhonov regularization [32, 45] and isthe only value used for rNNLS in this paper. Given initial T2 distributionss = {s(i) | i ? I} for an image I, a T2 distribution prior f(i), for a voxel po-sition i, is computed using the non-local means algorithm [42] as a weightedaverage of all the T2 distributions in the image,fk(i) =?j?Iw(i, j)sk(j), ?k ? {1, 2, . . . , K}, (2.5)where the weight w(i, j) depends on the similarity between the distributionsat voxel positions i and j, and satisfies the usual conditions 0 ? w ? 1and?j w(i, j) = 1. K is set to be the number of logarithmically spaced T2decay times. The similarity between the two distributions at i and j dependson the similarity of the T2 distribution vectors v(Di) and v(Dj), where Dcdenotes a square neighborhood of fixed size D and centered at voxel positionc. By treating each T2 distribution in a voxel as a probability distribution,the similarity between the T2 distribution vectors v(Di) and v(Dj) can bemeasured by the SKL divergence [43] as followsE(v(Di), v(Dj)) =1V???v(Di)%?v(Dj)SKLdiv(s(?) ? s(%)) (2.6)where V is the number of voxels in a square neighborhood, s(?) and s(%) arenormalized versions of the distributions, and the SKL divergence is defined asSKLdiv(s(?) ? s(%)) =K?k=1(sk(?) log[sk(?)sk(%)]+ sk(%) log[sk(%)sk(?)]). (2.7)152.3. MethodThe weight w used in Equation 2.5 is defined asw(i, j) =1Z(i)exp(?(E(v(Di), v(Dj))h)2), (2.8)where Z(i) is the normalizing constantZ(i) =?j?Iexp(?(E(v(Di), v(Dj))h)2)(2.9)and the parameter h controls the degree of filtering.After computation of the spatial T2 distribution prior f, the final regu-larized T2 distribution is estimated by the NNLS algorithm in Equation 2.4.We use an identity matrix for H. Using these T2 distribution, MWF for eachvoxel is defined as the percentage of signal having a T2 between 15 and 40ms. To vary the degree of regularization, the target value for the misfit ?2r isadjusted relative to ?2min in Equation 2.3 by a multiplier ?, which in turns de-termines ?. Since the parameters largely depend on noise level, we randomlyselected one subject from our data sets for tuning, which should not bias theoverall results. We empirically select the parameters ? and h to produce themost robust MWF maps, as measured by the COV which is one of the mea-sures used to evaluate the reproducibility. In order to observe how the bothparameters affect the COV in MWF maps, we measure COVs for various pa-rameter values. Figure 2.1 shows COV values with various parameters usingour proposed regularization with the Tikhonov regularization for generatinginitial T2 distributions and Figure 2.2 shows COV values using our proposedregularization without the Tikhonov regularization for generating initial T2distributions. The data set used here to evaluate the reproducibility is ex-plained in Section 2.4.2. We measure COV for both with and without theTikhonov regularization to see the effect of the Tikhonov regularization onour method. As shown in Figure 2.1 and Figure 2.2, higher ? and h gen-162.4. Resultserally produce lower COV, but it is important to note that too high valuescan cause over-regularization. We have observed that COV doesn?t signifi-cantly improve as h increases beyond 1.5 and ? increases beyond 1.2 usingour proposed regularization with the Tikhonov regularization. Without theTikhonov regularization, we found that ? = 1.2 and h = 4.5 are reasonableparameters. These parameters are used for the experiments in Section 3.4.Due to the high noise level in an MWF map produced without the Tikhonovregularization, a higher value of h is required. For practical purposes, we usea fixed search window of 21? 21 and set the neighborhood size for comput-ing the SKL divergence to 7 ? 7, as this has been shown to be effective fornon-local denoising of other types of images [42] and structural MRI images[46]. Due to the strong anisotropy of the test data (described in Section 2.4),we apply the regularization in 2D on a slice-by-slice basis. All methods areimplemented in C++ with multicore CPUs and GPUs. The implementationdetails and processing speed comparisons are discussed in Chapter 3.2.4 ResultsIn this section, we compared six methods of NNLS regularization for MWFcomputation: NNLS, NNLS with the basic Tikhonov regularization (rNNLS),the spatially regularized NNLS without the Tikhonov regularization (sNNLS)presented in [32], the spatially regularized NNLS (srNNLS) presented in [32],our proposed non-local spatially regularized NNLS without the Tikhonovregularization (nlNNLS) and our proposed non-local spatially regularizedNNLS (nlsrNNLS). The methods were evaluated under three criteria: thevisual quality of MWF maps, the reproducibility measured by COV withinthe 11 region of interest (ROI)s of healthy subjects, and the ability to dis-tinguish between healthy subjects and MS patients as measured by p-valuesfrom Student?s t-test and overlapping coefficient (OVL). All experimentswere performed on the WM regions of each scan only. The T2 relaxationsequence for measuring COV was a 32 echo 3D-GRASE (combined gradient172.4. Results0.511.522.533.511.11.21.31.41.5102030 hd CoV (%)1617181920212223242526Figure 2.1: COV (%) plot to show the effect of changes in our regu-larization parameter values on the longitudinal reproducibility,which is estimated with various values of ? and h using our pro-posed regularization with the Tikhonov regularization. COVstabilizes at ? = 1.2 and h = 1.5.echo and spin echo) [34]. The scan parameters were TR = 1200ms, TE =10ms ? 320ms, echo spacing 10ms, 20 slices acquired at 5 mm slice thickness,and 40 slices reconstructed at 2.5 mm slice thickness. The T2 relaxation se-quence for evaluating the ability to distinguish between the two groups wasspin-spin 32 echo sequence (TR = 1200ms, TE = 10ms ? 320ms, echo spac-ing 10ms and seven slices acquired at 5 mm slice thickness). All images wereacquired on a Philips Achieva 3T scanner. All examinations were performedwith approval from our institution?s ethical review board. To measure COV,182.4. Results23456711.11.21.31.41.510152025 hd CoV (%)131415161718192021Figure 2.2: COV (%) plot to show the effect of changes in our regu-larization parameter values on the longitudinal reproducibility,estimated with various values of ? and h using our proposedregularization without Tikhonov regularization. COV stabilizesat ? = 1.2 and h = 4.5.the MRIs of four healthy subjects at four time points were used. For evaluat-ing the ability to distinguish between the two groups, NAWM segmentationmasks were created from registered T1-weighted images and fluid attenuatedinversion recovery (FLAIR) using the BET and FAST software [47, 48]. Tomeasure p-values and OVLs, the scans of 18 healthy subjects and 17 MSpatients were used. Each subject was scanned at month 0 (baseline) andmonth 6.192.4. Results2.4.1 Visual assessmentFigure 2.3 and Figure 2.4 show a comparison of the visual quality in theMWF maps of a healthy subject and an MS patient. Both spatial regulariza-tion methods appeared to reduce spatial variability and noise, but our spatialregularization produced greater spatial coherence overall and was clearly bet-ter than both other methods at reconstructing the finer details in the thinWM structures near the cortex and the ventricles of the brain. The MSlesions in the MWF map of the MS patient produced by our regularizationwere more clearly visible and had more distinct boundaries. By comparingbetween Figure 2.3 and Figure 2.4, it was observed that the MWF maps esti-mated with the Tikhonov regularization were less noisy and more consistent,and produced lower MWF values due to the underestimation effect of theTikhonov regularization. We visually compared the MWF map produced byour method with other conventional structural MRIs in order to see if theyagreed with the variations seen in our regularized MWF map. To confirmthat MS lesions shown in the MWF map had a good correspondence to thelesions shown in conventional structural MRI images, we compared thoseimages in Figure 2.5. It was shown that the lesions observed in structuralimages such as T1/T2 weighted images and FLAIR were also found in theMWF map.2.4.2 Coefficient of variationIn the data set used in this section, 4 healthy subjects were scanned at 4time points (baseline, 72 hours, 2 weeks and 2 months). The COVs within11 ROIs were measured and averaged for each subject. The 11 ROIs in ahuman brain are shown in Table 3.4. The number of voxels for each ROIranges from several voxels to several hundreds of voxels depending on localbrain structures. The COV is a measure of the dispersion of data pointsin a data series around the mean. Generally, the COV is calculated bydividing the standard deviation of a number of measurements by the mean202.4. ResultsHealthy subject MS patient rNNLS srNNLS nlsrNNLS 00.050.10.150.20.25Figure 2.3: Visual comparison of MWF maps produced by rNNLS [22](left), srNNLS [32] (middle) and nlsrNNLS (right) algorithms inWM for healthy subject (top) and MS patient (bottom). ThenlsrNNLS shows an improved visibility of the MWF maps withreduced noise and sharp details preserved.and multiplying by 100%, but it can be biased to lower values when thesample size is small [49]. To correct this, the sample COV can be multipliedby (1 + 1/(4 ? n)), where n is the number of points used to calculate COV.In this case n = 4, so COVs were corrected by multiplying by 1.06.We compared the reproducibility of each regularization method by evalu-ating COV. Table 2.1 shows the COV comparison for different regularizationmethods. In overall, nlNNLS showed the lowest mean COV (COV=14.9%).212.4. ResultsHealthy subject MS patient NNLS sNNLS nlNNLS 00.050.10.150.20.25Figure 2.4: Visual comparison of MWF maps produced by NNLS [22](left), sNNLS [32] (middle) and nlNNLS (right) algorithms inWM for healthy subject (top) and MS patient (bottom). ThenlNNLS shows an improved visibility of the MWF maps withreduced noise and sharp details preserved.Although we failed to show statistically significant results (p > 0.05) forgroup differences between sNNLS and nlNNLS, and between srNNLS andnlsrNNLS due to the small number of subjects and the large variation insubjects (especially subject 2 has very high COV), the result suggested thatthe non-local spatial regularization appeared to have a potential to improvescan-rescan reproducibility of MWF computation.222.4. ResultsTable 2.1: COV (%) measured on four healthy subjects scanned at fourtime points (baseline, 72 hours, two weeks and two months) foreach method. The nlNNLS produces the lowest COV value over-all. The p-values are computed by a two-sided Mann-WhitneyU-test between sNNLS and nlNNLS, and between srNNLS andnlsrNNLS.Subject 1 Subject 2 Subject 3 Subject 4 Mean p-valueNNLS [22] 22.66 40.28 18.91 17.07 24.73 -sNNLS [32] 21.35 33.05 17.54 16.01 21.990.1143nlNNLS 14.75 17.94 13.49 13.43 14.90rNNLS [22] 26.31 39.85 17.95 20.29 26.10 -srNNLS [32] 25.01 44.19 17.62 18.38 26.300.2000nlsrNNLS 17.71 24.84 13.61 14.91 17.772.4.3 Student?s t-test and overlapping coefficientThe data for 18 healthy subjects and 17 MS patients were used to comparethe ability of the three regularization methods to distinguish between thesetwo groups. We note that this data set is different from the one used to findout the parameters for our method in Section 2.3, thus the parameters usedhere should not bias the overall results in this section. The six mean MWFvalues computed from each group are shown in Figure 2.6. The mean MWFvalues from the MS patients were lower for all six regularization methods thanthose from the healthy subjects at both month 0 and month 6. However, themean MWF difference between the healthy subjects and the MS patientsin the MWF maps produced by srNNLS at month 0 was not statisticallysignificant (p > 0.05). The p-values computed using t-tests between the MSpatients and healthy subjects are shown in Table 2.2. The nlNNLS methodyielded the lowest p-values (5.75?10?4 at month 0 and 1.33?10?4 at month232.4. Results6) in this data set, which suggest it can produce MWF maps that are moresensitive to the differences between MS and healthy brains. To confirm thisresult, we constructed normal distributions for each group and computedthe OVL as a second measure of how well the patients and normals can bedistinguished. The results from this test for all six regularization methodsare summarized in Table 2.2. Again, the nlNNLS yielded the best results byproducing the lowest overlap values (0.5198 at month 0 and 0.4587 at month6). We also observed from Table 2.2 that srNNLS actually reduced the abilityto distinguish between healthy subjects and MS patients when comparedwith rNNLS. Our supposition on this observation is that rNNLS, which isused for obtaining a spatial prior of srNNLS, already reduces some amountof noise and spatial variations as shown in Figure 2.3 and Figure 2.4, thussrNNLS probably only makes a MWF map more blurred which would leadto aggravate the ability. NNLS, which is used for obtaining a spatial prior ofsNNLS without the Tikhonov regularization, doesn?t reduce noise and spatialvariations and this would be probably why sNNLS doesn?t appear to worsenthe ability to distinguish between healthy subjects and MS patients as inTable 2.2. On the other hand, our regularization preserves brain structureswhile reducing noise and spatial variations which would lead to improve theability to distinguish between healthy subjects and MS patients.We also observed that the MWF values were reduced by around 30%when the Tikhonov regularization was used, which was shown in Figure 2.3and Figure 2.4. Since the Tikhonov regularization produces a smoother T2distribution, some part of short T2 components can go beyond the definedshort T2 range (15 ? T2 ? 40 ms), which would cause the underestimationeffect.242.4. ResultsTable 2.2: Overlapping coefficients for distinguishing between 18healthy subjects and 17 MS patients computed using the MWFmaps produced by the six regularization algorithms. The p-valuesare computed by two-sample t-tests between healthy subjects andMS patients (*p<0.05; **p<0.001 (Bonferroni correction)). Theresult suggests that the mean MWF values produced by nlNNLSare most sensitive to distinguish the two groups.Month 0 Month 6OVL p-value OVL p-valueNNLS [22] 0.5907 3.2?10?3* 0.5450 1.2?10?3*sNNLS [32] 0.5904 3.2?10?3* 0.5452 1.2?10?3*nlNNLS 0.5198 5.7541?10?4** 0.4587 1.3309?10?4**rNNLS [22] 0.6915 0.0256* 0.6276 7.2?10?3*srNNLS [32] 0.7393 0.0595 0.6655 0.0157*nlsrNNLS 0.5419 1.0?10?3* 0.5138 5.7159?10?4**252.4. ResultsMWF (WM) 3D T1 FLAIR WM mask T2 (TE = 50ms) T2 (TE = 150ms) MWF Figure 2.5: Visual comparison of MWF map and structural MRI im-ages such as T1-weighted, T2-weighted and FLAIR. Lesionsobserved in structural images are also found in the MWF map.262.4. ResultsNNLS sNNLS nlNNLS rNNLS srNNLS nlsrNNLS0.0310.0460.0610.0760.0920.1070.1220.1370.153Myelin Water Fraction (%)Month 0 patientcontrolNNLS sNNLS nlNNLS rNNLS srNNLS nlsrNNLS0.0310.0460.0610.0760.0920.1070.1220.1370.153Myelin Water Fraction (%)Month 6 patientcontrolNNLS sNNLS nlNNLS rNNLS srNNLS nlsrNNLS0.0310.0460.0610.0760.0920.1070.1220.1370.153Myelin Water Fraction (%)Month 0 patientcontrolNNLS sNNLS nlNNLS rNNLS srNNLS nlsrNNLS0.0310.0460.0610.0760.0920.1070.1220.1370.153Myelin Water Fraction (%)Month 6 patientcontrolFigure 2.6: Mean MWF values produced by the six regularization algo-rithms from 18 healthy subjects and 17 MS patients at baseline(top) and month 6 (bottom). Error bars are standard errors.The nlNNLS shows the most clear difference between the twogroups.27Chapter 3Fast Myelin WaterMeasurement using MulticoreArchitectures3.1 IntroductionIn the most established approach of MWF measurement [18], a multi-echoMRI sequence is used to produce T2 decay curves, which are then convertedto T2 distributions that can be used to estimate the amount of water trappedwithin myelin as well as the intra- and extra-cellular water. CNS tissue isinhomogeneous and has multiple components. Multi-echo T2 relaxation de-rived MWF computation requires multi-exponential relaxation decay analysis[21], which requires solving an ill-posed optimization problem that is com-putationally demanding, even when applying only the most basic regulariza-tion. For example, our single-threaded reference MATLAB implementationof a current algorithm [31] with only Tikhonov regularization [22] takes over2.5 hours to process a single set of T2 images with 32 echoes and dimen-sions of 256 ? 256 ? 7 voxels using a 2.4 GHz processor. This problem isgreatly increased when using new 3D acquisition methods that produce up283.1. Introductionto 40 slices for much greater anatomical coverage [34], or new regularizationmethods that use neighborhood information [32, 33]. An added complicationfor MWF measurements at magnetic field of 3T or higher is the fact thatthe shape of the T2 decay curve is corrupted by stimulated echo artifactswhich occur when the refocusing pulses differ from 180 degrees due to radiofrequency field homogeneity. Stimulated echo artifacts can be corrected bymaking use of the Extended Phase Algorithm [31, 50, 51], however this placesadditional demands on computation times. In order to make MWF analy-sis with these new methods practical for large clinical studies, some form ofacceleration is necessary.Most T2 relaxation data analyses are currently being done in MATLAB(e.g., [31, 33, 35, 36]), which offers a powerful software environment for al-gorithmic prototyping but carries a significant computational overhead. Anumber of MATLAB extension packages exist to improve efficiency, such asthe parallel computing toolbox (PCT), which can utilize multicore architec-tures such as multicore CPUs and, with more recent versions, GPUs withinone instance of MATLAB by spawning multiple workers. However, there existtechnical limitations within PCT?s functionality such that multiple workerscannot share a CPU core or GPU efficiently, likely because some of the keyresource allocation mechanisms are implemented at the application level toprovide ease of use, rather than being handled natively by the lower-level,architecture-aware layers. In addition, different MATLAB GPU packagescan vary substantially in their performance and functionality provided [52],and none can match the speed and flexibility of lower-level software libraries[53], the most developed of which is NVIDIAs compute unified device archi-tecture (CUDA) [54].Motivated by the above, in this chapter we discuss an optimized utiliza-tion of multicore CPUs and modern GPUs to improve efficiency of our T2decay analysis algorithm presented in Chapter 2. In order to do it, we havetaken the MATLAB (version 7.14 (R2012a)) implementation with PCT (ver-293.2. Flip Angle Optimizationsion 6.0) of a recently published MWF analysis algorithm [31] and convertedit to a hybrid multicore C++ (HMC++) implementation, combined withour spatial regularization, that exploit multicore CPU and GPU parallelismdepending on which architecture would be the most efficient for the particu-lar task. We have built this software on freely available (for non-commercialuse) parallel processing libraries to promote its use.3.2 Flip Angle OptimizationIn order to perform myelin measurement using Equation 2.4, we need tocompute the NNLS basis matrix A that contains the echo decay curves withthe optimized flip angle over T2 space for each voxel. Because of inhomo-geneities in the B1 transmit field, refocusing pulses are often significantlydifferent from 180?, which makes flip angle optimization necessary for eachvoxel [31, 50, 51]. Flip angle optimization is realized by the stimulated echocorrection algorithm [31]. A fixed number P (eight in our case) basis flipangle (FA)s that are evenly spaced between 50? and 180?, inclusive, are ini-tialized. For each FA, a transition matrix T = [ti,j]3N?3N and relaxationmatrix E = [ei,j]3N?3N are computed with the given interecho time, echotrain length, transverse relaxation time and longitudinal relaxation time.The echo decay curve is generated over T2 space by a flip and relaxationwith an initialized magnetization phase state vector (MPSV) g = [gi]3N?1 asshown in Algorithm 1. Additional theoretical background and mathematicaldetails are described in the work by Prasloski et al. [31]. The generated echodecay curves form the NNLS basis matrices {A1,A2, . . . ,AP}, one for eachFA. For each A, ?2min is computed using Equation 2.3. These P ?2min valuesare then interpolated with a B-spline kernel to a granularity corresponding toa change in flip angle of 0.001?. The flip angle corresponding to the smallestinterpolated ?2min value is used to create the NNLS basis matrix A used forcomputing the T2 distribution using Equation 2.4. The pseudo code of thecomplete algorithm (rNNLS) is presented in Algorithm 1.303.3. Implementation Using Multicore CPUs and GPUTable 3.1: Multicore architecture and C++ libraries used for imple-menting each module of our T2 decay analysis algorithm are sum-marized.Multicore Library/architecture platformModule 1 CPUs Intel MKL, OpenMPModule 2 GPUs CUDA CIModule 3 CPUs Intel MKLModule 4 CPUs Intel MKL, OpenMP, BoostModule 5 GPUs CUDA3.3 Implementation Using Multicore CPUsand GPUWe describe our implementation of the T2 decay analysis algorithm, whichwe have divided into five logical modules. The first two modules performthe FA optimization, the third module computes the NNLS basis matrix forthe optimal angle, and the fourth module computes the T2 distribution. Wealso describe our GPU-based implementation of the non-local spatial priorcomputation algorithm presented in Chapter 2 (the last module). Multicorearchitecture and C++ libraries used for each module are summarized in Ta-ble 3.1. The pseudo code of the complete algorithm (nlsrNNLS) is presentedin Algorithm 2.3.3.1 Solving NNLS for each basis flip angle(Module 1)To optimize the FA for each voxel, this module solves P NNLS and matrix-vector products (N?M)? (M?1) to compute ?2min in Equation 2.3, where P313.3. Implementation Using Multicore CPUs and GPUAlgorithm 1: T2 decay curve analysis algorithm with stimulated echocorrection and rNNLSinput : T2 decay curve y for every voxel in an image Ioutput: T2 distribution s for every voxel in an image Iinitialization Compute sparse diagonal matrix parameters for relaxation matrix E over T2 decayspace;// Compute NNLS basis for each FAfor FA? 1 to P dofor j ? 1 to M doInitialize MPSV g;Extract relaxation matrix Ej ;Compute transition matrix T;for i? 1 to N dog? T ? g ; // Perform the flipAi,j ? g(1)e?ti/T2j ; // Record the echo magnitudeg? Ej ? g ; // Allow for relaxationAFA ? A;for ? voxel v ? I do// Perform unregularized NNLS for FA optimizationModule 1 for FA? 1 to P doPerform NNLS with AFA and y using Eq. 2.3 and store ?2min;Module 2 Interpolate the ?2min values using spline fit;Find the FA corresponding to the smallest ?2min;// Compute NNLS basisModule 3 for j ? 1 to M doInitialize MPSV g;Extract relaxation matrix Ej ;Compute flip matrix T;for i? 1 to N dog? T ? g ; // Perform the flipAi,j ? g(1)e?ti/T2j ; // Record the echo magnitudeg? Ej ? g ; // Allow for relaxation// Perform NNLS with the Tikhonov regularizationModule 4 Perform NNLS with A and y using Eq. 2.3 to compute ?2min;?? 0.001;while ?2r < 1.02?2min do?? 2?;Perform NNLS with A, y and ? using Eq. 2.4 and store ?2r;Interpolate the ?2r values using spline fit;Find the ? corresponding to the smallest ?2r;Perform NNLS with A, y and ? using Eq. 2.4 to compute s;323.3. Implementation Using Multicore CPUs and GPUAlgorithm 2: T2 decay curve analysis algorithm with nlsrNNLSinput : T2 decay curve y for every voxel in an image Ioutput: T2 distribution s for every voxel in an image Iinitialization Perform rNNLS using Algorithm 1;Store initial T2 distributions and optimal FAs for every voxel;for ? voxel v ? I doModule 5 Compute a non-local spatial prior f for every voxel as described in Section 2.3and Section 3.3.5;for ? voxel v ? I do// Compute NNLS basis using the stored FAModule 3 for j ? 1 to M doInitialize MPSV g;Extract relaxation matrix Ej ;Compute flip matrix T;for i? 1 to N dog? T ? g ; // Perform the flipAi,j ? g(1)e?ti/T2j ; // Record the echo magnitudeg? Ej ? g ; // Allow for relaxation// Perform NNLS with the non-local spatial regularization priorModule 4 Perform NNLS with A and y using Eq. 2.3 to compute ?2min;?? 0.001;while ?2r < 1.02?2min do?? 2?;Perform NNLS with A, y, ? and f using Eq. 2.4 and store ?2r;Interpolate the ?2r values using spline fit;Find the ? corresponding to the smallest ?2r;Perform NNLS with A, y, ? and f using Eq. 2.4 to compute s;is the number of basis FAs. QR decomposition, which decomposes a matrixinto an orthogonal matrix and an upper triangular matrix, is often used tosolve the linear least square problem. Based on the QR column updatingand downdating scheme that makes a full QR decomposition unnecessary[55], we parallelize NNLS on multicore CPUs by using the Intel math kernellibrary (MKL) (Intel Corporation, Santa Clara, USA) and the open mes-sage passing (OpenMP) [56] application programming interface (API)s. TheIntel MKL is a set of optimized math routines that include versions of thepopular linear algebra package (LAPACK) [57] and basic linear algebra sub-333.3. Implementation Using Multicore CPUs and GPUprograms (BLAS) [58] libraries, and OpenMP is a more general API thatsupports multi-platform shared-memory parallel programming in C/C++.The Intel MKL calls OpenMP internally to parallelize certain computationswhere it would be beneficial, but the programmer can also call OpenMP func-tions directly for higher-level parallel processing. The NNLS problem can besolved independently for each FA, so a straightforward approach would beto set up P NNLS systems via the OpenMP API, and use the Intel MKLto accelerate the QR column updating and downdating operations. Such astrategy has been used with success previously to accelerate the solving ofmutually exclusive linear systems of equations [55], but we have found thatthe relatively small size of our NNLS system (32 linear equations and 40 un-known variables) makes the use of multiple threads with the OpenMP APIinefficient due to the overhead of organizing threads. Thus, we solve the PNNLS systems sequentially and allow the Intel MKL to maximally utilize theavailable CPU cores. After a T2 distribution for each flip angle is computed,The matrix-vector products for calculating P ?2min values in Equation 2.3,one for each FA, are computed in parallel on the GPU by creating P threadblocks using CUDA.3.3.2 Interpolation to find an optimal flip angle(Module 2)The P ?2min samples corresponding to the basis FAs are interpolated by thecubic B-spline kernel to find the optimum angle. We accelerate this processby using a library that provides a CUDA implementation of cubic B-splineinterpolation (CI) [59], which is not natively supported by CUDA. In addi-tion, multiple points along the spline can be interpolated in parallel, and wedo so by creating L threads with CUDA, where L is the number of interpo-lated points (in this case L = 130000 which corresponds to the number ofintermediate FAs between 50? and 180? with a step size of 0.001?). In thisway all of the interpolated points are produced in parallel. The large num-343.3. Implementation Using Multicore CPUs and GPUber of required interpolated points and the fact that they can be computedindependently make the GPU-based method well-suited for accelerating thismodule.3.3.3 NNLS basis matrix computation for an optimalflip angle (Module 3)This module constructs the final NNLS basis matrix A, which will be usedfor the regularized NNLS in Equation 2.4 to find a T2 distribution, corre-sponding to the optimum angle found in the previous section. The mosttime-consuming parts of this module are the transition matrix-MPSV prod-uct (3N ? 3N) ? (3N ? 1) and the relaxation matrix-MPSV product (3N ?3N)?(3N?1) as shown in Algorithm 1. These matrix-vector products occur2?N?M times for each voxel. Since these matrix-vector products need to beperformed recursively and their matrix sizes are not large enough to utilizemany cores, parallelization through GPUs would not be an efficient way toimprove the processing speed. Instead, we convert the relaxation and flipmatrices into two separate sparse diagonal matrices and perform multicoreCPU-accelerated matrix-vector products using the Intel MKL Sparse BLASlevel 2 library. Conversion into a sparse diagonal matrix for every voxel canbe a time expensive operation, but for the relaxation matrix it can be donevery efficiently because the matrix is only defined over a limited number of T2decay times (M), so we pre-compute the parameters needed for the conver-sion and efficiently generate the relaxation matrix using these pre-computedparameters when each voxel is processed as shown in Algorithm 1.3.3.4 Estimating a T2 distribution with theregularized NNLS (Module 4)A continuous T2 distribution is obtained by solving the regularized NNLSin Equation 2.4 with the final NNLS basis matrix computed in the previ-ous section. The multicore NNLS method using Intel MKL and OpenMP as353.3. Implementation Using Multicore CPUs and GPUdescribed above is also applied in this module. For the matrix-vector prod-ucts required to compute ?2r, we apply the Intel MKLs BLAS level 2 library.In order to find the regularization parameter ? corresponding to 1.02?2min,a cubic B-spline interpolation is also performed as for the FA optimization(refer to Algorithm 1 for details), but due to the relatively small number ofrequired interpolation points in this case (typically 10-20 points), a GPU-based method is not well-suited, and we use the cubic spline class of theBoost C++ library [60].3.3.5 Computing a non-local spatial prior (Module 5)Using NVIDIA?s CUDA, computing our non-local spatial prior presented inChapter 2 is parallelized by creating 21 ? 21 (search window size) threadblocks on the number of voxels (for example, 256? 256? 7). Each thread isused to create a 7 ? 7 kernel for each search window. The weights in eachsearch window are stored in GPU?s shared memory which are shared betweenthreads to reduce the processing time and overhead caused by copying datainto GPU?s device memory.3.3.6 Creating multiple instancesThe HMC++ implementation is particularly advantageous when there is aneed to batch process many sets of images, which is common in clinical stud-ies. The HMC++ implementation can make more efficient use of multicorearchitectures than the MATLAB PCT for batch processing, because 1) eachinstance of the MATLAB PCT locks the CPU cores that it allocates until itexits, even when they are idle, which imposes a strong limit on the numberof concurrent MWF analyses, and 2) for versions of the PCT that supportGPU processing, each instance of the PCT can only utilize one GPU at atime. In contrast, the number of instances of our HMC++ implementationis generally only limited by the capacity of the GPUs device memory andthe number of GPUs available.363.4. Results3.4 ResultsWe compared our HMC++ implementation for rNNLS algorithm with theparallelized MATLAB implementation, which utilizes the PCT for rNNLSsolving but does not use the GPU, due to very limited GPU support in ourversion of the PCT. We recorded the processing time individually for everymodule described in Section 3.3 and the processing time for processing afull scan. Table 3.2 shows an estimate of how much the acceleration of eachmodule contributes to the overall speedup. The CPU used in this experimentis the Intel Core i5-2500 3.3GHz (4 cores) with 8GB RAM and the GPU usedis a GeForce GTX 560 Ti. For the experiment in Table 3.2, we selected asingle voxel of T2 relaxation data with 32 echoes and compute it 1000 times.The most significant performance improvement as measured by speed-upfactor was obtained by the multicore-based NNLS parallelization (10.5? forModule 1 and 9.0? for Module 4) and matrix-vector product accelerationby the sparse diagonal matrix multiplications using the Intel MKL (2.6? forModule 3). Although the speed-up factor for Module 3 was smaller thanones for Modules 1 and 4, it was a significant improvement because Module3 consumed about 50% of the entire processing time. Module 3 consumedthe most processing time (2.6108 seconds) in our implementation because ofits recursive computation (refer to Algorithm 1).The overall performance gain was more pronounced when processing abatch of scans. Table 3.3 presents the computation times for processingone 32-echo T2 relaxation scan and 10 32-echo T2 relaxation scans. Theperformance improvement for processing one full scan was 4.1? and 14.4? forprocessing 10 full scans. When processing a batch of scans with MATLAB,all four CPU cores were locked by a single instance of the PCT even ifmultiple instances were created, so the speed-up over the single-threadedversion was linear with respect to the number of cores. In contrast, ourHMC++ implementation did not lock CPU cores unnecessarily, but utilizedthem fully when necessary, which resulted in a overall speed-up of about 14373.4. Resultstimes over the parallelized MATLAB implementation. If more than one GPUis available or if the GPU has a larger device memory and more cores, moreinstances can be created for further speed improvement.The MWF values computed by our HMC++ implementation were practi-cally identical to those computed by the MATLAB version. To demonstratethis, we measured the mean MWF and standard deviation (SD) values of11 ROIs in a human brain using both implementations. The comparisonis shown in Table 3.4. The difference in mean MWF for any ROI betweenthe two implementations was less than < 0.002. There are several differenttype of boundary conditions in cubic spline interpolation, and MATLAB?sinterp1 function and CUDA?s texture interpolation [59] have different bound-ary conditions. The boundary conditions for each interpolation method arepre-defined and they can?t be changed by users. The MWF differences weredue to the different boundary conditions of the B-spline interpolation, usedin Module 2, in MATLAB?s interp1 function and CUDA?s texture inter-polation [59]. Due to the boundary condition difference, interpolated ?2minvalues nearby the FAs 50? and 180? were slightly different, which resulted indifferent mean MWF values.383.4. ResultsTable 3.2: Runtime (seconds) comparisons of our implementation forrNNLS and the parallelized MATLAB implementation for eachmodule. We selected a single voxel of T2 relaxation data with 32echoes and performed the MWF analysis 1000 times.Our MATLAB Speed-upimplementation implementation factorModule 1 0.2105 2.2143 10.52?Module 2 0.5621 1.1994 2.13?Module 3 2.6108 6.7790 2.60?Module 4 0.3702 3.3309 9.00?Table 3.3: Runtime (minutes) comparisons of our implementation forrNNLS and the parallelized MATLAB implementation for a fullscan (256x256 T2 data with 32 echoes and 7 axial slices) and abatch processing of 10 full scans.Our MATLAB Speed-upimplementation implementation factor1 image 9.4674 39.1000 4.13?10 images 27.2496 391.3181 14.36?393.4. ResultsTable 3.4: Mean MWF and SD comparisons of our implementation forrNNLS and the parallelized MATLAB implementation for eachROI in brain.Our implementation MATLAB implementationMean SD Mean SDCaudate Nucleus 0.0098633 0.0199610 0.0095641 0.0189250Cingulate Gyrus 0.0224290 0.0273900 0.0220210 0.0275370Thalamus 0.0030181 0.0097879 0.0030615 0.0099761Insular Cortex 0.0157260 0.0314480 0.0159360 0.0316010Cortical Grey Matter 0.0297900 0.0253520 0.0299770 0.0250640Putamen 0.0528920 0.0421180 0.0539370 0.0420010Minor Forceps 0.0791770 0.0312680 0.0774580 0.0321700Major Forceps 0.1208300 0.0374680 0.1206500 0.0376850Genu 0.1325900 0.0322220 0.1333900 0.0321600Splenium 0.1651300 0.0407170 0.1650100 0.0412570Internal Capsule 0.2043200 0.0408480 0.2043700 0.040984040Chapter 4Conclusions4.1 Non-local Spatial RegularizationThe study of myelin has greatly advanced our understanding of neurologicaldiseases such as MS and monitoring myelin health in vivo has great potentialfor the development of new therapies. MWI is the only MRI technique thatcan specifically measure the amount of myelin. A major challenge in pro-ducing reliable myelin measurements using MWI is that the mathematicalprocess of converting the T2 decay curves to T2 distributions is very sensitiveto image noise, and even small differences in the curves can cause large dif-ferences in the computed myelin values, which has hindered the applicationof the technique to clinical studies.We have presented a non-local spatial regularization method for T2 de-cay analysis to more robustly measure myelin water in WM in vivo. Theproposed spatial T2 distribution prior is adaptively estimated based on theSKL divergence, and exploits information from the global WM region. Weobserved that our spatial regularization strategy can effectively preserve im-portant brain structures while reducing noise and spatial variations, whichimproves the longitudinal reproducibility of MWF and the ability to distin-guish between healthy subjects and MS patients. Six different regularization414.2. Fast Myelin Water Measurement using Multicore Architecturesmethods have been compared using three main criteria (visual quality, re-producibility as measured by longitudinal COV and ability to distinguishbetween MS patients and normals), and it was observed that the proposedregularization algorithm compared favorably with the other methods in allthe three criteria.By providing more robust and accurate myelin measurements, our under-standing of myelin loss in neurological diseases such as MS and its impacton the patients would improve. In addition, we would have a more sensi-tive tool to assess the efficacy of newly developed remyelinating agents forneurological diseases such as MS, resulting in reduced development time.Future work would include further optimization of algorithmic parametersand validation beyond a single scanner. We will further validate our methodon more extensive data sets that have more subjects and larger anatomicalcoverage based on longitudinal and ROIs studies (including lesions). Onecould investigate how well the proposed regularization would work in com-bination with the advanced denoising strategies applied to the multi-echoimages [39, 40]. Finally, it would be important future work to measure thequantitative correlations between the myelin water calculations produced byour proposed method and histopathological data from, for example, myelinstaining [13], in order to determine whether regularization improves the cor-respondence to true pathology.4.2 Fast Myelin Water Measurement usingMulticore ArchitecturesWe have presented an efficient parallelized framework for extracting themyelin water signal from T2 relaxation data. Many CNS studies requirea large number of patients to achieve sufficient statistical power and ourcontribution will make these studies more efficient and practical. More im-portantly, we expect that this work will open the doors for many futureimprovements on MWF analyses due to reduced calculation time.424.2. Fast Myelin Water Measurement using Multicore ArchitecturesWe have compared the performance of the proposed HMC++ methodwith a parallelized MATLAB version, because MATLAB is currently themost popular platform for the development of MWF analysis methods [31,33, 35, 36]. However, a comparison in speed between a C++ implementationof any algorithm and a MATLAB one is inherently unfair due to the over-head incurred by MATLAB in order to provide a user-friendly programmingenvironment. The purpose of the comparison done in this work is not reallyto show an advantage of C++ over MATLAB, but rather to demonstratethat a carefully designed HMC++ implementation for MWF analysis is apractical way forward to improving regularization methods and facilitatingthe use of larger images. For example, some of the recently proposed regu-larization methods (e.g., [32, 33, 38, 61, 62]) would require days to process asingle 256?256?40 32-echo volume if implemented in MATLAB. Many suchregularization methods can be efficiently parallelized and therefore would fitwell into the HMC++ framework.Our goal is to provide a starting point for other MWF research groupsto explore our HMC++ approach. Our implementation is built on freelyavailable (for non-commercial use) software libraries and can be readily com-plied and run on many existing systems. We intend to make the source codepublicly available.Recently proposed regularization methods utilize multi-voxel based pro-cessing [33] or graph-cut algorithm [62], which are computationally demand-ing. Advanced multi-echo T2 images denoising methods [39, 40] have greatpotential to improve myelin measurements, but they require expensive com-putations and long processing time. Our future work will be focused on theparallelization of such advanced myelin measurement methods based on ourHMC++ approach to further reduce the development time.43Bibliography[1] P. Morell, R. H. Quarles, and W. T. Norton. Formation, structure andbiochemistry of myelin. Basic Neurochemistry: Molecular, Cellular andMedical Aspects, pages 109?136, 1989.[2] Kent M. Van D. G. Human anatomy. McGraw-Hill, 2002.[3] C. Laule, I. M. Vavasour, S. H. Kolind, D K B. Li, A. Traboulsee,G. Moore, and A. MacKay. Magnetic resonance imaging of myelin.Neurotherapeutics, 4(3):460?84, July 2007.[4] D. B. Plewes and W. Kucharczyk. Physics of MRI: a primer. Journalof Magnetic Resonance Imaging, 35(5):1038?54, May 2012.[5] W. Stewart, A. MacKay, K. P. Whittall, G. R. Moore, and D. W. Paty.Spin-spin relaxation in experimental allergic Encephalomyelitis.Analysis of CPMG data using a non-linear least squares method andlinear inverse theory. Magnetic Resonance in Medicine, 29(6):767?775,1993.[6] A. Waldman, J.H. Rees, C.S. Brock, M.D. Robson, P.D. Gatehouse,and G.M. Bydder. MRI of the brain with ultra-short echo-time pulsesequences. Neuroradiology, 45(12):887?892, 2003.[7] P. M. Kilby, N. M. Bolas, and G. K. Radda. 31P-NMR study of brainphospholipid structures in vivo. Biochimica et Biophysica Acta(BBA)-Lipids and Lipid Metabolism, 1085(2):257?264, 1991.[8] C. Beaulieu. The basis of anisotropic water diffusion in the nervoussystem?a technical review. NMR in Biomedicine, 15(7-8):435?455,2002.44Bibliography[9] S. Song, S. Sun, W.K. Ju, S.J. Lin, A. H. Cross, and A. H. Neufeld.Diffusion tensor imaging detects and differentiates axon and myelindegeneration in mouse optic nerve after retinal ischemia. NeuroImage,20(3):1714?1722, 2003.[10] S. Song, S. Sun, M. J. Ramsbottom, C. Chang, J. Russell, and A. H.Cross. Dysmyelination revealed through MRI as increased radial (butunchanged axial) diffusion of water. NeuroImage, 17(3):1429?1436,2002.[11] S. D. Wolff and R. S. Balaban. Magnetization transfer contrast (MTC)and tissue water proton relaxation in vivo. Magnetic Resonance inMedicine, 10(1):135?144, 1989.[12] P. J. Gareau, B. K. Rutt, S. J. Karlik, and J. R. Mitchell.Magnetization transfer and multicomponent T2 relaxationmeasurements with histopathologic correlation in an experimentalmodel of MS. Journal of Magnetic Resonance Imaging, 11(6):586?595,2000.[13] C. Laule, E. Leung, D K B. Li, A. Traboulsee, D. Paty, A. MacKay,and G. Moore. Myelin water imaging in multiple sclerosis: quantitativecorrelations with histopathology. Multiple Sclerosis, 12(6):747?753,November 2006.[14] V. Vasilescu, E. Katona, V. Simplaceanu, and D. Demco. Watercompartments in the myelinated nerve. III. Pulsed NMR result.Experientia, 34(11):1443?1444, 1978.[15] C. K. Jones. T2 decay curve acquisition and analysis in MRI noiseconsiderations, short T2, and B1 field encoding. PhD thesis, Universityof British Columbia, 2003.[16] R. M. Kroeker and Mark H. R. Analysis of biological NMR relaxationdata with continuous distributions of relaxation times. Journal ofMagnetic Resonance (1969), 69(2):218?235, 1986.[17] R.S. Menon and P.S. Allen. Application of continuous relaxation timedistributions to the fitting of data from model systmes and excisedtissue. Magnetic Resonance in Medicine, 20(2):214?227, 1991.45Bibliography[18] A. MacKay, K. Whittall, J. Adler, D K B. Li, D. Paty, and D. Graeb.In vivo visualization of myelin water in brain by magnetic resonance.Magnetic Resonance in Medicine, 31(6):673?7, June 1994.[19] K. Whittall, A. MacKay, D. Graeb, R. Nugent, D K B. Li, andD. Paty. In vivo measurement of T2 distributions and water contentsin normal human brain. Magnetic Resonance in Medicine, 37(1):34?43,January 1997.[20] C. Laule, I. Vavasour, G. Moore, J. Oger, D K B. Li, D. Paty, andA. MacKay. Water content and myelin water fraction in multiplesclerosis. A T2 relaxation study. Journal of Neurology, 251(3):284?93,March 2004.[21] A. MacKay, I. Vavasour, A. Rauscher, S. Kolind, B. Ma?dler, G. Moore,A. Traboulsee, D K B. Li, and C. Laule. MR relaxation in multiplesclerosis. Neuroimaging Clinics of North America, 19(1):1?26,February 2009.[22] K. Whittall and A. MacKay. Quantitative interpretation of NMRrelaxation data. Journal of Magnetic Resonance, 84(1):134?152,August 1989.[23] K. Whittall, M. Bronskill, and R. Henkelman. Investigation of analysistechniques for complicated NMR relaxation data. Journal of MagneticResonance, 95(2):221?234, 1991.[24] D. Gardner, J. Gardner, G. Laush, and W. Meinke. Method for theanalysis of multicomponent exponential decay curves. 1959.[25] S. Provencher. A Fourier method for the analysis of exponential decaycurves. Biophysical Journal, 16(1):27?41, 1976.[26] J. McWhirter and E. Pike. On the numerical inversion of the Laplacetransform and similar Fredholm integral equations of the first kind.Journal of Physics A: Mathematical and General, 11(9):1729, 1978.[27] S. Provencher. A constrained regularization method for inverting datarepresented by linear algebraic or integral equations. ComputerPhysics Communications, 27(3):213?227, 1982.46Bibliography[28] H. Barkhuijsen, R. De Beer, W. Bovee, and Van O. Retrieval offrequencies, amplitudes, damping factors, and phases fromtime-domain signals using a linear least-squares procedure. Journal ofMagnetic Resonance, 61(3):465?481, 1985.[29] E. Yeramian and P. Claverie. Analysis of multiexponential functionswithout a hypothesis as to the number of components. Nature, 326(6109):169?174, 1987.[30] C. Lawson and R. Hanson. Solving least squares problems. EnglewoodCliffs: Prentice-Hall, 1974.[31] T. Prasloski, B. Ma?dler, Q. Xiang, A. MacKay, and C. Jones.Applications of stimulated echo correction to multicomponent T2analysis. Magnetic Resonance in Medicine, 67(6):1803?14, June 2012.[32] D. Hwang and Y. Du. Improved myelin water quantification usingspatially regularized non-negative least squares algorithm. Journal ofMagnetic Resonance Imaging, 30(1):203?8, July 2009.[33] D. Kumar, T. Nguyen, S. Gauthier, and A. Raj. Bayesian algorithmusing spatial priors for multiexponential T(2) relaxometry frommultiecho spin echo MRI. Magnetic Resonance in Medicine, 68(5):1536?43, November 2012.[34] T. Prasloski, A. Rauscher, A. MacKay, M. Hodgson, I. Vavasour,C. Laule, and B. Ma?dler. Rapid whole cerebrum myelin water imagingusing a 3D GRASE sequence. NeuroImage, 63(1):533?9, October 2012.[35] T. Bjarnason and J. Mitchell. AnalyzeNNLS: magnetic resonancemultiexponential decay image analysis. Journal of MagneticResonance, 206(2):200?4, October 2010.[36] J. Oh, E. Han, M. Lee, S. Nelson, and D. Pelletier. Multislice brainmyelin water fractions at 3T in multiple sclerosis. Journal ofNeuroimaging, 17(2):156?63, April 2007.[37] C. Jones, K. Whittall, and A. MacKay. Robust myelin waterquantification: averaging vs. spatial filtering. Magnetic Resonance inMedicine, 50(1):206?9, July 2003.47Bibliography[38] D. Hwang, H. Chung, and Y. Nam. Robust mapping of the myelinwater fraction in the presence of noise: Synergic combination ofanisotropic diffusion filter and spatially regularized nonnegative leastsquares algorithm. Journal of Magnetic Resonance Imaging, 195:189?195, 2011.[39] U. Jang and D. Hwang. High-quality multiple T2* contrast MR imagesfrom low-quality multi-echo images using temporal-domain denoisingmethods. Medical Physics, 39(1):468?74, January 2012.[40] O. Kwon, E. Woo, Y. Du, and D. Hwang. A tissue relaxationdependent neighboring method for robust mapping of the myelin waterfraction. NeuroImage, pages 1?10, February 2013.[41] A. Tikhonov. Solution of incorrectly formulated problems and theregularization method. In Soviet Math. Doklady, volume 4, pages1035?1038, 1963.[42] A. Buades, B. Coll, and J. Morel. A non-local algorithm for imagedenoising. In IEEE Computer Society Conference on Computer Visionand Pattern Recognition, volume 2, pages 60?65, Washington, DC,USA, June 2005. IEEE.[43] D. Johnson and S. Sinanovic. Symmetrizing the Kullback-Leiblerdistance. IEEE Transactions Information Theory, January 2001.[44] G. Golub, M. Heath, and G. Wahba. Generalized cross-validation as amethod for choosing a good ridge parameter. Technometrics, 21:215?223, 1979.[45] I. Levesque, C. Chia, and G. Pike. Reproducibility of in vivo magneticresonance imaging-based measurement of myelin water. Journal ofMagnetic Resonance Imaging, 32(1):60?8, July 2010.[46] J. Manjo?n, J. Carbonell-Caballero, J. Lull, G. Garc??a-Mart??,L. Mart??-Bonmat??, and M. Robles. MRI denoising using non-localmeans. Medical Image Analysis, 12(4):514?23, August 2008.[47] S. Smith. Fast robust automated brain extraction. Humam BrainMapping, 17(3):143?155, November 2002.48Bibliography[48] Y. Zhang, M. Brady, and S. Smith. Segmentation of brain MR imagesthrough a hidden Markov random field model and theexpectation-maximization algorithm. IEEE Transactions on MedicalImaging, 20(1):45?57, January 2001.[49] R. Sokal and J. Rohlf. Biometry. W. H. Freeman, 4th edition, 2012.[50] J. Hennig. Multiecho imaging sequences with low refocusing flipangles. Journal of Magnetic Resonance, 78(3):397?407, 1988.[51] R. Marc Lebel and Alan H. Wilman. Transverse relaxometry withstimulated echo compensation. Magnetic Resonance in Medicine, 64(4):1005?1014, 2010.[52] B. Zhang, S. Xu, F. Zhang, Y. Bi, and L. Huang. AcceleratingMATLAB code using GPU: A review of tools and strategies. 2011 2ndInternational Conference on Artificial Intelligence, ManagementScience and Electronic Commerce (AIMSEC), pages 1875?1878,August 2011.[53] M. Malik, T. Li, and U. Sharif. Productivity of GPUs under differentprogramming paradigms. Concurrency and Computation: Practice andExperience, pages 179?191, 2012.[54] D. Kirk and W. Wen-mei. Programming massively parallel processors:a hands-on approach. Morgan Kaufmann, 2010.[55] Y. Luo and R. Duraiswami. Efficient parallel non-negative leastsquares on multi-core architectures. SIAM Journal on ScientificComputing, pages 1?16, 2011.[56] B. Chapman, G. Jost, and R. Van. Using OpenMP: Portable SharedMemory Parallel Programming (Scientific and EngineeringComputation). The MIT Press, 2007.[57] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra,J. Croz, A. Greenbaum, S. Hammarling, A. McKenney, andD. Sorensen. {LAPACK} Users? Guide. Society for Industrial andApplied Mathematics, Philadelphia, PA, third edition, 1999.49Bibliography[58] L. Blackford, J. Demmel, J. Dongarra, I. Duff, S. Hammarling,G. Henry, M. Heroux, L. Kaufman, A. Lumsdaine, A. Petitet, R. Pozo,K. Remington, and R. Whaley. An Updated Set of Basic LinearAlgebra Subprograms (BLAS). ACM Transactions on MathematicalSoftware, 28:135?151, 2001.[59] D. Ruijters and P. Thevenaz. GPU Prefilter for Accurate CubicB-spline Interpolation. The Computer Journal, 55(1):15?20, December2010.[60] B. Karlsson. Beyond the C++ standard library: an introduction toboost. Pearson Education, 2005.[61] Y. Yoo and R. Tam. Non-local spatial regularization of MRI T2relaxation images for myelin water quantification. In Medical ImageComputing and Computer-Assisted Intervention?MICCAI 2013, pages614?621. Springer, 2013.[62] X. Shen, T. D. Nguyen, S. A. Gauthier, and A. Raj. Robust myelinquantitative imaging from multi-echo T2 MRI using edge preservingspatial priors. In Medical Image Computing and Computer-AssistedIntervention?MICCAI 2013, pages 622?630. Springer, 2013.50