Automated Tumor Segmentation in PET/CT Scans ofLymphoma PatientsbyLeyi (Bellinda) YinA THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFBachelor of Science, Combined Honours in Physics and MathematicsinTHE FACULTY OF SCIENCE(Physics and Astronomy)The University of British Columbia(Vancouver)April 2021© Leyi (Bellinda) Yin, 2021AbstractLymphoma is a heterogeneous disease that can manifest in over 500 lymph nodesthroughout the body in addition to several lymphatic organs like the bone mar-row and spleen. Assessment of disease burden, staging and prognosis is typicallydone by visually assessing full-body PET/CT scans. With hundreds of tumorspossible, manual detection and delineation of each individual lesion is extremelytime-consuming, prone to inter- and intra-observer variability, and only results ina qualitative analysis, leaving out vital quantitative metrics. Although not rou-tinely performed in clinics, full segmentation of patient tumors can provide valu-able quantitative metrics (such as metabolic tumor volume) that aid in predictingoutcome and developing individualized treatment. A fully automatic pipeline forPET/CT lymphoma images for detection and delineation of the lymphoma lesionsis therefore, of great importance. In recent years, with the advancement of ArtificialIntelligence (AI), numerous segmentation schemes based on supervised deep learn-ing models have been proposed that require a large number of detailed delineatedcases. While detection techniques only need roughly annotated data, they can notbe used to extract exact tumor boundaries. For this thesis, an automatic segmen-tation pipeline based on conventional segmentation methods will be implementedin MATLAB with a focus on optimizing delineation algorithms. These results willthen be used to refine the output of AI-based object detection techniques, such asYOLO network. Using this routine, the work of this thesis intends to present anaccurate technique for individual lesion segmentation and enhance the diagnosticworkflow.iiPrefaceThis dissertation is original, unpublished, independent work by the author, L. Yin.This thesis is a retrospective study based on PET/CT images provided by theQurit Lab and Lymphoma CMC South Korea collaboration.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Lymphoma Disease Characteristics . . . . . . . . . . . . . . . . . 11.2 Clinical Assessment of Lymphoma Using PET/CT Images . . . . 21.3 Automatic Segmentation . . . . . . . . . . . . . . . . . . . . . . 31.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.1 PET/CT Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Image Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . 102.4 Conventional Segmentation Algorithms . . . . . . . . . . . . . . 132.4.1 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . 13iv2.4.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.3 Active Contours . . . . . . . . . . . . . . . . . . . . . . 162.4.4 Region Growing . . . . . . . . . . . . . . . . . . . . . . 172.5 Segmentation Evaluation . . . . . . . . . . . . . . . . . . . . . . 183 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233.1 Patient Information and Ground Truth . . . . . . . . . . . . . . . 233.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.3 Segmentation Algorithms . . . . . . . . . . . . . . . . . . . . . . 253.3.1 Thresholding . . . . . . . . . . . . . . . . . . . . . . . . 253.3.2 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3.3 Active Contours . . . . . . . . . . . . . . . . . . . . . . 273.3.4 Region Growing . . . . . . . . . . . . . . . . . . . . . . 273.3.5 Hybrid Algorithms . . . . . . . . . . . . . . . . . . . . . 283.4 Performance Evaluation . . . . . . . . . . . . . . . . . . . . . . . 294 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.1 Segmentation Results . . . . . . . . . . . . . . . . . . . . . . . . 314.1.1 Segmentation Parameter Optimization . . . . . . . . . . . 314.1.2 Segmentation Accuracy . . . . . . . . . . . . . . . . . . 324.1.3 Volume Dependence . . . . . . . . . . . . . . . . . . . . 364.2 Combination with AI-based Detection . . . . . . . . . . . . . . . 415 Analysis and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 435.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48A Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 54vList of TablesTable 3.1 A summary of pre-processing and segmentation parameters ex-plored. Values attempted are based on literature suggestions andtrial and error. Final values were chosen based on mean DSC ofthe segmentation result to access delineation accuracy. . . . . . 30Table 4.1 Summary of optimal ranges of lesion volumes for top segmen-tation methods. For the methods that appeared to have a distinctvolume dependence, the widest possible range with the highestmean DSC was calculated to determine the optimal range ofperformance for each method. Volume range is per axial sliceof PET scan which was calculated using the area of the regionmultiplied by the separation distance between images obtainedfrom PET image settings. V = (Area×0.003)mL . . . . . . . 37viList of FiguresFigure 1.1 Taken from [13]. An example of inter-physician variability formanual tumor segmentation. Two independent nuclear medicinephysicians delineated and ranked lesions for five patients usingMaximum intensity projections (MIPs) of patient PET scans.Red contours illustrate proposed malignant lesions while yel-low contours display likely benign lesions. . . . . . . . . . . 3Figure 2.1 An example of PET/CT fusion. The PET scan (left) is com-bined with the CT scan (middle) to produce a fused PET/CTdual modality image (right). . . . . . . . . . . . . . . . . . . 7Figure 2.2 Taken from [21]. Illustration of PVE and “spill out”. (a) Acircular source of intensity 100 and diameter 10mm (left) pro-duces the measured image (right). Part of signal is seen outsideactual source area and maximum intensity of the source is re-duced to 85. (b) Illustration of image output affected by PVE.The object edges produce a “spill out” that expands the area ofthe signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Figure 2.3 Adapted from [13]. Schematic illustrating two steps of imagesegmentation. A) Recognition(detection) both includes dis-tinguishing an area from its surroundings and determining itsrelative location. B) Delineation involves defining the spatialboundaries of the desired region. . . . . . . . . . . . . . . . . 10viiFigure 2.4 Taken from [25]. The probability distribution of a Guassian(Normal) distribution centered at mean µ . 68% of the data iswithin 1 standard deviation, 95% is within 2 standard devia-tion, 99.7% is within 3 standard deviations. . . . . . . . . . . 11Figure 2.5 Taken from [26]. An example application of an Gaussian filterto PET phantom image from left to right with (a) σ = 0.5, (b)σ = 1.5, and (c) σ = 2.5. . . . . . . . . . . . . . . . . . . . . 11Figure 2.6 Taken from [16]. Illustration of the Morphological OpeningOperation. The structuring element B (green circle) is firstrolled all around Image A (orange triangle) to create the roughshape of A in terms of B (left). Erosion completely eliminateselements in A lighter/narrower than B (middle). Dilation re-stores the shape of A (green shape). The final image A has asmoother contour than the original image. . . . . . . . . . . . 12Figure 2.7 Taken from [26]. An example application of an Morphologi-cal Opening Operation to remove inhomogeneous background.From left the right: (a) The original Image and structuring el-ement (green) of size 9 × 9 pixels, (b) Background obtainedafter applying opening operation, and (c) Resulting image withuniform background after background image is subtracted fromoriginal image. . . . . . . . . . . . . . . . . . . . . . . . . . 13Figure 2.8 Taken from [29]. Illustration of ITM: (A) True volume andoptimum threshold curve to determine initial threshold, (B)Source/Background ratio of original image, (C-G) threshold it-eration until values stabilize, and (H) the final segmented volume. 15Figure 2.9 Taken from [31]. Simple illustration of fixed value K-meansclustering and Fuzzy C-Mean. Three clusters were defined forboth methods. . . . . . . . . . . . . . . . . . . . . . . . . . . 16viiiFigure 2.10 Taken from [33]. Visualization of localized consideration ateach point along the contour. A ball is considered at each pointalong the initial curve (green) and split by the contour into lo-cal interior and local exterior regions. In both images, the pointx is represented by the small yellow dot. The local neighbor-hood is represented by the larger red circle. In the left, thelocal interior is the shaded part of the circle and on the right,the shaded part of the circle indicates the local exterior. . . . . 17Figure 2.11 Taken from [35]. Example of region growing starting froma seed point inside the object of interest. The ROI is grownby adding surrounding pixels that satisfy a specified similaritycondition compared to the seed. . . . . . . . . . . . . . . . . 18Figure 2.12 Taken from [36]. Illustration of calculating the DICE coeffi-cient (DSC). . . . . . . . . . . . . . . . . . . . . . . . . . . 19Figure 2.13 Taken from [37]. Illustration of calculating the Jaccard Simi-larity Coefficient (J). . . . . . . . . . . . . . . . . . . . . . . 20Figure 2.14 Taken from [38]. Illustration of finding the Hausdorff distance(HD) of two curves. . . . . . . . . . . . . . . . . . . . . . . . 20Figure 2.15 Taken from [40]. Schematic illustrating how to calculate thethe Structural Similarity Index (SSIM) of two images definedby Signal x and y. . . . . . . . . . . . . . . . . . . . . . . . . 22Figure 4.1 Lesion-level segmentation performance for top 4 segmentationmethods. Round circles indicate the mean value of metric for90 lesions segmented by the respective algorithm. Methodsare ordered from left to right by mean Dice coefficient (DSC),mean Jaccard Index, mean Hausdorff distance (HD), and meanStructural Similarity Index (SSIM). . . . . . . . . . . . . . . 34ixFigure 4.2 Patient-level segmentation performance for top 4 segmentationmethods. Round circles indicate the mean value of metric for38 patients segmented by the respective algorithm. Methodsare ordered from left to right by mean Dice coefficient (DSC),mean Jaccard Index, mean Hausdorff distance (HD), and meanStructural Similarity Index (SSIM). . . . . . . . . . . . . . . 35Figure 4.3 Segmentation results based on lesion size (mL) in individualaxial tumor slices. ITM+AC+region growing had the largestimprovement in performance as size increased (as illustratedby the logarithmic curve). The other methods, with fitted hy-perbolic curves, tended to have better performance in the mid-dle range of lesion sizes. FCM had no discernible pattern ofperformance and thus was not fitted to a trend line. . . . . . . 38Figure 4.4 Example lesion segmentation results. Five tumors in axialslices containing the tumors with ground truth from Physician(green) and results from indicated automated methods (red).Methods ordered from left to right by Hybrid method (ITM+AC+regiongrowing), fixed 35% thresholding with pre-processing (BR=backgroundremoval), and fixed 35% thresholding alone. . . . . . . . . . . 39Figure 4.5 Example lesion segmentation results. Five tumors in axialslices containing the tumors with ground truth from Physician(green) and results from indicated automated methods (red).Methods ordered from left to right by ITM with pre-processing(BR=background removal), original ITM, and FCM with pre-processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . 40xFigure 4.6 Taken from [50]). A summary of the lesion detection and seg-mentation results for the NSCLC axial PET images. (a) and(d) show two representative axial cross-sections of the NCSLCPET scan, with the lesions delineated by a physician (in red).(b) and (d) show the corresponding YOLO v3 predicted bound-ing boxes (in green) around the lesion region. (c) and (f) showthe corresponding segmentation contours using the ITM + ACmethod (in red) and ground truth contours (in green)inside theYOLO predicted ROIs, with the Dice coefficients 0.94 and0.95, respectively. . . . . . . . . . . . . . . . . . . . . . . . . 42xiGlossaryAI Artificial IntelligenceNHL non-Hodgkin’s lymphomaDLBCL Diffuse Large B-cell Lymphoma18F-FDG 18F-fluorodeoxyglucosePET positron emission tomographyCT computed tomographyROI region of interestSUV Standardized Uptake ValuesPVE Partial Volume EffectITM Iterative Thresholding MethodS/B source-to-background ratioFCM Fuzzy C-means clusteringAC Active countersDSC Dice Similarity CoefficientHD Hausdorff distanceSSIM Structural Similarity IndexDSC Dice Similarity CoefficientFRFCM Fast and Robust Fuzzy C-Means ClusteringYOLO You Only Look OncexiiAcknowledgmentsI would like to sincerely thank my thesis supervisor Dr. Fereshteh Yousefi whoprovided invaluable support both scientifically and as a mentor. I am extremelygrateful to Dr. Arman Rahmim for giving me the opportunity to work on this the-sis project, making me a member of the Qurit team (Quantitative RadiomolecularImaging and Therapy lab at the BC Cancer Research Centre), and being the secondreader for this thesis. I would also like to thank the entire Qurit Lab for providingsupport and assistance whenever I needed, even in these difficult times. I wouldlike to thank Professor Rob Kiefl for the dedication and organization of the Phys449 course.I would also like to thank my parents, Pengfei Yin and Wei Zhang, for theirunwavering support.Last but not least, I would like to thank Christopher Tao for giving me inspira-tion and helping me consolidate my ideas.xiiiChapter 1Introduction1.1 Lymphoma Disease CharacteristicsLymphoma is cancer of the lymphatic system, a component of the immune sys-tem that spans the entire body [1]. Cells of this system, called lymphocytes, cangrow or change to no longer behave normally, producing tumors that can originatealmost anywhere in the body. In Canada, lymphoma is the 6th most commonlydiagnosed form of cancer with an estimated 10,000 new cases a year [2]. In par-ticular, non-Hodgkin’s lymphoma (NHL) is the most common and deadly form,occurring about ten times more frequently than Hodgkin’s lymphoma and with anapproximately 15% lower survival rate [2]. Within NHL, Diffuse Large B-cellLymphoma (DLBCL) – a cancer of the B-cell lymphocytes - is the most commonmanifestation with clinical features like rapidly growing tumor masses. Unfor-tunately, prognosis is not great for this aggressive cancer, with incurable patientsdying within a few months to a few years [3]. NHL can stem from a variety of lym-phocytes and is treated depending on the origin and stage of the cancer. Treatmentscan be divided into two categories: targeted and non-targeted therapy. Targetedtherapies include introducing specific antibodies that destroy proteins necessaryfor cancer survival and introducing other proteins that interfere with the cancer.Non-targeted therapies include chemotherapy, radiation therapy, immunotherapy,and stem cell transplants [3]. Since lymphoma manifests from so many differ-ent abnormalities, individualized medicine is paramount to produce effective and1efficient treatment that removes all traces of the disease to prevent remission.1.2 Clinical Assessment of Lymphoma Using PET/CTImagesAfter searching for swollen lymph nodes in a physical exam, imaging test may berecommended to look for tumors during the diagnostic stage [4]. Most commonly,18F-fluorodeoxyglucose (18F-FDG) positron emission tomography (PET) imagingintegrated with computed tomography (CT) is used to image lymphoma patients.For lesion segmentation and outcome prediction, the most pertinent information isexpected to be in PET images but not limited to them. PET images are high contrastbut lower resolution which is useful for background and foreground separationwhereas CT images provide higher resolution images that can display valuableorgan spatial information.PET/CT images are normally assessed visually by physicians and analyzedqualitatively which leaves a lot of room for inter- and intra-physician variability[5]. Since patients may have hundreds of pea sized tumors, it is often unrealisticfor a physician to be able to locate all of them in a timely and efficient manner.Additionally, the widely used visual Deauville criteria for evaluating response totherapy has only five categories, grouping together patients with widely differentdisease characteristics and treating them as the same [6]. Without detailed diseaseinformation, it is very difficult to determine whether or not a patient is going torespond well to treatment regimes. With such diverse disease characteristics, thereis a plethora of methods to de-escalate lymphoma disease burden on a patient. Thisis not only to account for the heterogeneity of lymphoma but also potential drugresistances, acquired treatment resistances, and drug toxicity which are particularlytroubling for younger patients who may develop infertility and chronic secondarymalignancies [7]. Having access to more individualized metrics may narrow downtreatment options, avoiding potentially harmful regimes.While imaging is common for staging and prognosis, the emphasized benefitsof PET/CT imaging is for evaluating response to therapy and post-treatment fol-lowup, focusing PET use to response management [8]. However, there is strongevidence that extracting quantitative tumor metrics (like total metabolic tumor vol-2Figure 1.1: Taken from [13]. An example of inter-physician variability formanual tumor segmentation. Two independent nuclear medicine physi-cians delineated and ranked lesions for five patients using Maximumintensity projections (MIPs) of patient PET scans. Red contours illus-trate proposed malignant lesions while yellow contours display likelybenign lesions.ume from FDG-PET) can be extremely informative and improve prognosis, leadingto better treatment for patients [9]. Many studies have also shown that quantita-tive disease information is more predictive of patient outcome than current clinicalqualitative assessment. At a patient level, there is great potential for more objectiveand standardized methods for characterizing lymphoma [10][11].1.3 Automatic SegmentationAutomatic segmentation of tumors in PET/CT scans will produce more consis-tent results and extract accurate lesion level metrics which quantify patient dis-ease burden based on objective and standardized image metrics. These techniqueswill allow more detailed extraction of individual tumor metrics (like individualmetabolic tumor uptake, texture, and shape information) which can significantly3improve prognosis power [12]. There are many challenges to the segmentationof PET images which makes developing an accurate process extremely desirable.Firstly, since the lymphatic system spans the entire body, there is wide variation inlocation, size, and texture of lesions which makes them both difficult to detect anddelineate. Secondly, during the processing stage, the resolution is another promi-nent issue that makes it difficult to draw clear boundaries. This problem is alsoexasperated by image noise.Although AI-based deep models have shown some very promising segmen-tation results, these methods require an abundance of accurate annotated data fortraining the network which is problematic in cases where such resources are scarce[5]. Comparatively, semi-supervised methods normally only require an initial seedor contour which can be estimated quite well using global image characteristics.These methods can also produce similar segmentation results and are simpler toimplement [13]. With this in mind, the aim of this thesis project is to produce atumor segmentation pipeline using semi-supervised methods that will act as a com-plementary scheme to help the AI-based automatic method become more accurate.An accurate and timely segmentation method will allow the extraction of informa-tive quantitative lesion metrics which can be used to gain a deeper understandingof initial disease characteristics and treatment progression [13].1.4 ObjectivesThe work in this thesis is a step in an automatic lymphoma segmentation pipelinethat will enable a shift from qualitative disease assessment to using quantitativeimaging metrics to develop more individualized treatment. The completed pipelinewill reduce subjectivity in individual physician assessment and promote the use ofstandardized metrics to characterize disease progression. By adapting and optimiz-ing conventional methods, a robust and accurate semi-automatic tumor segmenta-tion algorithm can be development. This algorithm, in combination with AI-baseddetection methods to determine the region of interest (ROI) can provide accuratedelineations of lesions (see results). The completed automatic framework will beapplicable in a clinical workflow to improve individualized care and free valuable4physician time.5Chapter 2Theory2.1 PET/CT ImagingPositron Emission Tomography (PET) is a highly sensitive, non-invasive molecularimaging technique that uses positron-emitting radioisotopes to measure accumula-tions of radiotracers throughout the body [14]. When the body undergoes normalmetabolic processes, the tracers circulate throughout the body, concentrating athighly active organs, inflamed areas, and tumors. A large scanner detects photonsemitted by tracers, allowing visualization of the targeted tissues. For lymphoma, inparticular, this holistic imaging technique is a vital tool used to identify abnormallesions from surrounding healthy tissue [15].An image can be characterized by a two-dimensional function with inputs ofspatial coordinates and output of the pixel intensity. This data can then be manip-ulated and refined to produce useful metrics that reveal important details about theimaged regions [16]. To standardize PET images, image intensities are convertedinto Standardized Uptake Values (SUV) which purely defines the concentration ofthe radiotracer measured [17]. This can be done by multiplying the intensity ofevery pixel by the following formula that uses information stored in the PET/CTimage file [18]:SUVscale f actor = (weight(kg)/DecayedDose) (2.1)6Figure 2.1: An example of PET/CT fusion. The PET scan (left) is combinedwith the CT scan (middle) to produce a fused PET/CT dual modalityimage (right).DecayedDose= In jectedDose∗2(−decayTime/hal f li f e) (2.2)Computerized Tomography (CT), on the other hand, is a computerized combi-nation of X-ray images taken from different angles around the body [4]. Processingis then done to create more detailed cross-sectional images of internal body partswhich allow CT scans to provide accurate anatomical data. The integration of PETand CT information can then be done by fusing the two modalities together, ac-cessing both metabolic and anatomical information [19]. In recent years, it hasbecome more common in clinics to look at these fusion PET/CT scans than justPET images alone [20].Although helpful for visual assessment, structural imaging like CT is not wellsuited for lesion detection applications since functional characterization based onbiological activity contributes more significantly than anatomical features [18].Currently, conventional image segmentation techniques only require high contrastPET scans that provide crucial information closely correlated with the existence ofabnormal tissue. However, the segmentation process is complicated by low spa-tial resolution, noise, and the Partial Volume Effect (PVE). PVE characterizes thephenomenon of a small source appearing larger than it really is due to low spatialresolution [21]. This “spill out” is particularly challenging for segmentation sincemany methods rely on the existence of a boundary to separate the ROI from the7Figure 2.2: Taken from [21]. Illustration of PVE and “spill out”. (a) A cir-cular source of intensity 100 and diameter 10mm (left) produces themeasured image (right). Part of signal is seen outside actual source areaand maximum intensity of the source is reduced to 85. (b) Illustration ofimage output affected by PVE. The object edges produce a “spill out”that expands the area of the signal.background [18]. These challenges ultimately hinder the segmentation goals topreserve all lesion information and produce accurate lesion metrics for analysis.2.2 Image SegmentationImage segmentation involves taking a holistic look at an entire image and extract-ing regions of interest based on certain image characteristics. In image process-ing and analysis, the segmentation process is actually two related steps: detec-tion/recognition and segmentation/delineation [18]. Detection has the goal of lo-cating and distinguishing a desired region from its surroundings. Segmentation isoutlining (delineating) the spatial boundaries of an object in the image: drawing abarrier between foreground and background. With these goals in mind, segmen-tation algorithms are usually based on the similarity of pixels within a group andvariation of pixels across different groups. In the context of medical imaging, ob-ject segmentation is useful for disease visualization as well as determining burdenand prognosis.Many conventional PET tumor segmentation algorithms have been developedfor accessing a variety of cancers. Due to the heterogeneity of lymphoma, spe-8cialized techniques targeting this application is an advancing area of study strivingto overcoming many challenges. Firstly, lymphoma can be present throughout thebody with large variations in FDG uptake which make it difficult to distinguishfrom high uptake organs like the liver and bladder [9]. Secondly, there is widevariation in the size of tumors, from small pea sized lumps to large tumors litresin volume [22]. Finally, due to lack of full lymphoma tumor delineation in clinics,realistic evaluation of segmentation methods is extremely difficult. Obtaining alarge enough data set would require physicians to volunteer many hours of tediouswork. In light of this, many studies compare results to phantom studies (modelsmiming real clinical situations with synthetic lesions) which are extremely helpfulbut limited in scope [5]. Nonetheless, these algorithms are highly transferable andimportant for lymphoma tumor segmentation.9Figure 2.3: Adapted from [13]. Schematic illustrating two steps of imagesegmentation. A) Recognition(detection) both includes distinguishingan area from its surroundings and determining its relative location. B)Delineation involves defining the spatial boundaries of the desired re-gion.2.3 Image Pre-processingReconstruction and smoothing of PET images is a common pre-processing step toameliorate the effects of noise and blurring [18]. Some studies have demonstratedthat PET noise closely resembles a Gaussian distribution which can be used to filterand refine the images [23]. The image filtering process consists of a convolutionbetween the starting image and a kernel defined by the 2D guassian function:G(x,y) =12piσex2+y22σ2 (2.3)centered at zero with standard deviation σ . The resultant value of each pixel iscalculated using the weighted average of its neighbouring pixels and this Gaussiankernel [24]. The size of this kernel is related to the preservation of the original im-10Figure 2.4: Taken from [25]. The probability distribution of a Guassian (Nor-mal) distribution centered at mean µ . 68% of the data is within 1 stan-dard deviation, 95% is within 2 standard deviation, 99.7% is within 3standard deviations.Figure 2.5: Taken from [26]. An example application of an Gaussian filter toPET phantom image from left to right with (a) σ = 0.5, (b) σ = 1.5,and (c) σ = 2.5.age: a length of greater than 3σ preserves over 99% of the original image, a lengthof 2σ preserves about 96%, and a length of σ preserves about 68%. Choosing theright length is vital since there is a trade off between getting smoother images andpreserving detail.Background correction can also be applied to improve non-uniform back-ground illumination [16]. In particular, the Morphological Opening Operation usesa combination of erosion and dilation operations to preserve objects in an image.A defined boundary is filled with structuring elements (small geometry templates)11Figure 2.6: Taken from [16]. Illustration of the Morphological Opening Op-eration. The structuring element B (green circle) is first rolled all aroundImage A (orange triangle) to create the rough shape of A in terms of B(left). Erosion completely eliminates elements in A lighter/narrowerthan B (middle). Dilation restores the shape of A (green shape). Thefinal image A has a smoother contour than the original image.that specify the shape of the foreground area. The structuring element is thenplaced at all possible locations in the image where comparisons are made for thepixels around each element. Light features smaller than the structuring elementare first completed eliminated by the erosion operation then the overall shape ofthe image is restored by the dilation operation. This smooths the contours and re-moves the smaller narrow details like white noise.12Figure 2.7: Taken from [26]. An example application of an MorphologicalOpening Operation to remove inhomogeneous background. From leftthe right: (a) The original Image and structuring element (green) of size9 × 9 pixels, (b) Background obtained after applying opening operation,and (c) Resulting image with uniform background after background im-age is subtracted from original image.2.4 Conventional Segmentation AlgorithmsBased on existing studies in the field of PET image segmentation and inspired bythe paper (Weisman et al., 2020), the following four categories of conventionalsemi-supervised segmentation methods for automatic segmentation will be used:2.4.1 ThresholdingA commonly used and simple group of techniques is thresholding, which catego-rizes pixels into above or below a set intensity. A grey-scale image is convertedinto a binary image by defining pixels with an intensity greater than some valueas the foreground and everything else as the background. A fixed 40%SUVmax and50%SUVmax are the most common binary thresholds used clinically [18].Beyond having a single fixed threshold, there is the Iterative ThresholdingMethod (ITM) which determines the most optimal intensities to be applied forthresholding [28]. This method first segments the image using some initial esti-mate then tries to minimize the variance in clusters by calculating new thresholdvalues and the corresponding new variance [18]. For the preliminary estimate, the13source-to-background ratio (S/B) is used to initialize a threshold [29]:T1(B/S) = 61.7%B/S+31.6% (2.4)The threshold value is then updated using the following equation until values sta-bilize and there is less than 5% change (or another stopping condition) in newthreshold values:T (V,B/S) = 7.8%(mL/V )+61.7%B/S+31.6% (2.5)This method does not require an estimation of the lesion volume prior to analysisbut only uses the source to background ratio (S/B) obtainable from any PET image.14Figure 2.8: Taken from [29]. Illustration of ITM: (A) True volume and op-timum threshold curve to determine initial threshold, (B) Source/Back-ground ratio of original image, (C-G) threshold iteration until valuesstabilize, and (H) the final segmented volume.2.4.2 ClusteringClustering techniques use the similarity of groups of pixels to cluster them together.These can be further divided by changing the definition of a cluster and methodsof optimizing groupings [30]. There are two widely used clustering algorithms:K-means clustering and Fuzzy C-means clustering. K-means uses a predeterminedk number of clusters and the cluster means for grouping. The centroid of eachgroup is calculated and pixels are assigned to the group whose centroid is closestspatially. Iterations of regrouping occur until intra-cluster dispersion is minimized.This is the simplest of the two with limited application to lymphoma data [5].Fuzzy C-means clustering (FCM) is similar to K-means clustering but allowsobjects to belong to more than one cluster. With the introduction of a new term15Figure 2.9: Taken from [31]. Simple illustration of fixed value K-means clus-tering and Fuzzy C-Mean. Three clusters were defined for both meth-ods.m which defines the number of clusters one point can have a membership, themethod computes the percent (a value from 0 to 1) to determine the degree of be-longing each pixel is to a group. Through iterations, the goal of fuzzy clustering isto make members of one cluster as similar as possible while making members ofseparate clusters as dissimilar as possible. This method results in well separated,spherically-shaped clustering. Allowing points to span more than one group ac-counts for clusters to be a mixture of many different tissue types which makes itmore ideal for segmenting PET images [18].2.4.3 Active ContoursActive counters (AC) is a more mathematically involved technique that relies onmanipulating a curve subject to certain constraints in order to detect the desiredimage [32]. Its merit lies in detecting objects without well-defined boundaries.Manipulation of the curve is described by the energy function which consists ofthe internal and external energies [18]. The internal energy term is utilized to en-sure the curve remains smooth while the external term adjusts to satisfy the desiredfeatures of the curve (gradient, texture, edges). For use in this thesis, this energyfunction is treated as a black box. A slight difficulty of the classic active contoursmethod is the need for an initial contour that is as close to the object of interest as16Figure 2.10: Taken from [33]. Visualization of localized consideration ateach point along the contour. A ball is considered at each point alongthe initial curve (green) and split by the contour into local interior andlocal exterior regions. In both images, the point x is represented bythe small yellow dot. The local neighborhood is represented by thelarger red circle. In the left, the local interior is the shaded part of thecircle and on the right, the shaded part of the circle indicates the localexterior.possible. Thus the active contours method is often implemented after an initial seg-mentation method so the initialized curve is already somewhat close to the regionof interest.Localized Active Contours is a region-based implementation of active con-tours. Instead of looking at the entire image, the foreground and background aredefined in smaller localized neighbourhoods. This removes the assumption thatthe area of interest can be described by purely global statistics (e.g. the brightestor darkest spots) [33]. The use of these local regions modifies the single energy ofa curve to be a family of local energies at each point along the moving curve. Theoptimization criteria considers each point individually, moving the contour basedon a point’s local region.2.4.4 Region GrowingRegion growing starts with a seed point in the object of interest and expands theforeground region based on a defined criteria. This algorithm rests on the assump-tion that regions of interest are relatively homogeneous with similar or slowly vary-17Figure 2.11: Taken from [35]. Example of region growing starting from aseed point inside the object of interest. The ROI is grown by addingsurrounding pixels that satisfy a specified similarity condition com-pared to the seed.ing intensity values [18]. Normally, region growing is a manual segmentation tech-nique that requires the user to visually select a seed point because accuracy of thistechnique is highly dependent on this initialization and the expansion condition.Many workarounds to this difficulty have been presented including an iterativetechnique that finds the optimal confidence interval for the algorithm [34]. Regiongrowing methods have shown good results for homogeneous regions of interestwith less success when it comes to more heterogeneous features. The techniqueis also not designed to handle multiple lesion detection. An additional challengeof region growing for tumor detection is the partial volume effect. This limits thestructural information of the objects which may eliminate the existence of a sharpboundary separating the ROI from the background.2.5 Segmentation EvaluationThere are four common metrics to evaluate the performance of segmentation schemes.The most widely used is likely the Dice Similarity Coefficient (DSC). Dice scoresimply measures the percent spatial overlap between two images. Given the areaof two regions, A and B, for comparison, the formula is:DSC(A,B) =2|A∩B||A+B| (2.6)18Figure 2.12: Taken from [36]. Illustration of calculating the DICE coefficient(DSC).where A∩B indicates the overlap of the two regions.Similarly, the Jaccard Similarity Coefficient (J) is defined as the size of theintersection divided by the size of the union of two regions.J(A,B) =|A∩B||A∪B| (2.7)19Figure 2.13: Taken from [37]. Illustration of calculating the Jaccard Similar-ity Coefficient (J).For more complex shapes, geometric metrics can quantify the similarity ofcontours. The Hausdorff distance (HD) measures the spatial distance between twoboundaries. Informally, it is the greatest distance from one point in one set to thecloset point in the comparison set. The Hausdorff distance has the mathematicaldefinition:dH = max{supx∈Xd(x,Y ),supy∈Yd(X ,y)} (2.8)where d(x,Y ) = miny∈Y (x,y).Figure 2.14: Taken from [38]. Illustration of finding the Hausdorff distance(HD) of two curves.20Lastly, the Structural Similarity Index (SSIM) measures the similarity of im-ages based on luminance, contrast, and structure[39]. The resulting metric is avalue between 0 and +1 with +1 meaning two images are the same (or very sim-ilar) and 0 meaning they are completely different (or very different). Given twoimages A and B, SSIM is calculated by:SSIM(A,B) =(2µAµB+ c1)(2σAB+ c2)(µ2A+µ2B+ c1)(σ2A +σ2B + c2)(2.9)where µ indicates average, σ2 variance, σAB covariance, and c1,c2 stability con-stants [39]. Thus, an accurate segmentation result would achieve a high DSC valueand Jaccard coefficent (high regional overlap) and a low HD value (high shape sim-ilarity) with high SSIM.21Figure 2.15: Taken from [40]. Schematic illustrating how to calculate the theStructural Similarity Index (SSIM) of two images defined by Signal xand y.22Chapter 3Methods3.1 Patient Information and Ground TruthThis study utilized PET/CT images of 38 adult patients with Diffuse Large B-cell Lymphoma (DLBCL) from the Qurit Lab’s South Korean collaborators. Afterundergoing ethics training, image data was obtained from Synaptic Medical ImageDrive for analysis.Lymphoma lesions were segmented manually by a nuclear medicine physi-cian on PET/CT images using the following workflow:1. Select sights consistent with DLBCL lesionsBased on:• Areas of high FDG uptake• Known patterns of DLBCL• Patient history (past scans)Tumors display more significant FDG uptake than surrounding healthy tis-sue, normally making them brights spots in dimmer regions. Using thisinformation and the disease characteristics of lymphoma, a physician cannarrow down the sites tumors are likely going to be.232. Draw Volume of Interest (VOI) using best SUV threshold (determined visu-ally)In a clinical setting, it is standard to use a fixed percent threshold to first filterout the locations lesions will be. This method is reliable and reproducible,making it possible to have consistent results. The physician will normallyselect the threshold that visually best captures the boundary of lesions, givingthe closest shape and size.3. Manually delete normal regionsBased on:• Known areas of physiological high FDG uptake• CT images• Patient history (past scans)Fixed thresholding has a characteristic flaw of producing many false posi-tive. Since it is a purely intensity based segmentation scheme, highly activeorgans like the liver will always be selected. To deal with the false pos-itives, physicians must manually delete the regions that are not consistentwith lymphoma.ROIs were drawn on CT images and transferred to PET images after PET/CTcoregisration. The contours on the PET images were then converted from RT-structformat to Nifty(.nii) contour masks for use in MATLAB R2020b (The Mathworks,Inc). These masks were used as the ground truth to evaluate the performance of thesegmentation algorithms.Before applying the algorithms, each image was normalized.3.2 PreprocessingTo test the effects of noise removal, the MATLAB Gaussian Filter function imgauss f iltfrom the Image Processing Tool Box was applied for various values of σ . The fil-ter was also applied using the Gaussian Filter function by Dirk-Jan Kroon to seeif there were discrepancies [41]. The images are filtered using a 2-D Gaussian24smoothing kernel with specified standard deviation. To remove the background,the MATLAB Morphological opening operation imopen was implemented fromthe Image Processing Tool Box. Optimization was done by changing the size ofthe disk structuring element.These pre-processing steps were implemented for each algorithm that per-formed the best in its category and compared with the algorithm with no pre-processing steps. A summary of pre-processing and segmentation parameters ex-plored can be found in Table 3.1.3.3 Segmentation AlgorithmsA total of six conventional segmentation algorithms were implemented as outlinedbelow. With these six classic methods, four hybrid methods were developed to tryto make up for some unique shortcomings. Since the work of this thesis will beapplied after detection, the following segmentation algorithms were implementedusing the physician ROI as the binding box of the PET images. A four to ten pixelboarder was given to the lesion mask to define the cropping of each image andsimulate a detection output. Boarders were chosen to avoid high uptake regions asmuch as possible while leaving enough background. The techniques were appliedon individual axial slices of the PET scans containing lesions.3.3.1 ThresholdingFixed percent thresholding is normally thought of as a semi-automatic segmenta-tion method since its clinical application requires the user to manually select theoptions that look best for the image. To make this method fully automatic, a pre-processing step was added to binary thresholds ranging from 20% to 60% SUVmax.Removing background noise can reduce the number of false positives since therewould be fewer (if any) small scattered regions of high intensity. The range of fixedpercent threshold values was chosen based on a clinical average of 40% SUVmaxand visual assessment of results.The iterative threshold method (ITM) was implemented as in (Jentzen et al.,252007). Inspired by its original implementation, the source was calculated usingthe maximum intensity of an image and the background the mean intensity. Thisinitialized the method using the formula:T1(B/S) = 61.7%×B/S+31.6% (3.1)where B/S is the background to source ratio. Using this initial estimate of thethreshold, a volume estimate using the area of the foreground and PET imagingparameters was estimated and used to iteratively update the thresholds using thefollowing formula:T (V,B/S) = 7.8%× (mL/V )+61.7%×B/S+31.6% (3.2)To prevent breaking of the formula when volumes are too small, an alternativeformula:T (B/S) = 61.7%×B/S+31.6% (3.3)was used when T became greater than 100%. A stopping criteria of 10 iterationswas used.When adding the pre-processing step, the fixed threshold parameter in theabove equations (bolded) were changed to see if results could be improved. SinceITM was not designed for PET images that have undergone background removal,the formulas were modified to adapt the algorithm for these modified images.3.3.2 ClusteringTwo clustering methods were assessed: Fuzzy C-means clustering (FCM) by (Bezdek,1981) and FRFCM by (Lei et al. 2018) [42] [43]. Fuzzy C-means clustering is sim-ilar to K-means but classifies pixels into groups based on “fuzzy” levels. Since PETimages have low resolution, PVE is especially prominent so allowing one point tobelong to more than one cluster is more useful. FCM was implemented using theMATLAB function f cm which returns an indexed matrix based on cluster number.Using this matrix, the cluster with the highest intensity was taken and smoothedto give the final segmentation contour. For FCM, the number of clusters was opti-26mized by looking at the DSC of each segmentation output.As FCM is sensitive to noise, local spatial information can be introduced toincrease robustness. Fast and Robust Fuzzy C-Means Clustering (FRFCM) wasused to test if such modifications would make an improvement to the performance.FRFCM first uses an image reconstruction technique to smooth the images thenuses membership filtering to classify pixels instead of computing the distance ofeach pixel to its neighbour. This significantly decreases computation time. Thecluster number was taken from FCM with a fixed radius of structuring element andfiltering window.3.3.3 Active ContoursActive Contours (AC) was implemented based on the localized active contoursmodel by (Lankton and Tannenbaum, 2008) and the MATLAB function by JinchengPang [33][44]. This version was used since using localized information obtainsbetter edge detection. The tunable input parameters were optimized visually bytesting segmentation results using DSC. The number of iterations was determinedafter seeing how long it took for the formula to stabilize.Since Active Contours relies on the initializing curve to obtain accurate re-sults, various starting curves were tested (as detailed in Hybrid Algorithms sec-tion). To implement AC without combining other methods, an initial curve thatwas the square of the boundary of the image was used.3.3.4 Region GrowingRegion growing was implemented based on the MATLAB function by Daniel Kell-ner which is a recursive expansion algorithm starting from a seed point inside theregion of interest [45]. The growth is defined by a confidence interval of imageintensity like in (Tan et al., 2017):I ∈ {m− fσ ,m+ fσ} (3.4)27where m and σ are the mean and standard deviation of the pixels in the currentregion and f determines the length of the interval. This f can be optimized in manyways and was decided after finding the value that achieved the highest DSC forthe most number of images. Expansion can also be based on a fixed percent of themaximum intensity pixel inside the ROI. This was also optimized iteratively.3.3.5 Hybrid AlgorithmsIt has been proposed that combining multiple segmentation methods may achievebetter, final partitions of an image [46]. Following this idea, four hybrid methodswere tested to see if better results could be achieved.Firstly, inspired by (Shyu et al., 2012), FRFCM and AC was combined. Thenumber of iterations was set to 100 just as in AC alone and the number of clustersthe same as FRFCM alone [47]. A dilation of 11 pixels of the FRFCM results wasused to give an inital mask to AC. This dilation and number of iterations was testedto ensure the AC curve would not completely miss the ROI.Secondly, ITM and AC was combined. The number of iterations was set to100 just as in AC alone. A dilation of 11 pixels of the ITM results was used to givean inital mask to AC. This dilation and number of iterations was tested to ensurethe AC curve would not completely miss the ROI as above.Third, an averaging algorithm of results from region growing and AC wasimplemented. This idea is based on the intuition that since region growing is anexpanding method and AC is a contracting method, they should meet in the middleat the location of the ROI. Both methods were initialized using 35% fixed thresh-olding with AC using a dilation of 9 pixels as the initial mask. Dilation of the maskwas chosen by trial and error, choosing the one that achieved the best AC results.Region growing used the maximum intensity of pixels inside each disconnectedinitial segmentation as the starting point. The resulting curves used the distancetransform (MATLAB bwdist) to determine the average curve. The optimized num-ber of averages taken between the curves was computed by analyzing DSC scoresof different iteration numbers.Finally, an ITM + AC + region growing hybrid method was implemented.28Like the algorithm detailed above, this method used ITM to initialize AC and lo-cate the seed points for region growing. AC once again used a dilation of 9 pixelsas the initial mask and region growing used the maximum intensity value insideeach disconnection ITM contour.3.4 Performance EvaluationMethods were first assessed on individual axial image slices, then lesion-level andpatient level.For each lesion, four segmentation performance metrics were computed asoutlined in the theory section. DSC, Jaccard Index, and SSIM were computedusing MATLAB functions from the Image Processing Toolbox package. HD wascalculated using the MATLAB function designed by Zachary Danziger [48]. Forall the metrics, the physician mask on each PET scan was taken as the ground truthfor comparison. Each method was evaluated based on the mean values in eachperformance metric. All graphing and subsequent statistical analysis was doneusing Google Sheets.29Table 3.1: A summary of pre-processing and segmentation parameters ex-plored. Values attempted are based on literature suggestions and trial anderror. Final values were chosen based on mean DSC of the segmentationresult to access delineation accuracy.30Chapter 4ResultsIn total, there were 38 patients with 100 individual lesions. The average lesion was31.3 mL with the largest being 403 mL and the smallest 0.299 mL. The number oflesions per patient also ranged from 1 to 34. Of these total lesions, 10 were dis-carded because the physician ROI was empty, the conversion of RT-struct to Niftyclearly altered the contour location, or the mask was too small (an area of one totwo pixels). Each lesion was made up of 1-120 axial slices.4.1 Segmentation ResultsEach segmentation scheme was tested on 15 lesions randomly chosen from the setto determine the best performing algorithms. Of these, the best performing methodfrom each category was analyzed deeper.4.1.1 Segmentation Parameter OptimizationFor fixed thresholding, the percent with the best mean performance was 35%. How-ever, there appeared to be a heavy volume dependence for all other percents. DSCwent down for larger lesions as the threshold value went up. 35% appeared to bethe point of convergence where there was no huge deviation in results for differentvolumes. After applying the pre-processing scheme, 35% fixed threshold still hadthe best mean DSC performance.31For ITM, the formula defined by (Jentzen et al. 2007) had the best perfor-mance on its own. However, lowering the fixed threshold starting point for theiteration formulas from 31.6% to 25% achieved better results when pre-processingwas initially performed. When coupled with AC and region growing, this thresholdwas set to 32% since it gave slightly better mean DSC than 31.6%.For fuzzy c-means clustering, the optimal number of clusters was 3 while forFRFCM, the optimal number of clusters was 8. For active contours, the optimalnumber of iterations was 100 across all applications with an energy coefficient of0.001 and coefficient of balance 0.01. Stabilization of the AC curve was most likelyfor larger lesions. For smaller round lesions, the boundary would frequently shrinkto produce an empty result. For region growing, the optimized expansion conditionwas determined to be 25% of the maximum intensity and a f of 1.5. These param-eters were also the optimized ones for the hybrid method.4.1.2 Segmentation AccuracyOverall, the methods had variable results with fixed thresholding performing veryconsistently and 35% having the best mean score in all four metrics. ITM camein second place with much better results for larger lesions which is consistent withits original implementation in (Jentzen et al. 2007). Active contours had the worstperformance and was most sensitive to PVE. FRFCM also missed many lesionsand performed worse than FCM alone. The hybrid methods ITM+AC and FR-FCM+AC had similar results, with ITM+AC performing better for smaller lesionsthan FRFCM+AC. The first averaging function using fixed threshold as the initial-ization curve performed worse than using ITM as the initialization curve. Of thehybrid methods, ITM + AC + region growing performed the best but still did nothave a mean DSC higher than the best of the conventional methods.Small lesions in high uptake regions were especially difficult to segment bothbecause of low image resolution and because there is a lot of noise from hetero-geneous uptake of the normal organ which resembles the lesions. This resulted insegmentation that was significantly larger than the desired region and often com-plete oversight of the region all together. Without organ spatial information or a32very tight cropping of the suspected lesion area, it was not possible to obtain goodresults on images that contained parts of high uptake organs. These cases resultedin very low DSC or a DSC of 0.The best four methods turned out to be fixed 35% threshold and ITM with ad-ditions of pre-processing. Although the mean DSC was 0.65 for the best method,this is close to the DSC of 0.70 for inter-physician variability as reported in (Weis-man et al. 2020). All other methods had average DSC ranging from 0.36 to 0.57.33Figure 4.1: Lesion-level segmentation performance for top 4 segmentationmethods. Round circles indicate the mean value of metric for 90 lesionssegmented by the respective algorithm. Methods are ordered from leftto right by mean Dice coefficient (DSC), mean Jaccard Index, meanHausdorff distance (HD), and mean Structural Similarity Index (SSIM).34Figure 4.2: Patient-level segmentation performance for top 4 segmentationmethods. Round circles indicate the mean value of metric for 38 patientssegmented by the respective algorithm. Methods are ordered from leftto right by mean Dice coefficient (DSC), mean Jaccard Index, meanHausdorff distance (HD), and mean Structural Similarity Index (SSIM).354.1.3 Volume DependenceAs noted in other segmentation methods, the performance of the algorithms usuallyhave some dependence on the area of the ROI [29][49]. This can be seen in Fig-ure 4.5 where each method appeared to have an optimal range where it achievedthe best performance. Although ITM+AC+region growing only had a mean le-sion segmentation DSC of 0.46, this hybrid method showed the best performance(DSC=0.74) for axial lesion cross section areas greater than 4mL. Thresholdingmethods tended to perform better for a more narrow range of lesion volumes inthe given data set. Although they performed worse for larger lesions, since therewere fewer of these, the average DSC was pulled up by the results of smaller le-sions. Since for larger regions, all methods tended to underestimate the ROI, apost-processing dilation of the segmentation results was also tested. As expected,the results for larger lesions increase in DSC but the mean went down because themajority of lesions are quite small. For instance, the addition of post-processing toITM decreased mean DSC from 0.60 to 0.59 while the standard deviation went upfrom 0.17 to 0.21. No other method show the same strong volume dependence. Asillustrated by Figure 4.4, the threshold methods tend to have decreased mean DSCfor larger lesions. FCM does not seem to have a clear pattern of performance withsignificant variation in results regardless of lesion size.36Table 4.1: Summary of optimal ranges of lesion volumes for top segmenta-tion methods. For the methods that appeared to have a distinct volume de-pendence, the widest possible range with the highest mean DSC was cal-culated to determine the optimal range of performance for each method.Volume range is per axial slice of PET scan which was calculated us-ing the area of the region multiplied by the separation distance betweenimages obtained from PET image settings. V = (Area×0.003)mL37Figure 4.3: Segmentation results based on lesion size (mL) in individual axialtumor slices. ITM+AC+region growing had the largest improvement inperformance as size increased (as illustrated by the logarithmic curve).The other methods, with fitted hyperbolic curves, tended to have bet-ter performance in the middle range of lesion sizes. FCM had no dis-cernible pattern of performance and thus was not fitted to a trend line.38Figure 4.4: Example lesion segmentation results. Five tumors in axial slicescontaining the tumors with ground truth from Physician (green) and re-sults from indicated automated methods (red). Methods ordered fromleft to right by Hybrid method (ITM+AC+region growing), fixed 35%thresholding with pre-processing (BR=background removal), and fixed35% thresholding alone.39Figure 4.5: Example lesion segmentation results. Five tumors in axial slicescontaining the tumors with ground truth from Physician (green) and re-sults from indicated automated methods (red). Methods ordered fromleft to right by ITM with pre-processing (BR=background removal),original ITM, and FCM with pre-processing.404.2 Combination with AI-based DetectionThe methods developed are highly applicable to segment lesions detected usingAI based methods. To this end, the following combination of AI and automaticsegmentation methods was conducted to validate this proposed methodolgy. A YouOnly Look Once (YOLO) v3 framework was trained using publically accessiblenon-small cell lung carcinoma (NSCLC) PET images from The Cancer ImagingArchive (TCIA) to predict bounding boxes around lesions. Using these boundingboxes, the hybrid iterative thresholding method (ITM) and active contour (AC)was implemented in the area enclosed. Although only two images were tested,there was very good overlap between the ground truth physician delineation andITM+AC contours (displayed by DSC > 0.9), indicating promising results.41Figure 4.6: Taken from [50]). A summary of the lesion detection and seg-mentation results for the NSCLC axial PET images. (a) and (d) showtwo representative axial cross-sections of the NCSLC PET scan, withthe lesions delineated by a physician (in red). (b) and (d) show the cor-responding YOLO v3 predicted bounding boxes (in green) around thelesion region. (c) and (f) show the corresponding segmentation con-tours using the ITM + AC method (in red) and ground truth contours (ingreen)inside the YOLO predicted ROIs, with the Dice coefficients 0.94and 0.95, respectively.42Chapter 5Analysis and DiscussionAn accurate estimate of lymphoma tumor volume is closely correlated with betterpatient outcomes [12]. In both lesion and patient level assessments, the most sim-ple method, fixed thresholding, was able to achieve the highest mean segmentationperformance with the addition of simple pre-processing steps. It may be surpris-ing that fixed 35%SUVmax thresholding achieved better accuracy than ITM whichwas designed to take volume and background/source information into considera-tion. In theory, this should make a more robust thresholding algorithm. However,these results may be partially explained by the construction of the ground truthwhich is based on the provided workflow from a physician. In this workflow, theROIs are selected using the threshold value that fits best visually. This is nor-mally 40%SUVmax whereas ITM outputted a range of thresholds from 32%SUVmaxto 79%SUVmax. It does seem plausible that a segmentation algorithm close to theclinical average threshold value would produce the most consistent results. Fur-thermore, there are potential limitations to the use of physician delineations as theground truth. These contours are on CT scans but the transfer of information fromCT to PET scans is not perfect. The format and timing of PET and CT imageswill not always overlap so mismatches are possible [49]. In light of this, a bettercomparison may be possible if the segmentation results on the PET scans werefused to CT scans and compared to physician results in their original modality. Al-though complicated, this may result in a more consistent performance evaluationof segmentation results.43Out of the conventional methods tested, fixed 35% thresholding returned themost consistent results with a bulk of DSC scores near 0.60. The addition ofpre-processing also helped to remove false positives, making fixed thresholdingmore robust. These findings agree with Weisman (2020) in which a median leison-level DSC of 0.60 was reported using a 40% fixed SUVmax threshold on lymphomadata which is comparable to 0.57 found in this study. Although, ITM was de-termined to be the conventional method with the best performance in the Weis-man (2020) study, only 40% and 50% fixed thresholding were compared with nopre-processing steps on multiple sets of different lymphoma data [5]. For moreadvanced methods like FRFCM and AC, the results would vary after each im-plementation. These methods use spatial information to deliver results, but thelack of consistency and reproduciblity resulted in lower mean performance metrics[33][43].ITM was prone to underestimation since it produced threshold results rang-ing from 32% to 79%. As mentioned in Jenzten (2007), inhomogeneous activitydistribution in PET images will cause ITM to underestimate region volumes. Thiswas the case for several images so background removal was introduced. Althoughthis increased performance for smaller volumes, larger lesions were drastically un-derestimated since parts of the volume close to the background intensity would beremoved. As seen after further analysis, the optimal range for ITM based methodsappeared to be larger than 2mL but less than 15mL while the mean single slice axiallesion volume was 2.5mL. This meant less than 50% of the total data set was suitedto segmentation using ITM. Small lesions also had very small physician masks soa contour with a slightly different radius decreased the DSC significantly. Thesesmaller regions are more heavily affected by PVE which meant they were prone tooverestimation and produced the lowest DSC scores in the data set.It is important to note that all the methods are highly sensitive to the initialboundary box. As seen when bounding box coordinates were given by YOLO,the DSC scores were > 0.9, which implies close cropping of the ROI that mini-mizes noise and excludes external organs produce much better results. Since thesesegmentation methods do not use organ spatial information, input images whichcontained high uptake regions would come up as false positives, frequently return-ing a DSC of zero. Additionally, all the algorithms use intensity information in44some way so the existence of any uncharacteristic bright spots would result in poorsegmentation results. Although background removal steps were performed, thesecould not remove larger components like parts of the liver, bladder, and bone mar-row. A very low DSC was also achieved when there were multiple lesions/closedcontours in an image. All the methods tended to miss the lesion with lower in-tensity. These events contributed to a large number of low DSC scores but due tothe heterogeneity of the data, it is difficult to say how big of a role this played inoverall performance metrics. The initial input image is a confounding factor that ischallenging to discern from the performance of segmentation methods.A limitation of this work is that only six conventional segmentation methodswere implemented. There are many more that use much more sophisticated com-putations to achieve better results but due to limited resources, they could not allbe tested. Within the conventional algorithms tested, it was not possible to test allthe proposed optimization protocols. Parameters were mostly chosen by reviewingrecent publications, trial and error, and experimentation on a small subset of thedata. There are many studies proposing more systematic ways to tune algorithmsbut they could not all be attempted. Furthermore, since lymphoma is such a hetero-geneous disease, the sample data used may not be representative of the populationand thus the results may not be a representative analysis of the overall segmenta-tion performance.5.1 Future WorkAs presented by the results, a simple pre-processing step can increase the perfor-mance of conventional segmentation algorithms. Hybrid methods will performwell for a higher range of tumor volumes but thresholding remains one of the mostconsistent and simple methods. This reinforces the advantages of developing adap-tive thresholding methods, suggesting a more comprehensive study of results mayproduce a more robust algorithm for choosing the best thresholding value based onPET statistics.Since the overarching aim of this thesis and accompanying work is to developan automatic lymphoma lesion segmentation algorithm, further research into com-45bining AI based detection methods with conventional segmentation methods willbe ongoing. There is also still room for improvement as shown by the segmenta-tion performance of even the four best segmentation techniques. To this end, moremethods will be tested to try and determine better algorithms for small lesions withmore consistent segmentation output.46Chapter 6ConclusionAlthough one of the most simple techniques, the work of this thesis suggests a fixed35% threshold algorithm will give the most accurate and consistence segmentationresults for this DLBCL data set. For lesions larger than 4mL, hybrid algorithmslike ITM+AC+region growing will produce the best results. Hybrid algorithmswill also obtain better results when used after AI-based detection algorithms toprovide a bounding box of the lesion. These results encourage the combination ofconventional methods to achieve more specialized and robust algorithms.In this dissertation, automatic image segmentation techniques that only re-quire a ROI were developed to aid a fully automatic PET/CT lymphoma seg-mentation pipeline. In addition to this, important observations about conventionalsegmentation algorithms were noted in hopes of improving these well establishedmethods. Fixed thresholding methods provided consistent results with worse per-formance for larger lesions, while hybrid methods performed well for larger, ho-mogeneous lesions. It was also noted that a simple addition of pre-processing tothe fixed threshold method will result in more accurate delineations. Overall, thetechniques explored provide one step forward towards a fully automatic segmenta-tion pipeline for PET/CT lymphoma images to ease physician burden and improvepatient care.47Bibliography[1] Canadian Cancer Society. (n.d.). Non-Hodgkin lymphoma. RetrievedNovember 1, 2020. https://www.cancer.ca/en/cancer-information/cancer-type/non-hodgkin-lymphoma/non-hodgkin-lymphoma/?region=on.→ pages 1[2] Canadian Cancer Society. (2020). Canadian Cancer Statistics. RetrievedNovember 1, 2020. https://www.cancer.ca/en/cancer-information/cancer-101/canadian-cancer-statistics/?region=on.→ pages 1[3] Leukemia Lymphoma Society of Canada (LLSC). (n.d.). Non-Hodgkinlymphoma (NHL) Retrieved November 1, 2020https://www.llscanada.org/lymphoma/non-hodgkin-lymphoma?src1=20045src2=. → pages1[4] Mayo Clinic. (n.d.). CT scan. Retrieved March 1, 2021.https://www.mayoclinic.org/tests-procedures/ct-scan/about/pac-20393675.→ pages 2, 7[5] Weisman A.J., Kieler M. W., Perlman S., Hutchings M., Jeraj R., KostakogluL., Bradshaw T. J. Comparison of 11 automated PET segmentation methods inlymphoma. Phys Med Biol. 2020 Nov 27;65(23):235019. doi:10.1088/1361-6560/abb6bd. PMID: 32906088. → pages 2, 4, 9, 15, 44[6] Kluge, R., Chavdarova, L., Hoffmann, M., Kobe, C., Malkowski, B.,Montravers, F., Kurch, L., Georgi, T., Dietlein, M., Wallace, W. H., Karlen, J.,Ferna´ndez-Teijeiro, A., Cepelova, M., Wilson, L., Bergstraesser, E., Sabri, O.,Mauz-Ko¨rholz, C., Ko¨rholz, D., Hasenclever, D. (2016). Inter-ReaderReliability of Early FDG-PET/CT Response Assessment Using the DeauvilleScale after 2 Cycles of Intensive Chemotherapy (OEPA) in Hodgkin’s48Lymphoma. PloS one, 11(3), e0149072.https://doi.org/10.1371/journal.pone.0149072 → pages 2[7] Johnson P. W. (2016) Response-adapted frontline therapy for Hodgkinlymphoma: are we there yet? Hematology Am Soc Hematol Educ Program.2016;2016:316-322. → pages 2[8] Ansell S. M. and Armitage J. O. (2012) Positron emission tomographic scansin lymphoma: convention and controversy. Mayo Clin Proc.2012;87(6):571-580. doi:10.1016/j.mayocp.2012.03.006→ pages 2[9] Barrington S. F. and Meignan M. (2019) Time to Prepare for Risk Adaptationin Lymphoma by Standardizing Measurement of Metabolic Tumor Burden JNucl Med 60 1096-102→ pages 3, 9[10] Boualle`gue F. B., Tabaa Y.A., Kafrouni M., Cartron G., Vauchot F.,Mariano-Goulart D. (2017) Association between textural and morphologicaltumor indices on baseline PET-CT and early metabolic response on interimPET-CT in bulky malignant lymphomas. Med Phys. 2017;44:4608-4619. →pages 3[11] Cottereau A. S., El-Galaly T. C., Becker S., Broussais F., Petersen L. J.,Bonnet C., Prior J. O., Tilly H., Hutchings M., Casasnovas O., Meignan M.(2018). Predictive Value of PET Response Combined with Baseline MetabolicTumor Volume in Peripheral T-Cell Lymphoma Patients. Journal of nuclearmedicine : official publication, Society of Nuclear Medicine, 59(4), 589–595.https://doi.org/10.2967/jnumed.117.193946 → pages 3[12] Mettler J., Muller H., Voltin CA, et al. Metabolic Tumour Volume forResponse Prediction in Advanced-Stage Hodgkin Lymphoma. J Nucl Med.2018. → pages 4, 43[13] Weisman A. J. Automatic Quantification and Assessment of FDG PET/CTImaging in Patients with Lymphoma, Doctoral Dissertation, Department ofPhilosophy, University of Wisconsin-Madison, Madison, WI, 2020 → pagesvii, 3, 4, 10[14] The Johns Hopkins University. (n.d.). Positron Emission Tomography (PET)Retrieved November 7, 2020.49https://www.hopkinsmedicine.org/health/treatment-tests-and-therapies/positron-emission-tomography-pet. → pages6[15] Buchpiguel C. A. Current status of PET/CT in the diagnosis and follow upof lymphomas.Revista brasileira de hematologia ehemoterapia,2011;33(2):140–147. → pages 6[16] Gonzalez R. C. and Woods R. E. (2018) Digital Image Processing, 4thEdition, Pearson, New York, NY, 2018 → pages viii, 6, 11, 12[17] Kinahan P. E. and Fletcher J. W. PET/CT Standardized Uptake Values(SUVs) in Clinical Practice and Assessing Response to Therapy.Seminars inultrasound, 2010;31(6):496–505 → pages 6[18] Foster B., Bagci U., Mansoor A., Xu Z., and Daniel J.. Mollura. A Reviewon Segmentation of Positron Emission Tomography Images.Comput Biol Med.2014 July 1;0:76–96. → pages 6, 7, 8, 10, 13, 16, 18[19] Hofman M. S. and Hicks R. J. How We Read Oncologic FDG PET/CT.Cancer Imaging 16, 35 (2016). https://doi.org/10.1186/s40644-016-0091-3 →pages 7[20] Hicks R. J. (2012). Should positron emission tomography/computedtomography be the first rather than the last test performed in the assessment ofcancer?. Cancer imaging : the official publication of the International CancerImaging Society, 12(2), 315–323.https://doi.org/10.1102/1470-7330.2012.9005 → pages 7[21] Soret M.,Bacharach S. L., and Buvat. I. Partial-Volume Effect in PET TumorImaging, Journal of Nuclear Medicine, 2007. → pages vii, 7, 8[22] American Society of Clinical Oncology (ASCO). (n.d.). Lymphoma -Hodgkin. Retrieved April 1, 2021.https://www.cancer.net/cancer-types/lymphoma-hodgkin/stages. → pages 9[23] Mou T., Huang J., and O’Sullivan F., The Gamma Characteristic ofReconstructed PET Images: Implications for ROI Analysis, IEEE Transactionson MedicaL Imaging, Volume 37 No.5, 2018. → pages 10[24] Patrice (2010) Gaussian Filtering - 6 [PowerPoint slides]. Microsoft Power-Point.https://www.cs.auckland.ac.nz/courses/compsci373s1c/PatricesLectures/GaussianFiltering1up.pd f .→ pages 1050[25] Michael Galarnyk. (n.d.). Normal Distribution. Towards Data Science.Retrieved April 2, 2021. https://towardsdatascience.com/understanding-the-68-95-99-7-rule-for-a-normal-distribution-b7b7cbf760c2. → pages viii,11[26] Cupparo I. A Region Growing and Fuzzy C-means algorithm segmentationfor PET images of head-neck tumours, Masters Thesis, Department of Physicsand Astronomy, Universita` di Bologna, Via Zamboni, Italy 2018. → pages viii,11, 13[27] OpenCV. (n.d.). Morphological Transformations. Retrieved April 5,2021.https://docs.opencv.org/master/d9/d61/tutorialpymorphologicalops.html.→ pages[28] Dong L., Yu G., Ogunbona P., and Li W. An Efficient Iterative Algorithm forImage Thresholding.Pattern Recognition Letters 29. 2008;1311–1316. →pages 13[29] Jentzen W, Freudenberg L, Eising E G, Heinze M, Brandau W and BockischA 2007 Segmentation of PET volumes by iterative image thresholding.J NuclMed 2007;48:108-14 → pages viii, 14, 15, 36[30] Tan S, Li L, Choi W, Kang M K, D’Souza W, and Lu W. (2017). Adaptiveregion-growing with maximum curvature strategy for tumor segmentation in18 F-FDG PET.Physics in Medicine Biology. 62. 5383-5402.10.1088/1361-6560/aa6e20. → pages 15[31] Dalmaijer E., Nord C., and Astle D. (2020). Statistical power for clusteranalysis. → pages viii, 16[32] Chan T. F. and Vese L. A. Active contours without edges IEEE Transactionson Image Processing. 2001;10:266-77 → pages 16[33] Lankton S. and Tannenbaum A. Localizing region-based active contours.IEEE Trans Image Process. 2008;17(11):2029-2039.doi:10.1109/TIP.2008.2004611 → pages ix, 17, 27, 44[34] Tan S, Li L, Choi W, Kang M K, D’Souza W and Lu W. (2017). Adaptiveregion-growing with maximum curvature strategy for tumor segmentation in18 F-FDG PET. Physics in Medicine Biology. 62. 5383-5402.10.1088/1361-6560/aa6e20. → pages 1851[35] Marshall, D. (n.d.). Region Growing. Retrieved April 15,2021.https://users.cs.cf.ac.uk/Dave.Marshall/Visionlecture/node35.html.→ pages ix, 18[36] Tiu E. (2019, August 9) Metrics to Evaluate your Semantic SegmentationModel. Towards Data Science. Retrieved April 1,2021.https://towardsdatascience.com/metrics-to-evaluate-your-semantic-segmentation-model-6bcb99639aa2. → pages ix,19[37] Rosebrock A. (2016, November 7) Intersection over Union (IoU) for objectdetection. Retrieved April 1,2021.https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/. → pages ix,20[38] Uccheddu F, Servi M, Furferi R, and Governi L. (2018). Comparison ofMesh Simplification Tools in a 3D Watermarking Framework. 60-69.10.1007/978-3-319-59480-47.→ pages ix, 20[39] Datta P. (2020, September 3). All about Structural Similarity Index (SSIM):Theory + Code in PyTorch. Retrieved March 15,2021.https://medium.com/srm-mic/all-about-structural-similarity-index-ssim-theory-code-in-pytorch-6551b455541e.→ pages 21[40] Datta P. (2020, September 3). All about Structural Similarity Index (SSIM):Theory + Code in PyTorch. Retrieved March 15,2021.https://medium.com/srm-mic/all-about-structural-similarity-index-ssim-theory-code-in-pytorch-6551b455541e.→ pages ix, 22[41] Kroon, D. J. (2010). Hessian based Frangi Vesselness filter. Retrieved April15, 2021.https://www.mathworks.com/matlabcentral/fileexchange/24409-hessian-based-frangi-vesselness-filter, MATLAB Central File Exchange. →pages 24[42] Bezdek, J. (1981). Pattern Recognition With Fuzzy Objective FunctionAlgorithms. 10.1007/978-1-4757-0450-1. → pages 26[43] Lei T., Jia X., Zhang Y., He L., Meng H. and Nandi A. K., Significantly Fastand Robust Fuzzy C-Means Clustering Algorithm Based on Morphological52Reconstruction and Membership Filtering, in IEEE Transactions on FuzzySystems, vol. 26, no. 5, pp. 3027-3041, Oct. 2018, doi:10.1109/TFUZZ.2018.2796074. → pages 26, 44[44] Pang J. (2014). Localized Active Contour. Retrieved February 15, 2021.https://www.mathworks.com/matlabcentral/fileexchange/44906-localized-active-contour, MATLAB Central File Exchange. → pages27[45] Kellner D. (2011). Region Growing (2D/3D grayscale). Retrieved April 5,2021. https://www.mathworks.com/matlabcentral/fileexchange/32532-region-growing-2d-3d-grayscale, MATLAB Central File Exchange. → pages27[46] Aljahdali, S. and Zanaty, E. A. Combining multiple segmentation methodsfor improving the segmentation accuracy, 2008 IEEE Symposium onComputers and Communications, Marrakech, Morocco, 2008, pp. 649-653,doi: 10.1109/ISCC.2008.4625766. → pages 28[47] Shyu K. K., Tran, T. T., Pham, V. T, Lee, P. L., and Shang, L. J. (2012).Fuzzy distribution fitting energy-based active contours for image segmentation.Nonlinear Dynamics. 69. 10.1007/s11071-011-0265-2. → pages 28[48] Danziger Z. (2013). Hausdorff Distance. Retrieved April 6, 2021.(https://www.mathworks.com/matlabcentral/fileexchange/26738-hausdorff-distance), MATLAB Central File Exchange. → pages29[49] Hansen S., Kuttner S., Kampffmeyer M., Markussen T, Sundset R, KjærnesØen S, Eikenes L, and Jenssen R, Unsupervised supervoxel-based lung tumorsegmentation across patient scans in hybrid PET/MRI, Expert Systems withApplications, Volume 167, 2021,114244, ISSN0957-4174,https://doi.org/10.1016/j.eswa.2020.114244. → pages 36, 43[50] Ahamed S. (2021) An automatic pipeline for lymphoma lesion detection andsegmentation in PET/CT images of patientswith diffuse large B-cell lymphoma(DLBCL), Research Plan Description by Shadab Ahamed → pages xi, 42[51] Yu J, Li X, Xing L, et al. Comparison of tumor volumes as determined bypathologic examination and FDG-PET/CT images of non-small-cell lungcancer: a pilot study. Int J Radiat Oncol Biol Phys. 2009;75:1468–1474. doi:10.1016/j.ijrobp.2009.01.019. → pages53Appendix ASupporting MaterialsPlease find the MATLAB codes used in this dissertation at the following Githublink:https://github.com/Bellinda12/UBCThesis2021PublicThe following package was used to convert RT-struct to .nii format:https://github.com/Sikerdebaard/dcmrtstruct2nii54