A Study of Methods for Learning Phylogenies of CancerCell Populations from Binary Single Nucleotide VariantProfilesbyEmily Ann HindalongB.Sc. Cognitive Systems, The University of British Columbia, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Bioinformatics)The University of British Columbia(Vancouver)August 2015c© Emily Ann Hindalong, 2015AbstractAn accurate phylogeny of a cancer tumour has the potential to shed light on numer-ous phenomena, such as key oncogenetic events, relationships between clones, andevolutionary responses to treatment. Most work in cancer phylogenetics to-daterelies on bulk tissue data, which can resolve only a few genotypes unambiguously.Meanwhile, single-cell technologies have considerably improved our ability to re-solve intra-tumour heterogeneity. Furthermore, most cancer phylogenetic methodsuse classical approaches, such as Neighbor-Joining, which put all extant specieson the leaves of the phylogenetic tree. But in cancer, ancestral genotypes may bepresent in extant populations. There is a need for scalable methods that can capturethis phenomenon.We have made progress on this front by developing the Genotype Tree rep-resentation of cancer phylogenies, implementing three methods for reconstructingGenotype Trees from binary single-nucleotide variant profiles, and evaluating thesemethods under a variety of conditions. Additionally, we have developed a tool thatsimulates the evolution of cancer cell populations, allowing us to systematicallyvary evolutionary conditions and observe the effects on tree properties and recon-struction accuracy.Of the methods we tested, Recursive Grouping and Chow-Liu Grouping appearto be well-suited to the task of learning phylogenies over hundreds to thousands ofcancer genotypes. Of the two, Recursive Grouping has the strongest and moststable overall performance, while Chow-Liu Grouping has a superior asymptoticruntime that is competitive with Neighbor-Joining.iiPrefaceCredit for the idea to apply Latent Tree Models (Chapter 3) to the problem oflearning phylogenies from single-cell cancer data goes to Dr. Hossein Farahani. Itwas my idea to apply Parsimony + Edge Contraction to the same (Chapter 3).Dr. Farahani also proposed the Genotype Tree representation of single-cellcancer phylogenies (Section 1.2.1). I developed the formal definition.Dr. Farahani advised me on the initial simulator design (Chapter 2), which Iimplemented in full. I added the constraint on convergent evolution and parametersfor mutation loss and dropout. Dr. Farahani provided the initial implementation fortree conversion (Section 2.3), which I re-implemented for stylistic consistency.Aside from the initial idea, most of the work described in Chapter 3 is myown, including implementation, adjustments to the algorithms, guarantees analy-sis, proofs, development of scoring metrics, and analysis of results. Dr. Farahanisuggested the paralinear distance metric and provided the code to compute it. Isought advice from Dr. Farahani and Adi Steif on simulator parameter settings.Chapter 3 Figures 3.1, 3.2, 3.3, and 3.4 are adapted with permission from ap-plicaple sources.There are no publications based on this work at this time.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xx1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 The Cancer Genome - An Evolving Landscape . . . . . . . . . . 21.2 Classical Phylogenetics . . . . . . . . . . . . . . . . . . . . . . . 31.2.1 Common Challenges . . . . . . . . . . . . . . . . . . . . 41.2.2 Distance-Based Methods . . . . . . . . . . . . . . . . . . 51.2.3 Character-Based Methods . . . . . . . . . . . . . . . . . 71.3 Cancer Phylogenetics . . . . . . . . . . . . . . . . . . . . . . . . 101.3.1 Across-Cancer Phylogenetics . . . . . . . . . . . . . . . 101.3.2 Within-Cancer Phylogenetics . . . . . . . . . . . . . . . 121.3.3 Inferring Phylogenies from Bulk SNV Data . . . . . . . . 131.3.4 The Promise of Single-Cell Sequencing . . . . . . . . . . 141.3.5 Limitations of Current Methods . . . . . . . . . . . . . . 161.4 Current Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17iv1.4.1 Representing Cancer Phylogenies as Genotype Trees . . . 171.4.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . 191.4.3 Project Overview . . . . . . . . . . . . . . . . . . . . . . 192 Simulating the Evolution of Cancer Cell Populations . . . . . . . . . 202.1 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.1.1 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 212.1.2 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.1.3 Converting the Single-Cell Evolutionary Tree to a Geno-type Tree . . . . . . . . . . . . . . . . . . . . . . . . . . 242.2 Analysis of Parameter Effects . . . . . . . . . . . . . . . . . . . . 252.2.1 Tree Topology . . . . . . . . . . . . . . . . . . . . . . . 252.2.2 Number of Homoplasies . . . . . . . . . . . . . . . . . . 262.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.3.1 Cell Division . . . . . . . . . . . . . . . . . . . . . . . . 302.3.2 Mutation Loss and Convergent Evolution . . . . . . . . . 302.3.3 Other Limitations . . . . . . . . . . . . . . . . . . . . . . 312.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Evaluation of Novel Methods for Learning Genotype Trees . . . . . 333.1 Reconstruction Methods . . . . . . . . . . . . . . . . . . . . . . 343.1.1 Terminology . . . . . . . . . . . . . . . . . . . . . . . . 343.1.2 Distance Metrics . . . . . . . . . . . . . . . . . . . . . . 363.1.3 Recursive Grouping . . . . . . . . . . . . . . . . . . . . 373.1.4 Chow-Liu Grouping . . . . . . . . . . . . . . . . . . . . 413.1.5 Parsimony + Edge Contraction . . . . . . . . . . . . . . . 433.2 Method Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 443.2.1 Experimental Pipeline . . . . . . . . . . . . . . . . . . . 443.2.2 Macro-Structure Scoring Metrics . . . . . . . . . . . . . 443.2.3 Micro-Structure Scoring Metrics . . . . . . . . . . . . . . 473.2.4 Overview of Tests . . . . . . . . . . . . . . . . . . . . . 483.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503.3.1 Aggregate Results . . . . . . . . . . . . . . . . . . . . . 51v3.3.2 Results by Parameter Settings . . . . . . . . . . . . . . . 583.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 763.4.1 Possible Explanations for Various Effects . . . . . . . . . 773.4.2 Opportunities for Further Investigation . . . . . . . . . . 803.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 814 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84A Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91A.1 Genotype Trees are Minimal Latent Trees . . . . . . . . . . . . . 91A.2 Additivity of the Hamming Distance . . . . . . . . . . . . . . . . 93B Additional Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95B.1 Hamming Distance versus Paralinear Distance . . . . . . . . . . . 95B.1.1 Recursive Grouping . . . . . . . . . . . . . . . . . . . . 95B.1.2 Chow-Liu Grouping . . . . . . . . . . . . . . . . . . . . 96B.2 Relaxed Recursive Grouping - Version Comparison . . . . . . . . 103C Dunn’s Test Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 104C.1 Number of Leaf Nodes . . . . . . . . . . . . . . . . . . . . . . . 104C.2 Number of Positions . . . . . . . . . . . . . . . . . . . . . . . . 119C.3 Asymmetric Division Rate . . . . . . . . . . . . . . . . . . . . . 134C.4 Mutation Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164C.5 Mutation Loss Rate . . . . . . . . . . . . . . . . . . . . . . . . . 179C.6 Dropout Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194C.7 10,000 Positions . . . . . . . . . . . . . . . . . . . . . . . . . . . 209C.8 0.1 Asymmetric Division Rate . . . . . . . . . . . . . . . . . . . 212C.9 0.1 Dropout Rate . . . . . . . . . . . . . . . . . . . . . . . . . . 215viList of TablesTable 1.1 A summary of the major contributions to cancer phylogenetics.1 12Table 3.1 Summary statistics on macro-structure scores for each method. 54Table 3.2 Dunn’s z-test-statistic and Bonferroni-adjusted p-value for pair-wise comparison of methods by macro-structure scoring metric. 54Table 3.3 Dunn’s z-test-statistic and Bonferroni-adjusted p-value for pair-wise comparison of macro-structure scoring metrics by method. 55Table 3.4 Summary statistics on micro-structure scores for each method. 57Table 3.5 Dunn’s z-test-statistic and Bonferroni-adjusted p-value for pair-wise comparison of methods by micro-structure scoring metric. 57viiList of FiguresFigure 1.1 A simple phylogenetic tree of species. Each leaf representsan extant species and each internal node represents a sharedextinct ancestor. Each edge represents evolutionary divergenceover time. In some trees, each edge has an associated lengththat quantifies the divergence. . . . . . . . . . . . . . . . . . 4Figure 1.2 Distance-based methods a. Defining a distance-matrix. Here,the pairwise distances are computed using a naive scheme whichassigns a cost of 1 to each nucleotide substitution. b. UP-GMA illustrated. c. Neighbor-Joining illustrated. At eachstep, nodes that have already been joined are shown in greyand active nodes are shown in are black. . . . . . . . . . . . . 6Figure 1.3 An illustration of tree additivity. The distance between 2 and 3is the sum of the lengths of the edges on the path between them. 6Figure 1.4 Character-based methods a. Parsimony illustrated. b. Cal-culating the likelihood of a tree. An evolutionary model is sup-plied, specifying transition and root probabilities. P(data|T, t1, t2)is the product of the probabilities of the assignments at eachsite. The probability at each site is computed by summing overall possible assignments to the root. . . . . . . . . . . . . . . 8Figure 1.5 a. Across-cancer phylogenetics. Estimates likely event orderby looking at data from multiple tumours. b. Within-cancerphylogenetics. Learns evolutionary relationships between sub-populations of an individual cancer. . . . . . . . . . . . . . . 11viiiFigure 1.6 A hypothetical Genotype Tree. Here, the feature set containsbinarized SNV data for three locations (1 for mutation present,0 for mutation absent) and copy-number data for one chromo-some, where the possible copy numbers are 1, 2, and 3. . . . . 18Figure 2.1 A single-cell evolutionary tree. Each node represents a cell,and each pair of children represents the daughter cells afterdivision. Each color is a distinct genotype. (This tree containsconvergent evolution due to the small size of the feature set.This is disallowed by the simulator.) . . . . . . . . . . . . . . 22Figure 2.2 On the left is a single-cell evolutionary tree. The edges be-tween identical nodes are in bold. On the right is the GenotypeTree that is the result of contracting the bolded edges. . . . . . 24Figure 2.3 (a) A single-cell evolutionary tree (left) with an asymmetric di-vision rate of 0.1 and the corresponding Genotype Tree (right).The edges between genotypically identical nodes are in red.These edges are contracted to produce the Genotype Tree. (b)The same, but with an asymmetric division rate of 0.6. . . . . 25Figure 2.4 Number of homoplasies per leaf node by number of leaf nodes.(R2: 0.9422, F-statistic: 3410 on 1 and 208 DF, p-value < 2.2e-16). Each boxplot summarizes results for thirty random trees. 27Figure 2.5 Total number of homoplasies by number of leaf nodes. Eachboxplot summarizes results for thirty random trees. . . . . . . 27Figure 2.6 Total number of homoplasies by number of positions. (R2:0.9705, F-statistic: 4895 on 1 and 148 DF, p-value: < 2.2e-16). Each boxplot summarizes results for thirty random trees. 28Figure 2.7 Total number of homoplasies by mutation rate. (R2: 0.9076,F-statistic: 1760 on 1 and 178 DF, p-value: < 2.2e-16). Eachboxplot summarizes results for thirty random trees. . . . . . . 28Figure 2.8 Total number of homoplasies by mutation loss rate. (R2: 0.9018,F-statistic: 1370 on 1 and 148 DF, p-value: < 2.2e-16). Eachboxplot summarizes results for thirty random trees. . . . . . . 29ixFigure 2.9 Total number of homoplasies by dropout rate. (R2: 0.937, F-statistic: 1.338e+04 on 1 and 898 DF, p-value: < 2.2e-16).Each boxplot summarizes results for thirty random trees. . . . 29Figure 3.1 A generic tree-structured graph. (adapted with permission fromMourad et al. [45]) . . . . . . . . . . . . . . . . . . . . . . . 35Figure 3.2 In these trees, the shaded nodes are observed and the unshadednodes are hidden. (a) A tree with no redundant hidden nodes.(b) A tree with two redundant hidden nodes: h1 and h4. (adaptedwith permission from Choi et al. [12]) . . . . . . . . . . . . . 35Figure 3.3 (a) The original latent tree with hidden (light green) and ob-served (dark green) nodes. (b) - (d) Outputs after three it-erations of RG. Active nodes are in green and newly intro-duced nodes in yellow. Red circles show the learned families.(adapted with permission from Choi et al. [12]) . . . . . . . . 38Figure 3.4 (a) The original latent tree. (b) The MST, with the closedneighbourhood of 3 circled. (c) Output after applying RG tothe closed neighbourhood of 3, with the closed neighbourhoodof 5 circled. (d) Output after applying RG to the closed neigh-bourhood of 5. (adapted with permission from Choi et al. [12]) 42Figure 3.5 (a) The maximum parsimony tree. (b) The result after con-tracting edges between identical nodes. . . . . . . . . . . . . 43Figure 3.6 A single experimental run with Recursive Grouping as the re-construction method. (1) Generate the single-cell tree. (2)Convert the single-cell tree to a Genotype Tree. (3a) Computea distance matrix over the leaf nodes. (3b) Learn a GenotypeTree using Recursive Grouping. (4) Score the result. . . . . . 45xFigure 3.7 The task of matching up the non-trivial splits of the originaltree (left) and the learned tree (right) can be modelled as anassignment problem [37]. The vertices in Po and Pl representthe non-trivial splits in the original and learned trees respec-tively. A maximum weight matching pairs up the splits in sucha way that the sum of the edge weights in the bipartite graphis maximized. The edge weights in this example (in red) werecalculated using the matching-splits variation. . . . . . . . . . 46Figure 3.8 Macro-structure scores for each method. Each boxplot sum-marizes the results of 1170 tests (39 configurations x thirtyrandom trees). . . . . . . . . . . . . . . . . . . . . . . . . . . 53Figure 3.9 Micro-structure scores for each method. Each boxplot sum-marizes the results of 1170 tests (39 configurations x thirtyrandom trees). . . . . . . . . . . . . . . . . . . . . . . . . . . 56Figure 3.10 Macro-structure scores for each method by number of leaf nodesin the original tree. Each boxplot summarizes the results forthirty random trees. There is no data for Parsimony + EdgeContraction at 256 nodes due to intractability. . . . . . . . . . 59Figure 3.11 Micro-structure scores for each method by number of leaf nodesin the original tree. Each boxplot summarizes the results forthirty random trees. There is no data for Parsimony + EdgeContraction at 256 nodes due to intractability. . . . . . . . . . 60Figure 3.12 Macro-structure scores for each method by number of sequencepositions. Each boxplot summarizes the results for thirty ran-dom trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Figure 3.13 Micro-structure scores for each method by number of sequencepostions. Each boxplot summarizes the results for thirty ran-dom trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63Figure 3.14 Macro-structure scores for each method by asymmetric divi-sion rate. Each boxplot summarizes the results for thirty ran-dom trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65xiFigure 3.15 Micro-structure scores for each method by asymmetric divi-sion rate. Each boxplot summarizes the results for thirty ran-dom trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Figure 3.16 Macro-structure scores for each method by mutation rate. Eachboxplot summarizes the results for thirty random trees. . . . . 68Figure 3.17 Micro-structure scores for each method by mutation rate. Eachboxplot summarizes the results for thirty random trees. . . . . 69Figure 3.18 Macro-structure scores for each method by mutation-loss rate.Each boxplot summarizes the results for thirty random trees. . 71Figure 3.19 Micro-structure scores for each method by mutation-loss rate.Each boxplot summarizes the results for thirty random trees. . 72Figure 3.20 Macro-structure scores for each method by dropout rate. Eachboxplot summarizes the results for thirty random trees. . . . . 73Figure 3.21 Micro-structure scores for each method by dropout rate. Eachboxplot summarizes the results for thirty random trees. . . . . 74Figure 3.22 Runtime in seconds for each method by number of leaf nodesin the original tree. Each boxplot summarizes the results forthirty random trees. There are no data points for Parsimony +Edge Contraction at 256 nodes due to intractability. . . . . . . 75Figure B.1 Performance of Recursive Grouping using the Hamming dis-tance versus the Paralinear distance by number of leaf nodes. . 97Figure B.2 Performance of Recursive Grouping using the Hamming dis-tance versus the Paralinear distance by number of positions. . 97Figure B.3 Performance of Recursive Grouping using the Hamming dis-tance versus the Paralinear distance by mutation rate. . . . . . 98Figure B.4 Performance of Recursive Grouping using the Hamming dis-tance versus the Paralinear distance by mutation loss rate. . . 98Figure B.5 Performance of Recursive Grouping using the Hamming dis-tance versus the Paralinear distance by dropout rate. . . . . . 99Figure B.6 Performance of Chow-Liu Grouping using the Hamming dis-tance versus the Paralinear distance by number of leaf nodes. . 100xiiFigure B.7 Performance of Chow-Liu Grouping using the Hamming dis-tance versus the Paralinear distance by number of positions. . 100Figure B.8 Performance of Chow-Liu Grouping using the Hamming dis-tance versus the Paralinear distance by mutation rate. . . . . . 101Figure B.9 Performance of Chow-Liu Grouping using the Hamming dis-tance versus the Paralinear distance by mutation loss rate. . . 101Figure B.10 Performance of Chow-Liu Grouping using the Hamming dis-tance versus the Paralinear distance by dropout rate. . . . . . 102Figure B.11 Performance of Relaxed Recursive Grouping - Basic versusRelaxed Recursive Grouping - k-means by number of leaf nodes.103Figure C.1 Dunn’s test on RG MS scores by number of leaf nodes . . . . 104Figure C.2 Dunn’s test on RG MMS scores by number of leaf nodes . . . 105Figure C.3 Dunn’s test on RG RF scores by number of leaf nodes . . . . 106Figure C.4 Dunn’s test on RG FR scores by number of leaf nodes . . . . 106Figure C.5 Dunn’s test on RG FP scores by number of leaf nodes . . . . . 107Figure C.6 Dunn’s test on RG FC scores by number of leaf nodes . . . . 107Figure C.7 Dunn’s test on CLG MS scores by number of leaf nodes . . . 108Figure C.8 Dunn’s test on CLG MMS scores by number of leaf nodes . . 108Figure C.9 Dunn’s test on CLG RF scores by number of leaf nodes . . . . 109Figure C.10 Dunn’s test on CLG FR scores by number of leaf nodes . . . . 109Figure C.11 Dunn’s test on CLG FP scores by number of leaf nodes . . . . 110Figure C.12 Dunn’s test on CLG FC scores by number of leaf nodes . . . . 110Figure C.13 Dunn’s test on PEC MS scores by number of leaf nodes . . . . 111Figure C.14 Dunn’s test on PEC MMS scores by number of leaf nodes . . 111Figure C.15 Dunn’s test on PEC RF scores by number of leaf nodes . . . . 112Figure C.16 Dunn’s test on PEC FR scores by number of leaf nodes . . . . 112Figure C.17 Dunn’s test on PEC FC scores by number of leaf nodes . . . . 113Figure C.18 Dunn’s test on MST MS scores by number of leaf nodes . . . 113Figure C.19 Dunn’s test on MST MMS scores by number of leaf nodes . . 114Figure C.20 Dunn’s test on MST RF scores by number of leaf nodes . . . . 114Figure C.21 Dunn’s test on MST FR scores by number of leaf nodes . . . . 115Figure C.22 Dunn’s test on MST FP scores by number of leaf nodes . . . . 115xiiiFigure C.23 Dunn’s test on MST FC scores by number of leaf nodes . . . . 116Figure C.24 Dunn’s test on NJ MS scores by number of leaf nodes . . . . 116Figure C.25 Dunn’s test on NJ MMS scores by number of leaf nodes . . . 117Figure C.26 Dunn’s test on NJ RF scores by number of leaf nodes . . . . . 117Figure C.27 Dunn’s test on NJ FR scores by number of leaf nodes . . . . . 118Figure C.28 Dunn’s test on NJ FC scores by number of leaf nodes . . . . . 118Figure C.29 Dunn’s test on RG MS scores by number of positions . . . . . 119Figure C.30 Dunn’s test on RG MMS scores by number of positions . . . . 119Figure C.31 Dunn’s test on RG RF scores by number of positions . . . . . 120Figure C.32 Dunn’s test on RG FR scores by number of positions . . . . . 120Figure C.33 Dunn’s test on RG FP scores by number of positions . . . . . 121Figure C.34 Dunn’s test on RG FC scores by number of positions . . . . . 121Figure C.35 Dunn’s test on CLG MS scores by number of positions . . . . 122Figure C.36 Dunn’s test on CLG MMS scores by number of positions . . . 122Figure C.37 Dunn’s test on CLG RF scores by number of positions . . . . 123Figure C.38 Dunn’s test on CLG FR scores by number of positions . . . . 123Figure C.39 Dunn’s test on CLG FP scores by number of positions . . . . 124Figure C.40 Dunn’s test on CLG FC scores by number of positions . . . . 124Figure C.41 Dunn’s test on PEC MS scores by number of positions . . . . 125Figure C.42 Dunn’s test on PEC MMS scores by number of positions . . . 125Figure C.43 Dunn’s test on PEC RF scores by number of positions . . . . 126Figure C.44 Dunn’s test on PEC FR scores by number of positions . . . . 126Figure C.45 Dunn’s test on PEC FP scores by number of positions . . . . . 127Figure C.46 Dunn’s test on PEC FC scores by number of positions . . . . 127Figure C.47 Dunn’s test on MST MS scores by number of positions . . . . 128Figure C.48 Dunn’s test on MST MMS scores by number of positions . . . 128Figure C.49 Dunn’s test on MST RF scores by number of positions . . . . 129Figure C.50 Dunn’s test on MST FR scores by number of positions . . . . 129Figure C.51 Dunn’s test on MST FP scores by number of positions . . . . 130Figure C.52 Dunn’s test on MST FC scores by number of positions . . . . 130Figure C.53 Dunn’s test on NJ MS scores by number of positions . . . . . 131Figure C.54 Dunn’s test on NJ MMS scores by number of positions . . . . 131Figure C.55 Dunn’s test on NJ RF scores by number of positions . . . . . 132xivFigure C.56 Dunn’s test on NJ FR scores by number of positions . . . . . 132Figure C.57 Dunn’s test on NJ FP scores by number of positions . . . . . . 133Figure C.58 Dunn’s test on NJ FC scores by number of positions . . . . . 133Figure C.59 Dunn’s test on RG MS scores by asymmetric division rate . . 134Figure C.60 Dunn’s test on RG MMS scores by asymmetric division rate . 135Figure C.61 Dunn’s test on RG RF scores by asymmetric division rate . . . 136Figure C.62 Dunn’s test on RG FR scores by asymmetric division rate . . . 137Figure C.63 Dunn’s test on RG FP scores by asymmetric division rate . . . 138Figure C.64 Dunn’s test on RG FC scores by asymmetric division rate . . . 139Figure C.65 Dunn’s test on CLG MS scores by asymmetric division rate . 140Figure C.66 Dunn’s test on CLG MMS scores by asymmetric division rate 141Figure C.67 Dunn’s test on CLG RF scores by asymmetric division rate . . 142Figure C.68 Dunn’s test on CLG FR scores by asymmetric division rate . . 143Figure C.69 Dunn’s test on CLG FP scores by asymmetric division rate . . 144Figure C.70 Dunn’s test on CLG FC scores by asymmetric division rate . . 145Figure C.71 Dunn’s test on PEC MS scores by asymmetric division rate . . 146Figure C.72 Dunn’s test on PEC MMS scores by asymmetric division rate 147Figure C.73 Dunn’s test on PEC RF scores by asymmetric division rate . . 148Figure C.74 Dunn’s test on PEC FR scores by asymmetric division rate . . 149Figure C.75 Dunn’s test on PEC FP scores by asymmetric division rate . . 150Figure C.76 Dunn’s test on PEC FC scores by asymmetric division rate . . 151Figure C.77 Dunn’s test on MST MS scores by asymmetric division rate . 152Figure C.78 Dunn’s test on MST MMS scores by asymmetric division rate 153Figure C.79 Dunn’s test on MST RF scores by asymmetric division rate . . 154Figure C.80 Dunn’s test on MST FR scores by asymmetric division rate . . 155Figure C.81 Dunn’s test on MST FP scores by asymmetric division rate . . 156Figure C.82 Dunn’s test on MST FC scores by asymmetric division rate . . 157Figure C.83 Dunn’s test on NJ MS scores by asymmetric division rate . . . 158Figure C.84 Dunn’s test on NJ MMS scores by asymmetric division rate . 159Figure C.85 Dunn’s test on NJ RF scores by asymmetric division rate . . . 160Figure C.86 Dunn’s test on NJ FR scores by asymmetric division rate . . . 161Figure C.87 Dunn’s test on NJ FP scores by asymmetric division rate . . . 162Figure C.88 Dunn’s test on NJ FC scores by asymmetric division rate . . . 163xvFigure C.89 Dunn’s test on RG MS scores by mutation rate . . . . . . . . 164Figure C.90 Dunn’s test on RG MMS scores by mutation rate . . . . . . . 164Figure C.91 Dunn’s test on RG RF scores by mutation rate . . . . . . . . . 165Figure C.92 Dunn’s test on RG FR scores by mutation rate . . . . . . . . . 165Figure C.93 Dunn’s test on RG FP scores by mutation rate . . . . . . . . . 166Figure C.94 Dunn’s test on RG FC scores by mutation rate . . . . . . . . . 166Figure C.95 Dunn’s test on CLG MS scores by mutation rate . . . . . . . . 167Figure C.96 Dunn’s test on CLG MMS scores by mutation rate . . . . . . 167Figure C.97 Dunn’s test on CLG RF scores by mutation rate . . . . . . . . 168Figure C.98 Dunn’s test on CLG FR scores by mutation rate . . . . . . . . 168Figure C.99 Dunn’s test on CLG FP scores by mutation rate . . . . . . . . 169Figure C.100 Dunn’s test on CLG FC scores by mutation rate . . . . . . . . 169Figure C.101 Dunn’s test on PEC MS scores by mutation rate . . . . . . . . 170Figure C.102 Dunn’s test on PEC MMS scores by mutation rate . . . . . . . 170Figure C.103 Dunn’s test on PEC RF scores by mutation rate . . . . . . . . 171Figure C.104 Dunn’s test on PEC FR scores by mutation rate . . . . . . . . 171Figure C.105 Dunn’s test on PEC FP scores by mutation rate . . . . . . . . 172Figure C.106 Dunn’s test on PEC FC scores by mutation rate . . . . . . . . 172Figure C.107 Dunn’s test on MST MS scores by mutation rate . . . . . . . 173Figure C.108 Dunn’s test on MST MMS scores by mutation rate . . . . . . 173Figure C.109 Dunn’s test on MST RF scores by mutation rate . . . . . . . . 174Figure C.110 Dunn’s test on MST FR scores by mutation rate . . . . . . . . 174Figure C.111 Dunn’s test on MST FP scores by mutation rate . . . . . . . . 175Figure C.112 Dunn’s test on MST FC scores by mutation rate . . . . . . . . 175Figure C.113 Dunn’s test on NJ MS scores by mutation rate . . . . . . . . . 176Figure C.114 Dunn’s test on NJ MMS scores by mutation rate . . . . . . . . 176Figure C.115 Dunn’s test on NJ RF scores by mutation rate . . . . . . . . . 177Figure C.116 Dunn’s test on NJ FR scores by mutation rate . . . . . . . . . 177Figure C.117 Dunn’s test on NJ FP scores by mutation rate . . . . . . . . . 178Figure C.118 Dunn’s test on NJ FC scores by mutation rate . . . . . . . . . 178Figure C.119 Dunn’s test on RG MS scores by mutation loss rate . . . . . . 179Figure C.120 Dunn’s test on RG MMS scores by mutation loss rate . . . . . 179Figure C.121 Dunn’s test on RG RF scores by mutation loss rate . . . . . . 180xviFigure C.122 Dunn’s test on RG FR scores by mutation loss rate . . . . . . 180Figure C.123 Dunn’s test on RG FP scores by mutation loss rate . . . . . . 181Figure C.124 Dunn’s test on RG FC scores by mutation loss rate . . . . . . 181Figure C.125 Dunn’s test on CLG MS scores by mutation loss rate . . . . . 182Figure C.126 Dunn’s test on CLG MMS scores by mutation loss rate . . . . 182Figure C.127 Dunn’s test on CLG RF scores by mutation loss rate . . . . . 183Figure C.128 Dunn’s test on CLG FR scores by mutation loss rate . . . . . 183Figure C.129 Dunn’s test on CLG FP scores by mutation loss rate . . . . . . 184Figure C.130 Dunn’s test on CLG FC scores by mutation loss rate . . . . . 184Figure C.131 Dunn’s test on PEC MS scores by mutation loss rate . . . . . 185Figure C.132 Dunn’s test on PEC MMS scores by mutation loss rate . . . . 185Figure C.133 Dunn’s test on PEC RF scores by mutation loss rate . . . . . . 186Figure C.134 Dunn’s test on PEC FR scores by mutation loss rate . . . . . . 186Figure C.135 Dunn’s test on PEC FP scores by mutation loss rate . . . . . . 187Figure C.136 Dunn’s test on PEC FC scores by mutation loss rate . . . . . . 187Figure C.137 Dunn’s test on MST MS scores by mutation loss rate . . . . . 188Figure C.138 Dunn’s test on MST MMS scores by mutation loss rate . . . . 188Figure C.139 Dunn’s test on MST RF scores by mutation loss rate . . . . . 189Figure C.140 Dunn’s test on MST FR scores by mutation loss rate . . . . . 189Figure C.141 Dunn’s test on MST FP scores by mutation loss rate . . . . . 190Figure C.142 Dunn’s test on MST FC scores by mutation loss rate . . . . . 190Figure C.143 Dunn’s test on NJ MS scores by mutation loss rate . . . . . . 191Figure C.144 Dunn’s test on NJ MMS scores by mutation loss rate . . . . . 191Figure C.145 Dunn’s test on NJ RF scores by mutation loss rate . . . . . . . 192Figure C.146 Dunn’s test on NJ FR scores by mutation loss rate . . . . . . . 192Figure C.147 Dunn’s test on NJ FP scores by mutation loss rate . . . . . . . 193Figure C.148 Dunn’s test on NJ FC scores by mutation loss rate . . . . . . . 193Figure C.149 Dunn’s test on RG MS scores by dropout rate . . . . . . . . . 194Figure C.150 Dunn’s test on RG MMS scores by dropout rate . . . . . . . . 194Figure C.151 Dunn’s test on RG RF scores by dropout rate . . . . . . . . . 195Figure C.152 Dunn’s test on RG FR scores by dropout rate . . . . . . . . . 195Figure C.153 Dunn’s test on RG FP scores by dropout rate . . . . . . . . . 196Figure C.154 Dunn’s test on RG FC scores by dropout rate . . . . . . . . . 196xviiFigure C.155 Dunn’s test on CLG MS scores by dropout rate . . . . . . . . 197Figure C.156 Dunn’s test on CLG MMS scores by dropout rate . . . . . . . 197Figure C.157 Dunn’s test on CLG RF scores by dropout rate . . . . . . . . 198Figure C.158 Dunn’s test on CLG FR scores by dropout rate . . . . . . . . 198Figure C.159 Dunn’s test on CLG FP scores by dropout rate . . . . . . . . . 199Figure C.160 Dunn’s test on CLG FC scores by dropout rate . . . . . . . . 199Figure C.161 Dunn’s test on PEC MS scores by dropout rate . . . . . . . . 200Figure C.162 Dunn’s test on PEC MMS scores by dropout rate . . . . . . . 200Figure C.163 Dunn’s test on PEC RF scores by dropout rate . . . . . . . . . 201Figure C.164 Dunn’s test on PEC FR scores by dropout rate . . . . . . . . . 201Figure C.165 Dunn’s test on PEC FP scores by dropout rate . . . . . . . . . 202Figure C.166 Dunn’s test on PEC FC scores by dropout rate . . . . . . . . . 202Figure C.167 Dunn’s test on MST MS scores by dropout rate . . . . . . . . 203Figure C.168 Dunn’s test on MST MMS scores by dropout rate . . . . . . . 203Figure C.169 Dunn’s test on MST RF scores by dropout rate . . . . . . . . 204Figure C.170 Dunn’s test on MST FR scores by dropout rate . . . . . . . . 204Figure C.171 Dunn’s test on MST FP scores by dropout rate . . . . . . . . 205Figure C.172 Dunn’s test on MST FC scores by dropout rate . . . . . . . . 205Figure C.173 Dunn’s test on NJ MS scores by dropout rate . . . . . . . . . 206Figure C.174 Dunn’s test on NJ MMS scores by dropout rate . . . . . . . . 206Figure C.175 Dunn’s test on NJ RF scores by dropout rate . . . . . . . . . . 207Figure C.176 Dunn’s test on NJ FR scores by dropout rate . . . . . . . . . . 207Figure C.177 Dunn’s test on NJ FP scores by dropout rate . . . . . . . . . . 208Figure C.178 Dunn’s test on NJ FC scores by dropout rate . . . . . . . . . . 208Figure C.179 Dunn’s test on MS scores of all methods at 10000 positions . . 209Figure C.180 Dunn’s test on MMS scores of all methods at 10000 positions 209Figure C.181 Dunn’s test on RF scores of all methods at 10000 positions . . 210Figure C.182 Dunn’s test on FR scores of all methods at 10000 positions . . 210Figure C.183 Dunn’s test on FP scores of all methods at 10000 positions . . 211Figure C.184 Dunn’s test on FC scores of all methods at 10000 positions . . 211Figure C.185 Dunn’s test on MS scores of all methods at 0.1 asymmetricdivision rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 212xviiiFigure C.186 Dunn’s test on MMS scores of all methods at 0.1 asymmetricdivision rate . . . . . . . . . . . . . . . . . . . . . . . . . . . 212Figure C.187 Dunn’s test on RF scores of all methods at 0.1 asymmetric di-vision rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213Figure C.188 Dunn’s test on FR scores of all methods at 0.1 asymmetric di-vision rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213Figure C.189 Dunn’s test on FP scores of all methods at 0.1 asymmetric di-vision rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214Figure C.190 Dunn’s test on FC scores of all methods at 0.1 asymmetric di-vision rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214Figure C.191 Dunn’s test on MS scores of all methods at 0.1 dropout rate . . 215Figure C.192 Dunn’s test on MMS scores of all methods at 0.1 dropout rate 215Figure C.193 Dunn’s test on RF scores of all methods at 0.1 dropout rate . . 216Figure C.194 Dunn’s test on FR scores of all methods at 0.1 dropout rate . . 216Figure C.195 Dunn’s test on FP scores of all methods at 0.1 dropout rate . . 217Figure C.196 Dunn’s test on FC scores of all methods at 0.1 dropout rate . . 217xixAcknowledgmentsI would like to thank my supervisor, Dr. Sohrab Shah, for taking me on as a studentand offering his continued support. His commitment to high-quality, high-impactscience is an inspiration, and I am privileged to be one of his students.I would also like to thank my committee members, Dr. Alexandre Bouchard-Coˆte´ and Dr. Steven Jones, for their incisive questions and contributions to thedevelopment of my thesis.Special thanks go to Dr. Hossein Farahani, who conceived of this project andhas offered a tremendous amount of encouragement and guidance along the way. Iowe much of my success to his patient tutelage.Additional thanks go to Dr. Art Poon for advising me on phylogenetics; Dr.Paul Pavlidis for chairing my defense; Carolyn Lui, Shima Kordlou, and RebeccaGillespie for arranging meetings; and Sharon Ruschkowski for keeping the Bioin-formatics Training Program running smoothly.Finally, I would like to thank my husband, Michael Gottlieb, for being a sound-ing board for my ideas and taking care of me when I was in the throes of scientificlabour. There were times when he was the only thing standing between me and adiet of McDonalds and poptarts.I acknowledge the Canadian Institute of Health Research for funding my stud-ies.xxChapter 1IntroductionIn 1976, Peter Nowell published a groundbreaking paper characterizing cancer asan evolutionary process akin to Darwinian natural selection [49]. According to thistheory, cancer begins when mutations to a single cell boost its proliferative poten-tial, triggering a period of rapid growth and genetic instability. Cancer cells divide,diversify, and compete for limited resources in the tumour micro-environment,leading some varieties to proliferate at the expense of others. This theory has beenexperimentally validated and refined over the years, but there remain many openquestions [43] [69] [29].Understanding how cancer genomes evolve is key to understanding oncoge-nesis and has been the focus of much work in recent years. However, it is notpossible to observe this process directly using current technologies. An alternativeapproach is to infer evolutionary lineages from snapshots in space and time. Thiscan be done by adopting methods from the field of phylogenetics, which is con-cerned with learning evolutionary histories of species. An accurate phylogeny ofa cancer tumour has the potential to shed light on numerous phenomena, such askey oncogenetic events, family relationships between groups of cells, patterns ofdiversification, and evolutionary trajectories in the wake of treatment. Ultimately,this knowledge will leave us better equipped to combat disease.The remainder of this chapter is devoted to relevant background (Sections 1.1- 1.3) and a high-level overview of the work presented in this thesis (Section 1.4).11.1 The Cancer Genome - An Evolving LandscapeA consequence of the evolutionary nature of cancer is a high degree of intra-tumourgenetic heterogeneity [53] [69] [40]. The number of mutations in a cancer can varyfrom the tens to the thousands [29]. Some of these are driver mutations, whichconfer selective advantage, but most are passenger mutations that have a neutral ordeleterious effect on fitness [29].Cancer cell populations are often described in terms of clones, or populationsof cells that share an ancestor and a genotype1 [43]. In an evolutionary context, aclone is analogous to a species and a cell to an organism. A subclone is a clonethat is a descendant of another clone, typically the dominant clone in the tumour.It was once believed that cancers evolved through a series of clonal sweeps,whereby a single clone would dominate the population until it spawned a newclone with enough proliferative power to wipe out its predecessor [69]. Recentevidence suggests that this model is applicable in some cases but not in others, asit is not uncommon for competing subclones to prevent each other from establish-ing dominance [29] [20]. In other cases, distinct subclones may occupy differentspatial niches or even co-evolve cooperatively [20]. It remains unclear how manysubclones typically coexist in a tumour, but recent evidence suggests that it variesand could be more than previously assumed [47] [20] [67].Another topic of debate is whether driver mutations are acquired in a stepwiseor punctuated fashion [69]. The stepwise model postulates that each new mutationis accompanied by a full clonal sweep, whereas the punctuated model states thatmutations are acquired in bursts before a full sweep can occur [29] [69]. Evidencesuggests that both models may be at work in any given cancer [47] [67], with thepunctuated model being especially applicable in the event of genomic crises, suchas chromosome breakage [69].In addition to heterogeneity within cancers, there is also considerable hetero-geneity across cancers, even those of the same type [69]. This rules out the possi-bility of one-size-fits-all therapy. In some types, such as chronic myeloid leukemia,a particular gene is consistently mutated [69], but in other types, such as ovarian1In reality, it is unlikely that two cells will be genetically identical, especially in cancer cellpopulations, which tend to be highly heterogenous [67]. For this reason, the concept of a clone is notwell-defined and may need revisioning.2cancer, all but TP53 are mutated in fewer than ten percent of cases [69]. However,it is often the same underlying biochemical pathways that are affected via differentgenes [69]. Associating mutations with specific pathway disturbances has provenimmensely valuable to the discovery of cancer subtypes and the development oftargeted therapeutics [6].The heterogeneity of cancer poses an enormous challenge to treatment, espe-cially in late stages of the disease [29]. The more heterogenous a tumour, the morelikely it is that some cells harbour treatment resistance [39]. Treatment selects forthese varieties by eliminating competition, and so relapses are common [21] [29].To complicate matters, chemotherapy can sometimes have a genotoxic effect, caus-ing new mutations to arise and increasing heterogeneity even further [39]. Successmay depend on tailoring treatments to each cancer individually and continuallymodifying them as the cancer evolves [29]. In order to do this effectively, thera-pists will need as complete a picture of a tumour’s evolutionary state as possible.Already, technologies such as high-throughput sequencing [47] and, more recently,single-cell sequencing [67] [25] have dramatically improved our ability to resolveintra-tumour heterogeneity. There is a growing need for robust and scalable algo-rithms that can make sense of this complexity.1.2 Classical PhylogeneticsA phylogeny is an evolutionary history of species or other biological entities, andphylogenetics is the discipline of discovering phylogenies using observations onexisting varieties. In the past, phylogeneticists relied on morphological observa-tions, but now it is possible to use molecular data instead [24]. A phylogeny maybe represented graphically as a phylogenetic tree (Figure: 1.1). Phylogenetic treescan be described using concepts and terminology from graph theory.The most common application of phylogenetics is learning evolutionary his-tories of species, and so most phylogenetic techniques are tailored to this task.Modern phylogenetic approaches fall into two general classes: distance-based andcharacter-based. The next section will discuss challenges that face all methods,and the subsequent sections will discuss each of these classes in turn.3Figure 1.1: A simple phylogenetic tree of species. Each leaf represents anextant species and each internal node represents a shared extinct ances-tor. Each edge represents evolutionary divergence over time. In sometrees, each edge has an associated length that quantifies the divergence.1.2.1 Common ChallengesAll phylogenetic methods leverage the fact that shared ancestry reveals itself throughshared traits. Perhaps the greatest challenge to accurate phylogenetic reconstruc-tion is homoplasy, which is the presence of shared traits that is not due to sharedancestry. There are two forms of this - convergent evolution, where a feature arisesindependently along two branches of the phylogeny, and atavism, where a distin-guishing trait of an ancestor is lost in a descendant. A perfect phylogeny is one thatcontains no homoplasy.Perfect phylogenies are often assumed in situations where homoplasy is be-lieved to be rare, such as when the mutation rate is very low relative to the sizeof the feature space. In the context of molecular phylogenetics, this is called theinfinite sites model, which states that no site is mutated more than once over theentire history of a population of interest [35]. This model was first introduced bypopulation geneticists to describe the evolution of species at the DNA level [35],but it has since been adopted in other domains, such as the evolution of cancer [3].Homoplasy is responsible for long-branch attraction [26] [5], which is the ten-dency for certain phylogenetic methods to group species from different long lin-eages together. This happens because longer lineages have had more opportunity4to acquire homoplasies. Some phylogenetic methods are more vulnerable to thisthan others, as will be discussed in next section.Another challenge is choosing an appropriate probabilistic model of evolution[24]. Most phylogenetic methods require some knowledge of the relative like-lihoods of differences between an ancestor and an immediate descendant. Forinstance, if constructing a phylogeny based on DNA sequences, it is sensible totake into account the fact that certain nucleotide substitutions are more likely thanothers. The Jukes-Cantor is one early model of nucleotide substitution rates thatis still used today [33]. More sophisticated models might account for insertions,deletions, and other structural changes [24].Finally, no phylogenetic method is invulnerable to incomplete or erroneousdata. Important signals might be missed due to undersampling, and technical errorsmight result in incorrect inference.1.2.2 Distance-Based MethodsThis class of methods uses a matrix of distances D between extant species to con-struct a phylogenetic tree (Figure 1.2). The performance of these methods dependson the quality of the distance metric, which is typically based on a suitable modelof evolution. Two popular varieties are discussed here.Unweighted Pair Group Method Using Arithmetic Averages (UPGMA)This method starts by adding each species to its own cluster and then recursivelymerging clusters that are most similar (Figure 1.2). The distance between clustersis defined as the average distance between pairs of species from each cluster.UPGMA is guaranteed to construct a rooted tree correctly if two propertieshold: the ultrametric property and additivity. A tree is ultrametric given a distancematrix if all leaves are equidistant to the root. This reflects a situation where the rateof evolution is constant across the whole tree. A tree is additive given a distancemetric if the distance between any two nodes is equal to the sum of the lengths ofthe edges between them [24] (Figure 1.3).5Figure 1.2: Distance-based methods a. Defining a distance-matrix. Here,the pairwise distances are computed using a naive scheme which assignsa cost of 1 to each nucleotide substitution. b. UPGMA illustrated. c.Neighbor-Joining illustrated. At each step, nodes that have already beenjoined are shown in grey and active nodes are shown in are black.Figure 1.3: An illustration of tree additivity. The distance between 2 and 3 isthe sum of the lengths of the edges on the path between them.6Neighbor-JoiningNeighbor-Joining works by finding pairs of species that are likely to be neigh-bours by examining the distances between them and other species. Specifically,if Di j ≈ Dik−D jk for all k, then i and j are likely to be neighbours. Neighbor-Joining proceeds by finding two nodes for which this is the most true, joiningthem, and then adding their parent to the working set (Figure 1.2). Unlike UP-GMA, Neighbor-Joining trees are not rooted. Neighbor-Joining is guaranteed toconstruct an tree correctly if additivity holds [24].The main strengths of distance-based methods are simplicity and speed [8]. Neighbor-Joining in particular has been shown to perform well as long as violations to addi-tivity are small [1]. However, its dependence on additivity is also a shortcoming.If there is homoplasy, then the calculated distances will underestimate the actualdistances, and additivity will be violated. This can become especially problematicif the tree is large or branches are long. Weighted Neighbor-Joining (Weighbor)is a recently proposed variation that alleviates this issue by putting less weight oncalculations involving longer branches [8].Another disadvantage of these methods is that they produce only one tree whenseveral plausible trees may be desired. Finally, these methods do not offer anyinsight into ancestral states.1.2.3 Character-Based MethodsInstead of relying a distance-matrix, character-based methods use features directly.Each species is represented as a list of features encoded as discrete characters.ParsimonyTraditional parsimony searches for a tree that requires the minimal number of sub-stitutions along its branches [28], assigning values to ancestral states in the process(Figure 1.4). Weighted parsimony is a variation where different substitutions arepenalized differently, and the goal is to find the tree with the minimal cost [59].Parsimony is typically slower than distance-based methods. The parsimonyscore for any given tree can be computed in linear time, but the task of finding the7Figure 1.4: Character-based methods a. Parsimony illustrated. b. Calculat-ing the likelihood of a tree. An evolutionary model is supplied, speci-fying transition and root probabilities. P(data|T, t1, t2) is the product ofthe probabilities of the assignments at each site. The probability at eachsite is computed by summing over all possible assignments to the root.most parsimonious tree is NP-Complete [16], as it is necessary to enumerate allpossible trees. Parsimony tends to perform well when the number of changes in atree is relatively small, but it falls prey to long-branch attraction as the size of thetree or the lengths of the branches increase [26].Another drawback of traditional parsimony is that it cannot incorporate an evo-lutionary model. Weighted parsimony does not have this problem, since an evolu-tionary model may be used to define substitution costs.Maximum-LikelihoodMaximum-likelihood is a probabilistic approach that finds a tree that maximizesthe likelihood of the data under an evolutionary model. The likelihood score is8calculated by summing over all possible assignments to internal nodes (Figure 1.4).Like parsimony, it requires an exhaustive search of all possible trees, but with theadded task of tuning the branch lengths. This makes maximum-likelihood lesssusceptible to long-branch attraction, since it allows for solutions that account forhomoplasy by having variable branch lengths [24]. The drawback of this is thatmaximum-likelihood is considerably slower than parsimony.Bayesian MethodsWhereas maximum-likelihood approaches seek a tree that maximizes the likeli-hood of the data, Bayesian methods seek a tree that is most likely given the data.This can be computed using Bayes’ rule:P(tree|data) =P(data|tree)∗P(tree)P(data)(1.1)Since P(tree) is unknown, P(tree|data) cannot be calculated directly. Instead,trees and parameters are sampled from their posterior distributions using a MarkovChain Monte Carlo (MCMC) process. Then, P(tree|data) can be estimated fromthe frequency at which tree appears [31].Bayesian methods are popular because they allow for more complex evolution-ary models and can be faster than maximum-likelihood in some situations [31]. Acriticism of Bayesian inference methods in general is that they require an initialspecification of priors (in this case, P(tree)), which introduces subjectivity [31].Another complication is that it is difficult to know how many MCMC runs areneeded for convergence [31].91.3 Cancer PhylogeneticsCancer phylogenetics is a relatively new field that adapts methods from classicalphylogenetics to the study of cancer progression. It is motivated by the fact thatcancer is essentially Darwinian, making it suitable for phylogenetic analysis. Thisfield has potential applications in both research and clinical settings:1. Research applications - Identify patterns of progression within and acrosscancer types, such as mutagenic event order, clonal population dynamics,and common responses to treatment2. Clinical applications - Use phylogenetics to evaluate specific cases, guidediagnostics, and predict outcomes; use knowledge gained from phylogeneticresearch to develop treatmentsMost contributions to cancer phylogenetic research to-date fall into one of twocategories (Figure 1.5):1. Across-cancer phylogenetics (also called oncogenetics) learns statisticalphylogenies for particular types of cancer by looking at patterns of mutationsacross cancers from different patients2. Within-cancer phylogenetics learns actual phylogenies for specific instancesof cancer, treating subclones or single cells as speciesThere is, arguably, a third category that uses the machinery of phylogenetics tocluster and classify tumours into subtypes [55] [19]. These trees are not true phy-logenies as they have no relationship with time.1.3.1 Across-Cancer PhylogeneticsIn 1999, Desper et al. pioneered the field of cancer phylogenetics by introducingthe notion of an oncogenetic tree and describing how to learn one from comparativegenome hybridization (CGH) data [17]. An oncogenetic tree captures the order inwhich mutagenic events are believed to occur in a typical instance of a particulartype of cancer [17]. Desper et al. pre-selected a set of interesting copy number al-teration events, profiled them across 117 instances of renal carcinoma using CGH,10Figure 1.5: a. Across-cancer phylogenetics. Estimates likely event order bylooking at data from multiple tumours. b. Within-cancer phylogenetics.Learns evolutionary relationships between subpopulations of an indi-vidual cancer.and then constructed a maximum-weight spanning tree over the events using thestatistical dependencies as edge weights. The idea is that events that often co-occurare more likely to be evolutionarily linked. [17].Most early work in cancer phylogenetics is of a similar nature (Table 1.1). In2000, Desper et al. built upon their previous work by deriving a distance-matrixfrom statistical dependencies and using Neighbor-Joining to build a tree [18]. Hey-debreck et al. used single-cell analysis to obtain copy number data for 173 casesof renal carcinoma and built a tree using maximum-likelihood, allowing for morea sophisticated probabilistic model [66]. Finally, Beerenwinkel et al. described anEM-algorithm for learning a mixture model from HIV-I copy number and struc-tural rearrangement data [4]. An oncogenetic tree might be a spanning tree [17] [4]11Category Model Data Source Data Type Method Topology Tool Name PublicationAcross Oncogenetic Bulk (CGH) CNV MWST Spanning N/A Desper et al. (1999) [17]Across Oncogenetic Bulk (CGH) CNV Neighbor-JoiningBifurcating N/A Desper et al. (2000) [18]Across Oncogenetic Single-cell(karyotypes)CNV ML Bifurcating N/A Heydebreck et al. (2004)[66]Across MutageneticMixtureHIV-I tran-scriptsCNV andSREM/ML Bifurcating mtreemix Beerenwinkel et al. (2005)[4]Both Oncogenetic Single-cell(FISH)CNV EM/ML Spanning w/Steiner NodesN/A Pennington et al. (2006)[52]Within Phylogenetic(over samples)Bulk (CGH) CNV Neighbor-JoiningBifurcating N/A Navin et al. (2010) [46]Within Phylogenetic(over samples)Bulk (CGH) CNV Neighbor-JoiningBifurcating TuMult Letouze´ et al. (2010) [41]Within Phylogenetic(over samples)Bulk (CGH) CNV Distance-basedBifurcating MEDICC Schwarz et al. (2014) [60]Within Phylogenetic(over subclones)Bulk (HTS) SNV Original Spanning TrAp Strino et al. (2014) [64]Within Phylogenetic(over subclones)Bulk (HTS) SNV Bayesian Spanning PhyloSub Jiao et al. (2014) [32]Within Phylogenetic(over subclones)Single-cellor bulkMethylationor SNVBayesian Spanning BitPhylogeny Yuan et al. (2015) [70]Within Phylogenetic(over cells)Single-cell(CGH)CNV Neighbor-JoiningBifurcating N/A Navin et al. (2011) [47]Within Phylogenetic(over cells)Single-cell(FISH)CNV Original Spanning w/Steiner nodesFISHtrees Chowdhury et al. (2014)[14]Within Phylogenetic(over cells)Single-cell(exome seq)SNV Neighbor-JoiningBifurcating N/A Xu et al. (2012) [68]Within Oncogenetic(over mutations)Single-cell(exome seq)SNV MST Spanning N/A Kim and Simon (2014)[34]Table 1.1: A summary of the major contributions to cancer phylogenetics.2or a full bifurcating tree [18] [66] depending on the method used.Across-cancer phylogenies provide powerful summaries of patterns across can-cers, but they do not benefit from the extensive heterogeneity within cancers. Theadvent of new technologies, such as high-throughput sequencing [47] and single-cell sequencing [67] [25], has made it possible to quantify this heterogeneity, open-ing up new avenues of research.1.3.2 Within-Cancer PhylogeneticsWithin-cancer phylogenetics is concerned with learning the phylogenetic relation-ships among subpopulations of a single cancer. The subpopulations may be sam-ples, subclones, or even individual cells (Table 1.1).The potential applications of within-cancer phylogenies are numerous. First,they could reveal key oncogenetic events and common pathways of progression at ahigher resolution than across-cancer methods. Additionally, they could shed lighton clonal populaton dynamics. For instance, a phylogeny over subclones could2Acronyms: HTS = High-throughput sequencing, SR = Structural rearrangments, MWST =Maximum-weight spanning tree, ML = Maximum likelihood, MST = Minimum spanning tree12show how clones emerge, evolve, and expand in relation to each other. This couldbe especially illuminating when applied to data drawn from different timepoints,such as before and after treatment.Within-cancer phylogenetic methods typically rely on one of two types of data:copy-number variations (CNVs) or single nucleotide variants (SNVs) [3]. CNVsmay be defined in terms of whole chromosomes or windows of fixed size along achromosome. In phylogenies based on CNVs, each node corresponds to a uniquecopy-number state observed in a sample [46] [60] or a single cell [47] [14]. Thesephylogenies tend to be small because the copy-number event space is small, andthe main challenge is computing pairwise distances between states.Naive approaches to calculating distances between copy-number states treatcopy-number events as independent [46] [47]. In reality, copy-number events arenot independent since they may affect overlapping regions of the genome, andmore sophisticated methods take this into account [41] [60] [14]. Furthermore,the true distance between copy-number states depends on the assignment of CNVsto alleles. This information is typically not available from copy-number data andmust be inferred (a process called phasing). Schwarz et al. recently described amethod for phasing CNVs and computing edit distances jointly [60].SNVs may be encoded using the nucleic acid alphabet (A, G, C, and T orU) or a binary alphabet (0 for mutation absent, 1 for mutation present). The SNVevent space is typically larger than the copy-number event space, and most of intra-tumour heterogeneity can be attributed to SNVs [67]. For this reason, phylogene-nies based on SNVs are better equipped to capture the full clonal make-up of atumour. The power of these methods depends on whether the SNV data comesfrom bulk tissue sequencing or single-cell sequencing. The next sections will dis-cuss each of these in turn.1.3.3 Inferring Phylogenies from Bulk SNV DataHigh-throughput sequencing technologies (HTS) have revolutionized the field ofcancer genomics, allowing for full characterization of the genomic content of tu-mour samples [63]. However, bulk sample preparation involves lysing cells andfragmenting their DNA, and so it is impossible for a sequencer to resolve which13SNVs came from the same cell. Therefore, there is no immediate picture of thesubclones that are present in a sample.Instead, the clonal make-up of a sample must be inferred from SNV readcounts. Specifically, these counts can be used to estimate the proportion of cellsthat harbour each SNV, or the cellular prevalence. From this, it is possible to iden-tify combinations of mutations that characterize subclones [3].Estimating cellular prevalences from SNV counts is not a trivial task, sincecounts may be affected by nuisance factors such as contamination from normalcells, copy-number variations, and technical noise [57]. For this reason, methodsto deconvolve clonal architecture from SNV counts typically rely on sophisticatedprobabalistic models and inference schemes [3], such as maximum-likelihood [50]and Bayesian [57] [44] mixture models.A phylogenetic tree may be learned over the subclones after deconvolution, butthis might produce suboptimal results since the two processes are mutually infor-mative [70]. A better approach is to deconvolve the subclones and learn the phy-logeny simultaneously. A program called TrAp does this by solving a constrainedmatrix inversion problem [64], but it can only handle up to 25 features. PhyloSubuses a non-parameteric Bayesian approach based on PyClone [57] to jointly clusterSNVs into major subclones and learn a phylogeny among them [32].In summary, bulk SNV profiling is able to capture a substantial portion of thegenetic diversity in large chunks of tissue, so large subclones are unlikely to bemissed. However, a major disadvantage is that the assignment of SNVs to clonalgenotypes is not obvious. Instead, the subclones must be inferred from SNV countdata, which is a challenging and computationally intensive process. The ambiguityinherent in the data means that smaller subclones are more difficult to detect andhighly similar subclones harder to distinguish [70]. Consequently, the resultingphylogenies tend to be crude pictures of the clonal architecture.1.3.4 The Promise of Single-Cell SequencingIn recent years, high-throughput sequencing technologies have improved to thepoint that only a small amount of DNA is required for analysis, making it possibleto sequence the DNA of individual cells [61]. Capturing SNV data at the resolution14of the cell bypasses the clonal deconvolution problem because each cell providesan unambiguous snapshot of a distinct genotype present in a tumour.However, single-cell sequencing technologies have some serious limitations.First, single-cell DNA must be amplified considerably, and this can introduce a hostof problems such as uneven coverage, incorrect SNV identification, and missed loci(allelic-dropout) [61]. Accuracy can be improved by increasing sequencing depth,but this can be expensive, so deep sequencing is typically limited to a few markersat a time [61]. Cost also limits the number of cells that can be sequenced [61],which means that it is hard to fully capture the heterogeneity of a tumour. Finally,it is difficult to validate mutations detected using single-cell sequencing [67]. Thiscan be done by performing targeted sequencing on bulk tissue [67], but this mayfail if the variant is rare.Nevertheless, single-cell technologies have improved rapidly over the past fewyears, and this trend is likely to continue into the future [61]. Navin et al. recentlypioneered nuc-seq, which isolates cells before they divide but after they replicatetheir DNA, reducing the amount of amplification required [67]. Nuc-seq was ableto achieve 91% mean coverage breadth with allelic dropout rates of around 10%and a false positive rate of 1-2 errors per million bases - a substantial improvementover previous methods [67]. Single-molecule (third-generation) sequencing tech-nologies also show promise as they do not require amplification at all [61], but theycurrently suffer from relatively low accuracy (around 85%) [11].With enough cells, a single-cell phylogeny could provide an intricate picture ofthe clonal architecture of a tumour. Major subclonal groups and their evolutionaryrelationships would be apparent in the clusters and the branching patterns. Butuntil technological advances drastically increase the number of cells that can besequenced at once, this picture will be incomplete.So far, there have only been a few attempts to learn phylogenies from single-cell SNV data. One used Neighbor-Joining to build a tree from a matrix of Eu-clidean distances as a part of a larger biological study [68]. Kim and Simon de-scribe a Bayesian method for building oncognenetic trees over mutations, whichcan then be converted into bifurcating phylogenies [34]. Finally, a tool calledBitPhylogeny uses non-parametric Bayesian clustering to jointly learn major sub-clones and a phylogeny over the subclones from single-cell SNV data [70].151.3.5 Limitations of Current MethodsCurrent methods for learning tumour phylogenies are limited in a few key respects.First, most of these methods either put all observed varieties on the leaves of thetree, as in classical phylogenetics, or build a spanning tree with no hidden nodes,as in oncogenetics. Neither of these models can fully capture the range of possiblerelationships among extant genotypes. For instance, it is possible for ancestralgenotypes to be present among extant genotypes. This can happen in the eventof asymmetric cell division, which results in all new mutations going to one ofthe two daughter cells. Asymmetric cell division is associated with stem cells andis implicated in cancer [36]. This can also happen if the observed feature set isincomplete.Second, many of these methods are designed to work on small feature sets, suchas a few variants over a few major subclones or samples. As such, it is unclear howwell current methods will scale as advances in single-cell sequencing continue toimprove our ability to detect unique genotypes.We have attempted to make progress on this front by developing a GenotypeTree representation of cancer phylogenies, implementing three methods for learn-ing Genotype Trees from binary SNV profiles, and testing these methods under avariety of simulated conditions.161.4 Current Work1.4.1 Representing Cancer Phylogenies as Genotype TreesIn this work, a within-cancer phylogeny is represented as a Genotype Tree. AGenotype Tree is a simple and intuitive way to model the evolutionary relation-ships between genotypes (Figure 1.6). Each node represents a distinct genotype asdefined over a set of features of interest, and each edge represents a parent-childrelationship between two genotypes. Its constituents are as follows:• A vector of features F = { f0, f1, ..., fn}, where each feature fi is a discretealphabet or distribution specifying possible values for that feature• A set of unique genotypes G, where each gi ∈G is a vector of values {gi0,gi1, ...,gin},where gij is the value for feature j at genotype i• A set of edges E, where each ei j ∈ E represents a parent-child relationshipbetween genotypes gi and g jNote that the expressive power of a Genotype Tree is constrained by F . In practice,it may not be feasible to capture all relevant features, and so the Genotype Treeshould be interpreted as a partial picture only.Possible constituents of F include copy-number profiles and SNV profiles. Itcould in theory capture any genotypic feature of interest.Differences from Phylogenetic TreesGenotype Trees differ from phylogenetic trees in two key respects:1. Genotype Trees are not strictly bifurcating - that is, each internal node mayhave one to many children2. Genotype Trees may have extant internal nodesBoth of these differences stem from the fact that when a cancer cell divides, there isa chance that all new mutations to features of interest will go to one of the daughtercells. In this case, a new genotype emerges in the one daughter while the ancestralgenotype persists in the other. Hence, it is possible for a genotype to spawn one ormore new genotypes over the course of its lifetime.17Figure 1.6: A hypothetical Genotype Tree. Here, the feature set contains bi-narized SNV data for three locations (1 for mutation present, 0 for mu-tation absent) and copy-number data for one chromosome, where thepossible copy numbers are 1, 2, and 3.181.4.2 Problem StatementThe present work is concerned with the task of learning a Genotype Tree over aset of known genotypes where the feature set is a series of binary SNV indicators.That is, each genotype is defined by a string of 0s and 1s indicating absence orpresence of a variant at each position of interest. In theory, genotypes may be de-rived from single-cell sequencing or bulk sequencing, but for simplicity, the currentwork assumes that each genotype is derived from a single cell.1.4.3 Project OverviewThe project has three main constituents:1. A cancer cell population simulator (Chapter 2)2. Implementations of methods for learning Genotype Trees from binary SNVprofiles, namely:• Recursive Grouping (Section 3.1.3)• Chow-Liu Grouping (Section 3.1.4)• Parsimony + Edge Contraction (Section 3.1.5)3. A procedure for evaluating these methods using the simulated data (Section3.2)The next two chapters will discuss these constituents in depth.19Chapter 2Simulating the Evolution ofCancer Cell PopulationsIn order to evaluate different methods for learning Genotype Trees from binarySNV profiles, we built a tool that simulates the evolution of cancer cell populations.This is necessary because the evolutionary histories of real cancer cell populationsare seldom known in full. Furthermore, a simulator allows us to systematicallyvary evolutionary conditions and observe the effects on tree properties and recon-struction accuracy. In addition to the present work, this tool could also be used inother endeavors that require similar data.The remainder of this chapter will describe the design and workings of thesimulator (Section 2.1), the effect of parameter settings on certain properties of theoutput data (Section 2.2), and the limitations of the current design and implemen-tation (Section 2.3).202.1 MethodsThe output of the simulator is a single-cell evolutionary tree with sequence datafor each leaf node (Figure 2.1), which can then be converted into a Genotype Tree(Section 2.1.3).The simulator generates a single-cell evolutionary tree in a step-wise fashion,introducing new mutations at each division in accordance with a mutation rate. Itonly models divisions that introduce new mutations. The simulator was imple-mented in Python.2.1.1 ParametersThe simulator has a number of configurable settings. These include:• Number of leaf nodes n - the number of distinct genotypes present in thefinal population• Number of sequence positions p• Asymmetric division rate a - the probablity of any given division beingasymmetric, which means that all mutations go to one child• Mutation rate m - the probability of any given mutable location being mu-tated upon division• Mutation loss rate l - the probability of any mutated location reverting to theun-mutated state upon division• Dropout rate d - the probability of any given mutated location present in aleaf node reverting to the un-mutated state at the endThe simulator does not expect m to reflect the mutation rate across the entiregenome. This is because the p sequence positions are understood to be heavilybiased toward those that will ultimately acquire mutations, and m will be set arti-ficially high to reflect this. A side-effect of this is that convergent evolution mayhappen when it should not, if we adopt the widely used infinite sites model (Section1.2.1). For this reason, convergent evolution is explicitly disallowed.21For the same reason, mutation-loss events are not meant to model actual re-versions at the base-pair level, but rather, loss of a variant due to a large structuraldeletion. The asymmetric division rate accounts for the presence of cancer stemcells, which have the capacity to divide asymmetrically [36]. Dropout rate modelsallelic dropout, which is the failure of sequencing technologies to detect variants.Figure 2.1: A single-cell evolutionary tree. Each node represents a cell, andeach pair of children represents the daughter cells after division. Eachcolor is a distinct genotype. (This tree contains convergent evolution dueto the small size of the feature set. This is disallowed by the simulator.)222.1.2 ProcessFirst, the set mutable global is initialized to contain all positions. The root nodeis initialized and assigned a string of 0s of length p. It then undergoes division(Algorithm 1), and each of its children undergo division, and this process continuesin a depth-first fashion until there are n leaf nodes in the tree.Note that positions are not added back to the local or global mutable sets oncethey are lost. This ensures that no position is ever mutated more than once over thewhole evolutionary history.Algorithm 1 Cell Dividefunction DIVIDE(node)new muts←{}new losses←{}new muts c1←{}new muts c2←{}new losses c1←{}new losses c2←{}mutable← mutable global∩node.mutablefor pos in mutable do Add pos to new muts with probability mend forif new muts is empty thenDIVIDE(node)end iffor pos in node.mutated do Add pos to new losses with probability lend forasym divide← True with probability aif asym divide thennew muts c1← new mutsnew losses c1← new losseselsenew muts is split evenly over new muts c1 and new muts c2new losses is split evenly over new losses c1 and new losses c2end ifchild1← copy(node)child2← copy(node)for pos in new muts c1 dochild1.seq[pos]← 1end forfor pos in new losses c1 dochild1.seq[pos]← 0end forfor pos in new muts c2 dochild2.seq[pos]← 1end forfor pos in new losses c2 dochild2.seq[pos]← 0end forchild1.mutable← node.mutable−new muts c1child1.mutated← node.mutated∪new muts c1child2.mutable← node.mutable−new muts c2child2.mutated← node.mutated∪new muts c2mutable global← mutable global−new mutsend function232.1.3 Converting the Single-Cell Evolutionary Tree to a GenotypeTreeThe resulting single-cell tree can be converted into a Genotype Tree simply bycontracting edges between nodes with identical sequences, starting with the leafnodes and working up the tree. Figure 2.2 illustrates the result of this process.Figure 2.2: On the left is a single-cell evolutionary tree. The edges betweenidentical nodes are in bold. On the right is the Genotype Tree that is theresult of contracting the bolded edges.242.2 Analysis of Parameter EffectsThis section describes how different parameter settings influence other importantproperties that are not explicitly varied. Specifically, Section 2.2.1 describes howasymmetric division rate affects tree topology, and Section 2.2.2 describes howother parameters affect the number of homoplasies in the final Genotype Tree.2.2.1 Tree TopologyAside from number of leaf nodes, asymmetric division rate is the only parameterthat influences tree topology. Each asymmetric division amounts to one edge con-traction during conversion from single-cell evolutionary tree to Genotype Tree, andso the number of hidden nodes in the Genotype Tree is inversely proportional toasymmetric division rate. Consequently, the asymmetric division rate directly in-fluences the size of the tree and the ratio of hidden nodes to observed nodes. Figure2.3 illustrates this relationship.Figure 2.3: (a) A single-cell evolutionary tree (left) with an asymmetric di-vision rate of 0.1 and the corresponding Genotype Tree (right). Theedges between genotypically identical nodes are in red. These edgesare contracted to produce the Genotype Tree. (b) The same, but with anasymmetric division rate of 0.6.25Another consequence of this is that at lower asymmetric division rates, a Geno-type Tree more closely resembles a full bifucating tree (as in traditional phyloge-netics). At higher asymmetric division rates, a Genotype Tree is more similar to aminimum spanning trees over the observed nodes.2.2.2 Number of HomoplasiesAs discussed in Section 1.2.1, homoplasy is a serious challenge for phylogeneticreconstriction. Although convergent evolution is disallowed by our simulator, ho-moplasies may still arise due to mutation loss and dropout. For the purposes of thissection, each reversion from 1 to 0 in the single-cell evolutionary tree is counted asa single homoplasy. The following figures show how the number of homoplasies|h| varies with different parameter settings. We used the same defaults and testedthe same range of values as described in Section 3.2.4. Thirty random trees weregenerated for each setting using the same thirty seeds.Figure 2.4 shows that |h|n scales linearly with lgn, where n is the number ofleaf nodes in the single-cell evolutionary tree. Equivalently, |h| scales linearly withnlgn (Figure 2.5). This can also be interpreted to mean that |h|n scales linearly withthe depth of the tree. This is as expected, since each level corresponds to a roundof cell division.|h| also scales linearly with number of positions, mutation rate, mutation lossrate, and dropout rate (Figures 2.6 - 2.9). This is also expected. The number ofhomoplastic events is directly proportional to the mutation loss rate, the dropoutrate, and the number of candidate positions. The number of candidate positions, inturn, is directly proportional to the number of positions and the mutation rate.26Figure 2.4: Number of homoplasies per leaf node by number of leaf nodes.(R2: 0.9422, F-statistic: 3410 on 1 and 208 DF, p-value < 2.2e-16).Each boxplot summarizes results for thirty random trees.Figure 2.5: Total number of homoplasies by number of leaf nodes. Each box-plot summarizes results for thirty random trees.27Figure 2.6: Total number of homoplasies by number of positions. (R2:0.9705, F-statistic: 4895 on 1 and 148 DF, p-value: < 2.2e-16). Eachboxplot summarizes results for thirty random trees.Figure 2.7: Total number of homoplasies by mutation rate. (R2: 0.9076, F-statistic: 1760 on 1 and 178 DF, p-value: < 2.2e-16). Each boxplotsummarizes results for thirty random trees.28Figure 2.8: Total number of homoplasies by mutation loss rate. (R2: 0.9018,F-statistic: 1370 on 1 and 148 DF, p-value: < 2.2e-16). Each boxplotsummarizes results for thirty random trees.Figure 2.9: Total number of homoplasies by dropout rate. (R2: 0.937, F-statistic: 1.338e+04 on 1 and 898 DF, p-value: < 2.2e-16). Each box-plot summarizes results for thirty random trees.292.3 LimitationsThe viability of the present work depends on how well the simulator models realcancer cell populations. Although the current simulator is highly customizeable, itdoes not model all possible conditions completely and accurately.2.3.1 Cell DivisionOne shortcoming of the simulator is that it treats asymmetric division as a randomoccurence independent of lineage. In reality, the ability to divide asymmetricallyis a characteristic of cancer stem cells that is passed down from parent to child [9][48].A better approach would be to model cells as either stem cells or differentiatedcells. Stem cells would have the ability to divide asymmetrically, producing oneidentical stem cell and one differentiated cell, or symmetrically, producing eithertwo stem cells or two differentiated cells. Stem cells are known to be capableof all three types of divisions, and the relative rates vary across populations [42].The simulator would need to have configurable parameters specifying these rates.Differentiated cells would only be capable of symmetric division.Another shortcoming is how the simulator treats symmetric division. It is notrealistic to divide the mutations evenly over the daughter cells at every division. Infact, it is possible for all mutations to go to one daughter cell simply by chance.Ideally, the way new mutations are partitioned should vary around some sensiblychosen norm.2.3.2 Mutation Loss and Convergent EvolutionAnother deficiency is how mutation loss is modelled. Currently, the rate of lossis the same at every cell division. This is not realistic, since structural alterationevents tend to occur in bursts and vary in severity [62] [2] [67]. One solution wouldbe to have a parameter specifying the probability of loss at any division and anotherspecifying severity. Alternatively, the severity could be determined at random. Amore sophisticated solution would be to expand the simulator to model structuralchanges as well as point mutations.It is also worth considering whether or not the constraint on convergent evolu-30tion is justified in the context of cancer evolution. Although it is widely adopted[3], a strong case for it has yet to be made in literature.Some amount of convergent evolution is guaranteed to occur in sizeable tu-mours. To illustrate, consider a one cubic centimeter tumour that contains aroundten billion cells. To get a conservative estimate of the convergent evolution rate, wecan assume that the current population is the product of exactly five billion divi-sions, that there were no mutations in the ancestral cells, and that the mutation rateis that of a non-cancer cell - 0.6 new mutations per division [67]. In this scenario,one can expect five billion * 0.6 mutational events, which is three billion - justlarger than the size of the human genome [65]. So, by the pigeonhole principle,some convergent evolution is inevitable.However, the real question is whether or not two convergent lineages will thriveto the extent that they are both likely to appear in a sample of a just a few thou-sand cells. Since far more lineages perish than survive [29], the chances are slim,unless the convergent mutation confers a strong selective advantage. Even thenit would seem unlikely, considering the highly competitive nature of the tumourenvironment and the fact that there are multiple ways to affect the same gene. Nev-ertheless, this is a matter that warrants further consideration.2.3.3 Other LimitationsAnother notable limitation of the simulator is that it does not model fitness andcell death. Including cell death would result in more topological variety in thesingle-cell evolutionary trees and, consequently, the Genotype Trees. Adding thisfunctionality would require us to extend the logic of the converter 2.1.3 to collapseredundant hidden nodes (see Section 3.1.1).A possible weakness in the simulator design is that it operates on a set of po-sitions that are understood to be heavily biased toward those that will ultimatelyacquire mutations. For this reason, m is set artificially high and convergent evo-lution must be explicitly disallowed. A more natural approach might be to modelthe entire genome and use an appropriate genome-level mutation rate. This wouldhave a negative effect on the simulator runtime, but this might not be too prohibitivesince each dataset needs to be generated only once.31Another potential criticism of the simulator is that all leaf nodes are used toconstruct the Genotype Tree, when in reality, only a subset of these nodes will beobserved. A possible response to this is that we can envision the single-cell treeas modelling the evolutionary history of just that subset. The problem with thisanswer is that it assumes that we will always have every node’s sibling.Alternatively, we could simply introduce a parameter that specifies what per-centage of leaf nodes are observed. However, this would not allow for any newtopologies that the simulator would never produce in its current form. It would,in fact, have the same effect as cell death applied to the leaf level only. For thisreason, the best approach might be to bypass this parameterization and model celldeath instead.2.4 ConclusionThis chapter has presented a tool for simulating the evolution of cancer cell pop-ulations. It has a number of configurable settings that capture both technical lim-itations (number of observed genotypes, number of sequence positions, dropoutrate) and natural conditions (asymmetric division rate, mutation loss). Althoughit is an essential component of the present work, it could also have value to otherendeavors that require similar data.A single-cell evolutionary tree is generated by simulating cell division and in-troducing new mutations according to specified rates. This tree is then convertedinto a Genotype Tree by contracting edges between genotypically identical nodes.The sequences at the leaves of the single-cell tree may then be given as inputs to thereconstuction algorithms presented in Chapter 3, with the Genotype Tree suppliedas ground-truth.The current simulator is able to produce a rich array of systematically varieddata, but further developments are needed to fully capture the complexities of can-cer evolution.32Chapter 3Evaluation of Novel Methods forLearning Genotype TreesThis chapter presents three methods for learning Genotype Trees from binary SNVprofiles and evaluates their performance in terms of reconstruction accuracy andspeed. The primary motivation behind this is to develop and test methods that aresuited to learning phylogenies over cancer cell populations, with special consider-ation given to scalability in light of recent advances in single-cell technology.Sections 3.1 and 3.2 describe the three reconstruction methods and the eval-uation procedure respectively. Section 3.3 presents the results and Section 3.4provides a discussion.333.1 Reconstruction MethodsThis section describes three methods for learning Genotype Trees given a set ofobserved genotypes in the form of binary SNV profiles. Sections 3.1.1 and 3.1.2introduce key terminology and distance metrics used in our experiments.The first two methods, Recursive Grouping (Section 3.1.3) and Chow-Liu Group-ing (Section 3.1.4), are distance-based methods for learning Latent Tree Modelsdeveloped by Choi et al. [12] that have been adapted to the problem of learningGenotype Trees. The third is an extension of the classical phylogenetic methodParsimony wherein edges between identical nodes in the result tree are collapsedin a post-processing step (Section 3.1.5).3.1.1 TerminologyLatent TreeFor the purposes of this paper, a latent tree is tree-structured graph (Figure 3.1)where leaf nodes are observed and internal nodes may be either hidden or observed.All Genotype Trees are latent trees.Minimal Latent TreeA minimal latent tree is a latent tree that contains no redundant hidden nodes [12].A hidden node is redundant if and only if it (a) has fewer than three neighbours or(b) is perfectly dependent or perfectly independent on one of its neighbours [51].The methods described below can only learn minimal latent trees, since redun-dant hidden nodes are impossible to detect from observed node data alone (Figure3.2).A Genotype Tree is a minimal latent tree as long as the root node is observed.See Appendix A.1 for proof.34Figure 3.1: A generic tree-structured graph. (adapted with permission fromMourad et al. [45])Figure 3.2: In these trees, the shaded nodes are observed and the unshadednodes are hidden. (a) A tree with no redundant hidden nodes. (b) A treewith two redundant hidden nodes: h1 and h4. (adapted with permissionfrom Choi et al. [12])35Latent Tree ModelAccording to Choi et al., a Latent Tree Model is a tree-structured graphical Markovmodel where leaf nodes are observed and internal nodes may be hidden or observed[45]. In other words, it is a latent tree with an associated Markov model. Its keyconstituents are:• A set of vertices V that represent observed and latent random variables• A set of edges E that capture the dependencies between these variables• A set of parameters Θ consisting of probability distributions for each variableEach variable Vi takes a value from an alphabet χ according to the probabilitiesspecified in Θi [12]. If Vi has a parent then Θi specifies a conditional distributiondepending only on the value of the parent. Otherwise, Θi specifies a marginal distri-bution. If the graph is undirected, Θi specifies a conditional distribution dependingonly on Vi’s neighbours.It can not be said that all Genotype Trees are Latent Tree Models, as GenotypeTrees do not necessarily satisfy the Markov property, which requires that the as-signment of values to each node depend only on the values of the parent node. Con-traints on convergent evolution introduce dependencies between non-neighbouringnodes. The below-described methods were developed to learn Latent Tree Models,but they can be used to learn latent trees in general as long as certain conditionshold, as described in the next section.3.1.2 Distance MetricsThe guarantees of these methods depend on the distance metric. We have testedtwo different metrics, described below.Hamming DistanceIn information theory, the Hamming distance between two strings of equal lengthis simply the number of positions at which the two strings differ [30].The Hamming distance metric is additive for the trees produced by our simu-lator as long as the mutation loss and dropout rates are zero (see Appendix A.2 for36proof). Recursive Grouping is guaranteed to produce the correct tree under theseconditions [12].Paralinear DistanceThe paralinear distance metric is an information-efficient way to calculate the dis-tance between two sequences [38]. Specifically, the formula for the distance di jbetween two sequences i and j is defined as:−logedetJi j(detDi)1/2(detD j)1/2(3.1)where Ji j is the joint probability matrix and Di and D j are diagonal matricescontaining residue frequencies1 for i and j respectively.The paralinear distance is additive under certain conditions, one of which is thatthe evolutionary process used to produce the sequences satisfies the Markov prop-erty [38]. As mentioned earlier, our simulator does not satisfy this, as constraintson convergent evolution introduce dependencies between non-neighbouring nodes.Hence, Recursive Grouping is not guaranteed to produce the correct tree when theparalinear distance metric is used. However, in practice, it seems that additivity isnearly satisfied, and the results reflect this (Appendix B.1).3.1.3 Recursive GroupingRecursive Grouping is a distance-based method for learning Latent Tree Models,which is described in full in Choi et al. [12]. It takes as input a set of observednodes and a matrix of distances between them. It is similar in spirit to the classicalphylogenetic method Neighbor-Joining [24] in that it uses a distance matrix toidentify related nodes and reconstructs the tree from the bottom up.The distance matrix can be used to identify parent-child and sibling relation-ships between nodes. In particular, where di j is the distance between nodes i andj:• i and j are siblings if and only if dik− d jk = dik′ − d jk′ for all other nodes kand k’1in this case, counts for 0 and 137• i is the parent of j if and only if di j = dik−d jk for all other nodes kUsing these criteria, Recursive Grouping proceeds as follows (Figure 3.3):1. Initialize the active node set to be the set of observed variables2. Identify sibling and parent-child relationships among the active nodes3. For every group of siblings that has no parent, introduce a new hidden nodeto be the parent4. Compute the distances between each of the new hidden nodes and all othernodes, and then update the distance matrix accordingly5. Update the active node set to include only parent nodes (hidden or observed)and ungrouped nodes6. Repeat steps 2 - 6 if more than two nodes remain; connect the remainingnodes otherwiseFigure 3.3: (a) The original latent tree with hidden (light green) and observed(dark green) nodes. (b) - (d) Outputs after three iterations of RG. Activenodes are in green and newly introduced nodes in yellow. Red circlesshow the learned families. (adapted with permission from Choi et al.[12])The distance between a new hidden node h and any other node i can be com-puted as follows:• If i is a child of h, then dih =di j+dik−d jk2 , where j is any other child of h and kis any other node in the active set• If i is not a child of h but is in the active set, dih = di j−d jh, where j is a childof h38• Otherwise, dih = d jk−d jh−dik, where j is a child of h and k is a child of iLike Neighbor-Joining [24], Recursive Grouping is guaranteed to recover any min-imal latent tree from a distance-matrix over an observed node set if and only if theoriginal tree is additive given the distance metric (proven in Choi et al. [12]). Thisis true of the Hamming distance metric but not the paralinear distance metric (seeSection 3.1.2).Relaxed Recursive GroupingSometimes additivity cannot be guaranteed, as discussed in Section 3.1.2. In thiscase, the equality tests that Recursive Grouping uses to identify relationships willnot always hold. To alleviate this problem, Choi et al. propose a few modificationsto the procedure, namely:1. Use softer measures to identify nodes of the same family2. Only consider distances that are under some threshold τ when looking forfamily relationships3. Use multiple nodes j and k to calculate dih as described in the previous sec-tion, and then average the resultsThe first of these is necessary to find relationships at all, whereas the second twoimprove precision. Choi et al. propose a couple ways to go about item 1, each ofwhich are described below.Relaxed Recursive Grouping - BasicIn this version, two nodes i and j are determined to be in the same family ifmax(dik−d jk)−min(dik−d jk) < ε, (3.2)where k is any other node and ε is a user-supplied threshold. The rationale is thatthis difference should be close to zero for nodes in the same family.Furthermore, a node k is identified as the parent of its family if|di j−dk j−dik| ≤ ε, (3.3)39for all nodes i and j in the same family.Relaxed Recusrive Grouping - k-means ClusteringThis is a modified version of k-means clustering that clusters nodes based on thestatisticΛi j = max(dik−d jk)−min(dik−d jk), (3.4)where k is any other node. If Λi j is close zero, then i and j are likely to be inthe same family. Potential clusters are enumerated and scored using the silhouettemethod [58] with Λi j as a dissimilarity measure.Implementation DetailsWe implemented both versions of Relaxed Recursive Grouping as described, withthe following adjustments and enhancements that were either reverse engineeredfrom the accompanying Matlab code or developed in house.1. (Both versions) Adaptive thresholding for τ . For each pair of nodes (i, j),we need to consider dik−d jk for at least three other nodes k to get accurategroupings. So, we iteratively relax τ until this condition is met.2. (Basic version) Instead of having a user-supplied ε , we start it at 0 and in-crease it incrementally every time an iteration of Recursive Grouping fails toproduce any new groups 2. When a new group is finally found, it is reset to0. The optimal increment depends on the distance metric and other factorsthat influence the amount of error expected in the calculations. Currently,we use an increment of 1 with the Hamming distance metric and 0.1 withthe paralinear distance metric.3. (k-means version) A mechanism to distinguish parent-child relationshipsfrom sibling relationships. Specifically, node k is identified as a parent ifand only if∑i∈ f amily6=kϒik < α ∗ (| f amily|−1), (3.5)2Note that by initializing ε to zero, we guarantee that Relaxed Recursive Grouping reverts toregular Recursive Grouping when tree additivity is observed and τ is larger than the longest branchlength.40where ϒki = mean(dk j−di j)+dki where j is any other node and α is a fixedparameter. Like ε , the optimal value for α depends on expected error.4. (k-means version) An improved method for selecting initial cluster centersbased on dissimilarity.3.1.4 Chow-Liu GroupingChow-Liu Grouping (Figure 3.4) is another distance-based method for learninglatent trees that significantly improves upon the computational complexity of Re-cursive Grouping by introducing a pre-processing step [12]. First, a minimumspanning tree (MST) [54] is built over the set of observed nodes using the distance-matrix as edge weights. Then, Recursive Grouping is applied to the neighbourhoodof each internal node of the MST in turn. In detail:1. Set T = MST (V ;D), where MST (V ;D) is the minimum spanning tree givenobserved vertices V and a distance matrix D2. Identify the set of internal nodes I ∈ T3. For each i ∈ l• Identify the closed neighbourhood Ci• Let Si be the output of RG applied over Ci• Replace the subtree over node set Ci in T with Si4. Return TIt is worth noting that Chow-Liu Grouping is a slight misnomer. It is called thisbecause the MST over a set of observed nodes given information distances (asdefined in [12]) is equivalent to the Chow-Liu tree [13] for the same, as long asthe underlying distribution is Gaussian or symmetric discrete [12]. The procedureis still applicable if these conditions do not apply (as in this project), but the MSTwill not be a Chow-Liu tree [12].Chow-Liu Grouping is guaranteed to reconstruct a minimal latent tree correctlyif the distance metric used is an additive information distance [12]. Neither of themetrics we tested satisfy this, as the paralinear distance metric is not additive for41Figure 3.4: (a) The original latent tree. (b) The MST, with the closed neigh-bourhood of 3 circled. (c) Output after applying RG to the closed neigh-bourhood of 3, with the closed neighbourhood of 5 circled. (d) Outputafter applying RG to the closed neighbourhood of 5. (adapted with per-mission from Choi et al. [12])the trees produced by our simulator (Section 3.1.2), and the Hamming distancemetric is not an information distance.Implementation DetailsChow-Liu Grouping was implemented as described, using Relaxed Recursive Group-ing - Basic as the Recursive Grouping procedure.423.1.5 Parsimony + Edge ContractionOne alternative to distance-based methods for learning Genotype Trees is to builda tree over the observed sequences using classical Parsimony [24] and then apply asimple post-processing step to the result (Figure 3.5). Because Parsimony assignssequences to internal nodes as it learns the tree, it is possible to convert the resultingtree into a Genotype Tree simply by collapsing edges between nodes with identicalsequence assignments in the manner described in Section 2.2.Figure 3.5: (a) The maximum parsimony tree. (b) The result after contractingedges between identical nodes.For our tests, we used the Phylip Mix [27] implementation of Camin-Sokalparsimony for discrete characters [10], which is heavily biased against changesfrom 1 to 0. This method returns several equally good trees, and we simply takethe first as there is no reason to prefer one over the other.433.2 Method Evaluation3.2.1 Experimental PipelineA single experimental run, or test, involves the following steps:1. Generate a single-cell evolutionary tree in accordance with simulator settings(Chapter 2)2. Convert the single-cell tree into a Genotype Tree (Section 2.1.3)3. Learn a Genotype Tree using a method of choice, taking the sequence dataat the leaves of the single-cell tree as input4. Score the result by comparing the learned tree from 3 with the actual treefrom 2 (Sections 3.2.2 and 3.2.3)This process is illustrated in Figure 3.6.3.2.2 Macro-Structure Scoring MetricsTo measure the overall similarity between two trees, we have developed three vari-ations on a matching-based method presented in Chowdhury, et al. [14]. All threefollow the same general procedure:1. Compute a maximum weight matching between the non-trivial splits3 of theoriginal tree and the learned tree (Figure 3.7).2. Calculate the reconstruction accuracy: WWo+Wl−W , where W is the weight ofthe maximum matching between the original and learned trees, Wo is theweight of the maximum matching between the original tree and itself, and Wlis the weight of the maximum matching between the learned tree and itself.If the tree is reconstructed perfectly, then W = Wo = Wl and the formulaevaluates to 1.Where the variations differ is in the schemes they use to determine the edge weightsin the bipartite graph in step 1. These are described below.3A split is the partitioning of labeled nodes into two groups by a single edge in the tree. Anon-trivial split is a split induced by an edge between two internal nodes.44Figure 3.6: A single experimental run with Recursive Grouping as the re-construction method. (1) Generate the single-cell tree. (2) Convert thesingle-cell tree to a Genotype Tree. (3a) Compute a distance matrix overthe leaf nodes. (3b) Learn a Genotype Tree using Recursive Grouping.(4) Score the result.Robinson-Foulds VariationThis variation is inspired by the widely-used Robinson-Foulds metric of tree dis-similarity, which simply counts the number of splits (or equivalently, taxa) thatare present in one of the trees but not both [56]. Each pair of splits is assigned aweight of 1 for perfect agreement or 0 for imperfect agreement. The reconstructionaccuracy can then be calculated as follows:RRF =WWo +Wl−W=W|Po|+ |Pl|−W(3.6)where Po and Pl are the sets of nontrivial splits in the original and learned treesrespectively. This metric is quite stringent as a few serious errors can dramaticallyaffect the score, even if the rest of the result is mostly correct.45Figure 3.7: The task of matching up the non-trivial splits of the original tree(left) and the learned tree (right) can be modelled as an assignment prob-lem [37]. The vertices in Po and Pl represent the non-trivial splits in theoriginal and learned trees respectively. A maximum weight matchingpairs up the splits in such a way that the sum of the edge weights in thebipartite graph is maximized. The edge weights in this example (in red)were calculated using the matching-splits variation.Matching-Splits VariationThis variation is based on the matching-splits metric described in Bogdanowicz etal. [7]. Each pair of splits is assigned a weight equal to the number of labeled(i.e. observed) nodes that fall on the same side of the split in both trees. The twosides of the splits are aligned based on whichever yields a higher weight. Thereconstruction accuracy can then be calculated as follows:RMS =WWo +Wl−W=W|T ||Po|+ |T ||Pl|−W(3.7)where T is the set of observed nodes and Po and Pl are the sets of nontrivial splitsin the original and learned trees respectively.Modified Matching-Splits VariationThe matching-splits weighting scheme is more sensitive to differences that arisedeep in the tree than differences that arise closer to the leaves. This is because thecloser a split is to the leaves, the smaller its smallest side will be, and this reducesthe chances that any given node will be on the wrong side of the split. This isarguably desirable because differences that arise deeper in the tree reflect more46grievous placement errors. However, if we expect that most errors will arise closeto the leaves, we might want to adjust the weighting scheme so that these errorsare penalized more harshly. To this end, we have devised a modified version ofthe matching splits weighting scheme that assigns weights to edges based on thefractional agreement between the smaller sides of the splits. The reconstructionaccuracy can then be calculated as:RMS =WWo +Wl−W=W|Po|+ |Pl|−W(3.8)where Po and Pl are the sets of nontrivial splits in the original and learned treesrespectively.3.2.3 Micro-Structure Scoring MetricsThese metrics assess how well a learned tree captures the immediate family rela-tionships between pairs of observed nodes. They emphasize exact preservation oflow-level relationships between observed nodes. Specifically, we have:• Family Recall = |Fo∩Fl ||Fo|• Family Precision = |Fo∩Fl ||Fl |• Family Correctness = |So∩Sl |+|Po∩Pl ||Fo∩Fl |whereSo = sibling-sibling pairs among observed nodes in the original Genotype TreeSl = sibling-sibling pairs among observed nodes in the learned Genotype TreePo = parent-child pairs among observed nodes in the original Genotype TreePl = parent-child pairs among observed nodes in the learned Genotype TreeFo = So∪PoFl = Sl ∪PlIn words, Family Recall is the fraction of family relationships (sibling-sibling orparent-child) between observed nodes in the original tree that are recovered in thelearned tree, while Family Precisions is the fraction of family relationships be-tween observed nodes in the learned tree that are also in the original tree. Family47Correctness is the fraction of correctly identified family relationships that also havethe correct type (sibling-sibling versus parent-child).3.2.4 Overview of TestsTests were run on trees generated under different simulator conditions. Specifi-cally, each simulator parameter (Section 2.1.1) was varied while keeping the oth-ers constant. For each configuration, thirty random trees were generated using thesame set of seeds.In addition to each of the three methods for Genotype Tree reconstruction, wealso tested the following for comparison:1. Classic Neighbor-Joining, as implemented by Biopython [15].2. The minimum spanning tree produced in Step 1 of Chow-Liu GroupingFor all these tests, we used the Hamming distance metric (Section 3.1.2) and the ba-sic version of Relaxed Recursive Grouping (Section ??), with τ set to ∞. Additionalanalyses are provided in Appendix B. Appendix B.1 compares the performance ofthe Hamming distance metric and the paralinear distance metric. Appendix B.2compares the two versions of Relaxed Recursive Grouping.Parameter SettingsThe variations tested for each parameter were:• Number of leaf nodes n: 4, 8, 16, 32, 64, 128, 256• Number of sequence positions p: 100, 300, 1000, 3000, 10000• Asymmetric division rate a: 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9• Mutation rate m: 0.002, 0.004, 0.008, 0.01, 0.02• Mutation-loss rate l: 0, 0.01, 0.02, 0.05, 0.1, 0.2• Dropout rate d: 0, 0.05, 0.1, 0.2, 0.4, 0.6The default settings (when the parameter was not being varied) were:48• Number of leaf nodes n: 32• Number of sequence positions p: 1000• Asymmetric division rate a: 0.5• Mutation rate m: 0.004• Mutation-loss rate l: 0.05• Dropout rate d: 0Rationale for Parameter SettingsThe values for number of leaf nodes covers a range that we can reasonably expectof single-cell sequencing in the near-future [61]. We did not test beyond 256 nodesbecause the result scoring becomes intractable beyond that, but it may be worth-while to optimize the scoring code to allow tests for more nodes. The dropout rateswere selected to cover everything from the ideal case (0) to the current state-of-the-art (around 0.1 for nuc-seq [67]) to unrealistically poor (0.6). The range ofvalues for mutation-loss rate are arbitrary because a flat mutation-loss rate is likelyunrealistic anyway (Section 2.3).Most of the default settings were selected arbitrarily, as much remains unknownabout cancer evolution.493.3 ResultsFor succinctness, this section uses the following acronyms for scoring metrics andreconstruction methods:• MS: Matching-Splits• MMS: Modified Matching-Splits• RF: Robinson-Foulds• FR: Family Recall• FP: Family Precision• FC: Family Correctness• RG: Recursive Grouping• CLG: Chow-Liu Grouping• PEC: Parsimony + Edge Contraction• MST: Minimum Spanning Tree• NJ: Neighbor-Joining503.3.1 Aggregate ResultsThis section compares the reconstruction accuracies of each of the five methodsby pooling results from across all configurations. Each boxplot summarizes theresults of 1170 tests (39 configurations x thirty random trees). In order to identifysignificant differences between conditions, we used Dunn’s test for non-parametricpairwise multiple comparisons [23] with Bonferroni correction, as implemented byAlexis Dinno [22]. We set a significance threshold of p < 0.01.Recursive Grouping Has the Best Overall PerformanceThe three methods designed to learn Genotype Trees (RG, CLG, and PEC) signif-icantly outperform the other two methods across all macro-structure scoring met-rics, with the exception of CLG under the RF metric (Figure 3.8, Table 3.2). Ofthe three, RG significantly outperforms CLG and PEC across all macro-structurescoring metrics (Table 3.2), with a particularly strong dominance over CLG underthe RF metric (median = 0.88 for RG, 0.68 for CLG) (Table 3.1).There are also significant differences between the three macro-structure scoresfor each method (Table 3.3). The MS scores for each method are significantlyhigher than the RF scores. The MS scores are also significantly higher than theMMS scores for all methods except NJ. The MMS scores are significantly higherthan the RF scores for all methods except RG. These differences are most pro-nounced for CLG.Every pair of methods differs significantly with respect to each micro-structurescoring metric (Table 3.5). There is a clear trade-off between FR and FP that ismost pronounced in the MST and NJ results (Figure 3.9). This reflects the ten-dency for methods to overgroup or undergroup. Overgrouping clusters leaf nodesinto fewer, larger families than in the original tree, while undergrouping does theopposite. NJ and PEC are undergroupers - they outperform the other methods interms of FP (Table 3.5). This is expected, since NJ produces a bifurcating tree andPEC simply makes small adjustments to a bifurcating tree. At the other extreme isthe MST, which outperforms the other methods in terms of FR.Family Correctness captures how well each method distinguishes parent-childfrom sibling-sibling relationships. NJ has the worst performance of the methods51(median = 0.48) (Table 3.4), which is expected because it misses all parent-childrelationships. MST also does poorly (median = 0.58), because it learns too manyparent-child relationships.On the whole, CLG and PEC perform moderately well in terms of micro-structure scoring metrics (median ≥ 0.75 across all metrics) (Table 3.4). RG per-forms best all around (median = 0.83,1.00,0.88 for FR, FP, and FC respectively).52Figure 3.8: Macro-structure scores for each method. Each boxplot summa-rizes the results of 1170 tests (39 configurations x thirty random trees).53MS MMS RFMed IQR Med IQR Med IQRRG 0.92 0.09 0.87 0.11 0.88 0.18CLG 0.90 0.08 0.84 0.08 0.68 0.18PEC 0.85 0.15 0.81 0.14 0.76 0.17MST 0.70 0.13 0.69 0.13 0.64 0.14NJ 0.71 0.11 0.71 0.11 0.68 0.15Table 3.1: Summary statistics on macro-structure scores for each method.RG CLG PEC MSTCLG -4.00, p = 4e-4PEC -12.70, p < 1e-4 -8.89, p < 1e-4MST -39.68, p < 1e-4 -35.68, p < 1e-4 -25.05, p < 1e-4NJ -41.41, p < 1e-4 -37.41, p < 1e-4 -26.69, p < 1e-4 -1.72, p = 0.43(a) MSRG CLG PEC MSTCLG -3.39, p = 3.6e-3PEC -9.60, p < 1e-4 -6.37, p < 1e-4MST -33.70, p < 1e-4 -31.60, p < 1e-4 -23.68, p < 1e-4NJ -26.72, p < 1e-4 -30.31, p < 1e-4 -22.45, p < 1e-4 1.29, p = 0.98(b) MMSRG CLG PEC MSTCLG -23.71, p < 1e-4PEC -12.44, p < 1e-4 -10.11, p < 1e-4MST -33.17, p < 1e-4 -9.46, p < 1e-4 -19.11, p < 1e-4NJ -26.72, p < 1e-4 -3.01, p = 1.3e-2 -12.97, p < 1e-4 6.45, p < 1e-4(c) RFTable 3.2: Dunn’s z-test-statistic and Bonferroni-adjusted p-value for pair-wise comparison of methods by macro-structure scoring metric.54MS MMSMMS -14.57, p < 1e-4RF -14.05, p < 1e-4 0.52, p = 0.90(a) RGMS MMSMMS -14.45, p < 1e-4RF -38.19, p < 1e-4 -23.75, p < 1e-4(b) CLGMS MMSMMS -6.72, p < 1e-4RF -17.21, p < 1e-4 -10.48, p < 1e-4(c) PECMS MMSMMS -3.37, p = 1e-3RF -14.64, p < 1e-4 -11.28, p < 1e-4(d) MSTMS MMSMMS 0.25, p = 1RF -5.70, p < 1e-4 -5.95, p < 1e-4(e) NJTable 3.3: Dunn’s z-test-statistic and Bonferroni-adjusted p-value for pair-wise comparison of macro-structure scoring metrics by method.55Figure 3.9: Micro-structure scores for each method. Each boxplot summa-rizes the results of 1170 tests (39 configurations x thirty random trees).56FR FP FCMed IQR Med IQR Med IQRRG 0.83 0.20 1.00 0.12 0.88 0.18CLG 0.78 0.19 0.80 0.18 0.75 0.25PEC 0.76 0.23 1.00 0.05 0.80 0.30MST 1.00 0.04 0.35 0.14 0.58 0.22NJ 0.59 0.26 1.00 0.00 0.48 0.23Table 3.4: Summary statistics on micro-structure scores for each method.RG CLG PEC MSTCLG -8.32, p < 1e-4PEC -10.56, p < 1e-4 -2.64, p = 4.2e-2MST 22.31, p < 1e-4 30.63, p < 1e-4 31.78, p < 1e-4NJ -30.03, p < 1e-4 -21.70, p < 1e-4 -18.00, p < 1e-4 -52.34, p < 1e-4(a) FRRG CLG PEC MSTCLG -19.28, p < 1e-4PEC 6.22, p < 1e-4 24.56, p < 1e-4MST -43.46, p < 1e-4 -24.18, p < 1e-4 -47.56, p < 1e-4NJ 14.31, p < 1e-4 33.58, p < 1e-4 7.39, p < 1e-4 57.78, p < 1e-4(b) FPRG CLG PEC MSTCLG -15.65, p < 1e-4PEC -10.80, p < 1e-4 4.08, p = 3e-4MST -32.24, p < 1e-4 -16.60, p < 1e-4 -19.87, p < 1e-4NJ -42.80, p < 1e-4 -27.15, p < 1e-4 -29.91, p < 1e-4 -10.56, p < 1e-4(c) FCTable 3.5: Dunn’s z-test-statistic and Bonferroni-adjusted p-value for pair-wise comparison of methods by micro-structure scoring metric.573.3.2 Results by Parameter SettingsThis section examines how the performance of each of the five methods varieswith different parameter settings. Each boxplot summarizes results on thirty ran-dom trees generated using the same thirty seeds. In order to identify significantdifferences between conditions, we used Dunn’s test with Bonferroni correctionand set a significance threshold of p < 0.01. All differences, improvements, anddeclines described in the following sections were significant at this threshold. SeeAppendix C for exact figures.Performance Stabilizes as Number of Leaf Nodes IncreasesThe tests in this section used the following exceptions to the default parametersettings. This was necessary because larger trees ran out of mutable locations whenthe usual defaults were used.• Number of sequence positions p: 10000 (instead of 1000)• Mutation rate m: 0.002 (instead of 0.004)For all methods and metrics, the range of scores decreases as number of leaf nodesincreases (Figures 3.10 and 3.11). This reflects the law of large numbers operatingon populations of nodes. Increasing the size of a tree balances out idiosyncrasiesthat might influence the score.The performances of RG and CLG stabilize4 at 64 nodes across all metrics,with the exception of the RF scores for CLG, which decline as the number of leafnodes increases in increments of decreasing size (0.1 decline in median score from16 to 64 nodes, p = 5e-4; 0.05 decline in median score from 64 to 256 nodes, p =7.3e-3) (Figure 3.10).The performances of PEC and NJ stabilize at 16 nodes, and MST stabilizes at32 nodes across all metrics (Figures 3.10 and 3.11).At the lower range, there is a decline in macro-structure scores for all methodsbetween 4 nodes and 32 nodes, with the exception of NJ, which shows an im-provement in the same range (Figure 3.10). FR shows the same trend in this range,4Here, stabilize means that there are no significant differences between any pair of groups at orbeyond the parameter level specified.58except with no significant changes for MST (Figure 3.11). FP declines in this rangefor CLG and MST only. FC declines in this range for all methods except NJ.Figure 3.10: Macro-structure scores for each method by number of leaf nodesin the original tree. Each boxplot summarizes the results for thirtyrandom trees. There is no data for Parsimony + Edge Contraction at256 nodes due to intractability.59Figure 3.11: Micro-structure scores for each method by number of leaf nodesin the original tree. Each boxplot summarizes the results for thirtyrandom trees. There is no data for Parsimony + Edge Contraction at256 nodes due to intractability.60The Optimal Number of Positions Varies by MethodRG shows a decline in performance between 3000 and 10000 positions across allscoring metrics except FP (Figure 3.12). Additionally, it shows an improvementbetween 1000 and 3000 according to the RF metric.The performance of CLG declines between 100 and 1000 and then stabilizesacross all macro-structure scoring metrics (Figure 3.12). The micro-structure scoresfor CLG show a similar trend, except that there is also a decline in FR and FC be-tween 1000 and 10000 positions (Figure 3.13).The MS and MMS scores for PEC decline slightly as number of positions in-creases (Figure 3.12). There are no significant changes in the RF scores for PEC.The FR and FC scores for PEC show larger declines as the number of positionsincreases, which is most pronounced for FC (0.38 decline in median score from1000 to 10000, p = 4.8e-7) (Figure 3.13).The performance of MST declines between 100 and 1000 and then stabilizesfor all metrics except FR and FP (Figure 3.13). There are no significant changes inthe FR or FP scores for MST.In contrast, Neighbor-Joining shows an improvement in performance between100 and 1000 positions across all metrics except FP, which is consistently very high(Figure 3.13). However, it is inferior to or no better than any of the other methodsacross all numbers of positions and metrics except FP (Appendix C.2).61Figure 3.12: Macro-structure scores for each method by number of sequencepositions. Each boxplot summarizes the results for thirty random trees.62Figure 3.13: Micro-structure scores for each method by number of sequencepostions. Each boxplot summarizes the results for thirty random trees.63Performance by Asymmetric Division Rate Reflects Result TopologyThere is a nearly linear decline in macro-structure scores for NJ as the asymmetricdivision rate increases (MS: α = -0.43, R2 = 0.86; MMS: α = -0.44, R2 = 0.87; RF:α = -0.4, R2 = 0.8) (Figure 3.14). MST shows the opposite trend (MS: α = 0.41,R2 = 0.75; MMS: α = 0.42, R2 = 0.76; RF: α = 0.41, R2 = 0.65).This result reflects the relationship between asymmetric division rate and Geno-type Tree topology, as discussed in Section 2.2.1. If there are no asymmetric divi-sions, then the Genotype Tree is a full bifurcating tree, just like an NJ result. If alldivisions are asymmetric, then the Genotype Tree contains only observed nodes,just like an MST.The PEC macro-structure scores all decline gradually as asymmetric divisionrate increases (Figure 3.14). This is a less extreme reflection of the trends forNJ, which is not surprising because PEC simply makes small adjustments to thebifurcating tree produced by Parsimony.The macro-structure scores for RG are robust to asymmetric division rate, withat most one significant difference occurring within each condition (Figure 3.14).The MMS and RF scores for CLG each show gradual improvements with increasesin asymmetric division rate, although the improvements for MMS are very small(0.05 increase in median score from 0 to 0.9, p = 3e-4).The FR and FP trends for MST and NJ are as expected (Figure 3.15). MSTovergroups when asymmetric division rates are low, while NJ undergroups whenasymmetric division rates are high. The opposing FC trends for MST and NJ reflectthe fact that the ratio of parent-child to sibling-sibling relationships increases asthe asymmetric division rate increases (see Figure 2.3). Again, the trends in micro-structure scores for PEC are less extreme reflections of those for NJ.RG shows a gradual decline in FR starting at 0.5 asymmetric division rate(Figure 3.15). The FC scores for RG decline slightly between 0.1 and 0.5 and thenstabilize. The FP and FC scores for CLG both improve with increasing asymmetricdivision, while the FR scores peak in the 0.1 to 0.8 range.It is worth noting that while the scores for RG, CLG, and PEC are still gener-ally superior to MST and NJ, at asymmetric division rates of 0.1 and lower, NJ issuperior to CLG across all metrics except MS (Appendix C.8).64Figure 3.14: Macro-structure scores for each method by asymmetric divisionrate. Each boxplot summarizes the results for thirty random trees.65Figure 3.15: Micro-structure scores for each method by asymmetric divisionrate. Each boxplot summarizes the results for thirty random trees.66Performance is Robust to Mutation RateThe scores for all five methods are generally robust to mutation rate (Figures 3.16and 3.17) with a few exceptions.The MS and MMS scores for PEC show small declines between 0.002 and0.006 and then stabilize (Figure 3.16). The RF scores for RG show an improvementbetween 0.002 and 0.008 and then stabilize, whereas the RF scores for NJ showgradual improvement from 0.002 to 0.02.The FR scores for NJ also improve gradually from 0.002 to 0.02 (Figure 3.17).The FP scores for RG improve between 0.002 and 0.006 and then stabilize. The FCscores for PEC decline gradually from 0.002 to 0.02, and the FC scores for CLGdecline between 0.002 and 0.004 and then again between 0.01 and 0.02.All other combinations of methods and metrics have no more than one signifi-cant difference across mutation rates.67Figure 3.16: Macro-structure scores for each method by mutation rate. Eachboxplot summarizes the results for thirty random trees.68Figure 3.17: Micro-structure scores for each method by mutation rate. Eachboxplot summarizes the results for thirty random trees.69Neighbor-Joining is Most Robust to Mutation-Loss and DropoutHere, we see that RG and PEC perform perfectly in the absence of mutation loss(Figures 3.18 and 3.19). This is as expected (see Section 3.1.2).RG, CLG, and PEC all decline in performance across all metrics as the rate ofmutation loss increases (Figures 3.18 and 3.19), although this is somewhat less pro-nounced for CLG. The decline is most severe for the RF metric. The one exceptionis that FP remains high for PEC (Figure 3.19).MST and NJ are both fairly robust to mutation loss (Figures 3.18 and 3.19).MST exhibits slight declines for RF, FR, and FP beginning at a mutation-loss rateof 0.05. The only significant decline for NJ is between 0.05 and 0.2 under the RFmetric (Figure 3.18).All methods show declines in performance across metrics as dropout increases(Figures 3.20 and 3.21) with the exception of FC for NJ, which is low to beginwith.Overall, NJ is the most robust for dropout, with no significant performancedeclines up to a dropout rate of 0.1 (Figures 3.20 and 3.21). MST also has nosignificant declines up to a dropout rate of 0.1. However, at higher dropout rates,the MS and MMS scores decline more severely for MST than for NJ (Figure 3.20).Competition is Closer at 0.1 Dropout RateIt is worth giving special attention to how the methods perform at 0.1 dropoutrate, since this is the current state-of-the-art for single-cell sequencing [67]. Atthis dropout rate, there are no significant differences between the macro-structurescores for RG, CLG, and PEC (Figure 3.20, Appendix C.9). NJ is inferior to allother methods in terms of MS scores but superior to or competitive with all othermethods in terms of RF scores (Figure 3.20, Appendix C.9).70Figure 3.18: Macro-structure scores for each method by mutation-loss rate.Each boxplot summarizes the results for thirty random trees.71Figure 3.19: Micro-structure scores for each method by mutation-loss rate.Each boxplot summarizes the results for thirty random trees.72Figure 3.20: Macro-structure scores for each method by dropout rate. Eachboxplot summarizes the results for thirty random trees.73Figure 3.21: Micro-structure scores for each method by dropout rate. Eachboxplot summarizes the results for thirty random trees.74Chow-Liu Grouping is Competitive with Neighbor-Joining in Terms of SpeedOf the three methods designed to learn Genotype Trees, Chow-Liu Grouping hasthe best computational complexity and is even somewhat superior to Neighbor-Joining (median runtime at 256 nodes = 75 seconds for CLG, 101 seconds for NJ;p = 4.3e-8 by Dunn’s test on all methods at 256 nodes). These results agree withthe complexities reported in Choi et al. [12].Recursive Grouping is reasonably fast and, extrapolating from the current tra-jectory, should run on 1000 nodes in under a day.As expected, PEC becomes intractable as the number of leaf nodes increases,since finding a most parsimonious tree is known to be NP-Hard [16].Figure 3.22: Runtime in seconds for each method by number of leaf nodes inthe original tree. Each boxplot summarizes the results for thirty ran-dom trees. There are no data points for Parsimony + Edge Contractionat 256 nodes due to intractability.753.4 DiscussionOverall, the three methods for reconstructing Genotype Trees (Recursive Group-ing, Chow-Liu Grouping, and Parsimony + Edge Contraction) outperform the min-imum spanning tree and Neighbor-Joining (Section 3.3.1). They show superior per-formance according to the macro-structure scoring metrics (Figure 3.8), as well asa better balance between Family Recall and Family Precision (Figure 3.9). Re-cursive Grouping is arguably the best all-around, as it has the highest medianscore across all macro-structure metrics (Table 3.1) and a competitive median scoreacross micro-structure metrics (Table 3.4). Furthermore, it is reasonably robust tochanges in conditions such as number of leaf nodes, number of positions, asym-metric division rate, and mutation rate.However, the relative strengths of the methods depend on the specific condi-tions. For instance, at a dropout rate of 0.1, there are no significant differencesin performance between the three methods for learning Genotype Trees, and evenNeighbor-Joining is competive under certain metrics (Section 3.3.2). Addition-ally, Neighbor-Joining outperforms Chow-Liu Grouping at low asymmetric divi-sion rates (≤ 0.1) (Section 3.3.2).Another performance measure to consider is runtime. This is a major weaknessof Parsimony and a major strength of Chow-Liu Grouping. Recursive Grouping isalso reasonably fast, but Chow-Liu Grouping might be more suitable if a researcherwishes to obtain a phylogeny over several thousand genotypes very quickly. Alter-natively, if asymmetric division is known to be uncommon in a particular cancertype (occuring in fewer than 10% of divisions), then Neighbor-Joining might bea good choice because it is also fast, well-understood, and outperforms Chow-LiuGrouping at these rates (Section 3.3.2).Although Recursive Grouping and Chow-Liu Grouping appear to be strongcandidates for learning phylogenies over many genotypes, it is difficult to predicthow they will compete on a smaller scale because they have yet to be compared toslower but more powerful methods such as maximum-likelihood and Bayesian ap-proaches (Section 1.2). Furthermore, we have only tested the Camin-Sokal modelof Parsimony, which is heavily biased against 1 to 0 transitions. We might getbetter reconstruction accuracies if we used classical Parsimony instead.763.4.1 Possible Explanations for Various EffectsThis section proposes possible explanations for various characteristics of the re-sults. These are speculative and will require further testing to verify. This sectionuses the acronyms defined in Section 3.3.Sensitivity of Performance to Number of Leaf NodesThe number of homoplasies per leaf node increases linearly with the depth of thetree (Figure 2.4), so we might expect performance to decline as the tree grows.Indeed, this is what we observe in the lower range of values for all methods exceptNJ. However, the scores converge at around 16 to 64 nodes, depending on themethod and metric. For this reason, the performance decline can not be attributedto total number of homoplasies alone.A possible explanation for this is that reconstruction accuracy is not sensitive tothe total number of homoplasies, but rather, the number of perfect lineages (that is,leaf nodes that contain no homoplasies). If this is the case, then performance shoulddecline up to the point where the expected number of perfect lineages approacheszero. More work is needed to determine this threshold, but considering that eachhomoplasy affects at least one leaf node, it may be in the range of around 1 -2 homoplasies per leaf node. Interestingly, this is around the number we wouldexpect at 16 to 64 leaf nodes (Figure 2.4).The fact that NJ shows the opposite trend is compelling and warrants furtherinvestigation.Sensitivity of Performance to Number of PositionsLike number of leaf nodes, number of positions directly influences the number ofhomoplasies in the tree (Figure 2.6), so we might expect performance to decline asthe number of positions increases. However, each method responds to changes innumber of positions in subtle but significant ways.The macro-structure scores for RG peak at 3000 positions (Section 3.3.2). Thismay have something to do with the increment used for relaxation of ε , as describedin Section 3.1.3. Because ε is used to group nodes in the presence of additivity vio-lations, the ideal increment will depend on how much error is expected in distance77calculations. In the case of the Hamming distance, this error is directly proportionalto the number of homoplasies expected in each sequence. This, in turn, dependson mutation rate, mutation-loss rate, dropout rate, and number of positions. Thiscould also explain the decline in FR from 3000 to 10000 positions - if ε is too lowfor the expected error rate, RG may undergroup.Additionally, the FC scores for RG decline as the number of positions in-creases. This may be because ε is also used to distinguish parent-child fromsibling-sibling relationships. If ε is too low for the expected error rate, then anactual parent-child pair may fail the parent-child test.The macro-structure scores for the MST decline up to 1000 positions. Interest-ingly, the macro-structure scores for MST follow a similar trend to the FP scores,suggesting that the performance decline between 100 and 1000 positions might bedue to overgrouping. A possible reason for this is that homoplasies cause nodes tolook more similar to ancestors, resulting in overgrouped trees.Again, it is interesting that this effect is only observed up to 1000 positions.This might also reflect a sensitivity to number of perfect lineages instead of to-tal homoplasy count. The expected number of perfect lineages should approachzero beyond some threshold as the number of positions increases, and a thresholdanywhere between 300 and 3000 would be consistent with the data.Not surprisingly, the scores for CLG in response to number of positions appearto be a compromise between the scores for RG and MST - its two constituents.PEC shows a gradual decline in performance with more positions under theMS and MMS metrics. This is not surprising because Camin-Sokal parsimony isheavily biased against 1 to 0 transitions, so increasing the number of homoplasiesshould negatively impact performance. However, this raises the question as to whyPEC does not suffer performance losses beyond 16 leaf nodes.The FR and FC scores for PEC also decline as the number of positions in-creases. This is likely because homoplasy obscures parent-child relationships, re-sulting in fewer edge contractions and undergrouped trees.55To illustrate why homoplasy interferes with edge contraction, imagine a = {001} is the parent of b = {011}in the original tree. The correct parsimony solution would join them as siblings with a parent c = {001}. Then,the edge between a and c would be contracted. But if mutation loss had resulted in b = {010}, then the twowould be joined as siblings under parent c = {000} (if at all). Then, there would be no edge contraction, and theparent-child relationship would be misclassified as sibling-sibling.78In contrast, Neighbor-Joining shows an improvement in FR and FC with morepositions, while FP is consistently perfect or near perfect. This means that the NJ islearning few families at all when the number of positions is low, which is indicativeof an unbalanced tree structure where leaf nodes are more often joined with internalnodes than with each other. FC is also low when there are fewer positions, whichmeans that the families NJ is learning are mostly parent-child in the original tree.These are interesting trends that warrant further investigation.Sensitivity of Performance to Mutation RateWith the exception of FC, the scores for RG, CLG, and MST are reasonably robustto mutation rate (Section 3.3.2). This is somewhat surprising because mutation ratealso influences the total number of homoplasies in the tree (Figure 2.7). Moreover,the range of homoplasy counts covered by the mutation rate tests overlaps withthe ranges covered by the number of leaf nodes tests (Figure 2.5) and number ofpositions tests (Figure 2.6). This calls into question the theories presented in theprevious sections attributing performance declines to increases in homoplasy.The fact that PEC does exhibit a few small performance losses as mutationrate increases is encouraging, as it is consistent with the findings for number ofpositions.Sensitivity of Performance to Mutation-Loss Rate and DropoutMutation-loss rate has a more pronounced effect on the macro-structure scores forRG and PEC than mutation rate, even though the homoplasy counts at the upperend of the mutation rate tests exceed those at the upper end of the mutation-lossrate tests (around 35 - 60 for a mutation rate of 0.2 versus 25 - 45 for a mutation-loss rate of 0.02) (Figures 2.7 and 2.8). A possible explanation for this is thatinformative mutations (that is, 0 to 1 transitions that are never lost) partially offsetthe detrimental effect of homoplasy. In other words, the signal to noise ratio isconserved when the mutation rate is increased but not when the mutation-loss rateis increased. This makes sense for PEC because even though Camin-Sokal parsi-mony is heavily biased against 1 to 0 transitions, it might permit such a transitionif doing otherwise would require a large number of 0 to 1 transitions.79Dropout negatively affects performance for the same reasons as mutation loss.Additionally, the homoplasy counts at the upper end of the dropout tests are largerthan for any other test set (Figure 2.9), and this likely explains the relative size ofthe effect.Differences Between Macro-Structure Scoring MetricsAs discussed in Section 3.3.1, there are several significant differences between theMS, MMS, and RF scores for each method. This is expected because each ofthese metrics is more conservative than the last. What is surprising is that thiseffect is most pronounced for CLG. In particular, the difference between the MMSscores and the RF scores for CLG is much greater than it is for the other methods(Table 3.3). This is also apparent in the tests for number of leaf nodes, wherethe RF scores for CLG decay with increasing number of leaf nodes while the MSand MMS scores increase or remain constant (Section 3.3.2). The macro-structurescores for MST also follow this pattern but to a lesser extent (Figure 3.10).It is possible that this is related to the MST construction phase of CLG. The RFmetric penalizes errors harshly, especially if they occur deep in the tree. So, if justa few nodes are seriously misplaced in the MST, then the RF score for the CLGresult will be disproportionately affected. Larger trees are likely more vulnerableto this because the number of homoplasies per node increases with the size of thetree (Section 2.2.2). The reason that the MST does not show this pattern as stronglymay be that inaccuracies due to size dominate its score.3.4.2 Opportunities for Further InvestigationThe results presented in this chapter raise many interesting questions for furtherstudy. First, Section 3.4.1 often postulated that the relationship between perfor-mance and some parameter might be due to the mediating effect that parameter hason another tree property, such as the number of homoplasies or perfect lineages.To test these theories, we could vary these properties directly and observe howreconstruction accuracy varies. Furthermore, we could conduct more analyses inthe style of Section 2.2 to quantify the relationship between different configurableparameters and these properties. Finally, there could be value in investigating the80differences between macro-structure scoring metrics. To do this, we could generatepairs of trees that differ from each other in systematic ways and compare how thethree metrics score them.It might also be worthwhile to conduct additional analyses on the current datasets.For instance, there are two varieties of reconstruction error: size errors (too manyor too few nodes in the learned tree) and placement errors (nodes incorrectly placedrelative to each other). It might be enlightening to tease these types of errors apart.The MST and Neighbor-Joining are automatically at a disadvantage because theylearn a tree of the wrong size, but they might be performing well in terms of nodeplacement. This could also reveal the extent to which Chow-Liu Grouping’s recon-struction error can be attributed to placement errors in the MST.Another avenue for testing would be to investigate the effects of the two Re-laxed Recursive Grouping parameters, ε and τ . Currently, ε is set to 0 and itera-tively relaxed after each iteration that fails to produce new groups. The ideal incre-ment will depend on how much error is expected in distance calculations, whichdepends on the distance metric and other parameters. Additional tests are neededto see how much of an effect this has on performance.Similarly, the parameter τ can be used to disregard distances that exceed acertain length when determining family relationships, since greater distances tendto have more error. Currently, τ is set to ∞, which effectively means that it is notbeing used. Recursive Grouping probably does not stand to gain much from settingτ considering that it already performs quite well under most tested conditions. Still,setting τ might alleviate the performance decline at high mutation-loss and dropoutrates. This could be worth exploring in the next phase of testing.3.5 ConclusionThis chapter has described the design, implementation, and evaluation of threemethods for learning Genotype Trees from binary SNV profiles. These meth-ods include Recursive Grouping and Chow-Liu Grouping (collectively, Latent TreeModel-based approaches) and Parsimony + Edge Contraction.We tested each of these methods on a variety of datasets generated by the sim-ulator presented in Chapter 2. Additionally, we tested Neighbor-Joining and the81minimum spanning tree produced in Step 1 of Chow-Liu Grouping for compari-son.In summary, Recursive Grouping and Chow-Liu Grouping appear to be strongcandidates for learning phylogenies over hundreds to thousands of cancer geno-types. Chow-Liu Grouping is particularly fast and only slightly weaker than Re-cursive Grouping in terms of overall accuracy. However, the relative strengths ofthe methods depend on the specific conditions. It remains to be seen how thesemethods will compare to slower but more powerful methods such as maximum-likelihood and Bayesian approaches.82Chapter 4ConclusionThe future of cancer phylogenetics is likely to be heavily shaped by single-cellsequencing technologies, which continue to improve in terms of coverage, quality,and cost [67]. Single-cell sequencing has the power to reveal genotypic diversityof cancer cell populations at a finer resolution than ever before.In light of this, there is a demand for phylogenetic methods that can constructphylogenies over a large number of unique genotypes quickly and accurately. Clas-sical phylogenetics offers a number of scalable methods, such as Neighbor-Joining.However, these methods put all observed varieties on the leaves of the tree, whichis not a valid assumption in the context of cancer.The current work addresses this need by describing and evaluating three meth-ods for learning cancer Genotype Trees from binary SNV profiles. The reconstruc-tion accuracy and runtime of these methods were systematically evaluated usingdatasets produced by our custom simulator. Of these methods, Recursive Groupingand Chow-Liu Grouping appear to be strong candidates for learning phylogeniesover hundreds to thousands of genotypes.Nevertheless, an accurate phylogeny of a cancer cell population must accountfor structural variations as well as SNVs. In the next phase, we hope to test thesemethods on datasets that model both types of variation. This will require us toextend the capabilities of the simulator and devise a distance-metric that consid-ers structural variations and SNVs jointly. There is already an extensive body ofliterature on this topic to draw from (Section 1.3).83Bibliography[1] K. Atteson. The performance of the neighbor-joining method of phylogenyreconstruction. In B. Mirkin, F. R. McMorris, F. S. Roberts, andA. Rzhetsky, editors, Mathematical hierarchies and biology. DIMACS Seriesof Discrete Mathematics and Theoretical Computer Science, volume 37,pages 133–147. American Mathematical Society, 1997. → pages 7[2] S. Baca et al. Punctuated evolution of prostate cancer genomes. Cell, 153:666–677, 2013. → pages 30[3] N. Beerenwinkel, R. Schwarz, M. Gerstung, and F. Markowetz. Cancerevolution: Mathematical models and computational inference. SystematicBiology, 64(1):e1–e25, 2015. → pages 4, 13, 14, 31[4] N. Beerenwinkel et al. Learning multiple evolutionary pathways fromcross-sectional data. Journal of Computational Biology, 12(6):584–598,2005. → pages 11, 12[5] J. Bergsten. A review of long-branch attraction. Cladistics, 21:163–193,2005. → pages 4[6] A. Bild, A. Potti, and J. Nevins. Linking oncogenic pathways withtherapeutic opportunities. Nature Reviews Cancer, 6:735–741, 2006. →pages 3[7] D. Bogdanowicz and K. Giaro. Matching split distance for unrooted binaryphylogenetic trees. IEEE/ACM Transactions of Computational Biology andBioinformatics, 9(1):150–160, 2012. → pages 46[8] W. Bruno, N. Sock, and A. Halpern. Weighted neighbour joining: Alikelihood-based approach to distance-based phylogeny reconstruction.Systematic Zoology, 17(1):189–197, 1999. → pages 784[9] J. Cairns. Mutation selection and the natural history of cancer. Nature, 255:197–200, 1975. → pages 30[10] J. Camin and R. Sokal. A method for deducing branching sequences inphylogeny. Evolution, 19(3):311–326, 1965. → pages 43[11] M. Chaisson. Resolving the complexity of the human genome usingsingle-molecule sequencing. Nature, 517:608–611, 2014. → pages 15[12] M. J. Choi, V. Y. F. Tan, A. Anandkumar, and A. S. Willsky. Learning latenttree graphical models. Journal of Machine Learning Research, 12:1771–1812, 2011. → pages x, 34, 35, 36, 37, 38, 39, 41, 42, 75, 91[13] C. Chow and C. Liu. Approximating discrete probability distributions withdependence trees. IEEE Trans. on Information Theory, 3:462–467, 1968. →pages 41[14] S. A. Chowdhury, S. E. Shackney, K. Heselmeyer-Haddad, T. Ried, A. A.Schaffer, and R. Schwartz. Algorithms to model single gene, singlechromosome, and whole genome copy number changes jointly in tumorphylogenetics. PLOS Computational Biology, 10:e1003740, 2014. → pages12, 13, 44[15] P. Cock et al. Biopython: freely available python tools for computationalmolecular biology and bioinformatics. Bioinformatics, 25:1422–1423, 2009.→ pages 48[16] W. Day and D. Sandhoff. Computational complexity of inferringphylogenies by compatibility. Systematic Biology, 35(2):224–229, 1986. →pages 8, 75[17] R. Desper, F. Jiang, O. P. Kallioniemi, H. Moch, C. H. Papadimitriou, andA. A. Schaffer. Inferring tree models for oncogenesis from comparativegenome hybridization data. Journal of Computational Biology, 6:37–51,1999. → pages 10, 11, 12[18] R. Desper, F. Jiang, O. P. Kallioniemi, H. Moch, C. H. Papadimitriou, andA. A. Schaffer. Distance-based reconstruction of tree models foroncogenesis. Journal of Computational Biology, 7:789–803, 2000. → pages11, 12[19] R. Desper, J. Khan, and A. Schaffer. Tumor classification usingphylogenetic methods on expression data. Journal of Theoretical Biology,228:477–496, 2004. → pages 1085[20] L. Ding, B. Raphael, F. Chen, and M. Wendi. Tumour evolution inferred bysingle-cell sequencing. Cancer Letters, 3117(2):90–94, 2013. → pages 2[21] L. Ding et al. Clonal evolution in relapsed acute myeloid leukaemia revealedby whole-genome sequencing. Nature, 481:506–510, 2012. → pages 3[22] A. Dinno. dunn.test: Dunn’s Test of Multiple Comparisons Using RankSums, 2015. URL http://CRAN.R-project.org/package=dunn.test. Rpackage version 1.2.4. → pages 51[23] O. Dunn. Multiple comparisons using rank sums. Technometrics, 6(3):241–252, 1964. → pages 51[24] R. Durbin, S. Eddy, A. Krogh, and G. Mitchison. Biological sequenceanalysis - Probabilistic models of proteins and nucleic acids. CambridgeUniversity Press, 1998. → pages 3, 5, 7, 9, 37, 39, 43, 93[25] P. Eirew et al. Dynamics of genomic clones in breast cancer patientxenografts at single-cell resolution. Nature, 518:422–426, 2015. → pages 3,12[26] J. Felsenstein. Cases in which parsimony or compatibility methods will bepositively misleading. Systematic Zoology, 27(4):401–410, 1978. → pages4, 8[27] J. Felsenstein. Phylip (phylogeny inference package) version 3.6.Distributed by the author. Department of Genome Sciences, University ofWashington, Seattle., 2005. → pages 43[28] W. Fitch. Toward defining the course of evolution: minimum change for aspecified tree topology. Systematic Zoology, 20:406–416, 1971. → pages 7[29] M. Greaves and C. Maley. Clonal evolution in cancer. Nature, 481:306–313,2012. → pages 1, 2, 3, 31[30] R. Hamming. Error detecting and error correcting codes. The Bell SystemTechnical Journal, 29(2), 1950. → pages 36[31] M. Holder and P. Lewis. Phylogeny estimation: traditional and bayesianapproaches. Nature Reviews Genetics, 4:275–284, 2003. → pages 9[32] W. Jiao, S. Vembu, A. Deshwar, L. Stein, and Q. Morris. Inferring clonalevolution of tumors from single nucleotide somatic mutations. BMCBioinformatics, 15, 2014. → pages 12, 1486[33] T. Jukes and C. Cantor. Evolution of protein molecules. Mammalian ProteinMetabolism, Academic Press:21–132, 1969. → pages 5[34] K. Kim and R. Simon. Using single cell sequencing data to model theevolutionary history of a tumor. BMC Bioinformatics, 15(27), 2014. →pages 12, 15[35] M. Kimura. The number of heterozygous nucleotide sites maintained in afinite population due to steady flux of mutations. Genetics, 61:893–903,1969. → pages 4[36] J. A. Knoblich. Asymmetric cell division: recent developments and theirimplications for tumour biology. Nature Reviews—Molecular Cell Biology,11:849–860, 2010. → pages 16, 22[37] H. Kuhn. The hungarian method for the assignment problem. NavalResearch Logistics Quarterly, pages 83–97, 1955. → pages xi, 46[38] J. A. Lake. Reconstructing evolutionary trees from dna sequences:Paralinear distances. Proceedings to the National Academy of Science, 91:1455–1459, 1994. → pages 37[39] D. Landau, S. Carter, G. Getz, and C. Wu. Clonal evolution in hematologicalmalignancies and therapeutic implications. Leukemia, 28:34–43, 2014. →pages 3[40] M. Lawrence et al. Mutational heterogeneity in cancer and the search fornew cancer-associated genes. Nature, 499:214–218, 2013. → pages 2[41] E. Letouze´, Y. Allory, M. Bollet, F. Radvanyi, and F. Guyon. Analysis of thecopy number profiles of several tumor samples from the same patient revealsthe successive steps in tumorigenesis. Genome Biology, 11:1–19, 2010. →pages 12, 13[42] P. McHale and A. Lander. The protective role of symmetric stem celldivision on the accumulation of heritable damage. PLOS ComputationalBiology, 10:e1003802, 2014. → pages 30[43] L. Merlo, J. Pepper, B. Reid, and C. Maley. Cancer as an evolutionary andecological process. Nature Reviews Cancer, 6:924–935, 2006. → pages 1, 2[44] C. Miller. Sciclone: Inferring clonal architecture and tracking the spatial andtemporal patterns of tumor evolution. PLOS Computational Biology, 10:e1003665, 2014. → pages 1487[45] M. Mourad, S. C., N. L. Zhang, T. Liu, and P. Leray. A survey on latent treemodels and applications. Journal of Artificial Intelligence Research, 47:157–203, 2013. → pages x, 35, 36[46] N. Navin et al. Inferring tumour progression from genomic heterogeneity.Genome Research, 20(1):68–80, 2010. → pages 12, 13[47] N. Navin et al. Tumour evolution inferred by single-cell sequencing. Nature,472(7341):90–94, 2011. → pages 2, 3, 12, 13[48] R. Neumu¨ller and J. Knoblich. Dividing cellular asymmetry: asymmetriccell division and its implications for stem cells and cancer. Genes andDevelopment, 23:2675–2699, 2009. → pages 30[49] P. C. Nowell. The clonal evolution of tumor cell populations. Science, 196:23–28, 1976. → pages 1[50] L. Oesper, A. Mahmoody, and B. Raphael. Theta: Inferring intra-tumorheterogeneity from high-throughput dna sequencing data. Genome Biology,14:R80, 2013. → pages 14[51] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Network ofPlausible Inference. Morgan Kaufmann, 1988. → pages 34, 91[52] G. Pennington, C. Smith, S. Shackney, and R. Schwartz.Expectation-maximization for the reconstruction of tumor phylogenies fromsingle-cell data. Computational Systems Bioinformatics Conference (CSB),pages 371–380, 2006. → pages 12[53] K. Polyak. Breast cancer: origins and evolution. The Journal of ClinicalInvestigation, 117(11):3155–3163, 2011. → pages 2[54] R. Prim. Shortest connection networks and some generalizations. BellSystem Technical Journals, 36(6):1389–1401, 1957. → pages 41[55] M. Riester, C. Attolini, R. Downey, S. Singer, and F. Michor. Adifferentiation-based phylogeny of cancer subtypes. PLOS ComputationalBiology, 6(5), 2010. → pages 10[56] D. Robinson and L. Foulds. Comparison of phylogenetic trees.Mathematical Biosciences, 53:131–147, 1981. → pages 45[57] A. Roth et al. Pyclone: statistical inference of clonal population structure incancer. Nature Methods, 11:396–398, 2014. → pages 1488[58] P. J. Rousseeuw. Silhouettes: A graphical aid to the interpretation andvalidation of cluster analysis. Journal of Computational and AppliedMathematics, 20:53–65, 1987. → pages 40[59] D. Sankoff and R. Cedergren. Simultaneous comparison of three of moresequences related by a tree. In D. Sankoff and J. Kruskal, editors, TimeWarps, String Edits, and Macromolecules: The Theory and Practice ofSequence Comparison, chapter 9, pages 253–264. 1983. → pages 7[60] R. Schwarz et al. Phylogenetic quantification of intra-tumour heterogeneity.PLOS Computational Biology, 10(4):e1003535, 2014. → pages 12, 13[61] E. Shapiro, T. Biezuner, and S. Linnarsson. Single-cell sequencing-basedtechnologies will revolutionize whole-organism science. Nature ReviewsGenetics, 14:618–630, 2013. → pages 14, 15, 49[62] P. Stephens et al. Massive genomic rearrangement acquired in a singlecatastrophic event during cancer development. Cell, 144:27–40, 2011. →pages 30[63] M. Stratton, P. Campbell, and P. Futreal. The cancer genome. NatureReviews, 458:719–724, 2009. → pages 13[64] F. Strino, F. Parisi, M. Micsinai, and Y. Kluger. Trap: a tree approach forfingerprinting sub clonal tutor composition. Nucleic Acids Research, 41:e165, 2013. → pages 12, 14[65] J. Venter et al. The sequence of the human genome. Science, 291(5507):1304–1351, 2001. → pages 31[66] A. von Heydebreck, B. Gunawan, and L. Fuzesi. Maximum likelihoodestimation of oncogenetic tree models. Biostatistics, 5:545–556, 2004. →pages 11, 12[67] Y. Wang et al. Clonal evolution in breast cancer revealed by single nucleusgenome sequencing. Nature, 512(7513):155–160, 2014. → pages 2, 3, 12,13, 15, 30, 31, 49, 70, 83[68] X. Xu et al. Single-cell exome sequencing reveals single-nucleotidemutation characteristics of a kidney tumor. Cell, 148(5):886–895, 2012. →pages 12, 15[69] L. R. Yates and P. J. Campbell. The clonal evolution of tumor cellpopulations. Nature Reviews Genetics, 13:795–806, 2012. → pages 1, 2, 389[70] K. Yuan, T. Sakopanig, F. Markowitz, and N. Beerenwinkel. Bitphylogeny: aprobabilistic framework for reconstruction intra-tumor phylogenies. GenomeBiology, 16:10.1186/s13059–015–0592–6, 2015. → pages 12, 14, 1590Appendix AProofsA.1 Genotype Trees are Minimal Latent TreesThe Genotype Trees produced by our simulator are Minimal Latent Trees as de-fined in Section 3.1.1 as long as the root node is observed:A Minimal Latent Tree is a Latent Tree that contains no redundanthidden nodes [12]. A hidden node is redundant if and only if it (a) hasfewer than three neighbours or (b) is perfectly dependent or perfectlyindependent on one of its neighbours [51].Clearly, a Genotype Tree is a Latent Tree as defined in Section 3.1.1. To show thatit also contains no redundant hidden nodes, let h be any hidden node in a GenotypeTree G, without loss of generality.(a) h has three or more neighboursProof. Let T be a full bifurcating tree produced by the simulator thatcorresponds to G. The root node and the leaf nodes are observed, so hmust be a non-root internal node in T . All non-root internal nodes offull bifurcating trees have three neighbours: a parent and two children.So h, too, must have had a parent and two children in T .91When T is converted into G, all changes are made via edge contractionoperations. Let ec be an edge in T between h and either of its childrenc. We know that ec is never contracted, because if it were, h wouldbe replaced by c and would not exist in G. Alternatively, ec might beaffected when an adjacent edge contraction causes h or c to be replacedby another node. But if h were replaced it would not exist in G, and ifc were replaced, h would simply have a new child. So h must have atleast two children in G.Similar logic applies to the edge ep between h and its parent p in T .If ep were contracted, then h would replace p and either (a) inherit p’sparent or (b) become the root. But (b) cannot be, since we know h isnot the root. Alternatively, an adjacent edge contraction might causep to be replaced by another node, in which case h would simply havea new parent. So h must also have a parent.(b) h is neither perfectly dependent on or perfectly independent of any of its neigh-bours.Proof. If h were perfectly dependent on one of its neighbours, thenthey would have the same sequence. However, all edges betweennodes with identical sequences are contracted during conversion toa Genotype Tree, so this could never be.h is not perfectly independent of any of its neighbours. The sequenceassigned to h is inherited from its parent, with a few chance modifica-tions. The sequences assigned to h’s children are derived from h in thesame fashion.92A.2 Additivity of the Hamming DistanceHere, the Hamming distance metric is shown to be additive for trees produced byour simulator as long as the mutation loss and dropout rates are 0. The Hammingdistance between any two sequences of the same length is simply the number ofpositions at which they differ.A tree is additive given a distance metric if the distance between anytwo nodes is equal to the sum of the lengths of the edges between them[24].Let i and j be two nodes in the Genotype Tree, without loss of generality. Let D bea matrix of Hamming distances. It needs to be shown that:Di j = ∑(k,k′)∈E(i, j)Dkk′ (A.1)where E(i, j) is the set of edges on the path from i to j.Proof. Let A be the common ancestor of i and j. If i is the ancestor ofj, then A = i and vice versa. First, I will show that:DAi = ∑(k,k′)∈E(A,i)Dkk′ (A.2)DAi is the number of positions at which A and i differ. Because no mu-tations are lost, A and i must be identical except at locations that weremutated since A. Therefore, DAi is equal to the number of mutationsacquired along the path from A to i.Because no mutations are lost, the length of each edge on the pathfrom A to i is the number of mutations acquired at that edge. The sumof these lengths equals the total number of mutations acquired alongthe path from A to i.By the same logic:DA j = ∑(k,k′)∈E(A, j)Dkk′ (A.3)93Now, it will be shown that:Di j = DAi +DA j (A.4)As previously established, DAi and DA j are the total number of muta-tions acquired on the paths from A to i and A to j respectively.Di j is the number of positions at which i and j differ. Since there isno mutation loss, both i and j must be identical to A, and thereforeto each other, at all positions except those that were mutated since A.Because convergent evolution is disallowed, all these new mutationsmust have occurred at separate locations. Therefore, the total numberof differences between i and j is equal to the sum of the number ofmutations acquired from A to i and from A to j, or DAi +DA j.By A.2, A.3, and A.4:Di j = DAi +DA j = ∑(k,k′)∈E(A,i)Dkk′+ ∑(k,k′)∈E(A, j)Dkk′ = ∑(k,k′)∈E(i, j)Dkk′(A.5)94Appendix BAdditional ResultsThis appendix uses the acronyms defined in Section 3.3.B.1 Hamming Distance versus Paralinear DistanceThis compares the performance of Recursive Grouping and Chow-Liu Groupingwhen using the Hamming distance versus the paralinear distance.B.1.1 Recursive GroupingOn the whole, performance under the two distance metrics is comparable, but thereare a few differences worth discussing.First, the two metrics show somewhat different trends as the number of leafnodes increases (Figure B.1). In particular, the paralinear distance shows an im-provement in Family Recall and a corresponding decline in Family Precision, alongwith a slight decline in the macro-structure scores.Also interesting is the slight improvement in overall performance for the para-linear distance with more positions (Figure B.2). Because the paralinear distance isnormalized for sequence length (unlike the Hamming distance), the expected errorin distance calculations should not depend on number of positions. As such, theideal relaxation increment for ε should not depend on number of positions. Hence,there is no obvious explanation for this trend.95B.1.2 Chow-Liu GroupingAs with Recursive Grouping, the trends and overall performance under the twometrics is comparable, with a few slight differences. The most striking difference isthat the Hamming distance is less resistant to dropout by the MS and MMS metrics(Figure B.10). This is interesting considering that this effect is not observed in theRecursive Grouping tests (Figure B.5).96Figure B.1: Performance of Recursive Grouping using the Hamming distanceversus the Paralinear distance by number of leaf nodes.Figure B.2: Performance of Recursive Grouping using the Hamming distanceversus the Paralinear distance by number of positions.97Figure B.3: Performance of Recursive Grouping using the Hamming distanceversus the Paralinear distance by mutation rate.Figure B.4: Performance of Recursive Grouping using the Hamming distanceversus the Paralinear distance by mutation loss rate.98Figure B.5: Performance of Recursive Grouping using the Hamming distanceversus the Paralinear distance by dropout rate.99Figure B.6: Performance of Chow-Liu Grouping using the Hamming dis-tance versus the Paralinear distance by number of leaf nodes.Figure B.7: Performance of Chow-Liu Grouping using the Hamming dis-tance versus the Paralinear distance by number of positions.100Figure B.8: Performance of Chow-Liu Grouping using the Hamming dis-tance versus the Paralinear distance by mutation rate.Figure B.9: Performance of Chow-Liu Grouping using the Hamming dis-tance versus the Paralinear distance by mutation loss rate.101Figure B.10: Performance of Chow-Liu Grouping using the Hamming dis-tance versus the Paralinear distance by dropout rate.102B.2 Relaxed Recursive Grouping - Version ComparisonFigure B.11 demonstrates the superiority of the basic version of Relaxed RecursiveGrouping over the k-means clustering version across all metrics except Family Re-call. This is especially apparent at larger numbers of nodes. This reflects the ten-dency of the k-means version to overgroup. Although not shown, this superiorityis observed across all other configurations.Figure B.11: Performance of Relaxed Recursive Grouping - Basic versus Re-laxed Recursive Grouping - k-means by number of leaf nodes.103Appendix CDunn’s Test ResultsThis appendix uses the acronyms defined in Section 3.3.C.1 Number of Leaf Nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 138.7693, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -3.648107 | 0.0028 | 16 | -7.231891 -3.583783 | 0.0000 0.0036 | 32 | -8.308558 -4.660450 -1.076666 | 0.0000 0.0000 1.0000 | 64 | -8.951801 -5.303693 -1.719909 -0.643243 | 0.0000 0.0000 0.8972 1.0000 | 128 | -9.011531 -5.363423 -1.779639 -0.702972 -0.059729 | 0.0000 0.0000 0.7889 1.0000 1.0000 | 256 | -8.196756 -4.548648 -0.964864 0.111801 0.755045 0.814774 | 0.0000 0.0001 1.0000 1.0000 1.0000 1.0000Figure C.1: Dunn’s test on RG MS scores by number of leaf nodes104 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 154.6546, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -10.28144 | 0.0000 | 16 | -9.954712 0.326731 | 0.0000 1.0000 | 32 | -9.455540 0.825903 0.499172 | 0.0000 1.0000 1.0000 | 64 | -8.805103 1.476340 1.149609 0.650437 | 0.0000 1.0000 1.0000 1.0000 | 128 | -7.079175 3.202268 2.875537 2.376364 1.725927 | 0.0000 0.0143 0.0424 0.1836 0.8858 | 256 | -6.201842 4.079601 3.752870 3.253698 2.603261 0.877333 | 0.0000 0.0005 0.0018 0.0120 0.0970 1.0000Figure C.2: Dunn’s test on RG MMS scores by number of leaf nodes105 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 147.6935, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -3.648145 | 0.0028 | 16 | -6.683672 -3.035526 | 0.0000 0.0252 | 32 | -7.578096 -3.929950 -0.894423 | 0.0000 0.0009 1.0000 | 64 | -8.470988 -4.822842 -1.787315 -0.892892 | 0.0000 0.0000 0.7758 1.0000 | 128 | -8.947299 -5.299154 -2.263627 -1.369203 -0.476311 | 0.0000 0.0000 0.2478 1.0000 1.0000 | 256 | -10.02091 -6.372768 -3.337241 -2.442818 -1.549925 -1.073614 | 0.0000 0.0000 0.0089 0.1530 1.0000 1.0000Figure C.3: Dunn’s test on RG RF scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 26.2854, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -1.978567 | 0.5026 | 16 | -2.820704 -0.842136 | 0.0503 1.0000 | 32 | -3.815270 -1.836702 -0.994566 | 0.0014 0.6957 1.0000 | 64 | -3.765466 -1.786898 -0.944762 0.049803 | 0.0017 0.7765 1.0000 1.0000 | 128 | -4.204645 -2.226077 -1.383941 -0.389374 -0.439178 | 0.0003 0.2731 1.0000 1.0000 1.0000 | 256 | -3.667368 -1.688800 -0.846664 0.147902 0.098098 0.537277 | 0.0026 0.9582 1.0000 1.0000 1.0000 1.0000Figure C.4: Dunn’s test on RG FR scores by number of leaf nodes106 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 52.4394, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -2.352305 | 0.1959 | 16 | -2.279579 0.072725 | 0.2376 1.0000 | 32 | -0.551713 1.800592 1.727866 | 1.0000 0.7536 0.8821 | 64 | -1.095903 1.256401 1.183676 -0.544190 | 1.0000 1.0000 1.0000 1.0000 | 128 | -2.643208 -0.290903 -0.363629 -2.091495 -1.547305 | 0.0862 1.0000 1.0000 0.3831 1.0000 | 256 | -6.349720 -3.997414 -4.070140 -5.798006 -5.253816 -3.706511 | 0.0000 0.0007 0.0005 0.0000 0.0000 0.0022Figure C.5: Dunn’s test on RG FP scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 27.2441, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -2.761426 | 0.0604 | 16 | -3.224682 -0.463255 | 0.0132 1.0000 | 32 | -3.407268 -0.645841 -0.182586 | 0.0069 1.0000 1.0000 | 64 | -4.335289 -1.573862 -1.110606 -0.928020 | 0.0002 1.0000 1.0000 1.0000 | 128 | -4.597850 -1.836424 -1.373168 -1.190582 -0.262561 | 0.0000 0.6961 1.0000 1.0000 1.0000 | 256 | -3.242790 -0.481363 -0.018107 0.164478 1.092498 1.355060 | 0.0124 1.0000 1.0000 1.0000 1.0000 1.0000Figure C.6: Dunn’s test on RG FC scores by number of leaf nodes107 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 199.082, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -6.508661 | 0.0000 | 16 | -12.04321 -5.534555 | 0.0000 0.0000 | 32 | -11.78456 -5.275903 0.258652 | 0.0000 0.0000 1.0000 | 64 | -8.379731 -1.871070 3.663485 3.404833 | 0.0000 0.6440 0.0026 0.0070 | 128 | -7.133359 -0.624698 4.909857 4.651204 1.246371 | 0.0000 1.0000 0.0000 0.0000 1.0000 | 256 | -5.926315 0.582345 6.116901 5.858249 2.453415 1.207044 | 0.0000 1.0000 0.0000 0.0000 0.1486 1.0000Figure C.7: Dunn’s test on CLG MS scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 180.3248, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -4.670955 | 0.0000 | 16 | -7.050299 -2.379343 | 0.0000 0.1821 | 32 | -9.703426 -5.032470 -2.653126 | 0.0000 0.0000 0.0837 | 64 | -9.974184 -5.303228 -2.923885 -0.270758 | 0.0000 0.0000 0.0363 1.0000 | 128 | -9.774519 -5.103563 -2.724219 -0.071092 0.199665 | 0.0000 0.0000 0.0677 1.0000 1.0000 | 256 | -10.60343 -5.932476 -3.553133 -0.900006 -0.629247 -0.828913 | 0.0000 0.0000 0.0040 1.0000 1.0000 1.0000Figure C.8: Dunn’s test on CLG MMS scores by number of leaf nodes108 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 278.9024, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -3.269001 | 0.0113 | 16 | -5.866352 -2.597350 | 0.0000 0.0986 | 32 | -7.123427 -3.854426 -1.257075 | 0.0000 0.0012 1.0000 | 64 | -9.926509 -6.657507 -4.060157 -2.803081 | 0.0000 0.0000 0.0005 0.0531 | 128 | -12.27577 -9.006771 -6.409420 -5.152345 -2.349263 | 0.0000 0.0000 0.0000 0.0000 0.1975 | 256 | -13.31955 -10.05055 -7.453201 -6.196126 -3.393044 -1.043781 | 0.0000 0.0000 0.0000 0.0000 0.0073 1.0000Figure C.9: Dunn’s test on CLG RF scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 97.0206, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | 1.380790 | 1.0000 | 16 | -2.240386 -3.621177 | 0.2632 0.0031 | 32 | -4.279847 -5.660638 -2.039461 | 0.0002 0.0000 0.4347 | 64 | -5.048800 -6.429591 -2.808414 -0.768952 | 0.0000 0.0000 0.0523 1.0000 | 128 | -6.196941 -7.577732 -3.956555 -1.917093 -1.148141 | 0.0000 0.0000 0.0008 0.5799 1.0000 | 256 | -4.923411 -6.304202 -2.683024 -0.643563 0.125389 1.273530 | 0.0000 0.0000 0.0766 1.0000 1.0000 1.0000Figure C.10: Dunn’s test on CLG FR scores by number of leaf nodes109 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 154.0264, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -4.670253 | 0.0000 | 16 | -7.788312 -3.118058 | 0.0000 0.0191 | 32 | -9.701977 -5.031723 -1.913664 | 0.0000 0.0000 0.5845 | 64 | -9.982951 -5.312698 -2.194639 -0.280974 | 0.0000 0.0000 0.2960 1.0000 | 128 | -9.307093 -4.636840 -1.518781 0.394883 0.675857 | 0.0000 0.0000 1.0000 1.0000 1.0000 | 256 | -7.666809 -2.996556 0.121502 2.035167 2.316141 1.640284 | 0.0000 0.0287 1.0000 0.4393 0.2158 1.0000Figure C.11: Dunn’s test on CLG FP scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 65.2498, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -2.646983 | 0.0853 | 16 | -5.453660 -2.806676 | 0.0000 0.0526 | 32 | -6.121055 -3.474071 -0.667395 | 0.0000 0.0054 1.0000 | 64 | -6.002039 -3.355055 -0.548379 0.119016 | 0.0000 0.0083 1.0000 1.0000 | 128 | -5.881516 -3.234532 -0.427856 0.239539 0.120522 | 0.0000 0.0128 1.0000 1.0000 1.0000 | 256 | -5.616365 -2.969382 -0.162705 0.504689 0.385673 0.265150 | 0.0000 0.0313 1.0000 1.0000 1.0000 1.0000Figure C.12: Dunn’s test on CLG FC scores by number of leaf nodes110 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 81.8957, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64---------+------------------------------------------------------- 8 | -4.095514 | 0.0003 | 16 | -5.121887 -1.026372 | 0.0000 1.0000 | 32 | -7.177127 -3.081612 -2.055239 | 0.0000 0.0154 0.2989 | 64 | -7.412831 -3.317316 -2.290943 -0.235704 | 0.0000 0.0068 0.1647 1.0000 | 128 | -7.170891 -3.075377 -2.049004 0.006235 0.241939 | 0.0000 0.0158 0.3035 1.0000 1.0000Figure C.13: Dunn’s test on PEC MS scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 82.229, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64---------+------------------------------------------------------- 8 | -4.095514 | 0.0003 | 16 | -5.079485 -0.983971 | 0.0000 1.0000 | 32 | -7.157173 -3.061658 -2.077687 | 0.0000 0.0165 0.2830 | 64 | -7.362946 -3.267432 -2.283461 -0.205773 | 0.0000 0.0081 0.1680 1.0000 | 128 | -7.283131 -3.187617 -2.203646 -0.125958 0.079815 | 0.0000 0.0108 0.2066 1.0000 1.0000Figure C.14: Dunn’s test on PEC MMS scores by number of leaf nodes111 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 82.721, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64---------+------------------------------------------------------- 8 | -4.075645 | 0.0003 | 16 | -5.057142 -0.981497 | 0.0000 1.0000 | 32 | -7.128638 -3.052992 -2.071495 | 0.0000 0.0170 0.2873 | 64 | -7.313214 -3.237569 -2.256071 -0.184576 | 0.0000 0.0090 0.1805 1.0000 | 128 | -7.404255 -3.328610 -2.347112 -0.275617 -0.091041 | 0.0000 0.0065 0.1419 1.0000 1.0000Figure C.15: Dunn’s test on PEC RF scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 8.3892, df = 5, p-value = 0.14 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64---------+------------------------------------------------------- 8 | 0.919651 | 1.0000 | 16 | -1.382579 -2.302230 | 1.0000 0.1599 | 32 | -1.527787 -2.447438 -0.145208 | 0.9492 0.1079 1.0000 | 64 | -0.572144 -1.491795 0.810434 0.955642 | 1.0000 1.0000 1.0000 1.0000 | 128 | -0.184923 -1.104574 1.197656 1.342864 0.387221 | 1.0000 1.0000 1.0000 1.0000 1.0000Figure C.16: Dunn’s test on PEC FR scores by number of leaf nodes112 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 25.9445, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64---------+------------------------------------------------------- 8 | -1.058283 | 1.0000 | 16 | -3.467989 -2.409706 | 0.0039 0.1197 | 32 | -3.630707 -2.572423 -0.162717 | 0.0021 0.0757 1.0000 | 64 | -3.543758 -2.485475 -0.075769 0.086948 | 0.0030 0.0970 1.0000 1.0000 | 128 | -3.666728 -2.608445 -0.198738 -0.036021 -0.122969 | 0.0018 0.0682 1.0000 1.0000 1.0000Figure C.17: Dunn’s test on PEC FC scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 55.8502, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -3.042197 | 0.0247 | 16 | -3.885578 -0.843381 | 0.0011 1.0000 | 32 | -5.094927 -2.052730 -1.209348 | 0.0000 0.4210 1.0000 | 64 | -6.158190 -3.115993 -2.272611 -1.063262 | 0.0000 0.0192 0.2420 1.0000 | 128 | -5.837404 -2.795206 -1.951825 -0.742476 0.320786 | 0.0000 0.0545 0.5351 1.0000 1.0000 | 256 | -5.436797 -2.394600 -1.551219 -0.341870 0.721392 0.400606 | 0.0000 0.1747 1.0000 1.0000 1.0000 1.0000Figure C.18: Dunn’s test on MST MS scores by number of leaf nodes113 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 59.7559, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -2.988127 | 0.0295 | 16 | -3.583041 -0.594914 | 0.0036 1.0000 | 32 | -4.827090 -1.838963 -1.244048 | 0.0000 0.6922 1.0000 | 64 | -6.010895 -3.022767 -2.427853 -1.183804 | 0.0000 0.0263 0.1595 1.0000 | 128 | -6.203677 -3.215550 -2.620635 -1.376586 -0.192782 | 0.0000 0.0137 0.0922 1.0000 1.0000 | 256 | -5.843716 -2.855589 -2.260674 -1.016625 0.167178 0.359960 | 0.0000 0.0451 0.2497 1.0000 1.0000 1.0000Figure C.19: Dunn’s test on MST MMS scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 100.9626, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -1.546951 | 1.0000 | 16 | -2.690220 -1.143268 | 0.0750 1.0000 | 32 | -4.719182 -3.172230 -2.028962 | 0.0000 0.0159 0.4459 | 64 | -5.585295 -4.038343 -2.895074 -0.866112 | 0.0000 0.0006 0.0398 1.0000 | 128 | -7.245218 -5.698266 -4.554997 -2.526035 -1.659923 | 0.0000 0.0000 0.0001 0.1211 1.0000 | 256 | -7.673002 -6.126050 -4.982782 -2.953819 -2.087707 -0.427784 | 0.0000 0.0000 0.0000 0.0330 0.3867 1.0000Figure C.20: Dunn’s test on MST RF scores by number of leaf nodes114 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 159.6136, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -0.500727 | 1.0000 | 16 | -0.990648 -0.489920 | 1.0000 1.0000 | 32 | -2.806236 -2.305508 -1.815588 | 0.0526 0.2220 0.7291 | 64 | -5.218014 -4.717286 -4.227366 -2.411778 | 0.0000 0.0000 0.0002 0.1667 | 128 | -7.011988 -6.511260 -6.021340 -4.205752 -1.793973 | 0.0000 0.0000 0.0000 0.0003 0.7646 | 256 | -9.571463 -9.070735 -8.580814 -6.765226 -4.353448 -2.559474 | 0.0000 0.0000 0.0000 0.0000 0.0001 0.1101Figure C.21: Dunn’s test on MST FR scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 100.8597, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -3.617968 | 0.0031 | 16 | -5.168956 -1.550987 | 0.0000 1.0000 | 32 | -6.569509 -2.951540 -1.400552 | 0.0000 0.0332 1.0000 | 64 | -7.261511 -3.643542 -2.092554 -0.692002 | 0.0000 0.0028 0.3821 1.0000 | 128 | -7.803078 -4.185109 -2.634122 -1.233569 -0.541567 | 0.0000 0.0003 0.0886 1.0000 1.0000 | 256 | -8.141558 -4.523589 -2.972601 -1.572048 -0.880046 -0.338479 | 0.0000 0.0001 0.0310 1.0000 1.0000 1.0000Figure C.22: Dunn’s test on MST FP scores by number of leaf nodes115 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 47.2365, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -1.752429 | 0.8369 | 16 | -3.163104 -1.410675 | 0.0164 1.0000 | 32 | -3.967054 -2.214624 -0.803949 | 0.0008 0.2813 1.0000 | 64 | -4.888434 -3.136005 -1.725329 -0.921380 | 0.0000 0.0180 0.8869 1.0000 | 128 | -5.073613 -3.321184 -1.910509 -1.106559 -0.185179 | 0.0000 0.0094 0.5887 1.0000 1.0000 | 256 | -5.394290 -3.641861 -2.231185 -1.427236 -0.505855 -0.320676 | 0.0000 0.0028 0.2695 1.0000 1.0000 1.0000Figure C.23: Dunn’s test on MST FC scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 148.4375, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | 4.054962 | 0.0005 | 16 | 6.844631 2.789669 | 0.0000 0.0554 | 32 | 8.076785 4.021823 1.232154 | 0.0000 0.0006 1.0000 | 64 | 9.028767 4.973805 2.184136 0.951982 | 0.0000 0.0000 0.3040 1.0000 | 128 | 9.363166 5.308204 2.518535 1.286381 0.334398 | 0.0000 0.0000 0.1237 1.0000 1.0000 | 256 | 9.510784 5.455822 2.666152 1.433998 0.482016 0.147617 | 0.0000 0.0000 0.0806 1.0000 1.0000 1.0000Figure C.24: Dunn’s test on NJ MS scores by number of leaf nodes116 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 148.4376, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | 4.054963 | 0.0005 | 16 | 6.844633 2.789670 | 0.0000 0.0554 | 32 | 8.076788 4.021824 1.232154 | 0.0000 0.0006 1.0000 | 64 | 9.028770 4.973807 2.184137 0.951982 | 0.0000 0.0000 0.3040 1.0000 | 128 | 9.363169 5.308206 2.518536 1.286381 0.334398 | 0.0000 0.0000 0.1237 1.0000 1.0000 | 256 | 9.510787 5.455823 2.666153 1.433998 0.482016 0.147617 | 0.0000 0.0000 0.0806 1.0000 1.0000 1.0000Figure C.25: Dunn’s test on NJ MMS scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 148.4325, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | 4.054962 | 0.0005 | 16 | 6.844631 2.789669 | 0.0000 0.0554 | 32 | 8.076785 4.021823 1.232154 | 0.0000 0.0006 1.0000 | 64 | 9.028767 4.973805 2.184136 0.951982 | 0.0000 0.0000 0.3040 1.0000 | 128 | 9.372204 5.317242 2.527573 1.295418 0.343436 | 0.0000 0.0000 0.1206 1.0000 1.0000 | 256 | 9.501746 5.446784 2.657114 1.424960 0.472978 0.129541 | 0.0000 0.0000 0.0828 1.0000 1.0000 1.0000Figure C.26: Dunn’s test on NJ RF scores by number of leaf nodes117 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 35.3807, df = 6, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | 3.457917 | 0.0057 | 16 | 5.238813 1.780895 | 0.0000 0.7868 | 32 | 3.825236 0.367319 -1.413576 | 0.0014 1.0000 1.0000 | 64 | 4.031477 0.573559 -1.207335 0.206240 | 0.0006 1.0000 1.0000 1.0000 | 128 | 3.578350 0.120432 -1.660462 -0.246886 -0.453127 | 0.0036 1.0000 1.0000 1.0000 1.0000 | 256 | 4.937732 1.479814 -0.301081 1.112495 0.906254 1.359381 | 0.0000 1.0000 1.0000 1.0000 1.0000 1.0000Figure C.27: Dunn’s test on NJ FR scores by number of leaf nodes Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 12.2989, df = 6, p-value = 0.06 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 4 8 16 32 64 128---------+------------------------------------------------------------------ 8 | -2.055901 | 0.4178 | 16 | -2.911397 -0.855496 | 0.0378 1.0000 | 32 | -2.689993 -0.634091 0.221404 | 0.0750 1.0000 1.0000 | 64 | -2.890311 -0.834409 0.021086 -0.200318 | 0.0404 1.0000 1.0000 1.0000 | 128 | -2.462563 -0.406661 0.448834 0.227429 0.427748 | 0.1448 1.0000 1.0000 1.0000 1.0000 | 256 | -2.108617 -0.052715 0.802780 0.581375 0.781694 0.353946 | 0.3673 1.0000 1.0000 1.0000 1.0000 1.0000Figure C.28: Dunn’s test on NJ FC scores by number of leaf nodes118C.2 Number of Positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 17.8143, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -0.640968 | 1.0000 | 1000 | -0.172511 0.468457 | 1.0000 1.0000 | 3000 | 2.019570 2.660539 2.192082 | 0.2171 0.0390 0.1419 | 10000 | -2.135569 -1.494601 -1.963058 -4.155140 | 0.1636 0.6751 0.2482 0.0002Figure C.29: Dunn’s test on RG MS scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 19.1365, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -0.337424 | 1.0000 | 1000 | 1.529559 1.866984 | 0.6306 0.3095 | 3000 | 2.280217 2.617642 0.750658 | 0.1130 0.0443 1.0000 | 10000 | -1.614287 -1.276862 -3.143846 -3.894504 | 0.5323 1.0000 0.0083 0.0005Figure C.30: Dunn’s test on RG MMS scores by number of positions119 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 45.0274, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -0.611336 | 1.0000 | 1000 | 0.414994 1.026330 | 1.0000 1.0000 | 3000 | 5.066206 5.677542 4.651212 | 0.0000 0.0000 0.0000 | 10000 | 2.887113 3.498449 2.472118 -2.179093 | 0.0194 0.0023 0.0672 0.1466Figure C.31: Dunn’s test on RG RF scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 47.7314, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -0.251625 | 1.0000 | 1000 | 0.004466 0.256091 | 1.0000 1.0000 | 3000 | -0.887388 -0.635763 -0.891854 | 1.0000 1.0000 1.0000 | 10000 | -5.684644 -5.433019 -5.689111 -4.797256 | 0.0000 0.0000 0.0000 0.0000Figure C.32: Dunn’s test on RG FR scores by number of positions120 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 75.918, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -0.532054 | 1.0000 | 1000 | 2.411022 2.943076 | 0.0795 0.0162 | 3000 | 5.664064 6.196119 3.253042 | 0.0000 0.0000 0.0057 | 10000 | 6.069896 6.601950 3.658874 0.405831 | 0.0000 0.0000 0.0013 1.0000Figure C.33: Dunn’s test on RG FP scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 64.6174, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -0.935892 | 1.0000 | 1000 | -1.539453 -0.603561 | 0.6185 1.0000 | 3000 | -2.457462 -1.521570 -0.918009 | 0.0700 0.6406 1.0000 | 10000 | -7.265088 -6.329196 -5.725635 -4.807625 | 0.0000 0.0000 0.0000 0.0000Figure C.34: Dunn’s test on RG FC scores by number of positions121 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 34.5593, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -1.564758 | 0.5882 | 1000 | -3.804160 -2.239402 | 0.0007 0.1256 | 3000 | -3.310808 -1.746050 0.493352 | 0.0047 0.4040 1.0000 | 10000 | -5.377834 -3.813076 -1.573674 -2.067026 | 0.0000 0.0007 0.5778 0.1937Figure C.35: Dunn’s test on CLG MS scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 39.4161, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -1.781782 | 0.3739 | 1000 | -4.214457 -2.432675 | 0.0001 0.0749 | 3000 | -4.551792 -2.770010 -0.337334 | 0.0000 0.0280 1.0000 | 10000 | -5.345346 -3.563564 -1.130889 -0.793554 | 0.0000 0.0018 1.0000 1.0000Figure C.36: Dunn’s test on CLG MMS scores by number of positions122 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 38.2158, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -1.089404 | 1.0000 | 1000 | -4.203051 -3.113646 | 0.0001 0.0092 | 3000 | -3.614505 -2.525100 0.588546 | 0.0015 0.0578 1.0000 | 10000 | -5.189906 -4.100501 -0.986855 -1.575401 | 0.0000 0.0002 1.0000 0.5758Figure C.37: Dunn’s test on CLG RF scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 86.3997, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -1.401220 | 0.8057 | 1000 | -4.911710 -3.510489 | 0.0000 0.0022 | 3000 | -5.230034 -3.828813 -0.318324 | 0.0000 0.0006 1.0000 | 10000 | -8.255600 -6.854379 -3.343889 -3.025565 | 0.0000 0.0000 0.0041 0.0124Figure C.38: Dunn’s test on CLG FR scores by number of positions123 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 30.3754, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -0.514275 | 1.0000 | 1000 | -4.797926 -4.283650 | 0.0000 0.0001 | 3000 | -3.012824 -2.498548 1.785102 | 0.0129 0.0624 0.3712 | 10000 | -2.517870 -2.003595 2.280055 0.494953 | 0.0590 0.2256 0.1130 1.0000Figure C.39: Dunn’s test on CLG FP scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 102.7929, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -1.029133 | 1.0000 | 1000 | -4.851203 -3.822070 | 0.0000 0.0007 | 3000 | -5.768797 -4.739664 -0.917594 | 0.0000 0.0000 1.0000 | 10000 | -8.769940 -7.740807 -3.918737 -3.001143 | 0.0000 0.0000 0.0004 0.0134Figure C.40: Dunn’s test on CLG FC scores by number of positions124 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 93.5074, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -0.805459 | 1.0000 | 1000 | -3.097006 -2.291546 | 0.0098 0.1097 | 3000 | -6.532840 -5.727380 -3.435834 | 0.0000 0.0000 0.0030 | 10000 | -7.739543 -6.934083 -4.642536 -1.206702 | 0.0000 0.0000 0.0000 1.0000Figure C.41: Dunn’s test on PEC MS scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 54.7058, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -0.282371 | 1.0000 | 1000 | -1.468331 -1.185959 | 0.7101 1.0000 | 3000 | -4.498622 -4.216251 -3.030291 | 0.0000 0.0001 0.0122 | 10000 | -5.840629 -5.558258 -4.372298 -1.342007 | 0.0000 0.0000 0.0001 0.8980Figure C.42: Dunn’s test on PEC MMS scores by number of positions125 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 3.1316, df = 4, p-value = 0.54 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -0.985511 | 1.0000 | 1000 | 0.245263 1.230774 | 1.0000 1.0000 | 3000 | 0.622819 1.608331 0.377556 | 1.0000 0.5388 1.0000 | 10000 | -0.462283 0.523227 -0.707546 -1.085103 | 1.0000 1.0000 1.0000 1.0000Figure C.43: Dunn’s test on PEC RF scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 69.4554, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -2.020620 | 0.2166 | 1000 | -4.099228 -2.078607 | 0.0002 0.1883 | 3000 | -6.345849 -4.325228 -2.246621 | 0.0000 0.0001 0.1233 | 10000 | -7.056561 -5.035940 -2.957332 -0.710711 | 0.0000 0.0000 0.0155 1.0000Figure C.44: Dunn’s test on PEC FR scores by number of positions126 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 73.847, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | 0.553147 | 1.0000 | 1000 | 4.388654 3.835506 | 0.0001 0.0006 | 3000 | 6.256833 5.703685 1.868178 | 0.0000 0.0000 0.3087 | 10000 | 6.256833 5.703685 1.868178 0.000000 | 0.0000 0.0000 0.3087 1.0000Figure C.45: Dunn’s test on PEC FP scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 119.4689, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -0.909956 | 1.0000 | 1000 | -4.353517 -3.443561 | 0.0001 0.0029 | 3000 | -6.588263 -5.678307 -2.234746 | 0.0000 0.0000 0.1272 | 10000 | -9.254198 -8.344242 -4.900681 -2.665934 | 0.0000 0.0000 0.0000 0.0384Figure C.46: Dunn’s test on PEC FC scores by number of positions127 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 60.3183, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -2.436910 | 0.0741 | 1000 | -6.986799 -4.549889 | 0.0000 0.0000 | 3000 | -5.702964 -3.266054 1.283835 | 0.0000 0.0055 0.9960 | 10000 | -4.108571 -1.671660 2.878228 1.594393 | 0.0002 0.4730 0.0200 0.5542Figure C.47: Dunn’s test on MST MS scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 64.6232, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -1.873931 | 0.3047 | 1000 | -6.987489 -5.113558 | 0.0000 0.0000 | 3000 | -5.615849 -3.741918 1.371640 | 0.0000 0.0009 0.8509 | 10000 | -4.640988 -2.767057 2.346500 0.974860 | 0.0000 0.0283 0.0948 1.0000Figure C.48: Dunn’s test on MST MMS scores by number of positions128 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 36.9544, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -1.131897 | 1.0000 | 1000 | -5.512236 -4.380338 | 0.0000 0.0001 | 3000 | -3.641110 -2.509212 1.871126 | 0.0014 0.0605 0.3066 | 10000 | -2.558296 -1.426398 2.953939 1.082813 | 0.0526 0.7688 0.0157 1.0000Figure C.49: Dunn’s test on MST RF scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 3.9855, df = 4, p-value = 0.41 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | 1.277228 | 1.0000 | 1000 | 0.817491 -0.459737 | 1.0000 1.0000 | 3000 | 1.900464 0.623235 1.082973 | 0.2869 1.0000 1.0000 | 10000 | 1.298273 0.021044 0.480781 -0.602191 | 0.9710 1.0000 1.0000 1.0000Figure C.50: Dunn’s test on MST FR scores by number of positions129 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 89.8674, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -2.472370 | 0.0671 | 1000 | -8.064918 -5.592548 | 0.0000 0.0000 | 3000 | -7.041203 -4.568833 1.023715 | 0.0000 0.0000 1.0000 | 10000 | -5.741128 -3.268758 2.323790 1.300074 | 0.0000 0.0054 0.1007 0.9679Figure C.51: Dunn’s test on MST FP scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 107.9058, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | -2.239362 | 0.1257 | 1000 | -6.527882 -4.288520 | 0.0000 0.0001 | 3000 | -8.193659 -5.954296 -1.665776 | 0.0000 0.0000 0.4788 | 10000 | -7.966305 -5.726942 -1.438422 0.227353 | 0.0000 0.0000 0.7516 1.0000Figure C.52: Dunn’s test on MST FC scores by number of positions130 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 102.4191, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | 2.145853 | 0.1594 | 1000 | 6.942378 4.796525 | 0.0000 0.0000 | 3000 | 7.832885 5.687032 0.890506 | 0.0000 0.0000 1.0000 | 10000 | 7.515698 5.369845 0.573319 -0.317187 | 0.0000 0.0000 1.0000 1.0000Figure C.53: Dunn’s test on NJ MS scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 100.6989, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | 2.164501 | 0.1521 | 1000 | 7.033510 4.869009 | 0.0000 0.0000 | 3000 | 7.734624 5.570122 0.701113 | 0.0000 0.0000 1.0000 | 10000 | 7.404951 5.240450 0.371440 -0.329672 | 0.0000 0.0000 1.0000 1.0000Figure C.54: Dunn’s test on NJ MMS scores by number of positions131 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 105.4569, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | 2.250372 | 0.1221 | 1000 | 6.322191 4.071818 | 0.0000 0.0002 | 3000 | 8.162999 5.912626 1.840807 | 0.0000 0.0000 0.3282 | 10000 | 7.905345 5.654973 1.583154 -0.257653 | 0.0000 0.0000 0.5669 1.0000Figure C.55: Dunn’s test on NJ RF scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 108.854, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | 2.266261 | 0.1172 | 1000 | 6.642745 4.376484 | 0.0000 0.0001 | 3000 | 7.827145 5.560884 1.184400 | 0.0000 0.0000 1.0000 | 10000 | 8.356187 6.089926 1.713441 0.529041 | 0.0000 0.0000 0.4332 1.0000Figure C.56: Dunn’s test on NJ FR scores by number of positions132 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 3.0204, df = 4, p-value = 0.55 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | 1.114300 | 1.0000 | 1000 | -0.014957 -1.129258 | 1.0000 1.0000 | 3000 | 1.114300 0.000000 1.129258 | 1.0000 1.0000 1.0000 | 10000 | 1.114300 0.000000 1.129258 0.000000 | 1.0000 1.0000 1.0000 1.0000Figure C.57: Dunn’s test on NJ FP scores by number of positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 98.6914, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 100 300 1000 3000---------+-------------------------------------------- 300 | 2.403726 | 0.0811 | 1000 | 6.384154 3.980428 | 0.0000 0.0003 | 3000 | 7.847809 5.444083 1.463655 | 0.0000 0.0000 0.7164 | 10000 | 7.765999 5.362273 1.381845 -0.081809 | 0.0000 0.0000 0.8351 1.0000Figure C.58: Dunn’s test on NJ FC scores by number of positions133C.3 Asymmetric Division Rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 25.1941, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 0.523879 | 1.0000 | 0.2 | 1.087943 0.564063 | 1.0000 1.0000 | 0.3 | 0.680150 0.156271 -0.407792 | 1.0000 1.0000 1.0000 | 0.4 | 0.652617 0.128737 -0.435326 -0.027533 | 1.0000 1.0000 1.0000 1.0000 | 0.5 | 2.052359 1.528479 0.964415 1.372208 1.399741 | 0.9030 1.0000 1.0000 1.0000 1.0000 | 0.6 | 3.996072 3.472193 2.908129 3.315922 3.343455 1.943713 | 0.0014 0.0116 0.0818 0.0206 0.0186 1.0000 | 0.7 | 2.254767 1.730887 1.166823 1.574616 1.602149 0.202408 | 0.5433 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.8 | 2.300904 1.777024 1.212960 1.620753 1.648287 0.248545 | 0.4814 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.9 | 1.817952 1.294072 0.730008 1.137801 1.165335 -0.234406 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -1.741305 2.300904 1.777024 | 1.0000 0.4814 1.0000 | 0.8 | -1.695168 0.046137 1.817952 | 1.0000 1.0000 1.0000 | 0.9 | -2.178120 -0.436814 -0.482951 | 0.6614 1.0000 1.0000Figure C.59: Dunn’s test on RG MS scores by asymmetric division rate134 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 13.9298, df = 9, p-value = 0.12 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -0.205387 | 1.0000 | 0.2 | -0.539512 -0.334125 | 1.0000 1.0000 | 0.3 | -0.136180 0.069206 0.403332 | 1.0000 1.0000 1.0000 | 0.4 | -1.695187 -1.489800 -1.155674 -1.559006 | 1.0000 1.0000 1.0000 1.0000 | 0.5 | 0.020836 0.226223 0.560349 0.157016 1.716023 | 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.6 | -0.159993 0.045393 0.379519 -0.023812 1.535193 -0.180829 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.7 | -1.739836 -1.534449 -1.200323 -1.603655 -0.044649 -1.760672 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.8 | -1.236042 -1.030655 -0.696529 -1.099862 0.459144 -1.256879 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.9 | -2.382787 -2.177400 -1.843274 -2.246606 -0.687600 -2.403623 | 0.3866 0.6626 1.0000 0.5550 1.0000 0.3653Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -1.579842 -1.236042 -1.030655 | 1.0000 1.0000 1.0000 | 0.8 | -1.076049 0.503793 -2.382787 | 1.0000 1.0000 0.3866 | 0.9 | -2.222793 -0.642950 -1.146744 | 0.5902 1.0000 1.0000Figure C.60: Dunn’s test on RG MMS scores by asymmetric division rate135 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 6.394, df = 9, p-value = 0.7 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -0.842559 | 1.0000 | 0.2 | -1.269793 -0.427234 | 1.0000 1.0000 | 0.3 | -0.943785 -0.101226 0.326007 | 1.0000 1.0000 1.0000 | 0.4 | -1.971678 -1.129118 -0.701884 -1.027892 | 1.0000 1.0000 1.0000 1.0000 | 0.5 | -1.697027 -0.854468 -0.427234 -0.753242 0.274650 | 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.6 | -0.552278 0.290281 0.717515 0.391507 1.419400 1.144749 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.7 | -1.470013 -0.627453 -0.200219 -0.526227 0.501665 0.227014 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.8 | -1.453638 -0.611078 -0.183844 -0.509852 0.518040 0.243389 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.9 | -1.603989 -0.761429 -0.334195 -0.660203 0.367689 0.093038 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.917734 -1.453638 -0.611078 | 1.0000 1.0000 1.0000 | 0.8 | -0.901359 0.016374 -1.603989 | 1.0000 1.0000 1.0000 | 0.9 | -1.051710 -0.133975 -0.150350 | 1.0000 1.0000 1.0000Figure C.61: Dunn’s test on RG RF scores by asymmetric division rate136 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 151.0459, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -0.278437 | 1.0000 | 0.2 | -1.772966 -1.494529 | 1.0000 1.0000 | 0.3 | -1.157394 -0.878957 0.615571 | 1.0000 1.0000 1.0000 | 0.4 | -1.453140 -1.174703 0.319826 -0.295745 | 1.0000 1.0000 1.0000 1.0000 | 0.5 | -3.422517 -3.144080 -1.649551 -2.265122 -1.969377 | 0.0140 0.0375 1.0000 0.5289 1.0000 | 0.6 | -4.997568 -4.719131 -3.224601 -3.840173 -3.544427 -1.575050 | 0.0000 0.0001 0.0284 0.0028 0.0089 1.0000 | 0.7 | -6.145932 -5.867495 -4.372966 -4.988537 -4.692792 -2.723415 | 0.0000 0.0000 0.0003 0.0000 0.0001 0.1454 | 0.8 | -6.965440 -6.687003 -5.192474 -5.808045 -5.512300 -3.542922 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0089 | 0.9 | -7.866221 -7.587784 -6.093255 -6.708827 -6.413081 -4.443704 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -1.148364 -6.965440 -6.687003 | 1.0000 0.0000 0.0000 | 0.8 | -1.967872 -0.819507 -7.866221 | 1.0000 1.0000 0.0000 | 0.9 | -2.868653 -1.720289 -0.900781 | 0.0928 1.0000 1.0000Figure C.62: Dunn’s test on RG FR scores by asymmetric division rate137 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 32.0644, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -0.954507 | 1.0000 | 0.2 | -1.244776 -0.290268 | 1.0000 1.0000 | 0.3 | -3.620679 -2.666171 -2.375902 | 0.0066 0.1726 0.3939 | 0.4 | -2.794412 -1.839904 -1.549635 0.826267 | 0.1170 1.0000 1.0000 1.0000 | 0.5 | -2.389725 -1.435217 -1.144948 1.230954 0.404686 | 0.3794 1.0000 1.0000 1.0000 1.0000 | 0.6 | -2.294504 -1.339996 -1.049728 1.326174 0.499907 0.095220 | 0.4896 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.7 | -3.540817 -2.586309 -2.296040 0.079862 -0.746405 -1.151091 | 0.0090 0.2183 0.4877 1.0000 1.0000 1.0000 | 0.8 | -3.642948 -2.688440 -2.398172 -0.022269 -0.848536 -1.253223 | 0.0061 0.1615 0.3707 1.0000 1.0000 1.0000 | 0.9 | -3.937056 -2.982549 -2.692280 -0.316377 -1.142644 -1.547331 | 0.0019 0.0643 0.1597 1.0000 1.0000 1.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -1.246312 -3.642948 -2.688440 | 1.0000 0.0061 0.1615 | 0.8 | -1.348443 -0.102131 -3.937056 | 1.0000 1.0000 0.0019 | 0.9 | -1.642552 -0.396239 -0.294108 | 1.0000 1.0000 1.0000Figure C.63: Dunn’s test on RG FP scores by asymmetric division rate138 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 104.2992, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -1.258244 | 1.0000 | 0.2 | -2.839388 -1.581143 | 0.1017 1.0000 | 0.3 | -3.451077 -2.192832 -0.611688 | 0.0126 0.6372 1.0000 | 0.4 | -3.468511 -2.210266 -0.629122 -0.017433 | 0.0118 0.6095 1.0000 1.0000 | 0.5 | -5.181543 -3.923298 -2.342154 -1.730465 -1.713032 | 0.0000 0.0020 0.4314 1.0000 1.0000 | 0.6 | -6.253325 -4.995080 -3.413936 -2.802248 -2.784814 -1.071782 | 0.0000 0.0000 0.0144 0.1142 0.1205 1.0000 | 0.7 | -5.807634 -4.549389 -2.968245 -2.356556 -2.339122 -0.626090 | 0.0000 0.0001 0.0674 0.4150 0.4349 1.0000 | 0.8 | -6.833179 -5.574934 -3.993790 -3.382101 -3.364668 -1.651636 | 0.0000 0.0000 0.0015 0.0162 0.0172 1.0000 | 0.9 | -7.065879 -5.807634 -4.226490 -3.614801 -3.597367 -1.884335 | 0.0000 0.0000 0.0005 0.0068 0.0072 1.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | 0.445691 -6.833179 -5.574934 | 1.0000 0.0000 0.0000 | 0.8 | -0.579853 -1.025545 -7.065879 | 1.0000 1.0000 0.0000 | 0.9 | -0.812553 -1.258244 -0.232699 | 1.0000 1.0000 1.0000Figure C.64: Dunn’s test on RG FC scores by asymmetric division rate139 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 8.4267, df = 9, p-value = 0.49 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 0.361648 | 1.0000 | 0.2 | 0.238866 -0.122782 | 1.0000 1.0000 | 0.3 | 0.672696 0.311047 0.433829 | 1.0000 1.0000 1.0000 | 0.4 | 0.176359 -0.185289 -0.062507 -0.496337 | 1.0000 1.0000 1.0000 1.0000 | 0.5 | 1.637837 1.276188 1.398970 0.965141 1.461478 | 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.6 | 1.804523 1.442874 1.565656 1.131826 1.628163 0.166685 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.7 | 1.041786 0.680137 0.802919 0.369090 0.865427 -0.596050 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.8 | -0.077389 -0.439038 -0.316256 -0.750086 -0.253749 -1.715227 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.9 | 0.052089 -0.309559 -0.186777 -0.620607 -0.124270 -1.585748 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.762736 -0.077389 -0.439038 | 1.0000 1.0000 1.0000 | 0.8 | -1.881913 -1.119176 0.052089 | 1.0000 1.0000 1.0000 | 0.9 | -1.752434 -0.989697 0.129479 | 1.0000 1.0000 1.0000Figure C.65: Dunn’s test on CLG MS scores by asymmetric division rate140 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 54.288, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 0.445748 | 1.0000 | 0.2 | -0.227711 -0.673460 | 1.0000 1.0000 | 0.3 | 0.371333 -0.074415 0.599044 | 1.0000 1.0000 1.0000 | 0.4 | 0.916054 0.470305 1.143765 0.544721 | 1.0000 1.0000 1.0000 1.0000 | 0.5 | 2.444548 1.998799 2.672259 2.073214 1.528493 | 0.3263 1.0000 0.1695 0.8584 1.0000 | 0.6 | 3.807839 3.362090 4.035550 3.436506 2.891785 1.363291 | 0.0032 0.0174 0.0012 0.0133 0.0862 1.0000 | 0.7 | 2.806207 2.360458 3.033918 2.434874 1.890152 0.361659 | 0.1128 0.4107 0.0543 0.3352 1.0000 1.0000 | 0.8 | 3.629242 3.183493 3.856953 3.257909 2.713188 1.184694 | 0.0064 0.0327 0.0026 0.0253 0.1499 1.0000 | 0.9 | 4.321306 3.875557 4.549017 3.949973 3.405251 1.876758 | 0.0003 0.0024 0.0001 0.0018 0.0149 1.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -1.001632 3.629242 3.183493 | 1.0000 0.0064 0.0327 | 0.8 | -0.178597 0.823035 4.321306 | 1.0000 1.0000 0.0003 | 0.9 | 0.513466 1.515098 0.692063 | 1.0000 1.0000 1.0000Figure C.66: Dunn’s test on CLG MMS scores by asymmetric division rate141 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 140.3138, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 0.997268 | 1.0000 | 0.2 | 0.800047 -0.197221 | 1.0000 1.0000 | 0.3 | 2.367396 1.370128 1.567349 | 0.4031 1.0000 1.0000 | 0.4 | 2.554198 1.556929 1.754150 0.186801 | 0.2395 1.0000 1.0000 1.0000 | 0.5 | 3.548490 2.551221 2.748442 1.181093 0.994291 | 0.0087 0.2415 0.1347 1.0000 1.0000 | 0.6 | 5.839975 4.842707 5.039928 3.472578 3.285777 2.291485 | 0.0000 0.0000 0.0000 0.0116 0.0229 0.4935 | 0.7 | 6.149575 5.152306 5.349527 3.782178 3.595376 2.601085 | 0.0000 0.0000 0.0000 0.0035 0.0073 0.2091 | 0.8 | 6.806731 5.809462 6.006683 4.439334 4.252532 3.258240 | 0.0000 0.0000 0.0000 0.0002 0.0005 0.0252 | 0.9 | 7.872468 6.875200 7.072421 5.505072 5.318270 4.323978 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | 0.309599 6.806731 5.809462 | 1.0000 0.0000 0.0000 | 0.8 | 0.966755 0.657155 7.872468 | 1.0000 1.0000 0.0000 | 0.9 | 2.032493 1.722893 1.065737 | 0.9473 1.0000 1.0000Figure C.67: Dunn’s test on CLG RF scores by asymmetric division rate142 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 43.3897, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 2.887158 | 0.0875 | 0.2 | 2.500764 -0.386393 | 0.2788 1.0000 | 0.3 | 4.618114 1.730955 2.117349 | 0.0001 1.0000 0.7702 | 0.4 | 4.721599 1.834440 2.220834 0.103485 | 0.0001 1.0000 0.5931 1.0000 | 0.5 | 3.758964 0.871805 1.258199 -0.859149 -0.962634 | 0.0038 1.0000 1.0000 1.0000 1.0000 | 0.6 | 3.784277 0.897118 1.283512 -0.833836 -0.937321 0.025312 | 0.0035 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.7 | 2.909493 0.022334 0.408728 -1.708620 -1.812105 -0.849470 | 0.0815 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.8 | 1.666928 -1.220230 -0.833836 -2.951185 -3.054670 -2.092036 | 1.0000 1.0000 1.0000 0.0712 0.0507 0.8198 | 0.9 | 0.907541 -1.979617 -1.593223 -3.710572 -3.814057 -2.851423 | 1.0000 1.0000 1.0000 0.0047 0.0031 0.0979Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.874783 1.666928 -1.220230 | 1.0000 1.0000 1.0000 | 0.8 | -2.117349 -1.242565 0.907541 | 0.7702 1.0000 1.0000 | 0.9 | -2.876735 -2.001952 -0.759386 | 0.0904 1.0000 1.0000Figure C.68: Dunn’s test on CLG FR scores by asymmetric division rate143 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 210.2701, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 0.967057 | 1.0000 | 0.2 | 1.954959 0.987902 | 1.0000 1.0000 | 0.3 | 2.832681 1.865623 0.877721 | 0.1039 1.0000 1.0000 | 0.4 | 3.551087 2.584030 1.596128 0.718406 | 0.0086 0.2197 1.0000 1.0000 | 0.5 | 4.603758 3.636700 2.648798 1.771077 1.052670 | 0.0001 0.0062 0.1818 1.0000 1.0000 | 0.6 | 6.907126 5.940069 4.952166 4.074445 3.356038 2.303368 | 0.0000 0.0000 0.0000 0.0010 0.0178 0.4783 | 0.7 | 7.506418 6.539361 5.551459 4.673737 3.955331 2.902660 | 0.0000 0.0000 0.0000 0.0001 0.0017 0.0833 | 0.8 | 8.982688 8.015631 7.027729 6.150007 5.431600 4.378930 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0003 | 0.9 | 9.766608 8.799550 7.811648 6.933926 6.215520 5.162849 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | 0.599292 8.982688 8.015631 | 1.0000 0.0000 0.0000 | 0.8 | 2.075562 1.476269 9.766608 | 0.8535 1.0000 0.0000 | 0.9 | 2.859481 2.260189 0.783919 | 0.0955 0.5357 1.0000Figure C.69: Dunn’s test on CLG FP scores by asymmetric division rate144 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 204.0847, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 1.109606 | 1.0000 | 0.2 | 2.782574 1.672968 | 0.1213 1.0000 | 0.3 | 4.323818 3.214212 1.541244 | 0.0003 0.0294 1.0000 | 0.4 | 4.978717 3.869110 2.196142 0.654898 | 0.0000 0.0025 0.6318 1.0000 | 0.5 | 6.603312 5.493705 3.820737 2.279493 1.624594 | 0.0000 0.0000 0.0030 0.5093 1.0000 | 0.6 | 7.746408 6.636801 4.963833 3.422589 2.767690 1.143095 | 0.0000 0.0000 0.0000 0.0140 0.1270 1.0000 | 0.7 | 8.636474 7.526868 5.853900 4.312655 3.657757 2.033162 | 0.0000 0.0000 0.0000 0.0004 0.0057 0.9458 | 0.8 | 8.934155 7.824549 6.151581 4.610337 3.955438 2.330843 | 0.0000 0.0000 0.0000 0.0001 0.0017 0.4446 | 0.9 | 9.449888 8.340282 6.667313 5.126069 4.471171 2.846576 | 0.0000 0.0000 0.0000 0.0000 0.0002 0.0994Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | 0.890066 8.934155 7.824549 | 1.0000 0.0000 0.0000 | 0.8 | 1.187747 0.297681 9.449888 | 1.0000 1.0000 0.0000 | 0.9 | 1.703480 0.813413 0.515732 | 1.0000 1.0000 1.0000Figure C.70: Dunn’s test on CLG FC scores by asymmetric division rate145 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 188.3833, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -0.382483 | 1.0000 | 0.2 | -1.170518 -0.788034 | 1.0000 1.0000 | 0.3 | -2.094728 -1.712245 -0.924210 | 0.8144 1.0000 1.0000 | 0.4 | -3.346357 -2.963873 -2.175839 -1.251628 | 0.0184 0.0684 0.6653 1.0000 | 0.5 | -4.648586 -4.266103 -3.478068 -2.553857 -1.302229 | 0.0001 0.0004 0.0114 0.2397 1.0000 | 0.6 | -6.018531 -5.636048 -4.848013 -3.923803 -2.672174 -1.369945 | 0.0000 0.0000 0.0000 0.0020 0.1696 1.0000 | 0.7 | -6.988134 -6.605651 -5.817616 -4.893405 -3.641777 -2.339548 | 0.0000 0.0000 0.0000 0.0000 0.0061 0.4344 | 0.8 | -8.169070 -7.786587 -6.998552 -6.074341 -4.822713 -3.520484 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0097 | 0.9 | -8.741307 -8.358824 -7.570789 -6.646578 -5.394950 -4.092720 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0010Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.969602 -8.169070 -7.786587 | 1.0000 0.0000 0.0000 | 0.8 | -2.150538 -1.180935 -8.741307 | 0.7090 1.0000 0.0000 | 0.9 | -2.722775 -1.753172 -0.572236 | 0.1457 1.0000 1.0000Figure C.71: Dunn’s test on PEC MS scores by asymmetric division rate146 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 160.7072, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -0.913063 | 1.0000 | 0.2 | -1.646788 -0.733725 | 1.0000 1.0000 | 0.3 | -2.365630 -1.452567 -0.718842 | 0.4050 1.0000 1.0000 | 0.4 | -3.591234 -2.678171 -1.944445 -1.225603 | 0.0074 0.1666 1.0000 1.0000 | 0.5 | -4.772189 -3.859126 -3.125401 -2.406558 -1.180955 | 0.0000 0.0026 0.0400 0.3623 1.0000 | 0.6 | -5.428524 -4.515460 -3.781735 -3.062893 -1.837289 -0.656334 | 0.0000 0.0001 0.0035 0.0493 1.0000 1.0000 | 0.7 | -6.762772 -5.849709 -5.115983 -4.397141 -3.171537 -1.990582 | 0.0000 0.0000 0.0000 0.0002 0.0341 1.0000 | 0.8 | -7.976470 -7.063406 -6.329681 -5.610839 -4.385235 -3.204280 | 0.0000 0.0000 0.0000 0.0000 0.0003 0.0305 | 0.9 | -8.557645 -7.644582 -6.910857 -6.192014 -4.966411 -3.785456 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0035Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -1.334248 -7.976470 -7.063406 | 1.0000 0.0000 0.0000 | 0.8 | -2.547946 -1.213697 -8.557645 | 0.2438 1.0000 0.0000 | 0.9 | -3.129121 -1.794873 -0.581175 | 0.0394 1.0000 1.0000Figure C.72: Dunn’s test on PEC MMS scores by asymmetric division rate147 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 72.8186, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -1.285277 | 1.0000 | 0.2 | -1.158014 0.127262 | 1.0000 1.0000 | 0.3 | -1.676740 -0.391462 -0.518725 | 1.0000 1.0000 1.0000 | 0.4 | -3.398133 -2.112856 -2.240118 -1.721393 | 0.0153 0.7788 0.5644 1.0000 | 0.5 | -3.663822 -2.378544 -2.505807 -1.987082 -0.265688 | 0.0056 0.3911 0.2749 1.0000 1.0000 | 0.6 | -3.575259 -2.289981 -2.417244 -1.898519 -0.177125 0.088562 | 0.0079 0.4955 0.3519 1.0000 1.0000 1.0000 | 0.7 | -4.559125 -3.273847 -3.401110 -2.882385 -1.160991 -0.895303 | 0.0001 0.0239 0.0151 0.0888 1.0000 1.0000 | 0.8 | -5.904685 -4.619407 -4.746670 -4.227945 -2.506551 -2.240862 | 0.0000 0.0001 0.0000 0.0005 0.2743 0.5633 | 0.9 | -5.761049 -4.475772 -4.603034 -4.084309 -2.362915 -2.097227 | 0.0000 0.0002 0.0001 0.0010 0.4080 0.8094Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.983866 -5.904685 -4.619407 | 1.0000 0.0000 0.0001 | 0.8 | -2.329425 -1.345559 -5.761049 | 0.4463 1.0000 0.0000 | 0.9 | -2.185790 -1.201924 0.143635 | 0.6487 1.0000 1.0000Figure C.73: Dunn’s test on PEC RF scores by asymmetric division rate148 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 194.6317, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -0.542351 | 1.0000 | 0.2 | -1.867269 -1.324918 | 1.0000 1.0000 | 0.3 | -1.768795 -1.226444 0.098473 | 1.0000 1.0000 1.0000 | 0.4 | -2.922877 -2.380526 -1.055607 -1.154081 | 0.0780 0.3890 1.0000 1.0000 | 0.5 | -4.705101 -4.162750 -2.837831 -2.936305 -1.782223 | 0.0001 0.0007 0.1022 0.0747 1.0000 | 0.6 | -5.872610 -5.330259 -4.005341 -4.103815 -2.949733 -1.167509 | 0.0000 0.0000 0.0014 0.0009 0.0716 1.0000 | 0.7 | -6.752905 -6.210554 -4.885636 -4.984109 -3.830028 -2.047804 | 0.0000 0.0000 0.0000 0.0000 0.0029 0.9130 | 0.8 | -8.449338 -7.906987 -6.582068 -6.680542 -5.526461 -3.744237 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0041 | 0.9 | -9.268459 -8.726108 -7.401190 -7.499664 -6.345582 -4.563358 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.880294 -8.449338 -7.906987 | 1.0000 0.0000 0.0000 | 0.8 | -2.576727 -1.696432 -9.268459 | 0.2244 1.0000 0.0000 | 0.9 | -3.395849 -2.515554 -0.819121 | 0.0154 0.2674 1.0000Figure C.74: Dunn’s test on PEC FR scores by asymmetric division rate149 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 5.9494, df = 9, p-value = 0.74 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -0.449394 | 1.0000 | 0.2 | -0.055924 0.393469 | 1.0000 1.0000 | 0.3 | -1.383136 -0.933742 -1.327212 | 1.0000 1.0000 1.0000 | 0.4 | -1.442057 -0.992662 -1.386132 -0.058920 | 1.0000 1.0000 1.0000 1.0000 | 0.5 | -0.627155 -0.177760 -0.571230 0.755981 0.814902 | 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.6 | -0.378490 0.070904 -0.322565 1.004646 1.063567 0.248665 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.7 | -0.428422 0.020971 -0.372498 0.954713 1.013634 0.198732 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.8 | -1.056576 -0.607182 -1.000652 0.326560 0.385480 -0.429421 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.9 | 0.238678 0.688073 0.294603 1.621815 1.680735 0.865833 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.049932 -1.056576 -0.607182 | 1.0000 1.0000 1.0000 | 0.8 | -0.678086 -0.628153 0.238678 | 1.0000 1.0000 1.0000 | 0.9 | 0.617168 0.667101 1.295255 | 1.0000 1.0000 1.0000Figure C.75: Dunn’s test on PEC FP scores by asymmetric division rate150 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 127.1582, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -1.731254 | 1.0000 | 0.2 | -3.249557 -1.518303 | 0.0260 1.0000 | 0.3 | -4.226890 -2.495636 -0.977332 | 0.0005 0.2829 1.0000 | 0.4 | -5.092143 -3.360889 -1.842586 -0.865253 | 0.0000 0.0175 1.0000 1.0000 | 0.5 | -6.914555 -5.183301 -3.664998 -2.687665 -1.822412 | 0.0000 0.0000 0.0056 0.1619 1.0000 | 0.6 | -6.535727 -4.804473 -3.286169 -2.308836 -1.443583 0.378828 | 0.0000 0.0000 0.0229 0.4714 1.0000 1.0000 | 0.7 | -7.061006 -5.329752 -3.811448 -2.834116 -1.968862 -0.146450 | 0.0000 0.0000 0.0031 0.1034 1.0000 1.0000 | 0.8 | -7.480930 -5.749676 -4.231373 -3.254040 -2.388786 -0.566374 | 0.0000 0.0000 0.0005 0.0256 0.3803 1.0000 | 0.9 | -7.792511 -6.061257 -4.542954 -3.565621 -2.700367 -0.877955 | 0.0000 0.0000 0.0001 0.0082 0.1558 1.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.525279 -7.480930 -5.749676 | 1.0000 0.0000 0.0000 | 0.8 | -0.945203 -0.419924 -7.792511 | 1.0000 1.0000 0.0000 | 0.9 | -1.256784 -0.731505 -0.311580 | 1.0000 1.0000 1.0000Figure C.76: Dunn’s test on PEC FC scores by asymmetric division rate151 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 230.1078, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 1.032891 | 1.0000 | 0.2 | 1.646078 0.613186 | 1.0000 1.0000 | 0.3 | 2.584462 1.551570 0.938383 | 0.2194 1.0000 1.0000 | 0.4 | 3.823485 2.790594 2.177407 1.239023 | 0.0030 0.1184 0.6626 1.0000 | 0.5 | 4.931537 3.898645 3.285459 2.347075 1.108051 | 0.0000 0.0022 0.0229 0.4257 1.0000 | 0.6 | 7.190802 6.157910 5.544724 4.606340 3.367316 2.259264 | 0.0000 0.0000 0.0000 0.0001 0.0171 0.5370 | 0.7 | 7.822593 6.789701 6.176514 5.238131 3.999107 2.891055 | 0.0000 0.0000 0.0000 0.0000 0.0014 0.0864 | 0.8 | 8.857717 7.824825 7.211639 6.273255 5.034231 3.926179 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0019 | 0.9 | 10.33933 9.306445 8.693258 7.754874 6.515851 5.407799 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | 0.631790 8.857717 7.824825 | 1.0000 0.0000 0.0000 | 0.8 | 1.666914 1.035124 10.33933 | 1.0000 1.0000 0.0000 | 0.9 | 3.148534 2.516743 1.481619 | 0.0369 0.2665 1.0000Figure C.77: Dunn’s test on MST MS scores by asymmetric division rate152 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 233.5122, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 1.169928 | 1.0000 | 0.2 | 1.833781 0.663852 | 1.0000 1.0000 | 0.3 | 2.762579 1.592650 0.928798 | 0.1290 1.0000 1.0000 | 0.4 | 3.826827 2.656898 1.993046 1.064248 | 0.0029 0.1774 1.0000 1.0000 | 0.5 | 4.963265 3.793337 3.129484 2.200686 1.136438 | 0.0000 0.0033 0.0394 0.6246 1.0000 | 0.6 | 7.426665 6.256736 5.592883 4.664085 3.599837 2.463399 | 0.0000 0.0000 0.0000 0.0001 0.0072 0.3097 | 0.7 | 8.047352 6.877423 6.213571 5.284772 4.220524 3.084086 | 0.0000 0.0000 0.0000 0.0000 0.0005 0.0459 | 0.8 | 9.016338 7.846410 7.182557 6.253759 5.189511 4.053073 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0011 | 0.9 | 10.42218 9.252259 8.588407 7.659608 6.595360 5.458922 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | 0.620687 9.016338 7.846410 | 1.0000 0.0000 0.0000 | 0.8 | 1.589673 0.968986 10.42218 | 1.0000 1.0000 0.0000 | 0.9 | 2.995523 2.374835 1.405849 | 0.0616 0.3950 1.0000Figure C.78: Dunn’s test on MST MMS scores by asymmetric division rate153 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 206.3323, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 0.830679 | 1.0000 | 0.2 | 1.371067 0.540388 | 1.0000 1.0000 | 0.3 | 2.445144 1.614465 1.074077 | 0.3258 1.0000 1.0000 | 0.4 | 3.351001 2.520322 1.979934 0.905857 | 0.0181 0.2638 1.0000 1.0000 | 0.5 | 3.990386 3.159707 2.619319 1.545242 0.639384 | 0.0015 0.0355 0.1982 1.0000 1.0000 | 0.6 | 6.757084 5.926405 5.386017 4.311940 3.406082 2.766697 | 0.0000 0.0000 0.0000 0.0004 0.0148 0.1274 | 0.7 | 7.351809 6.521130 5.980741 4.906664 4.000807 3.361422 | 0.0000 0.0000 0.0000 0.0000 0.0014 0.0174 | 0.8 | 8.100611 7.269932 6.729544 5.655467 4.749609 4.110224 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0009 | 0.9 | 9.770157 8.939478 8.399090 7.325013 6.419156 5.779771 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | 0.594724 8.100611 7.269932 | 1.0000 0.0000 0.0000 | 0.8 | 1.343526 0.748802 9.770157 | 1.0000 1.0000 0.0000 | 0.9 | 3.013073 2.418348 1.669546 | 0.0582 0.3508 1.0000Figure C.79: Dunn’s test on MST RF scores by asymmetric division rate154 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 23.0449, df = 9, p-value = 0.01 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 0.634504 | 1.0000 | 0.2 | -0.714491 -1.348996 | 1.0000 1.0000 | 0.3 | -0.028759 -0.663263 0.685732 | 1.0000 1.0000 1.0000 | 0.4 | -0.382859 -1.017364 0.331631 -0.354100 | 1.0000 1.0000 1.0000 1.0000 | 0.5 | -2.183918 -2.818422 -1.469426 -2.155158 -1.801058 | 0.6518 0.1086 1.0000 0.7009 1.0000 | 0.6 | -1.562894 -2.197398 -0.848402 -1.534135 -1.180034 0.621023 | 1.0000 0.6298 1.0000 1.0000 1.0000 1.0000 | 0.7 | -1.802855 -2.437360 -1.088364 -1.774096 -1.419996 0.381062 | 1.0000 0.3329 1.0000 1.0000 1.0000 1.0000 | 0.8 | -2.239639 -2.874143 -1.525147 -2.210879 -1.856779 -0.055721 | 0.5651 0.0912 1.0000 0.6085 1.0000 1.0000 | 0.9 | -2.620701 -3.255206 -1.906209 -2.591942 -2.237841 -0.436783 | 0.1974 0.0255 1.0000 0.2147 0.5677 1.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.239961 -2.239639 -2.874143 | 1.0000 0.5651 0.0912 | 0.8 | -0.676744 -0.436783 -2.620701 | 1.0000 1.0000 0.1974 | 0.9 | -1.057807 -0.817845 -0.381062 | 1.0000 1.0000 1.0000Figure C.80: Dunn’s test on MST FR scores by asymmetric division rate155 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 227.0922, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 0.401086 | 1.0000 | 0.2 | 1.343155 0.942069 | 1.0000 1.0000 | 0.3 | 1.724150 1.323064 0.380994 | 1.0000 1.0000 1.0000 | 0.4 | 3.297986 2.896900 1.954830 1.573836 | 0.0219 0.0848 1.0000 1.0000 | 0.5 | 4.139598 3.738511 2.796442 2.415447 0.841611 | 0.0008 0.0042 0.1163 0.3536 1.0000 | 0.6 | 6.394313 5.993227 5.051158 4.670163 3.096327 2.254715 | 0.0000 0.0000 0.0000 0.0001 0.0441 0.5434 | 0.7 | 7.129514 6.728428 5.786359 5.405364 3.831528 2.989916 | 0.0000 0.0000 0.0000 0.0000 0.0029 0.0628 | 0.8 | 8.557501 8.156415 7.214345 6.833350 5.259514 4.417903 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0002 | 0.9 | 10.03832 9.637234 8.695165 8.314170 6.740334 5.898722 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | 0.735200 8.557501 8.156415 | 1.0000 0.0000 0.0000 | 0.8 | 2.163187 1.427986 10.03832 | 0.6869 1.0000 0.0000 | 0.9 | 3.644007 2.908806 1.480819 | 0.0060 0.0816 1.0000Figure C.81: Dunn’s test on MST FP scores by asymmetric division rate156 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 263.3725, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 1.011452 | 1.0000 | 0.2 | 2.309446 1.297993 | 0.4707 1.0000 | 0.3 | 3.868676 2.857223 1.559230 | 0.0025 0.0962 1.0000 | 0.4 | 4.603263 3.591811 2.293817 0.734587 | 0.0001 0.0074 0.4905 1.0000 | 0.5 | 6.291995 5.280542 3.982548 2.423318 1.688731 | 0.0000 0.0000 0.0015 0.3460 1.0000 | 0.6 | 7.867599 6.856146 5.558152 3.998922 3.264335 1.575603 | 0.0000 0.0000 0.0000 0.0014 0.0247 1.0000 | 0.7 | 8.421330 7.409878 6.111884 4.552654 3.818066 2.129335 | 0.0000 0.0000 0.0000 0.0001 0.0030 0.7476 | 0.8 | 10.07359 9.062140 7.764146 6.204916 5.470329 3.781598 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0035 | 0.9 | 11.15649 10.14504 8.847048 7.287818 6.553231 4.864500 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | 0.553731 10.07359 9.062140 | 1.0000 0.0000 0.0000 | 0.8 | 2.205994 1.652262 11.15649 | 0.6161 1.0000 0.0000 | 0.9 | 3.288896 2.735164 1.082902 | 0.0226 0.1403 1.0000Figure C.82: Dunn’s test on MST FC scores by asymmetric division rate157 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 258.2899, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -1.354480 | 1.0000 | 0.2 | -2.396216 -1.041736 | 0.3727 1.0000 | 0.3 | -4.049293 -2.694812 -1.653076 | 0.0012 0.1585 1.0000 | 0.4 | -4.465541 -3.111060 -2.069324 -0.416247 | 0.0002 0.0419 0.8666 1.0000 | 0.5 | -6.423915 -5.069435 -4.027699 -2.374622 -1.958374 | 0.0000 0.0000 0.0013 0.3953 1.0000 | 0.6 | -7.942214 -6.587734 -5.545997 -3.892921 -3.476673 -1.518298 | 0.0000 0.0000 0.0000 0.0022 0.0114 1.0000 | 0.7 | -8.906509 -7.552028 -6.510292 -4.857215 -4.440968 -2.482593 | 0.0000 0.0000 0.0000 0.0000 0.0002 0.2935 | 0.8 | -9.875271 -8.520791 -7.479055 -5.825978 -5.409730 -3.451356 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0125 | 0.9 | -11.11880 -9.764322 -8.722585 -7.069509 -6.653261 -4.694886 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.964294 -9.875271 -8.520791 | 1.0000 0.0000 0.0000 | 0.8 | -1.933057 -0.968762 -11.11880 | 1.0000 1.0000 0.0000 | 0.9 | -3.176587 -2.212293 -1.243530 | 0.0335 0.6063 1.0000Figure C.83: Dunn’s test on NJ MS scores by asymmetric division rate158 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 260.2732, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -1.340045 | 1.0000 | 0.2 | -2.501865 -1.161819 | 0.2780 1.0000 | 0.3 | -4.043252 -2.703207 -1.541387 | 0.0012 0.1545 1.0000 | 0.4 | -4.589859 -3.249814 -2.087994 -0.546607 | 0.0001 0.0260 0.8280 1.0000 | 0.5 | -6.404178 -5.064132 -3.902312 -2.360925 -1.814318 | 0.0000 0.0000 0.0021 0.4102 1.0000 | 0.6 | -7.970919 -6.630874 -5.469054 -3.927667 -3.381059 -1.566741 | 0.0000 0.0000 0.0000 0.0019 0.0162 1.0000 | 0.7 | -8.978377 -7.638332 -6.476512 -4.935124 -4.388517 -2.574199 | 0.0000 0.0000 0.0000 0.0000 0.0003 0.2261 | 0.8 | -9.937363 -8.597318 -7.435498 -5.894110 -5.347503 -3.533185 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0092 | 0.9 | -11.18419 -9.844149 -8.682329 -7.140941 -6.594334 -4.780016 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -1.007457 -9.937363 -8.597318 | 1.0000 0.0000 0.0000 | 0.8 | -1.966443 -0.958986 -11.18419 | 1.0000 1.0000 0.0000 | 0.9 | -3.213274 -2.205817 -1.246831 | 0.0295 0.6164 1.0000Figure C.84: Dunn’s test on NJ MMS scores by asymmetric division rate159 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 243.067, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -1.368038 | 1.0000 | 0.2 | -1.985405 -0.617367 | 1.0000 1.0000 | 0.3 | -3.799527 -2.431488 -1.814121 | 0.0033 0.3383 1.0000 | 0.4 | -3.986450 -2.618411 -2.001044 -0.186923 | 0.0015 0.1988 1.0000 1.0000 | 0.5 | -6.479750 -5.111711 -4.494344 -2.680223 -2.493299 | 0.0000 0.0000 0.0002 0.1655 0.2848 | 0.6 | -7.735336 -6.367298 -5.749931 -3.935809 -3.748886 -1.255586 | 0.0000 0.0000 0.0000 0.0019 0.0040 1.0000 | 0.7 | -8.481539 -7.113501 -6.496133 -4.682012 -4.495089 -2.001789 | 0.0000 0.0000 0.0000 0.0001 0.0002 1.0000 | 0.8 | -9.443708 -8.075670 -7.458303 -5.644181 -5.457258 -2.963958 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0683 | 0.9 | -10.63748 -9.269445 -8.652078 -6.837957 -6.651034 -4.157734 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0007Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.746202 -9.443708 -8.075670 | 1.0000 0.0000 0.0000 | 0.8 | -1.708372 -0.962169 -10.63748 | 1.0000 1.0000 0.0000 | 0.9 | -2.902147 -2.155944 -1.193775 | 0.0834 0.6995 1.0000Figure C.85: Dunn’s test on NJ RF scores by asymmetric division rate160 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 257.7784, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -0.798261 | 1.0000 | 0.2 | -2.260004 -1.461742 | 0.5360 1.0000 | 0.3 | -3.301021 -2.502759 -1.041016 | 0.0217 0.2773 1.0000 | 0.4 | -4.081411 -3.283149 -1.821407 -0.780390 | 0.0010 0.0231 1.0000 1.0000 | 0.5 | -5.849948 -5.051686 -3.589944 -2.548927 -1.768537 | 0.0000 0.0000 0.0074 0.2431 1.0000 | 0.6 | -7.549978 -6.751716 -5.289973 -4.248956 -3.468566 -1.700029 | 0.0000 0.0000 0.0000 0.0005 0.0118 1.0000 | 0.7 | -8.337070 -7.538808 -6.077065 -5.036049 -4.255658 -2.487121 | 0.0000 0.0000 0.0000 0.0000 0.0005 0.2898 | 0.8 | -9.614884 -8.816623 -7.354880 -6.313863 -5.533473 -3.764936 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0037 | 0.9 | -11.01035 -10.21209 -8.750349 -7.709332 -6.928942 -5.160405 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -0.787092 -9.614884 -8.816623 | 1.0000 0.0000 0.0000 | 0.8 | -2.064906 -1.277814 -11.01035 | 0.8760 1.0000 0.0000 | 0.9 | -3.460375 -2.673283 -1.395468 | 0.0121 0.1690 1.0000Figure C.86: Dunn’s test on NJ FR scores by asymmetric division rate161 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 7.0472, df = 9, p-value = 0.63 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | 0.000000 | 1.0000 | 0.2 | 0.000000 0.000000 | 1.0000 1.0000 | 0.3 | 0.000000 0.000000 0.000000 | 1.0000 1.0000 1.0000 | 0.4 | 0.000000 0.000000 0.000000 0.000000 | 1.0000 1.0000 1.0000 1.0000 | 0.5 | -1.299637 -1.299637 -1.299637 -1.299637 -1.299637 | 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.6 | -1.299637 -1.299637 -1.299637 -1.299637 -1.299637 0.000000 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.7 | -1.286683 -1.286683 -1.286683 -1.286683 -1.286683 0.012953 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.8 | 0.000000 0.000000 0.000000 0.000000 0.000000 1.299637 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 | 0.9 | 0.000000 0.000000 0.000000 0.000000 0.000000 1.299637 | 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | 0.012953 0.000000 0.000000 | 1.0000 1.0000 1.0000 | 0.8 | 1.299637 1.286683 0.000000 | 1.0000 1.0000 1.0000 | 0.9 | 1.299637 1.286683 0.000000 | 1.0000 1.0000 1.0000Figure C.87: Dunn’s test on NJ FP scores by asymmetric division rate162 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 256.5053, df = 9, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.1 0.2 0.3 0.4 0.5---------+------------------------------------------------------------------ 0.1 | -1.155975 | 1.0000 | 0.2 | -2.688592 -1.532617 | 0.1614 1.0000 | 0.3 | -3.527475 -2.371499 -0.838882 | 0.0094 0.3986 1.0000 | 0.4 | -5.338479 -4.182503 -2.649886 -1.811003 | 0.0000 0.0006 0.1812 1.0000 | 0.5 | -6.447561 -5.291585 -3.758968 -2.920085 -1.109081 | 0.0000 0.0000 0.0038 0.0787 1.0000 | 0.6 | -7.596837 -6.440862 -4.908245 -4.069362 -2.258358 -1.149276 | 0.0000 0.0000 0.0000 0.0011 0.5383 1.0000 | 0.7 | -8.621808 -7.465832 -5.933215 -5.094332 -3.283328 -2.174246 | 0.0000 0.0000 0.0000 0.0000 0.0231 0.6679 | 0.8 | -10.10753 -8.951555 -7.418938 -6.580055 -4.769051 -3.659969 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0057 | 0.9 | -11.09379 -9.937819 -8.405202 -7.566319 -5.755315 -4.646233 | 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001Col Mean-|Row Mean | 0.6 0.7 0.8---------+--------------------------------- 0.7 | -1.024970 -10.10753 -8.951555 | 1.0000 0.0000 0.0000 | 0.8 | -2.510693 -1.485722 -11.09379 | 0.2711 1.0000 0.0000 | 0.9 | -3.496957 -2.471986 -0.986263 | 0.0106 0.3023 1.0000Figure C.88: Dunn’s test on NJ FC scores by asymmetric division rate163C.4 Mutation Rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 14.9528, df = 5, p-value = 0.01 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | 0.793268 | 1.0000 | 0.006 | 1.950945 1.157676 | 0.3830 1.0000 | 0.008 | 2.883036 2.089767 0.932090 | 0.0295 0.2748 1.0000 | 0.01 | 3.215217 2.421948 1.264272 0.332181 | 0.0098 0.1158 1.0000 1.0000 | 0.02 | 1.955903 1.162634 0.004957 -0.927132 -1.259314 | 0.3786 1.0000 1.0000 1.0000 1.0000Figure C.89: Dunn’s test on RG MS scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 8.6481, df = 5, p-value = 0.12 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | 1.458309 | 1.0000 | 0.006 | 2.382607 0.924297 | 0.1289 1.0000 | 0.008 | 2.318178 0.859869 -0.064428 | 0.1533 1.0000 1.0000 | 0.01 | 2.246316 0.788007 -0.136290 -0.071862 | 0.1851 1.0000 1.0000 1.0000 | 0.02 | 1.154752 -0.303556 -1.227854 -1.163425 -1.091563 | 1.0000 1.0000 1.0000 1.0000 1.0000Figure C.90: Dunn’s test on RG MMS scores by mutation rate164 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 75.2519, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | 1.396058 | 1.0000 | 0.006 | 3.620329 2.224270 | 0.0022 0.1960 | 0.008 | 5.648705 4.252646 2.028376 | 0.0000 0.0002 0.3189 | 0.01 | 6.469478 5.073420 2.849149 0.820773 | 0.0000 0.0000 0.0329 1.0000 | 0.02 | 6.432283 5.036224 2.811954 0.783578 -0.037195 | 0.0000 0.0000 0.0369 1.0000 1.0000Figure C.91: Dunn’s test on RG RF scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 18.115, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -0.245976 | 1.0000 | 0.006 | -0.432322 -0.186345 | 1.0000 1.0000 | 0.008 | -0.520526 -0.274549 -0.088203 | 1.0000 1.0000 1.0000 | 0.01 | -2.572816 -2.326839 -2.140493 -2.052289 | 0.0757 0.1498 0.2424 0.3011 | 0.02 | -3.145519 -2.899542 -2.713196 -2.624993 -0.572703 | 0.0124 0.0280 0.0500 0.0650 1.0000Figure C.92: Dunn’s test on RG FR scores by mutation rate165 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 67.0958, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | 3.109987 | 0.0140 | 0.006 | 4.638440 1.528453 | 0.0000 0.9480 | 0.008 | 4.916093 1.806106 0.277653 | 0.0000 0.5318 1.0000 | 0.01 | 6.426853 3.316866 1.788412 1.510759 | 0.0000 0.0068 0.5528 0.9814 | 0.02 | 7.244841 4.134854 2.606400 2.328747 0.817987 | 0.0000 0.0003 0.0686 0.1490 1.0000Figure C.93: Dunn’s test on RG FP scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 41.0419, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -0.775261 | 1.0000 | 0.006 | -0.755414 0.019846 | 1.0000 1.0000 | 0.008 | -1.302438 -0.527177 -0.547024 | 1.0000 1.0000 1.0000 | 0.01 | -2.902577 -2.127316 -2.147163 -1.600139 | 0.0278 0.2505 0.2384 0.8218 | 0.02 | -5.502493 -4.727232 -4.747079 -4.200055 -2.599915 | 0.0000 0.0000 0.0000 0.0002 0.0699Figure C.94: Dunn’s test on RG FC scores by mutation rate166 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 3.5468, df = 5, p-value = 0.62 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -0.065658 | 1.0000 | 0.006 | -0.879579 -0.813920 | 1.0000 1.0000 | 0.008 | 0.727201 0.792860 1.606780 | 1.0000 1.0000 0.8108 | 0.01 | -0.676408 -0.610750 0.203170 -1.403610 | 1.0000 1.0000 1.0000 1.0000 | 0.02 | -0.629332 -0.563674 0.250246 -1.356534 0.047076 | 1.0000 1.0000 1.0000 1.0000 1.0000Figure C.95: Dunn’s test on CLG MS scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 3.3302, df = 5, p-value = 0.65 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -0.074333 | 1.0000 | 0.006 | -0.033450 0.040883 | 1.0000 1.0000 | 0.008 | 0.377861 0.452195 0.411311 | 1.0000 1.0000 1.0000 | 0.01 | -1.309507 -1.235173 -1.276057 -1.687368 | 1.0000 1.0000 1.0000 0.6865 | 0.02 | -0.350606 -0.276272 -0.317155 -0.728467 0.958901 | 1.0000 1.0000 1.0000 1.0000 1.0000Figure C.96: Dunn’s test on CLG MMS scores by mutation rate167 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 5.1217, df = 5, p-value = 0.4 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -2.070308 | 0.2882 | 0.006 | -1.366155 0.704152 | 1.0000 1.0000 | 0.008 | -1.601699 0.468608 -0.235544 | 0.8192 1.0000 1.0000 | 0.01 | -1.677322 0.392986 -0.311166 -0.075622 | 0.7011 1.0000 1.0000 1.0000 | 0.02 | -1.585583 0.484724 -0.219427 0.016116 0.091738 | 0.8463 1.0000 1.0000 1.0000 1.0000Figure C.97: Dunn’s test on CLG RF scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 27.9114, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -2.705967 | 0.0511 | 0.006 | -2.027925 0.678041 | 0.3193 1.0000 | 0.008 | -3.470777 -0.764810 -1.442851 | 0.0039 1.0000 1.0000 | 0.01 | -4.272775 -1.566808 -2.244849 -0.801997 | 0.0001 0.8787 0.1858 1.0000 | 0.02 | -4.502094 -1.796127 -2.474168 -1.031316 -0.229319 | 0.0001 0.5436 0.1002 1.0000 1.0000Figure C.98: Dunn’s test on CLG FR scores by mutation rate168 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 8.7445, df = 5, p-value = 0.12 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -2.874961 | 0.0303 | 0.006 | -2.027343 0.847617 | 0.3197 1.0000 | 0.008 | -1.582467 1.292493 0.444875 | 0.8516 1.0000 1.0000 | 0.01 | -1.589903 1.285058 0.437440 -0.007435 | 0.8389 1.0000 1.0000 1.0000 | 0.02 | -1.539095 1.335865 0.488247 0.043372 0.050807 | 0.9284 1.0000 1.0000 1.0000 1.0000Figure C.99: Dunn’s test on CLG FP scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 69.2523, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -3.332786 | 0.0064 | 0.006 | -4.582116 -1.249330 | 0.0000 1.0000 | 0.008 | -4.620538 -1.287751 -0.038421 | 0.0000 1.0000 1.0000 | 0.01 | -4.848590 -1.515804 -0.266474 -0.228052 | 0.0000 0.9718 1.0000 1.0000 | 0.02 | -8.137997 -4.805210 -3.555880 -3.517459 -3.289406 | 0.0000 0.0000 0.0028 0.0033 0.0075Figure C.100: Dunn’s test on CLG FC scores by mutation rate169 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 48.7141, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -1.422436 | 1.0000 | 0.006 | -3.601936 -2.179499 | 0.0024 0.2197 | 0.008 | -4.878164 -3.455727 -1.276227 | 0.0000 0.0041 1.0000 | 0.01 | -5.500170 -4.077734 -1.898234 -0.622006 | 0.0000 0.0003 0.4325 1.0000 | 0.02 | -4.900467 -3.478030 -1.298530 -0.022303 0.599703 | 0.0000 0.0038 1.0000 1.0000 1.0000Figure C.101: Dunn’s test on PEC MS scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 19.5482, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -0.147447 | 1.0000 | 0.006 | -1.783004 -1.635556 | 0.5594 0.7645 | 0.008 | -2.924177 -2.776729 -1.141172 | 0.0259 0.0412 1.0000 | 0.01 | -3.274830 -3.127382 -1.491826 -0.350653 | 0.0079 0.0132 1.0000 1.0000 | 0.02 | -2.412446 -2.264998 -0.629441 0.511731 0.862384 | 0.1188 0.1763 1.0000 1.0000 1.0000Figure C.102: Dunn’s test on PEC MMS scores by mutation rate170 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 7.5105, df = 5, p-value = 0.19 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | 0.817948 | 1.0000 | 0.006 | 1.467349 0.649401 | 1.0000 1.0000 | 0.008 | 1.340939 0.522991 -0.126410 | 1.0000 1.0000 1.0000 | 0.01 | 1.632178 0.814230 0.164828 0.291239 | 0.7698 1.0000 1.0000 1.0000 | 0.02 | 2.601322 1.783374 1.133973 1.260383 0.969144 | 0.0696 0.5589 1.0000 1.0000 1.0000Figure C.103: Dunn’s test on PEC RF scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 18.9884, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -0.743772 | 1.0000 | 0.006 | -0.440065 0.303707 | 1.0000 1.0000 | 0.008 | -3.002361 -2.258589 -2.562296 | 0.0201 0.1793 0.0780 | 0.01 | -3.188305 -2.444532 -2.748239 -0.185943 | 0.0107 0.1088 0.0449 1.0000 | 0.02 | -2.205285 -1.461513 -1.765220 0.797076 0.983019 | 0.2058 1.0000 0.5815 1.0000 1.0000Figure C.104: Dunn’s test on PEC FR scores by mutation rate171 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 63.8122, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | 3.777165 | 0.0012 | 0.006 | 6.030702 2.253536 | 0.0000 0.1817 | 0.008 | 5.978850 2.201684 -0.051851 | 0.0000 0.2077 1.0000 | 0.01 | 6.369729 2.592563 0.339027 0.390878 | 0.0000 0.0714 1.0000 1.0000 | 0.02 | 6.369729 2.592563 0.339027 0.390878 0.000000 | 0.0000 0.0714 1.0000 1.0000 1.0000Figure C.105: Dunn’s test on PEC FP scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 90.063, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -2.324064 | 0.1509 | 0.006 | -3.392514 -1.068449 | 0.0052 1.0000 | 0.008 | -5.839288 -3.515224 -2.446774 | 0.0000 0.0033 0.1081 | 0.01 | -6.477631 -4.153567 -3.085117 -0.638342 | 0.0000 0.0002 0.0153 1.0000 | 0.02 | -8.137323 -5.813259 -4.744809 -2.298034 -1.659691 | 0.0000 0.0000 0.0000 0.1617 0.7273Figure C.106: Dunn’s test on PEC FC scores by mutation rate172 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 10.0918, df = 5, p-value = 0.07 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -2.924390 | 0.0259 | 0.006 | -2.297381 0.627009 | 0.1620 1.0000 | 0.008 | -1.982637 0.941753 0.314743 | 0.3556 1.0000 1.0000 | 0.01 | -2.035921 0.888469 0.261460 -0.053283 | 0.3132 1.0000 1.0000 1.0000 | 0.02 | -2.358100 0.566290 -0.060718 -0.375462 -0.322178 | 0.1378 1.0000 1.0000 1.0000 1.0000Figure C.107: Dunn’s test on MST MS scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 11.2415, df = 5, p-value = 0.05 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -3.226638 | 0.0094 | 0.006 | -1.509813 1.716824 | 0.9832 0.6451 | 0.008 | -1.643689 1.582949 -0.133875 | 0.7518 0.8507 1.0000 | 0.01 | -2.192825 1.033813 -0.683011 -0.549135 | 0.2124 1.0000 1.0000 1.0000 | 0.02 | -2.092418 1.134219 -0.582604 -0.448729 0.100406 | 0.2730 1.0000 1.0000 1.0000 1.0000Figure C.108: Dunn’s test on MST MMS scores by mutation rate173 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 13.146, df = 5, p-value = 0.02 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -2.623216 | 0.0653 | 0.006 | -0.305111 2.318104 | 1.0000 0.1533 | 0.008 | 0.351002 2.974218 0.656114 | 1.0000 0.0220 1.0000 | 0.01 | -0.310072 2.313143 -0.004961 -0.661075 | 1.0000 0.1554 1.0000 1.0000 | 0.02 | 0.543247 3.166463 0.848359 0.192245 0.853320 | 1.0000 0.0116 1.0000 1.0000 1.0000Figure C.109: Dunn’s test on MST RF scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 9.5866, df = 5, p-value = 0.09 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -0.574385 | 1.0000 | 0.006 | 1.879524 2.453909 | 0.4513 0.1060 | 0.008 | 1.334555 1.908940 -0.544969 | 1.0000 0.4220 1.0000 | 0.01 | 1.204505 1.778891 -0.675018 -0.130049 | 1.0000 0.5644 1.0000 1.0000 | 0.02 | 1.655034 2.229419 -0.224490 0.320479 0.450528 | 0.7344 0.1934 1.0000 1.0000 1.0000Figure C.110: Dunn’s test on MST FR scores by mutation rate174 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 17.8132, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -3.762547 | 0.0013 | 0.006 | -2.894076 0.868470 | 0.0285 1.0000 | 0.008 | -2.875492 0.887054 0.018583 | 0.0303 1.0000 1.0000 | 0.01 | -2.104895 1.657651 0.789180 0.770597 | 0.2648 0.7304 1.0000 1.0000 | 0.02 | -3.281853 0.480694 -0.387776 -0.406360 -1.176957 | 0.0077 1.0000 1.0000 1.0000 1.0000Figure C.111: Dunn’s test on MST FP scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 48.7305, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -3.413782 | 0.0048 | 0.006 | -4.406319 -0.992537 | 0.0001 1.0000 | 0.008 | -5.126250 -1.712467 -0.719930 | 0.0000 0.6511 1.0000 | 0.01 | -4.692557 -1.278774 -0.286237 0.433692 | 0.0000 1.0000 1.0000 1.0000 | 0.02 | -6.471937 -3.058154 -2.065617 -1.345687 -1.779380 | 0.0000 0.0167 0.2915 1.0000 0.5638Figure C.112: Dunn’s test on MST FC scores by mutation rate175 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 45.6365, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | 3.750402 | 0.0013 | 0.006 | 4.186350 0.435948 | 0.0002 1.0000 | 0.008 | 4.192578 0.442176 0.006227 | 0.0002 1.0000 1.0000 | 0.01 | 5.197751 1.447348 1.011400 1.005172 | 0.0000 1.0000 1.0000 1.0000 | 0.02 | 6.296341 2.545939 2.109990 2.103762 1.098590 | 0.0000 0.0817 0.2614 0.2655 1.0000Figure C.113: Dunn’s test on NJ MS scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 40.0114, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | 3.710503 | 0.0016 | 0.006 | 3.960705 0.250202 | 0.0006 1.0000 | 0.008 | 3.844361 0.133858 -0.116344 | 0.0009 1.0000 1.0000 | 0.01 | 4.845171 1.134668 0.884465 1.000809 | 0.0000 1.0000 1.0000 1.0000 | 0.02 | 5.917289 2.206786 1.956583 2.072927 1.072117 | 0.0000 0.2050 0.3780 0.2863 1.0000Figure C.114: Dunn’s test on NJ MMS scores by mutation rate176 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 61.7257, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | 3.475714 | 0.0038 | 0.006 | 4.704049 1.228334 | 0.0000 1.0000 | 0.008 | 5.151282 1.675568 0.447233 | 0.0000 0.7037 1.0000 | 0.01 | 6.044503 2.568789 1.340454 0.893221 | 0.0000 0.0765 1.0000 1.0000 | 0.02 | 7.099675 3.623961 2.395626 1.948393 1.055172 | 0.0000 0.0022 0.1244 0.3853 1.0000Figure C.115: Dunn’s test on NJ RF scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 59.5986, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | 3.442927 | 0.0043 | 0.006 | 4.818611 1.375683 | 0.0000 1.0000 | 0.008 | 4.983445 1.540517 0.164834 | 0.0000 0.9258 1.0000 | 0.01 | 5.118535 1.675607 0.299923 0.135089 | 0.0000 0.7036 1.0000 1.0000 | 0.02 | 7.328304 3.885377 2.509693 2.344859 2.209769 | 0.0000 0.0008 0.0906 0.1428 0.2034Figure C.116: Dunn’s test on NJ FR scores by mutation rate177 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 4.0225, df = 5, p-value = 0.55 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | -0.013646 | 1.0000 | 0.006 | 1.221338 1.234984 | 1.0000 1.0000 | 0.008 | 1.221338 1.234984 0.000000 | 1.0000 1.0000 1.0000 | 0.01 | 1.221338 1.234984 0.000000 0.000000 | 1.0000 1.0000 1.0000 1.0000 | 0.02 | 1.221338 1.234984 0.000000 0.000000 0.000000 | 1.0000 1.0000 1.0000 1.0000 1.0000Figure C.117: Dunn’s test on NJ FP scores by mutation rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 45.5312, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0.002 0.004 0.006 0.008 0.01---------+------------------------------------------------------- 0.004 | 3.410333 | 0.0049 | 0.006 | 5.224630 1.814297 | 0.0000 0.5222 | 0.008 | 5.147743 1.737409 -0.076887 | 0.0000 0.6174 1.0000 | 0.01 | 4.934442 1.524108 -0.290188 -0.213300 | 0.0000 0.9561 1.0000 1.0000 | 0.02 | 5.658673 2.248339 0.434042 0.510929 0.724230 | 0.0000 0.1842 1.0000 1.0000 1.0000Figure C.118: Dunn’s test on NJ FC scores by mutation rate178C.5 Mutation Loss Rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 99.761, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -2.373388 | 0.0881 | 0.05 | -5.986763 -3.613375 | 0.0000 0.0015 | 0.1 | -6.902492 -4.529103 -0.915728 | 0.0000 0.0000 1.0000 | 0.2 | -8.688913 -6.315525 -2.702150 -1.786421 | 0.0000 0.0000 0.0344 0.3702Figure C.119: Dunn’s test on RG MS scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 117.6981, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -2.747141 | 0.0301 | 0.05 | -5.099053 -2.351912 | 0.0000 0.0934 | 0.1 | -7.319222 -4.572081 -2.220169 | 0.0000 0.0000 0.1320 | 0.2 | -9.835812 -7.088671 -4.736759 -2.516590 | 0.0000 0.0000 0.0000 0.0592Figure C.120: Dunn’s test on RG MMS scores by mutation loss rate179 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 119.2951, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -2.019401 | 0.2172 | 0.05 | -5.275968 -3.256566 | 0.0000 0.0056 | 0.1 | -7.346418 -5.327016 -2.070449 | 0.0000 0.0000 0.1921 | 0.2 | -9.508453 -7.489052 -4.232485 -2.162035 | 0.0000 0.0000 0.0001 0.1531Figure C.121: Dunn’s test on RG RF scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 97.8567, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -2.221280 | 0.1317 | 0.05 | -4.789303 -2.568023 | 0.0000 0.0511 | 0.1 | -7.160486 -4.939206 -2.371182 | 0.0000 0.0000 0.0887 | 0.2 | -8.541404 -6.320124 -3.752100 -1.380918 | 0.0000 0.0000 0.0009 0.8365Figure C.122: Dunn’s test on RG FR scores by mutation loss rate180 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 52.9765, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -2.233306 | 0.1276 | 0.05 | -3.937672 -1.704365 | 0.0004 0.4416 | 0.1 | -5.003496 -2.770189 -1.065824 | 0.0000 0.0280 1.0000 | 0.2 | -6.695154 -4.461848 -2.757482 -1.691658 | 0.0000 0.0000 0.0291 0.4536Figure C.123: Dunn’s test on RG FP scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 106.1884, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -1.728831 | 0.4192 | 0.05 | -4.322078 -2.593247 | 0.0001 0.0475 | 0.1 | -7.024899 -5.296068 -2.702821 | 0.0000 0.0000 0.0344 | 0.2 | -8.838955 -7.110124 -4.516876 -1.814055 | 0.0000 0.0000 0.0000 0.3483Figure C.124: Dunn’s test on RG FC scores by mutation loss rate181 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 18.7252, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -0.277845 | 1.0000 | 0.05 | -1.169327 -0.891481 | 1.0000 1.0000 | 0.1 | -2.109840 -1.831995 -0.940513 | 0.1744 0.3348 1.0000 | 0.2 | -3.767997 -3.490151 -2.598669 -1.658156 | 0.0008 0.0024 0.0468 0.4864Figure C.125: Dunn’s test on CLG MS scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 50.081, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -0.499241 | 1.0000 | 0.05 | -1.634421 -1.135179 | 0.5109 1.0000 | 0.1 | -3.216838 -2.717597 -1.582417 | 0.0065 0.0329 0.5678 | 0.2 | -6.201886 -5.702644 -4.567464 -2.985047 | 0.0000 0.0000 0.0000 0.0142Figure C.126: Dunn’s test on CLG MMS scores by mutation loss rate182 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 72.9804, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -0.338902 | 1.0000 | 0.05 | -2.847974 -2.509071 | 0.0220 0.0605 | 0.1 | -4.387901 -4.048998 -1.539927 | 0.0001 0.0003 0.6179 | 0.2 | -7.296818 -6.957915 -4.448844 -2.908917 | 0.0000 0.0000 0.0000 0.0181Figure C.127: Dunn’s test on CLG RF scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 73.4712, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -0.401489 | 1.0000 | 0.05 | -3.366565 -2.965075 | 0.0038 0.0151 | 0.1 | -5.182190 -4.780701 -1.815625 | 0.0000 0.0000 0.3471 | 0.2 | -7.034991 -6.633501 -3.668426 -1.852800 | 0.0000 0.0000 0.0012 0.3196Figure C.128: Dunn’s test on CLG FR scores by mutation loss rate183 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 12.0218, df = 4, p-value = 0.02 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -0.488921 | 1.0000 | 0.05 | -1.114562 -0.625641 | 1.0000 1.0000 | 0.1 | -0.395298 0.093623 0.719264 | 1.0000 1.0000 1.0000 | 0.2 | -3.091053 -2.602132 -1.976490 -2.695755 | 0.0100 0.0463 0.2405 0.0351Figure C.129: Dunn’s test on CLG FP scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 48.3979, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -0.282371 | 1.0000 | 0.05 | -2.101441 -1.819069 | 0.1780 0.3445 | 0.1 | -2.999087 -2.716715 -0.897645 | 0.0135 0.0330 1.0000 | 0.2 | -6.082885 -5.800513 -3.981444 -3.083798 | 0.0000 0.0000 0.0003 0.0102Figure C.130: Dunn’s test on CLG FC scores by mutation loss rate184 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 102.0517, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -2.856167 | 0.0214 | 0.05 | -6.550183 -3.694016 | 0.0000 0.0011 | 0.1 | -8.013427 -5.157259 -1.463243 | 0.0000 0.0000 0.7170 | 0.2 | -8.164539 -5.308371 -1.614355 -0.151112 | 0.0000 0.0000 0.5323 1.0000Figure C.131: Dunn’s test on PEC MS scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 120.5393, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -2.485142 | 0.0647 | 0.05 | -5.853026 -3.367884 | 0.0000 0.0038 | 0.1 | -7.615517 -5.130375 -1.762491 | 0.0000 0.0000 0.3899 | 0.2 | -9.630861 -7.145719 -3.777835 -2.015344 | 0.0000 0.0000 0.0008 0.2193Figure C.132: Dunn’s test on PEC MMS scores by mutation loss rate185 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 125.3216, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -2.329725 | 0.0991 | 0.05 | -5.780172 -3.450447 | 0.0000 0.0028 | 0.1 | -7.728343 -5.398618 -1.948171 | 0.0000 0.0000 0.2570 | 0.2 | -9.748336 -7.418611 -3.968164 -2.019993 | 0.0000 0.0000 0.0004 0.2169Figure C.133: Dunn’s test on PEC RF scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 113.9228, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -1.993792 | 0.2309 | 0.05 | -5.677874 -3.684082 | 0.0000 0.0011 | 0.1 | -7.936103 -5.942311 -2.258228 | 0.0000 0.0000 0.1197 | 0.2 | -8.732418 -6.738626 -3.054543 -0.796314 | 0.0000 0.0000 0.0113 1.0000Figure C.134: Dunn’s test on PEC FR scores by mutation loss rate186 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 23.0697, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -1.107573 | 1.0000 | 0.05 | -2.128740 -1.021167 | 0.1664 1.0000 | 0.1 | -3.534809 -2.427235 -1.406068 | 0.0020 0.0761 0.7985 | 0.2 | -4.127871 -3.020297 -1.999130 -0.593062 | 0.0002 0.0126 0.2280 1.0000Figure C.135: Dunn’s test on PEC FP scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 125.2189, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -1.822606 | 0.3418 | 0.05 | -5.407718 -3.585111 | 0.0000 0.0017 | 0.1 | -7.518817 -5.696210 -2.111098 | 0.0000 0.0000 0.1738 | 0.2 | -9.592351 -7.769744 -4.184633 -2.073534 | 0.0000 0.0000 0.0001 0.1906Figure C.136: Dunn’s test on PEC FC scores by mutation loss rate187 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 4.8794, df = 4, p-value = 0.3 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -0.017834 | 1.0000 | 0.05 | -0.135248 -0.117413 | 1.0000 1.0000 | 0.1 | 0.790680 0.808515 0.925928 | 1.0000 1.0000 1.0000 | 0.2 | -1.380718 -1.362883 -1.245470 -2.171399 | 0.8368 0.8646 1.0000 0.1495Figure C.137: Dunn’s test on MST MS scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 17.2116, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -0.257165 | 1.0000 | 0.05 | -1.178799 -0.921634 | 1.0000 1.0000 | 0.1 | -0.343383 -0.086217 0.835416 | 1.0000 1.0000 1.0000 | 0.2 | -3.572075 -3.314909 -2.393275 -3.228692 | 0.0018 0.0046 0.0835 0.0062Figure C.138: Dunn’s test on MST MMS scores by mutation loss rate188 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 80.2652, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -0.666137 | 1.0000 | 0.05 | -3.443692 -2.777555 | 0.0029 0.0274 | 0.1 | -4.633224 -3.967086 -1.189531 | 0.0000 0.0004 1.0000 | 0.2 | -7.828602 -7.162465 -4.384909 -3.195378 | 0.0000 0.0000 0.0001 0.0070Figure C.139: Dunn’s test on MST RF scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 63.8917, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -0.221921 | 1.0000 | 0.05 | -3.156032 -2.934110 | 0.0080 0.0167 | 0.1 | -5.027968 -4.806046 -1.871935 | 0.0000 0.0000 0.3061 | 0.2 | -6.332394 -6.110472 -3.176361 -1.304425 | 0.0000 0.0000 0.0075 0.9604Figure C.140: Dunn’s test on MST FR scores by mutation loss rate189 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 31.2221, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | 0.059433 | 1.0000 | 0.05 | -1.014828 -1.074262 | 1.0000 1.0000 | 0.1 | -0.378889 -0.438322 0.635939 | 1.0000 1.0000 1.0000 | 0.2 | -4.646222 -4.705656 -3.631393 -4.267333 | 0.0000 0.0000 0.0014 0.0001Figure C.141: Dunn’s test on MST FP scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 4.0237, df = 4, p-value = 0.4 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | 0.405707 | 1.0000 | 0.05 | 0.359638 -0.046069 | 1.0000 1.0000 | 0.1 | 1.659983 1.254275 1.300344 | 0.4846 1.0000 0.9674 | 0.2 | -0.129291 -0.534999 -0.488929 -1.789274 | 1.0000 1.0000 1.0000 0.3679Figure C.142: Dunn’s test on MST FC scores by mutation loss rate190 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 3.2142, df = 4, p-value = 0.52 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | 0.528749 | 1.0000 | 0.05 | -0.067024 -0.595773 | 1.0000 1.0000 | 0.1 | -1.105160 -1.633909 -1.038135 | 1.0000 0.5114 1.0000 | 0.2 | -0.667266 -1.196015 -0.600242 0.437893 | 1.0000 1.0000 1.0000 1.0000Figure C.143: Dunn’s test on NJ MS scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 1.6527, df = 4, p-value = 0.8 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | 0.640899 | 1.0000 | 0.05 | 0.319699 -0.321200 | 1.0000 1.0000 | 0.1 | -0.585364 -1.226264 -0.905064 | 1.0000 1.0000 1.0000 | 0.2 | 0.007504 -0.633394 -0.312194 0.592869 | 1.0000 1.0000 1.0000 1.0000Figure C.144: Dunn’s test on NJ MMS scores by mutation loss rate191 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 34.3378, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | 0.311374 | 1.0000 | 0.05 | -1.103962 -1.415336 | 1.0000 0.7848 | 0.1 | -2.899205 -3.210580 -1.795243 | 0.0187 0.0066 0.3631 | 0.2 | -4.606549 -4.917923 -3.502586 -1.707343 | 0.0000 0.0000 0.0023 0.4388Figure C.145: Dunn’s test on NJ RF scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 1.1755, df = 4, p-value = 0.88 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | 0.200682 | 1.0000 | 0.05 | 0.761106 0.560423 | 1.0000 1.0000 | 0.1 | 0.145680 -0.055001 -0.615425 | 1.0000 1.0000 1.0000 | 0.2 | 0.839892 0.639210 0.078786 0.694212 | 1.0000 1.0000 1.0000 1.0000Figure C.146: Dunn’s test on NJ FR scores by mutation loss rate192 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 3.5212, df = 4, p-value = 0.47 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | 0.000000 | 1.0000 | 0.05 | -0.814539 -0.814539 | 1.0000 1.0000 | 0.1 | -0.798568 -0.798568 0.015971 | 1.0000 1.0000 1.0000 | 0.2 | -1.581165 -1.581165 -0.766625 -0.782596 | 0.5692 0.5692 1.0000 1.0000Figure C.147: Dunn’s test on NJ FP scores by mutation loss rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 4.3618, df = 4, p-value = 0.36 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.01 0.05 0.1---------+-------------------------------------------- 0.01 | -0.708375 | 1.0000 | 0.05 | -0.922674 -0.214298 | 1.0000 1.0000 | 0.1 | -1.442050 -0.733674 -0.519376 | 0.7464 1.0000 1.0000 | 0.2 | 0.409250 1.117626 1.331924 1.851300 | 1.0000 1.0000 0.9144 0.3206Figure C.148: Dunn’s test on NJ FC scores by mutation loss rate193C.6 Dropout Rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 138.7595, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.579508 | 0.8566 | 0.1 | -3.201136 -1.621628 | 0.0103 0.7866 | 0.2 | -4.623313 -3.043805 -1.422176 | 0.0000 0.0175 1.0000 | 0.4 | -7.800912 -6.221404 -4.599775 -3.177598 | 0.0000 0.0000 0.0000 0.0111 | 0.6 | -9.791711 -8.212203 -6.590575 -5.168398 -1.990799 | 0.0000 0.0000 0.0000 0.0000 0.3488Figure C.149: Dunn’s test on RG MS scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 153.3542, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.863225 | 0.4682 | 0.1 | -3.696718 -1.833492 | 0.0016 0.5005 | 0.2 | -5.354294 -3.491069 -1.657576 | 0.0000 0.0036 0.7305 | 0.4 | -8.359736 -6.496511 -4.663018 -3.005441 | 0.0000 0.0000 0.0000 0.0199 | 0.6 | -10.36171 -8.498487 -6.664994 -5.007417 -2.001975 | 0.0000 0.0000 0.0000 0.0000 0.3397Figure C.150: Dunn’s test on RG MMS scores by dropout rate194 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 158.2059, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.967403 | 0.3685 | 0.1 | -3.820826 -1.853423 | 0.0010 0.4787 | 0.2 | -5.893538 -3.926134 -2.072711 | 0.0000 0.0006 0.2865 | 0.4 | -8.757919 -6.790516 -4.937092 -2.864381 | 0.0000 0.0000 0.0000 0.0313 | 0.6 | -10.35736 -8.389960 -6.536537 -4.463825 -1.599444 | 0.0000 0.0000 0.0000 0.0001 0.8229Figure C.151: Dunn’s test on RG RF scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 123.2269, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -2.235239 | 0.1905 | 0.1 | -2.658992 -0.423753 | 0.0588 1.0000 | 0.2 | -5.462954 -3.227714 -2.803961 | 0.0000 0.0094 0.0379 | 0.4 | -8.024062 -5.788823 -5.365069 -2.561108 | 0.0000 0.0000 0.0000 0.0783 | 0.6 | -8.917414 -6.682175 -6.258421 -3.454460 -0.893352 | 0.0000 0.0000 0.0000 0.0041 1.0000Figure C.152: Dunn’s test on RG FR scores by dropout rate195 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 149.9003, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.728710 | 0.6290 | 0.1 | -3.301279 -1.572568 | 0.0072 0.8686 | 0.2 | -5.312531 -3.583820 -2.011252 | 0.0000 0.0025 0.3322 | 0.4 | -8.001636 -6.272925 -4.700357 -2.689104 | 0.0000 0.0000 0.0000 0.0537 | 0.6 | -10.25205 -8.523347 -6.950778 -4.939526 -2.250421 | 0.0000 0.0000 0.0000 0.0000 0.1832Figure C.153: Dunn’s test on RG FP scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 105.0952, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -2.140082 | 0.2426 | 0.1 | -4.024891 -1.884809 | 0.0004 0.4459 | 0.2 | -5.940680 -3.800598 -1.915788 | 0.0000 0.0011 0.4154 | 0.4 | -7.483473 -5.343390 -3.458581 -1.542792 | 0.0000 0.0000 0.0041 0.9216 | 0.6 | -8.463673 -6.323590 -4.438781 -2.522992 -0.980199 | 0.0000 0.0000 0.0001 0.0873 1.0000Figure C.154: Dunn’s test on RG FC scores by dropout rate196 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 112.1957, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.114952 | 1.0000 | 0.1 | -0.597118 0.517833 | 1.0000 1.0000 | 0.2 | -2.855516 -1.740564 -2.258397 | 0.0322 0.6132 0.1794 | 0.4 | -5.911723 -4.796771 -5.314605 -3.056207 | 0.0000 0.0000 0.0000 0.0168 | 0.6 | -8.422843 -7.307891 -7.825725 -5.567327 -2.511119 | 0.0000 0.0000 0.0000 0.0000 0.0903Figure C.155: Dunn’s test on CLG MS scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 139.8108, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.648896 | 0.7438 | 0.1 | -1.802512 -0.153616 | 0.5360 1.0000 | 0.2 | -4.089411 -2.440515 -2.286898 | 0.0003 0.1100 0.1665 | 0.4 | -7.480116 -5.831220 -5.677604 -3.390705 | 0.0000 0.0000 0.0000 0.0052 | 0.6 | -9.612160 -7.963264 -7.809648 -5.522749 -2.132043 | 0.0000 0.0000 0.0000 0.0000 0.2475Figure C.156: Dunn’s test on CLG MMS scores by dropout rate197 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 150.6152, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.733283 | 0.6228 | 0.1 | -3.149397 -1.416113 | 0.0123 1.0000 | 0.2 | -5.248169 -3.514886 -2.098772 | 0.0000 0.0033 0.2688 | 0.4 | -8.274911 -6.541627 -5.125514 -3.026741 | 0.0000 0.0000 0.0000 0.0185 | 0.6 | -10.08005 -8.346770 -6.930656 -4.831884 -1.805142 | 0.0000 0.0000 0.0000 0.0000 0.5329Figure C.157: Dunn’s test on CLG RF scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 119.703, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -2.173478 | 0.2231 | 0.1 | -3.884751 -1.711273 | 0.0008 0.6527 | 0.2 | -5.617090 -3.443612 -1.732338 | 0.0000 0.0043 0.6241 | 0.4 | -7.557606 -5.384128 -3.672855 -1.940516 | 0.0000 0.0000 0.0018 0.3924 | 0.6 | -9.361816 -7.188338 -5.477065 -3.744726 -1.804209 | 0.0000 0.0000 0.0000 0.0014 0.5340Figure C.158: Dunn’s test on CLG FR scores by dropout rate198 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 132.6469, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.315732 | 1.0000 | 0.1 | -2.036783 -0.721050 | 0.3125 1.0000 | 0.2 | -3.954630 -2.638897 -1.917846 | 0.0006 0.0624 0.4135 | 0.4 | -7.210509 -5.894777 -5.173726 -3.255879 | 0.0000 0.0000 0.0000 0.0085 | 0.6 | -9.381096 -8.065363 -7.344312 -5.426466 -2.170586 | 0.0000 0.0000 0.0000 0.0000 0.2247Figure C.159: Dunn’s test on CLG FP scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 74.5196, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.946150 | 0.3873 | 0.1 | -4.520771 -2.574621 | 0.0000 0.0753 | 0.2 | -4.617459 -2.671308 -0.096687 | 0.0000 0.0567 1.0000 | 0.4 | -5.529795 -3.583644 -1.009023 -0.912335 | 0.0000 0.0025 1.0000 1.0000 | 0.6 | -7.751134 -5.804983 -3.230362 -3.133674 -2.221338 | 0.0000 0.0000 0.0093 0.0129 0.1975Figure C.160: Dunn’s test on CLG FC scores by dropout rate199 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 79.1117, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -2.175387 | 0.2220 | 0.1 | -2.029205 0.146182 | 0.3183 1.0000 | 0.2 | -2.301748 -0.126360 -0.272542 | 0.1601 1.0000 1.0000 | 0.4 | -3.840376 -1.664989 -1.811171 -1.538628 | 0.0009 0.7194 0.5259 0.9292 | 0.6 | -8.272914 -6.097527 -6.243709 -5.971166 -4.432538 | 0.0000 0.0000 0.0000 0.0000 0.0001Figure C.161: Dunn’s test on PEC MS scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 150.0647, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -2.081245 | 0.2806 | 0.1 | -2.979401 -0.898156 | 0.0217 1.0000 | 0.2 | -5.593345 -3.512100 -2.613944 | 0.0000 0.0033 0.0671 | 0.4 | -8.010315 -5.929070 -5.030914 -2.416969 | 0.0000 0.0000 0.0000 0.1174 | 0.6 | -10.30959 -8.228350 -7.330194 -4.716249 -2.299280 | 0.0000 0.0000 0.0000 0.0000 0.1612Figure C.162: Dunn’s test on PEC MMS scores by dropout rate200 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 154.6536, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -2.585604 | 0.0729 | 0.1 | -3.844336 -1.258732 | 0.0009 1.0000 | 0.2 | -6.250299 -3.664694 -2.405962 | 0.0000 0.0019 0.1210 | 0.4 | -8.788824 -6.203220 -4.944488 -2.538525 | 0.0000 0.0000 0.0000 0.0835 | 0.6 | -10.46506 -7.879465 -6.620732 -4.214770 -1.676244 | 0.0000 0.0000 0.0000 0.0002 0.7027Figure C.163: Dunn’s test on PEC RF scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 110.0751, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -2.799087 | 0.0384 | 0.1 | -3.888240 -1.089153 | 0.0008 1.0000 | 0.2 | -5.140952 -2.341865 -1.252712 | 0.0000 0.1439 1.0000 | 0.4 | -7.931366 -5.132279 -4.043125 -2.790413 | 0.0000 0.0000 0.0004 0.0395 | 0.6 | -8.952370 -6.153283 -5.064129 -3.811417 -1.021003 | 0.0000 0.0000 0.0000 0.0010 1.0000Figure C.164: Dunn’s test on PEC FR scores by dropout rate201 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 141.059, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.287091 | 1.0000 | 0.1 | -1.709867 -0.422776 | 0.6547 1.0000 | 0.2 | -4.367855 -3.080763 -2.657987 | 0.0001 0.0155 0.0590 | 0.4 | -7.359810 -6.072719 -5.649943 -2.991955 | 0.0000 0.0000 0.0000 0.0208 | 0.6 | -9.523725 -8.236634 -7.813857 -5.155870 -2.163914 | 0.0000 0.0000 0.0000 0.0000 0.2285Figure C.165: Dunn’s test on PEC FP scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 84.5833, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -2.662001 | 0.0583 | 0.1 | -4.748970 -2.086969 | 0.0000 0.2767 | 0.2 | -5.978348 -3.316347 -1.229378 | 0.0000 0.0068 1.0000 | 0.4 | -6.830982 -4.168981 -2.082012 -0.852633 | 0.0000 0.0002 0.2801 1.0000 | 0.6 | -7.842245 -5.180244 -3.093275 -1.863896 -1.011262 | 0.0000 0.0000 0.0148 0.4675 1.0000Figure C.166: Dunn’s test on PEC FC scores by dropout rate202 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 112.3023, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | 0.179634 | 1.0000 | 0.1 | -0.229188 -0.408823 | 1.0000 1.0000 | 0.2 | -1.152137 -1.331772 -0.922949 | 1.0000 1.0000 1.0000 | 0.4 | -5.563711 -5.743345 -5.334522 -4.411573 | 0.0000 0.0000 0.0000 0.0001 | 0.6 | -7.640037 -7.819671 -7.410848 -6.487899 -2.076325 | 0.0000 0.0000 0.0000 0.0000 0.2840Figure C.167: Dunn’s test on MST MS scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 129.8757, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | 0.194503 | 1.0000 | 0.1 | -0.849869 -1.044372 | 1.0000 1.0000 | 0.2 | -2.345193 -2.539696 -1.495323 | 0.1426 0.0832 1.0000 | 0.4 | -6.334375 -6.528879 -5.484506 -3.989182 | 0.0000 0.0000 0.0000 0.0005 | 0.6 | -8.437987 -8.632491 -7.588118 -6.092794 -2.103612 | 0.0000 0.0000 0.0000 0.0000 0.2656Figure C.168: Dunn’s test on MST MMS scores by dropout rate203 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 146.8254, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.320768 | 1.0000 | 0.1 | -2.936418 -1.615649 | 0.0249 0.7963 | 0.2 | -4.801105 -3.480337 -1.864687 | 0.0000 0.0038 0.4667 | 0.4 | -8.062140 -6.741372 -5.125722 -3.261034 | 0.0000 0.0000 0.0000 0.0083 | 0.6 | -9.775670 -8.454902 -6.839252 -4.974565 -1.713530 | 0.0000 0.0000 0.0000 0.0000 0.6496Figure C.169: Dunn’s test on MST RF scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 112.3129, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.754713 | 0.5948 | 0.1 | -2.988224 -1.233511 | 0.0210 1.0000 | 0.2 | -4.910466 -3.155753 -1.922242 | 0.0000 0.0120 0.4093 | 0.4 | -7.938401 -6.183688 -4.950177 -3.027934 | 0.0000 0.0000 0.0000 0.0185 | 0.6 | -8.282146 -6.527433 -5.293922 -3.371679 -0.343745 | 0.0000 0.0000 0.0000 0.0056 1.0000Figure C.170: Dunn’s test on MST FR scores by dropout rate204 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 131.3882, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -0.968771 | 1.0000 | 0.1 | -1.802508 -0.833737 | 0.5360 1.0000 | 0.2 | -3.509626 -2.540855 -1.707118 | 0.0034 0.0829 0.6585 | 0.4 | -7.268260 -6.299489 -5.465751 -3.758633 | 0.0000 0.0000 0.0000 0.0013 | 0.6 | -9.010065 -8.041294 -7.207557 -5.500438 -1.741805 | 0.0000 0.0000 0.0000 0.0000 0.6116Figure C.171: Dunn’s test on MST FP scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 17.7458, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -0.235464 | 1.0000 | 0.1 | -0.128885 0.106578 | 1.0000 1.0000 | 0.2 | -1.525561 -1.290097 -1.396675 | 0.9534 1.0000 1.0000 | 0.4 | -1.680472 -1.445008 -1.551586 -0.154910 | 0.6965 1.0000 0.9057 1.0000 | 0.6 | -3.441498 -3.206034 -3.312612 -1.915936 -1.761026 | 0.0043 0.0101 0.0069 0.4153 0.5868Figure C.172: Dunn’s test on MST FC scores by dropout rate205 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 21.5491, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -0.141259 | 1.0000 | 0.1 | -0.524148 -0.382888 | 1.0000 1.0000 | 0.2 | -1.086709 -0.945449 -0.562561 | 1.0000 1.0000 1.0000 | 0.4 | -2.204396 -2.063136 -1.680248 -1.117687 | 0.2062 0.2932 0.6968 1.0000 | 0.6 | -3.805341 -3.664081 -3.281193 -2.718632 -1.600944 | 0.0011 0.0019 0.0078 0.0492 0.8204Figure C.173: Dunn’s test on NJ MS scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 58.2547, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -0.024830 | 1.0000 | 0.1 | -0.180022 -0.155191 | 1.0000 1.0000 | 0.2 | -0.548757 -0.523927 -0.368735 | 1.0000 1.0000 1.0000 | 0.4 | -2.882841 -2.858010 -2.702819 -2.334083 | 0.0296 0.0320 0.0516 0.1469 | 0.6 | -5.995366 -5.970536 -5.815344 -5.446609 -3.112525 | 0.0000 0.0000 0.0000 0.0000 0.0139Figure C.174: Dunn’s test on NJ MMS scores by dropout rate206 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 144.0042, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -1.048403 | 1.0000 | 0.1 | -2.125310 -1.076906 | 0.2517 1.0000 | 0.2 | -4.883877 -3.835473 -2.758566 | 0.0000 0.0009 0.0435 | 0.4 | -7.508604 -6.460200 -5.383294 -2.624727 | 0.0000 0.0000 0.0000 0.0650 | 0.6 | -9.558321 -8.509917 -7.433010 -4.674444 -2.049716 | 0.0000 0.0000 0.0000 0.0000 0.3029Figure C.175: Dunn’s test on NJ RF scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 95.8033, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -0.774936 | 1.0000 | 0.1 | -1.196502 -0.421565 | 1.0000 1.0000 | 0.2 | -2.688100 -1.913163 -1.491598 | 0.0539 0.4180 1.0000 | 0.4 | -5.652698 -4.877761 -4.456196 -2.964597 | 0.0000 0.0000 0.0001 0.0227 | 0.6 | -7.854758 -7.079822 -6.658256 -5.166658 -2.202060 | 0.0000 0.0000 0.0000 0.0000 0.2075Figure C.176: Dunn’s test on NJ FR scores by dropout rate207 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 125.8961, df = 5, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -0.468434 | 1.0000 | 0.1 | -1.051641 -0.583207 | 1.0000 1.0000 | 0.2 | -3.217650 -2.749216 -2.166008 | 0.0097 0.0448 0.2273 | 0.4 | -6.329870 -5.861435 -5.278228 -3.112219 | 0.0000 0.0000 0.0000 0.0139 | 0.6 | -8.758789 -8.290354 -7.707147 -5.541138 -2.428919 | 0.0000 0.0000 0.0000 0.0000 0.1136Figure C.177: Dunn’s test on NJ FP scores by dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 1.0534, df = 5, p-value = 0.96 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | 0 0.05 0.1 0.2 0.4---------+------------------------------------------------------- 0.05 | -0.014895 | 1.0000 | 0.1 | -0.263151 -0.248256 | 1.0000 1.0000 | 0.2 | 0.278046 0.292942 0.541198 | 1.0000 1.0000 1.0000 | 0.4 | 0.662843 0.677739 0.925995 0.384796 | 1.0000 1.0000 1.0000 1.0000 | 0.6 | -0.052133 -0.037238 0.211017 -0.330180 -0.714977 | 1.0000 1.0000 1.0000 1.0000 1.0000Figure C.178: Dunn’s test on NJ FC scores by dropout rate208C.7 10,000 Positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 96.5565, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -0.780198 | 1.0000 |Parsimon | -6.026105 -5.245907 | 0.0000 0.0000 |Minimum | -6.133104 -5.352905 -0.106998 | 0.0000 0.0000 1.0000 |Neighbou | -7.650405 -6.870206 -1.624299 -1.517300 | 0.0000 0.0000 0.5216 0.6460Figure C.179: Dunn’s test on MS scores of all methods at 10000 positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 80.6409, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | 0.537980 | 1.0000 |Parsimon | -4.003647 -4.541628 | 0.0003 0.0000 |Minimum | -5.870233 -6.408213 -1.866585 | 0.0000 0.0000 0.3098 |Neighbou | -6.097611 -6.635592 -2.093964 -0.227378 | 0.0000 0.0000 0.1813 1.0000Figure C.180: Dunn’s test on MMS scores of all methods at 10000 positions209 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 89.0512, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -7.991430 | 0.0000 |Parsimon | -4.255061 3.736369 | 0.0001 0.0009 |Minimum | -7.995889 -0.004458 -3.740827 | 0.0000 1.0000 0.0009 |Neighbou | -6.212419 1.779011 -1.957358 1.783469 | 0.0000 0.3762 0.2515 0.3725Figure C.181: Dunn’s test on RF scores of all methods at 10000 positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 70.1941, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -0.599624 | 1.0000 |Parsimon | -0.188963 0.410660 | 1.0000 1.0000 |Minimum | 5.994754 6.594378 6.183717 | 0.0000 0.0000 0.0000 |Neighbou | -1.330183 -0.730559 -1.141220 -7.324937 | 0.9173 1.0000 1.0000 0.0000Figure C.182: Dunn’s test on FR scores of all methods at 10000 positions210 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 144.4921, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -5.896784 | 0.0000 |Parsimon | 0.158695 6.055479 | 1.0000 0.0000 |Minimum | -8.911996 -3.015211 -9.070691 | 0.0000 0.0128 0.0000 |Neighbou | 0.158695 6.055479 0.000000 9.070691 | 1.0000 0.0000 1.0000 0.0000Figure C.183: Dunn’s test on FP scores of all methods at 10000 positions Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 29.9221, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -4.189308 | 0.0001 |Parsimon | -4.415355 -0.226047 | 0.0001 1.0000 |Minimum | -3.856186 0.333122 0.559169 | 0.0006 1.0000 1.0000 |Neighbou | -4.641402 -0.452094 -0.226047 -0.785216 | 0.0000 1.0000 1.0000 1.0000Figure C.184: Dunn’s test on FC scores of all methods at 10000 positions211C.8 0.1 Asymmetric Division Rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 96.1366, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -0.909359 | 1.0000 |Parsimon | 3.344718 4.254078 | 0.0041 0.0001 |Minimum | -6.309051 -5.399692 -9.653770 | 0.0000 0.0000 0.0000 |Neighbou | -0.925704 -0.016344 -4.270422 5.383347 | 1.0000 1.0000 0.0001 0.0000Figure C.185: Dunn’s test on MS scores of all methods at 0.1 asymmetricdivision rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 98.964, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -2.749300 | 0.0299 |Parsimon | 2.358454 5.107754 | 0.0918 0.0000 |Minimum | -6.587919 -3.838618 -8.946373 | 0.0000 0.0006 0.0000 |Neighbou | 0.789123 3.538424 -1.569330 7.377042 | 1.0000 0.0020 0.5829 0.0000Figure C.186: Dunn’s test on MMS scores of all methods at 0.1 asymmetricdivision rate212 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 108.4126, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -5.205569 | 0.0000 |Parsimon | -0.197642 5.007927 | 1.0000 0.0000 |Minimum | -7.642661 -2.437092 -7.445019 | 0.0000 0.0740 0.0000 |Neighbou | 0.481474 5.687043 0.679116 8.124136 | 1.0000 0.0000 1.0000 0.0000Figure C.187: Dunn’s test on RF scores of all methods at 0.1 asymmetricdivision rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 73.9628, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -6.394310 | 0.0000 |Parsimon | -1.668420 4.725889 | 0.4762 0.0000 |Minimum | 1.646569 8.040880 3.314990 | 0.4982 0.0000 0.0046 |Neighbou | -2.659483 3.734826 -0.991063 -4.306053 | 0.0391 0.0009 1.0000 0.0001Figure C.188: Dunn’s test on FR scores of all methods at 0.1 asymmetricdivision rate213 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 130.6721, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -4.940341 | 0.0000 |Parsimon | 0.559134 5.499475 | 1.0000 0.0000 |Minimum | -7.859562 -2.919220 -8.418696 | 0.0000 0.0175 0.0000 |Neighbou | 1.454066 6.394407 0.894932 9.313628 | 0.7296 0.0000 1.0000 0.0000Figure C.189: Dunn’s test on FP scores of all methods at 0.1 asymmetricdivision rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 126.2909, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -7.049104 | 0.0000 |Parsimon | -1.006372 6.042732 | 1.0000 0.0000 |Minimum | -9.361811 -2.312706 -8.355439 | 0.0000 0.1037 0.0000 |Neighbou | -3.460060 3.589044 -2.453688 5.901750 | 0.0027 0.0017 0.0707 0.0000Figure C.190: Dunn’s test on FC scores of all methods at 0.1 asymmetricdivision rate214C.9 0.1 Dropout Rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 86.6582, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | 2.677391 | 0.0371 |Parsimon | 0.414535 -2.262856 | 1.0000 0.1182 |Minimum | -4.876358 -7.553750 -5.290893 | 0.0000 0.0000 0.0000 |Neighbou | -4.478167 -7.155558 -4.892702 0.398191 | 0.0000 0.0000 0.0000 1.0000Figure C.191: Dunn’s test on MS scores of all methods at 0.1 dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 57.5208, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | 2.860288 | 0.0212 |Parsimon | 0.181275 -2.679012 | 1.0000 0.0369 |Minimum | -4.435304 -7.295592 -4.616579 | 0.0000 0.0000 0.0000 |Neighbou | -1.652280 -4.512569 -1.833556 2.783023 | 0.4924 0.0000 0.3336 0.0269Figure C.192: Dunn’s test on MMS scores of all methods at 0.1 dropout rate215 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 38.7762, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -2.498295 | 0.0624 |Parsimon | -1.911247 0.587047 | 0.2799 1.0000 |Minimum | -4.611665 -2.113370 -2.700417 | 0.0000 0.1728 0.0346 |Neighbou | 1.018044 3.516339 2.929291 5.629709 | 1.0000 0.0022 0.0170 0.0000Figure C.193: Dunn’s test on RF scores of all methods at 0.1 dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 106.366, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -3.020127 | 0.0126 |Parsimon | -4.271577 -1.251450 | 0.0001 1.0000 |Minimum | 2.574242 5.594369 6.845819 | 0.0502 0.0000 0.0000 |Neighbou | -6.741779 -3.721652 -2.470202 -9.316021 | 0.0000 0.0010 0.0675 0.0000Figure C.194: Dunn’s test on FR scores of all methods at 0.1 dropout rate216 Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 121.6114, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -0.510883 | 1.0000 |Parsimon | 3.830131 4.341015 | 0.0006 0.0001 |Minimum | -4.559111 -4.048228 -8.389243 | 0.0000 0.0003 0.0000 |Neighbou | 5.332905 5.843789 1.502773 9.892017 | 0.0000 0.0000 0.6645 0.0000Figure C.195: Dunn’s test on FP scores of all methods at 0.1 dropout rate Kruskal-Wallis rank sum testdata: x and groupKruskal-Wallis chi-squared = 60.978, df = 4, p-value = 0 Comparison of x by group (Bonferroni) Col Mean-|Row Mean | Recursiv Chow-Liu Parsimon Minimum ---------+--------------------------------------------Chow-Liu | -3.043831 | 0.0117 |Parsimon | -2.941280 0.102550 | 0.0163 1.0000 |Minimum | -4.091635 -1.047803 -1.150354 | 0.0002 1.0000 1.0000 |Neighbou | -7.669029 -4.625197 -4.727748 -3.577394 | 0.0000 0.0000 0.0000 0.0017Figure C.196: Dunn’s test on FC scores of all methods at 0.1 dropout rate217
- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- A study of methods for learning phylogenies of cancer...
Open Collections
UBC Theses and Dissertations
Featured Collection
UBC Theses and Dissertations
A study of methods for learning phylogenies of cancer cell populations from binary single nucleotide… Hindalong, Emily Ann 2015
pdf
Page Metadata
Item Metadata
Title | A study of methods for learning phylogenies of cancer cell populations from binary single nucleotide variant profiles |
Creator |
Hindalong, Emily Ann |
Publisher | University of British Columbia |
Date Issued | 2015 |
Description | An accurate phylogeny of a cancer tumour has the potential to shed light on numerous phenomena, such as key oncogenetic events, relationships between clones, and evolutionary responses to treatment. Most work in cancer phylogenetics to-date relies on bulk tissue data, which can resolve only a few genotypes unambiguously. Meanwhile, single-cell technologies have considerably improved our ability to resolve intra-tumour heterogeneity. Furthermore, most cancer phylogenetic methods use classical approaches, such as Neighbor-Joining, which put all extant species on the leaves of the phylogenetic tree. But in cancer, ancestral genotypes may be present in extant populations. There is a need for scalable methods that can capture this phenomenon. We have made progress on this front by developing the Genotype Tree representation of cancer phylogenies, implementing three methods for reconstructing Genotype Trees from binary single-nucleotide variant profiles, and evaluating these methods under a variety of conditions. Additionally, we have developed a tool that simulates the evolution of cancer cell populations, allowing us to systematically vary evolutionary conditions and observe the effects on tree properties and reconstruction accuracy. Of the methods we tested, Recursive Grouping and Chow-Liu Grouping appear to be well-suited to the task of learning phylogenies over hundreds to thousands of cancer genotypes. Of the two, Recursive Grouping has the strongest and most stable overall performance, while Chow-Liu Grouping has a superior asymptotic runtime that is competitive with Neighbor-Joining. |
Genre |
Thesis/Dissertation |
Type |
Text |
Language | eng |
Date Available | 2015-08-20 |
Provider | Vancouver : University of British Columbia Library |
Rights | Attribution-NonCommercial-NoDerivs 2.5 Canada |
DOI | 10.14288/1.0166592 |
URI | http://hdl.handle.net/2429/54550 |
Degree |
Master of Science - MSc |
Program |
Bioinformatics |
Affiliation |
Science, Faculty of |
Degree Grantor | University of British Columbia |
GraduationDate | 2015-09 |
Campus |
UBCV |
Scholarly Level | Graduate |
Rights URI | http://creativecommons.org/licenses/by-nc-nd/2.5/ca/ |
AggregatedSourceRepository | DSpace |
Download
- Media
- 24-ubc_2015_september_hindalong_emily.pdf [ 9.2MB ]
- Metadata
- JSON: 24-1.0166592.json
- JSON-LD: 24-1.0166592-ld.json
- RDF/XML (Pretty): 24-1.0166592-rdf.xml
- RDF/JSON: 24-1.0166592-rdf.json
- Turtle: 24-1.0166592-turtle.txt
- N-Triples: 24-1.0166592-rdf-ntriples.txt
- Original Record: 24-1.0166592-source.json
- Full Text
- 24-1.0166592-fulltext.txt
- Citation
- 24-1.0166592.ris
Full Text
Cite
Citation Scheme:
Usage Statistics
Share
Embed
Customize your widget with the following options, then copy and paste the code below into the HTML
of your page to embed this item in your website.
<div id="ubcOpenCollectionsWidgetDisplay">
<script id="ubcOpenCollectionsWidget"
src="{[{embed.src}]}"
data-item="{[{embed.item}]}"
data-collection="{[{embed.collection}]}"
data-metadata="{[{embed.showMetadata}]}"
data-width="{[{embed.width}]}"
async >
</script>
</div>
Our image viewer uses the IIIF 2.0 standard.
To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0166592/manifest