Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Polymorphisms of CF modifier genes : their relationship to Pseudomonas aeruginosa infection and severity… Yung, Rossitta Pui Ki 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2008_fall_yung_rossitta_pui_ki.pdf [ 8.53MB ]
Metadata
JSON: 24-1.0066649.json
JSON-LD: 24-1.0066649-ld.json
RDF/XML (Pretty): 24-1.0066649-rdf.xml
RDF/JSON: 24-1.0066649-rdf.json
Turtle: 24-1.0066649-turtle.txt
N-Triples: 24-1.0066649-rdf-ntriples.txt
Original Record: 24-1.0066649-source.json
Full Text
24-1.0066649-fulltext.txt
Citation
24-1.0066649.ris

Full Text

Polymorphisms of CF modifier genes: Their relationship to Pseudomonas aeruginosa infection and severity of disease in CF patients by  Rossitta Pui Ki Yung  B.Sc., University of British Columbia, 2002 BMLSc., University of British Columbia, 2003  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  MASTER OF SCIENCE  in  THE FACULTY OF GRADUATE STUDIES (Experimental Medicine)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) April 2008  Rossitta Pui Ki Yung, 2008  Abstract  Cystic Fibrosis is one of the most common genetic recessive diseases among Caucasians and is caused by mutations in the Cystic Fibrosis Transmembrane conductance Regulator (CFTR) gene on chromosome 7. There are different classes of CFTR mutation, leading to differences in disease severity among patients. In addition to the CFTR genotype, secondary genetic factors, modifier genes, also influence CF phenotypes. Due to the dysfunction of CFTR protein and production of thickened mucus, bacterial infection in the lungs is favored and can lead to further clinical complications in CF patients. Pseudomonas aeruginosa is one of the most common bacteria detected among patients. The aim of this project was to investigate four candidate modifier genes, Factor B, Complement Factor 3, Toll-like Receptor 4 and Heme oxygenase-1, which might affect the status of Pseudomonas aeruginosa infection. A total of 22 single nucleotide polymorphisms (SNPs) were selected in these four genes and they were tested against five phenotypic traits, including age of diagnosis, FEV1% predicted and standard deviation value, age of first Pseudomonas aeruginosa infection and Pseudomonas aeruginosa infection status. Among the selected SNPs, both case-control studies and family-based analysis were performed in order to establish any correlation between the genotypes and the phenotypes. In addition, haplotype analysis was performed to determine whether there was interaction between SNPs or whether there were unidentified SNPs in the vicinity of the selected ones that might contribute to the observed phenotypic traits. Among the 22 chosen SNPs, 13 of them were found to be significantly linked to one or more of the tested phenotypes. The three most significant associations were BF_2557 with lung function, HMOX1_9531 with lung function and BF_7202 with age of diagnosis. Several haplotypes were significantly associated with one of the five phenotypes. There was no evidence for the presence of unidentified SNPs or interaction between SNPs. Most of haplotype associations were likely due to the presence of a single SNP which was found to be significantly linked to the phenotype. Conclusively, both SNPs and haplotype analyses suggest that the four candidate genes are  modifiers of disease severity in CF.  Table of Contents Page Abstract ^  ii  Table of Contents ^ List of Tables ^  iii viii  List of Figures^  .xv  Acknowledgements ^  xvi  Chapter 1: Introduction  1.1^Cystic fibrosis transmembrane regulator (CFTR) mutation^  1  1.2^CFTR protein and its function ^  2  1.3^Diagnosis and clinical symptoms of CF disease ^4 1.4^Pseudomonas aeruginosa infection in CF patients ^5 1.5^Candidate modifier genes ^ (a)  Factor B and Complement factor 3 ^  (b)  Toll-like receptor 4 ^  (c)^Heme oxygenase-1 ^  6 7 10 11  1.6^Single nucleotide polymorphism (SNP) ^  12  1.7^Association study^  13  1.8^Thesis objectives ^  14  Chapter 2: Materials and Methods  2.1^Patients recruitment ^  15  2.2^Quality control ^  15  2.3^Selection of SNPs ^  16  2.4^TaqMan assays ^  21  2.5 Primers and probes for genotyping ^  23  2.6^Real time PCR reactions ^  25  2.7^Sequencing^  28  2.8^Dilution of samples^  30  2.9^Preparation for polymerase chain reaction and genotyping^  30  2.10 Polymerase chain reaction ^  30  2.11 Genotyping ^  31  2.12 Quantification of DNA samples ^  32  2.13 Re-genotyping ^  35  2.14 Statistical data analysis ^  36  ^  Chapter 3: Results  ^3.1^PicoGreen reaction^  43  3.2^Analysis of the genotypic data for Mendelian inconsistencies ^  44  3.3^Sequencing result by the University of British Columbia^  44  3.4^Genotypes of the participating individuals ^ 45 3.5^Phenotypic characteristics of the study subjects ^46 3.6 Determination of Hardy-Weinberg equilibrium in the patient population ^  46  3.7^Comparison of genotype frequencies in the parent population and in online database ^  49  3.8 ANOVA analysis of the influence of genotype on the phenotypes ^  51  (a)^Factor B -  (I) Age of diagnosis ^  51  -  (II) FEV1 predicted value ^  52  -  (III) FEV1 standard deviation value ^ 53  (b)^Complement factor 3 -  (I) Age of diagnosis ^  53  -  (II) FEV1 predicted value ^  53  -  (III) FEV1 standard deviation value ^53  (c)^Toll-like receptor 4 -  (I) Age of diagnosis ^  54  -  (II) FEV1 predicted value ^  54  -  (III) FEV1 standard deviation value ^54  (d)^Heme oxygenase-1 -  (I) Age of diagnosis ^  54  -  (II) FEV1 predicted value ^  55  -  (III) FEV1 standard deviation value^ 55  3.9^Regression analysis ^  56  (a)  BF_2557 ^  57  (b)  BF_7202 ^  58  (c)  TLR4_1859 ^  59  (d)  HMOX1_2790 ^  60  (e)^HMOX1_9531 ^  61  iv  3.10  3.11  Age of onset analysis for the age of first Pseudomonas aeruginosa infection ^  61  (a)  Factor B^  62  (b)  Complement factor 3 ^  62  (c)  Toll-like receptor 4 ^  63  (d)  Heme oxygenase-1 ^  64  Pseudomonas aeruginosa infection status ^  64  (a)  Factor B^  64  (b)  Complement factor 3 ^  64  (c)  Toll-like receptor 4 ^  65  (d)  Heme oxygenase-1 ^  65  3.12  Re-genotyping ^  66  3.13  FBAT analysis of phenotypic characteristics ^  67  3.14  (a)  Age of diagnosis ^  67  (b)  FEV1 predicted value ^  68  (c)  FEV1 standard deviation value ^  69  Haplotype analysis by GRui program ^  70  (a)  (b)  (c)  (e)  3.15  Factor B -^(I) Age of diagnosis ^  71  -^(II) FEV1 predicted value ^  72  -^(III) FEV1 standard deviation value ^  72  Complement factor 3 (I) Age of diagnosis ^  73  (II) FEV1 predicted value ^  74  (III) FEV1 standard deviation value ^  75  Toll-like receptor 4 (I) Age of diagnosis ^  75  (II) FEV1 predicted value ^  76  (III) FEV1 standard deviation value ^  77  Heme oxygenase-1 (I) Age of diagnosis ^  77  (II) FEV1 predicted value ^  78  (III) FEV1 standard deviation value ^  79  Haplotype analysis by the FEAT program. ^  80  (a)  Factor B -^(I) Age of diagnosis ^  80  -^(II) FEV1 predicted value ^  80  -^(III) FEV1 standard deviation value ^  81  V  (b) Complement factor 3 - (I) Age of diagnosis ^  82  - (II) FEV1 predicted value ^  82  - (III) FEV1 standard deviation value ^ 83 (c) Toll-like receptor 4 - (I) Age of diagnosis ^  83  - (II) FEV1 predicted value^  84  - (III) FEV1 standard deviation value ^ 84 (d) Heme oxygenase-1 - (I) Age of diagnosis ^  85  - (II) FEV1 predicted value^  85  - (III) FEV1 standard deviation value ^ 86 3.16 Haplotype analysis of the age of first Pseudomonas aeruginosa infection by Hapstat ^ 86 (a) Factor B^  86  (b) Complement factor 3 ^  86  (c) Toll-like receptor 4 ^  87  (d) Heme oxygenase-1 ^  87  Chapter 4: Discussion 4.1 Analysis of the genotypic data for Mendelian inconsistencies ^  88  4.2^Genotypes of all participating individuals ^ 89 4.3^Genotyping analysis of the parental population ^ 89 4.4 ANOVA, regression analysis and FBAT ^  91  4.5^Age of onset analysis ^  95  4.6^Pseudomonas aeruginosa infection status ^ 96 4.7 Haplotype analysis by the GRui and FBAT programs ^97 (a) Haplotype analysis by RGui program ^ 97 (b) Haplotype analysis by the FBAT program ^ 98 4.8^Haplotype analysis of the age of first Pseudomonas aeruginosa infection by Hapstat ^  100  4.9^Position of SNPs with significant association and their effect of gene function ^  101  4.10 Summary^  106  4.11 Future studies ^  106  vi  References ^ Appendix^  109 113  vii  ^  List of Tables  1^Factor B polymorphism selection ^ 18 2^Complement factor 3 polymorphism selection ^ 19 3^Toll-like receptor 4 polymorphism selection ^20 4^Heme oxygenase-1 polymorphism selection^ 21 5^The sequence of all primers used for SNP assays ^24 6^The sequence of all probes used for SNP assays ^25 7^Genotype discrepancies between the real time PCR results in this study and those posted by the Seattle SNPs website ^ 28 Table 8^The sequence of both left and right primers used for sequencing of BF_7202 for Coriell samples ^28 Table 9^Layout of a 96-well plate with standard DNA samples and DNA samples from source plates sent by Toronto ^ 35 Table 10^Hardy-Weinberg equilibrium among the parental population ^ 47 Table 11^Comparison of allele frequencies between the genotyping results and the reported values on either IIPGA or Seattle SNPs websites ^ 50 Table 12^The ANOVA result of examining age of diagnosis among different genotypes of the selected SNPs in Factor B ^ 52 Table 13^The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Factor B^ 52 Table 14^The ANOVA result of examining FEV1 standard deviation value among different genotypes of the selected SNPs in Factor B^ 53 Table 15^The ANOVA result of examining age of diagnosis among different genotypes of the selected SNPs in Toll-like receptor 4 ^ 54 Table 16^The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Heme oxygenase-1 ^ 55 Table 17^The ANOVA result of examining FEV1 standard deviation value among different genotypes of the selected SNPs in Heme oxygenase-1 ^ 55 Table 18^Regression analysis of the association of BF_2557 and FEV1 predicted value with confounding factors ^ 57 Table 19^Regression analysis of the association of BF_2557 and FEV1 standard deviation value with confounding factors ^ 57 Table 20^Regression analysis of the association of BF_7202 and age of diagnosis with confounding factors ^ 58 Table 21^Regression analysis of the association of BF_7202 and FEV1 predicted value with confounding factors ^58 Table 22^Regression analysis of the association of BF_7202 and FEV1 standard deviation value with confounding factors ^ 59 Table 23^Regression analysis of the association of TLR4_1859 and age of diagnosis with Table Table Table Table Table Table Table  viii  ^  confounding factors ^ 59 Table 24^Regression analysis of the association of HMOX1_2790 and FEV1 predicted value with confounding factors ^ 60 Table 25^Regression analysis of the association of HMOX1_2790 and FEV1 standard deviation value with confounding factors ^ 60 Table 26^Regression analysis of the association of HMOX1_9531 and FEV1 predicted value with confounding factors ^ 61 Table 27^Regression analysis of the association of HMOX1_9531 and FEV1 standard deviation value with confounding factors ^ 61 Table 28^Age of onset analysis investigating sssociation between age of first Pseudomonas aeruginosa infection and selected SNPs in Toll-like receptor 4 ^ 63 Table 29^Chi square test for investigating the relationship between different genotypes of the selected SNPs in Complement factor 3 and Pseudomonas aeruginosa infection status ^ 65 Table 30^Comparing the result of genotyping and re-genotyping ^ 66 Table 31^Detailed results of FBAT analysis of age of diagnosis and SNPs under additive model ^ 67 Table 32^Detailed results of FEAT analysis of age of diagnosis and SNPs under dominant model ^ 68 Table 33^Detailed results of FEAT analysis of FEV1 predicted value and SNPs under additive model ^ 68 Table 34^Detailed results of FEAT analysis of FEV1 predicted value and SNPs under dominant model ^ 69 Table 35^Detailed results of FEAT analysis of FEV1 standard deviation value and SNPs under additive model ^ 69 Table 36^Detailed results of FEAT analysis of FEV1 standard deviation value and SNPs under dominant model ^70 Table 37^Haplotype analysis for investigation of correlation relationship between haplotype of Factor B and age of diagnosis with no adjustment for confounding factors ^ 71 Table 38^Haplotype analysis for investigation of correlation relationship between haplotype of Factor B and age of diagnosis with adjustment for confounding factors ^ 71 Table 39 Haplotype analysis for investigation of correlation relationship between haplotype of Factor B and FEV1 standard deviation value with no adjustment for confounding factors ^ 72 Table 40 Haplotype analysis for investigation of correlation relationship between haplotype of Factor B and FEV1 standard deviation value with adjustment for confounding factors ^ 73 Table 41^Haplotype analysis for investigation of correlation relationship between haplotype of complement factor 3 and age of diagnosis with no adjustment for confounding factors ^ 73 Table 42^Haplotype analysis for investigation of correlation relationship between haplotype of Complement factor 3 and age of diagnosis with adjustment for confounding factors ^ 74  ix  ^  Table 43^Haplotype analysis for investigation of correlation relationship between haplotype of Toll-like receptor 4 and age of diagnosis with no adjustment for confounding factors ^ 75 Table 44^Haplotype analysis for investigation of correlation relationship between haplotype of Toll-like receptor 4 and age of diagnosis with adjustment for confounding factors ^ 76 Table 45 Haplotype analysis for investigation of correlation relationship between haplotype of Heme oxygenase-1 and FEV1 predicted value with no adjustment for confounding factors ^ 78 Table 46^Haplotype analysis for investigation of correlation relationship between haplotype of Heme oxygenase-1 and FEV1 predicted value with adjustment for confounding factors ^ 78 Table 47^Haplotype analysis for investigation of correlation relationship between haplotype of Heme oxygenase-1 and FEV1 standard deviation value with no adjustment for confounding factors ^ 79 Table 48^Haplotype analysis for investigation of correlation relationship between haplotype of Heme oxygenase-1 and FEV1 standard deviation value with adjustment for confounding factors ^ 79 Table 49^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Factor B and FEV1 predicted value by the FEAT program. ^ 80 Table 50^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Factor B and FEV1 standard deviation value by the FBAT program ^ 81 Table 51^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Complement factor 3 and age of diagnosis by the FBAT program^ 82 Table 52^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Complement factor 3 and FEV1 standard deviation value by the FEAT program. ^ 83 Table 53^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and FEV1 predicted value by the FBAT program.^ 84 Table 54^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and FEV1 predicted value by the FBAT program. ^ 85 Table 55^Summary of all significant SNP-phenotype association by ANOVA and FEAT analyses ^ 102 Table 56^Summary of all significant haplotype-phenotype association when testing by RGui and FEAT analyses ^ 103 Table 57^Summary Table of SNPs which revealed a significant association with the Pseudomonas aeruginosa infection status and the age of first Pseudomonas aeruginosa infection ^ 104 Table 58^Summary table of the conservation score of the selected SNPs in the four candidate genes ^ 105  X  Table Al^Concentration of a subset of the DNA samples in the original source plates ^  113  Table A2^Families with non-Mendelian inheritance ^ 114 Table A3^Genotype frequency of each of the SNPs examined in the gene of Factor B ^ 115 Table A4^Genotype frequency of each of the SNPs examined in the gene of Complement factor 3 ^ 116 Table A5^Genotype frequency of each of the SNPs examined in the gene of Toll-like receptor 4 ^ 117 Table A6^Genotype and allele frequency of each of the SNP examined in the gene of Heme oxygenase-1 ^ 119 Table A7^Phenotypic characteristics of the CF patients ^ 120 Table A8^The ANOVA result of examining age of diagnosis among different genotypes of the selected SNPs in Complement factor 3 ^ 121 Table A9^The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Complement factor 3 ^ 121 Table A10 The ANOVA result of examining FEV1 sd value among different genotypes of the selected SNPs in Complement factor 3 ^ 122 Table All The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Toll-like receptor 4 ^ 122 Table Al2 The ANOVA result of examining FEV1 sd value among different genotypes of the selected SNPs in Toll-like receptor 4 ^ 123 Table Al3 The ANOVA result of examining age of diagnosis among different genotypes of the selected SNPs in Heme oxygenase-1 ^ 123 Table A14 Age of Onset Analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Factor B ^ 124 Table Al5 Age of Onset Analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Complement factor 3 ^ 125 Table Al6 Age of Onset Analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Heme oxygenase 1 ^ 126 Table Al7 Chi squared test for investigating the relationship between different genotypes of the selected SNPs in Factor B and Pseudomonas aeruginosa infection status ^127 Table A18 Chi square test for investigating the relationship between different genotypes of the selected SNPs in Toll-like receptor 4 and Pseudomonas aeruginosa infection status^ 128 Table A19 Chi square test for investigating the relationship between different genotypes of the selected SNPs in Heme oxygenase-1 and Pseudomonas aeruginosa infection status ^ 129 Table A20 FBAT analysis of the age of diagnosis under the additive model ^ 130 Table A21 FBAT analysis of the age of diagnosis under the dominant model ^ 131  xi  Table A22^FBAT analysis of FEV1 predicted value under the additive model ^ Table A23 FBAT analysis of FEV1 predicted value under the dominant model ^  132 133  Table A24 FBAT analysis of FEV1 standard deviation value under the additive model ^ 134 Table A25 FBAT analysis of FEV1 standard deviation value under the dominant model ^ 135 Table A26 Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and age of Diagnosis ^ 136 Table A27 Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value ^ 136 Table A28 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Factor B and FEV1 predicted value, with no adjustment for the confounding factors ^ 136 Table A29 Haplotype analysis for investigation of possible correlation between combinations of selected SNPs in Factor B and FEV1 predicted value, with adjustment for the confounding factors ^ 137 Table A30 Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value^ 137 Table A31 Frequencies of possible haplotypes generated for the Complement factor 3 gene when determining the presence of any correlation between the haplotypes and age of diagnosis ^ 138 Table A32 Frequencies of possible haplotypes generated for the Complement factor 3 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value ^ 138 Table A33 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 predicted value, with no adjustment for the confounding factors ^ 139 Table A34 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 predicted value, with adjustment for the confounding factors ^ 139 Table A35 Frequencies of possible haplotypes generated for the Complement factor 3 gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value ^ 140 Table A36 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 standard deviation value, with no adjustment for the confounding factors ^140 Table A37 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 standard deviation value, with adjustment for the confounding factors ^ 141  xi i  ^  Table A38^Frequencies of possible haplotypes generated for the Toll-like receptor 4 gene when determining the presence of any correlation between the haplotypes and age of diagnosis 141 Table A39^Frequencies of possible haplotypes generated for the Toll-like receptor 4 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value ^ 142 Table A40^Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 predicted value, with no adjustment for the confounding factors ^ 142 Table A41 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 predicted value, with adjustment for the confounding factors ^ 143 Table A42^Frequencies of possible haplotypes generated for the Toll-like receptor 4 gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value ^ 143 Table A43 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value, with no adjustment for the confounding factors 144 Table A44 Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value, with adjustment for the confounding factors 144 Table A45 Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and age of diagnosis 145 Table A46 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Heme oxygenase-1 and the age of diagnosis, with no adjustment for the confounding factors ^ 145 Table A47 Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and the age of diagnosis, with adjustment for the confounding factors ^ 146 Table A48 Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value^ 146 Table A49 Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value ^ 147 Table A50 Frequencies of possible haplotypes generated for the Factor B gene by the FBAT program^ 147 Table A51^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Factor B and age of diagnosis by the FEAT program. ^ 148 Table A52^Frequencies of possible haplotypes generated for the Complement factor 3 gene by the FBAT program ^148 Table A53 ^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Complement  ^  factor 3 and FEV1 predicted value by FBAT program. ^149 Table A54^Frequencies of possible haplotypes generated for the gene of Toll-like receptor 4 by the FBAT program. ^150 Table A55^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and age of diagnosis by the FEAT program.^ 151 Table A56^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value by the FEAT program.^ 152 Table A57^Frequencies of possible haplotypes generated for the Heme-oxygenase-1 gene by the FBAT program. ^ 153 Table A58 Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and age of diagnosis by the FEAT program^ 154 Table A59 Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and FEV1 standard deviation value by the FBAT program. ^ 155 Table A60 Haplotype analysis between the haplotypes formed by the five selected SNPs in the Factor B gene and age of Pseudomonas aeruginosa infection by Hapstat ^155 Table A61 Haplotype analysis between the haplotypes formed by the four selected SNPs in the Complement factor 3 gene and age of Pseudomonas aeruginosa infection by Hapstat ^ 156 Table A62 Haplotype analysis between the haplotypes formed by the seven selected SNPs in the Toll-like receptor 4 gene and age of Pseudomonas aeruginosa infection by Hapstat ^ 156 Table A63 Haplotype analysis between the haplotypes formed by the six selected SNPs in the Heme oxygenase-1 gene and age of Pseudomonas aeruginosa infection by Hapstat ^157  xiv  List of Figures  Figure Figure Figure Figure Figure Figure Figure Figure Figure  1^Structure of the CFTR protein ^ 3 2^Function of CFTR protein ^ 4 3^The classic complement pathway^ 8 4^The alternative pathway^ 9 5^Function of Heme oxygenase-1 in lung disease ^ 11 6^The mechanism underlying TagMan assays ^ 23 7^Sample genotyping result of Factor B SNP 2557 ^31 8^Example genotyping result for SNP 149 of HMOX1 ^33 9^Normalization of the FEV1 percent predicted value by age^ 38 Figure 10 Standard curve of optical density versus DNA concentration of the serially diluted A DNA samples ^43  xv  Acknowledgments  I would like to take this opportunity to express my appreciation for those who had given me their valuable help and support:  Dr. Sandford for his supervision, patience, and precious time for guiding me and helping me, starting from the brain-storming of this project to the execution of the laboratory work and finally to the write-up of this thesis. The successful completion of this thesis would have been impossible without his guidance and supportive efforts.  Dr. Peter Pare, Dr. Pearce Wilcox and Dr. John Hill for being members of my supervisory committee and providing me with valuable opinions on my thesis.  The CF Modifier Gene Project team at UBC especially Roxanne Rousseau and at The Hospital for Sick Children in Toronto for supplying the DNA samples and phenotypic data of the participating patients and providing relevant information.  Loubna Akhabir and Dorota Stefanowicz for helping me in the laboratory work and providing me their professional technical advice.  Karey Shumansky for her expert knowledge and patient guidance in Statistics.  My family and friends for their understanding and support.  xvi  Chapter 1 Introduction  Cystic Fibrosis (CF) is one of the most common genetic recessive disorders among Caucasians. In 2000, the estimated rate of CF in Canada was 1 in 3608 births although the rate may decline if antenatal carrier screening of the general population is implemented. [1] Mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene located on chromosome 7 have been identified to be the cause of the disease by the technique of positional cloning. [2] A wide diversity of CF pulmonary phenotypes is observed among CF patients. Furthermore, they may suffer from other CF-related medical conditions, for example, obstructive azoospermia and pancreatitis. [3] It is clear that CFTR genotype alone does not account for the broad spectrum of severity of the disease. It is likely that secondary genetic factors separate from the CFTR locus significantly influence CF phenotypes, with these loci termed modifier genes. [4] A number of putative modifier genes have been identified, for example, mannose-binding lectin, glutathione Stransf erases, transforming growth factor beta-1, tumor necrosis factoralpha, beta-2 adrenergic receptor and HLA class II. [5] The investigation of CF modifier genes may offer new insights into the pathophysiology of CF and provide leads for new CF therapeutic interventions.  1. Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) Mutations  Cystic Fibrosis is caused by a defect in the CFTR protein and the function of this chloride channel will be discussed in detail in the next section. At the molecular level, the CFTR gene is located at chromosome 7q31 and it includes 27 exons for a total size of 230kb. There are more than 1000 CFTR mutations [6] and five categories have been established to describe CFTR mutations with regard to their effect on CFTR function as a chloride channel. [7, 8] This classification scheme is summarized in the following table:  1  Class  Description  Class 1  Mutations^that cause defective synthesis and thus total^or partial^absence^of^CFTR protein.^Consequently,^there^is^a loss of conductance of Cl by the channel. Mutations that affect maturation of CFTR protein. ^CFTR mRNA is formed, but the protein fails to mature and does not traffic correctly to the cell membrane. The protein is either absent or present in very small quantities in the membrane. Mutations that disturb the regulation of the Cl channel and these usually occur in the ATP binding domains. CFTR protein is produced and traffics to the apical cell membrane. Mutations^that^affect^the^conductance^of^the^Cl ^channel which result in the reduction of ion flux and/or modification of selectivity. CFTR protein is produced and traffics to the apical cell membrane. Mutations that alter the stability of mRNA or the stability of mature CFTR protein. Lower levels of CFTR protein are produced and traffic to the apical cell membrane is reduced. -  Class 2  Class 3  Class 4  Class 5  -  -  Among the more than 1000 CFTR mutations, delta F508 is the most common and it is detected in about 70% of CF genes in North America and North Europe. [9] Delta F508 is a mutation that results in the deletion of a phenylalanine amino acid at position 508 of the hydrophilic region for ATP binding which is called the nucleotide binding fold (NBF). This mutation is categorized as a Class 2 mutation. For patients with this type of CFTR mutation, no or a small amount of CFTR protein can be produced in the lungs, pancreas or other organs. The transport of chlorides will be aborted or reduced which results in the thickening of mucus in the affected organs. This condition favors bacterial infection and eventually, CF patients are more susceptible to infection in the respiratory tract because of their compromised defense mechanism. [10]  2. CFTR Protein and its Function  The CFTR protein is a membrane associated, N-linked glycoprotein located at the apical membrane (Figure 1). CFTR is composed of two repeated motifs with a total size of 1480 amino acids. Each motif consists of a hydrophilic membrane-spanning domain with six helices and a hydrophilic NBF region which is responsible for ATP binding. The two motifs are connected by a cytoplasmic regulatory region (R domain) with a number of charged residues. [9]  2  T D 1^TM 2  R Domain  Figure 1: Structure of the CFTR protein. There are three major regions in a CFTR protein: (1) two membrane-spanning domains (TMD1 and TMD2) each composed of six transmembrane segments; (2) two nucleotide binding domains where binding of ATP occurs (NBD1 and NBD2); and (3) a regulatory region (R Domain) which is responsible for the phosphorylation by protein kinase A. As indicated in the diagram, some of the common CFTR mutations are located in the NBFs. [11]  CFTR has been classified as a member of the family of ATP-binding cassette (ABC) transporters. The CFTR mainly functions as a chloride channel in the lungs, pancreas, liver, digestive tract, reproductive tract and skin. Upon phosphorylation by protein kinase A in the regulatory region, a conformational change takes place for transporting .  the chloride ions across the membrane with the consumption of ATP as the energy source. In addition to working as a cAMP-induced chloride channel, CFTR is also capable of regulating other ion channels, for example, it regulates an outwardly rectifying chloride channel (ORCC), an epithelial sodium ion channel (eNaC), and at least two outwardly rectifying potassium ion channels (RMOK1 and RMOK2), as shown in Figure 2. [12]  3  Apical  Basolateral  Figure 2: Function of the CFTR protein. CFTR protein can be found at the apical membrane of epithelial cells in the lungs, pancreas, liver, digestive tract, reproductive tract and skin. With normally functioning CFTR protein, chloride ions are transported across the membrane with ATP as the energy source upon phosphorylation by protein kinase A. In addition, CFTR can then regulate the transport of chloride ions through an outwardly rectifying chloride channel (ORCC) in the presence of ATP; and the transport of sodium ions through an epithelial sodium ion channel (eNaC) in the absence of ATP. CFTR also regulates the pumping out of potassium ions through an outwardly rectifying potassium ion channel (RMOK). [13]  3. Diagnosis and Clinical Symptoms of CF Disease  Detection of CFTR mutations by genotyping is a powerful method for diagnosing CF patients, and it is especially important for infants where both parents are known to be CF carriers. Clinical complications (for example, meconium ileus and failure to thrive) may present in some of the new-born babies which can be clues for the diagnosis of CF in those infants. Testing the sweat chloride level can confirm the diagnosis in such situations, since CFTR is the only channel for  4  reabsorption of chloride ions in the sweat glands. CF patients with abnormal chloride channels are found to have a sweat chloride level which is five times higher than unaffected individuals. [14] However, patients with some forms of CFTR mutation show milder clinical symptoms and a borderline sweat test result. Consequently, DNA screening should be used for confirmation of the disease.  In addition to the lungs, CFTR is expressed in other organs, for example, liver, pancreas, intestines and reproductive tract. Consequently, CF patients commonly suffer from problems in those organs. Approximately 85% of CF patients have pancreatic insufficiency. [15] Complications can result due to the lack of functional CFTR protein in the ductules of the pancreas, including the failure to produce adequate bicarbonate ions and water; the blockage of ductules further results in inability of the pancreatic enzymes to reach the intestine in order to facilitate normal digestion. However, exogenous enzymes can be prescribed to patients in order to correct for problems in this exocrine organ. [16] Intestinal obstructions, both meconium ileus at birth and distal intestinal obstruction syndrome (DIOS) later in life, are another frequently seen manifestation in CF patients. Meconium ileus and DIOS patients fail to pass meconium at birth and suffer from excessive mucus production, respectively. Constipation is a common outcome. [17] Some CF patients may suffer from CF-related liver disease. Without functional CFTR in the lining of the epithelial cells in the biliary ductules, the ductules become blocked with mucous secretion and the accumulation of eosinophils. [18] Further symptoms include obstructive cirrhosis, splenomegaly and hypersplenism. A total of 1-2% of deaths in CF patients are liver-related. [19] Other CF complications include azoospermia in males [20] and chronic sinusitis.  4. Pseudomonas aeruginosa Infection in CF Patients  Staphylococcus aureus, Pseudomonas aeruginosa, Burkholderia cepacia, Hemophilus influenzae and Xanthomonas maltophilia are common bacteria which can be found in the lungs of many CF patients. Among them, Pseudomonas aeruginosa is found in nearly 80% of CF patients and the morbidity and the mortality rates increase when individuals are chronically colonized with this pathogen. [21] The bacteria adhere to  host tissues via their lipopolysaccharide layer, which also protects them from digestion and lysing by leukocytes. The organisms mainly grow in the bronchioles of CF patients as a biofilm and they are relatively resistant to the host's defense system and antibiotics by causing a low phagocytic response. [22] Furthermore, Pseudomonas aeruginosa can secrete several toxins and chemicals upon infection, for example, lipopolysaccharide, exotoxin A, exoenzyme S, elastase, alkaline proteinase, and phospholipase C protein. [23] The body's immune system tries to eradicate the bacteria; however, CF patients may suffer from lung infection due to failure of recruitment of innate defense mechanisms and involvement of immune complex formation. [24] This inflammatory situation induces a phenotypic shift of the pathogen from non-mucoid to mucoid type, which leads to exacerbation of the disease [25] The reason why antibiotics or other therapeutic methods cannot eradicate Pseudomonas aeruginosa infection in CF patients is still unknown. [26]  Some individuals are detected to be CF patients shortly after birth due to a positive family history of CF and/or acute clinical symptoms, while others are diagnosed at a later age. Usually the former group of patients is found to exhibit more serious CF-related problems, and the type of CFTR mutations present may be one of the causes of this increased severity, in addition to modifier genes. Not every CF patient becomes infected by Pseudomonas aeruginosa, however, the rates of morbidity and mortality increase significantly once the individual is chronically colonized with this pathogen. We hypothesized that the particular genotype of any of the polymorphisms in the four candidate genes that we investigated might contribute to the difference between patients with regard to the status of Pseudomonas aeruginosa infection and therefore the severity of the disease.  5. Candidate Modifier Genes  Previous studies have identified some putative modifier genes which may contribute to the diversity of disease severity in CF patients. The aim of this project was to investigate specific genes that are involved in Pseudomonas aeruginosa infection in CF patients and to investigate relationships between polymorphisms in those genes  6  and disease severity. Four candidate modifier genes were chosen for experiment in this project, namely, Factor B, Complement Factor 3, Toll-like Receptor 4 and Heme Oxygenase-l.  (a) Factor B and Complement Factor 3  The human immune system is responsible for providing different degrees of defenses which protect the body from being harmed by foreign substances or antigens. There are many components in the immune system and one of them is the cascade of complement reactions which helps in fighting infections. Complement reactions are a series of non-specific reactions in the host defense system to destroy invading microorganisms. Complement reactions can be sub-divided into two pathways: the classical pathway and the alternative pathway. The classical pathway is initiated by the binding of antibodies to complement protein Clq which results in cleavage of Clq into Cls and Clr. The complete set of reactions in the classical pathway is shown in Figure 3:  7  Figure 3: The classical complement pathway. Upon binding of the microbe to the antibody, Clq is cleaved into Clr and Cls which both catalyze the formation of C4b2b complex. After combining with C3b, the C4b2b3b complex catalyzes the cleavage of C5 into C5b which eventually results in the synthesis of membrane attack complex (MAC). (Note: each dotted line shows a catalytic effect on the corresponding reaction in box)  In comparison, the alternative pathway is antibody-independent. Consequently, the presence of the microbes can directly trigger the alternative pathway. However, the two pathways share some of the same proteins. The complete set of reactions in the alternative pathway is shown in Figure 4:  8  Microbe-FactorB(a/b)  Figure 4: The alternative pathway. The presence of the microbe (light beige ellipses) can directly trigger the formation of C3b-Factor B complex. After modification of the complex by Factor D and joining of another C3b unit, the resulting complex of C3bBb3b cleaves C5 into C5b and subsequently leads to the synthesis of the membrane attack complex (MAC). (Note: each dotted line shows a catalytic effect on the corresponding reaction in box)  Both complement pathways result in the generation of the membrane attack complex (MAC) which leads to neutrophil activation and bacterial cell lysis. In addition to the major triggering factors of the two pathways, they also differ in the reactions for the generation of complement C3. The key player is C4 and factor B in the classical pathway and alternative pathways, respectively. It has been found that the alternative pathway is critical for the generation of a protective response against Pseudomonas aeruginosa in a murine model of pneumonia. [27] In that study, C3 and factor B deficient mice were infected with Pseudomonas aeruginosa via intranasal inoculation and they were found  9  to have a higher mortality rate when compared with wild-type mice. However, the same outcome was not seen for C4 deficient mice. This analysis indicated that alternative pathway was critical for the eradication of Pseudomonas aeruginosa infection from the host. This result motivated us to investigate the role of the alternative pathway in host defense against Pseudomonas aeruginosa infection in CF patients. Consequently, we hypothesized that the alternative pathway also plays a key role in eradication of Pseudomonas aeruginosa from the lungs of CF patients and that polymorphisms in genes encoding for factor B and C3 are associated with severity of disease in CF patients.  (b) Toll-like 4 Receptor  The lipopolysaccharide (LPS) molecules of Pseudomonas aeruginosa are composed of three main parts, namely lipid A, core and 0polysaccharide. [28] Modifications of lipid A are observed in Pseudomonas aeruginosa from CF patients, for example, the pathogen synthesizes a unique hexa-acylated lipid A containing palmitate and aminoarabinose during adaptation to the CF airway while it produces a novel hepta-acylated lipid A in CF patients with severe pulmonary disease. [29] Lipid A of Pseudomonas aeruginosa is recognized by tolllike receptor 4 on the epithelial surface of the lungs. A 222 amino acid region in the extracellular portion of human toll-like receptor 4 has been identified using Pseudomonas aeruginosa lipid A with different levels of acylation and this region has been found to be crucial for the recognition of the lipid A which is specifically found in CF patients. [30] Toll-like receptor 4 is a transmembrane LPS receptor. Release of antimicrobial peptides, inflammatory cytokines and chemokines, and co-stimulatory molecules is triggered upon binding of Pseudomonas aeruginosa. All of these result in the activation of the innate immune response to the pathogens. Therefore, we hypothesized that polymorphisms in the gene encoding toll-like receptor 4 could lead to failure in recognition of Pseudomonas aeruginosa via its lipid A, the accumulation of the pathogens in lungs of CF patients and therefore increased disease severity.  10  (c) Heme Oxygenase-1  Heme oxygenase-1 (H0-1) is an enzyme which is responsible for the catalysis of the oxidative reaction of heme to anti-oxidant molecules, bilirubin and carbon monoxide. [31] There are two isoenzymes of heme oxygenase-1: an inducible HO-1 and a constitutively expressed HO-2. [32] HO-1 is induced by a wide variety of stimuli including conditions of oxidative stress, inflammatory agents, transforming growth factor beta and heat shock. [33] Because of their defective immune defense system, CF patients usually develop secondary acute or chronic bacterial infections and recruitment of the inflammatory defense mechanisms. Excessive inflammation can result in tissue damage which makes the condition of CF patients more severe. However, HO-1 plays a major role in resolving this situation. The major roles of HO-1 have been shown to include anti-inflammatory, anti-apoptotic, and antiproliferative effects [34] which are illustrated in Figure 5:  1  Lung Disease - Lung Damage  E/dialled} CO  CO 1  Hems — I  Herne Oxygenase  <Fe  ^  Billrerdin IXa-  Ferritin illilivertlin^I —) Bilirulan IXce Reductasej  (  - Apoptpsis - Inflammation - Oxidative stress  (-Tissue Injury  Figure 5: Function of heme oxygenase-1 in lung diseases. Patients with lung diseases like cystic fibrosis may suffer from inflammation, oxidative stress and apoptosis. Tissue injury can be the final result. However, these complications can be modulated by the production of heme oxygenase which can catalytically break down heme into carbon monoxide, iron and biliverdin-IXa. All those components have a negative effect on the detrimental processes mentioned above and therefore can reduce tissue injury. Heme oxygenase has been conclusively shown to be antiinflammatory, anti-apoptotic, and anti-proliferative.  11  As mentioned before, the recruitment of the inflammatory defense mechanisms in CF patients against Pseudomonas aeruginosa infection can lead to tissue damage in lungs. Due to the anti-inflammatory, antiapoptotic and anti-proliferative effect, HO-1 can modulate these adverse outcomes. [35] It has been demonstrated that HO-1 level is raised in CF patients and it is responsible for cytoprotective effects against Pseudomonas aeruginosa infection. [36] The levels of HO-1 and related by-products in CF patients and controls were examined by different methods [36]: (1) heme expression in the lungs was measured by both immunochemistry and quantitative reverse transcription PCR; (2) the level of acute inflammation and an increase in oxidant stress was measured by myelo-peroxidase staining; and (3) iron status was assessed by ferritin staining. All the above tests revealed an increase in HO-1 level in diseased lungs when compared with normal controls. Furthermore, investigation was performed to determine whether Pseudomonas aeruginosa infection was a direct cause of upregulation in HO-1 expression. A cell line of human CF bronchial epithelial cells (IB3.1) was treated with either Pseudomonas aeruginosa or LPS, however, no significant increase of HO-1 protein could be detected in either sample. The same authors also evaluated the survival of IB3.1 cells overpressing HO-1 in response to Pseudomonas aeruginosa infection by transfecting the IB3.1 cells with either pc DNA3.1 empty vector or pc 3.1 DNA 3.1/H0-1 vector [36]. Cells transfected with HO-i vectors were shown to have a higher survival rate when compared with those with empty vectors. As a result of this study, we hypothesized that polymorphisms in the HO-1 gene could result in defective synthesis of HO-1, which in turn could lead to failure in inhibition of tissue injury due to Pseudomonas aeruginosa infection and increased severity of disease in CF patients.  6. Single Nucleotide Polymorphisms  This project includes the above four candidate genes, i.e., Factor B, Complement factor 3, Toll-like receptor 4 and Heme Oxygenasei. A total of 22 single nucleotide polymorphisms (SNPs) in these genes were selected. A detailed description of the SNP selection is presented in the Materials and Methods section. A SNP is defined as a single-base variation in DNA sequence with an allele frequency of at least 1% in a  12  population. Not all SNPs result in alteration of protein structure and function. For example, a synonymous SNP usually occurs when the third nucleotide in the codon is altered but both the synthesis and function of the resulting protein remain unchanged. However, the structure and function of the protein are often changed if either a non-sense SNP or non-synonymous SNP is present. A non-sense SNP leads to premature termination in the elongation of the polynucleotide chain while a nonsynonymous SNP results in substitution of one amino acid for another. In addition, the position of the SNP in the entire polynucleotide chain is also important when analyzing its effect. SNPs in coding regions are a common cause of many monogenic disorders, however, SNPs in promoters or other regulatory regions can also show their significance in determination of timing, location and level of gene expression. Even if no impact is found for SNPs in non-coding regions, they may be important in the investigation of the genetic aspects of a disease since they can be used as disease-markers for further investigation and/or analysis.  7. Association Studies  Linkage analysis can be used to determine the genetic location of a disease gene. The goal of the analysis is to identify an allele of a gene that is co-inherited with the disease. Linkage analysis is a powerful technique for the identification of genes that cause monogenic disorders. For example, it was used for the identification of the CFTR as the causal gene in cystic fibrosis. However, this approach is less powerful when a more complex phenotype is to be studied i.e. a trait influenced by several genetic and environmental factors. Therefore, we used an association approach in this project since severity of the disease is likely a complex phenotype. Case-control studies and family based trios (i.e. offspring and both his/her parents) are two commonly used types of association studies. An example of an analysis that uses trios is the transmission disequilibrium test (TDT). The TDT requires genetic information from both parents and the affected individual since it tests for the transmission from heterozygous parents to the offspring. The TDT provides a better approach in the investigation and establishment of genetic associations under certain circumstances. If a case-control method is used for analysis, false-positive results may be  13  generated due to insufficient matching of the case and control groups for genetic background. However, there are still some drawbacks associated with the TDT method. First, the marker has to be in close proximity to the disease-causing gene. Otherwise, no association can be identified. Second, at least one of the parents has to be heterozygous, because if both parents are homozygous, the child will be homozygous as well and nothing can be concluded from the analysis. Therefore, the case-control approach was also included in the analysis of this project. Case-control studies test for the prevalence of a specific allele in the two groups and two approaches can be taken. First, the cases are defined as CF patients with severe lung disease and the controls are defined as CF patients with mild lung disease. The division of mild and severe lung disease is based on pulmonary function test data. Second, the patients are stratified by genotype and differences in the means of the outcome variables are investigated. In this study we utilized the latter case-control approach and supplemented the analysis with the family-based design.  8. Thesis Objectives  1. To perform a literature review to select candidate CF modifier genes. The following genes were selected: (a) Complement factor 3 and factor B protein - deficiency in recruitment of complement reactions against Pseudomonas aeruginosa infection in CF patients. (b) Toll-like receptor 4 - attachment of Pseudomonas aeruginosa onto host cells via binding of its LPS in CF patients. (c) Heme oxygenase-1 cytoprotective effects against Pseudomonas aeruginosa infection in CF patients. 2. To pick potential SNPs within each candidate modifier gene utilizing the LD Select software. 3. To genotype DNA samples of participating individuals (trio members) at the selected SNPs by TagMan assays. 4. To analyze the genotyping results with respect to phenotypic traits in an attempt to determine the presence of any association with disease severity.  14  Chapter 2 Materials and Methods  1. Patient Recruitment  This is a sub-study of a large, Canada-wide and international endeavor: the Canadian Consortium for Cystic Fibrosis Modifiers (http://www.cfmod.ca ). There were 1674 individuals from 558 trios (558 patients and both of their parents) in this sub-study who were recruited from Cystic Fibrosis clinics in participating hospitals in Canada. Patients with a diagnosis of CF on the basis of clinical signs, elevated sweat chloride values and/or positive genotyping for the CFTR gene were recruited for the study. CF patients who had received a lung transplant were also recruited and pulmonary function data for these individuals were collected prior to transplantation. The recruitment of patients and their family members was based upon several criteria: (a) satisfying ethical requirements (for example, willingness to give informed consent); (b) willingness to provide blood/DNA sample for the study; (c) availability of verified clinical information. Blood samples from patients in different provinces were sent to the Hospital for Sick Children in Toronto for extraction of DNA and, for the patient samples, establishment of cell lines.  2. Quality Control  As mentioned before, this project was a sub-study of the Canadian Consortium for Cystic Fibrosis Modifiers. Genotyping was also carried out for all samples for other analyses by our colleagues in Toronto. Non-Mendelian inheritance was found in some of the families by both Toronto and by the use of the FBAT program in this sub-study. Mendelian errors were detected for many SNPs in some of the families. This implied that there had been sample mislabeling or there was nonpaternity. Consequently, a total of 23 families were deleted from this study. However, when Mendelian inconsistency was detected only at one SNP in a given family, analysis was continued with those families except that genotypes for those particular non-Mendelian SNPs were excluded. In total, DNA samples from 1605 individuals/535 trios (535 patients and both of their parents) were used for the analysis.  15  3. Selection of SNPs  Several polymorphic sites were picked for each candidate modifier gene. Information about polymorphisms for each candidate gene was obtained from the websites of Innate Immunity Programs for Genomic Applications or IIPGA (http://innateimmunity.net//) and UW-FHCRC Variation Discovery Resource or Seattle SNPS (http://pga.mbt.washington.edu/) . By using the LD Select software (http://droog.gs.washington.edu/ldSelect.html)[37],  all sites of  polymorphism in each gene were divided into groups/bins according to specific settings as listed below. Only one SNP was selected and examined for each group. An assumption was made that phenotypic characteristics were more likely be affected if the site of polymorphism brought about a change of amino acid. Based upon this assumption, two stages were applied for the selection of a particular SNP in each group: (a) if one of the SNPs in the group resulted in a change of amino acid, it would be selected for experimentation; (b) if none of the SNPs in the group resulted in a change of amino acid, an arbitrary tagSNP was picked for experimentation. A tagSNP is a representative SNP in each bin which shows a high degree of linkage disequilibrium with all the other SNPs in the bin.  Linkage disequilibrium (LD) is a measure for detecting the presence of any non-random association of alleles at two or more loci on the same or different chromosomes. A strong allelic association is termed to be complete LD, which means the involved alleles are found together at a high frequency in the population. In general, linkage disequilibrium can be represented by the formula Pzu * Psi linkage equilibrium can be shown by the equation PAI * P131 =  PA1B1;  whereas  PA1B1 (PA].  and  PB1  are the probabilities of allele Al and allele Bl, respectively in the population and PAlB1 is the probability of the A1B1 haplotype). There are -  several parameters^for representing the degree of^linkage disequilibrium but the two most commonly used are Lewontin's D' and a measure that can be interpreted as a squared correlation coefficient, r2  .  D is determined by the difference between the observed haplotype  frequency and the expected haplotype frequency. Then D' is calculated by the equation  D' = (D / Dmax),  where  Dmax  is the maximum difference  between the observed haplotype frequency and the expected haplotype  16  frequency that can be observed in a given population. The value of D' ranges from 0 to 1: 0 represents linkage equilibrium and 1 represents complete linkage disequilibrium i.e. one or more haplotypes are not present in the population. A disadvantage of the D' measure is that it overestimates LD in small samples. The evaluation of r 2 depends on the value of D and the allele frequency at every locus involved. It is calculated by dividing D  2  by  the product of all allele frequencies, as illustrated by the equation r  2  = (D 2 / a i *a 2 *b i *b 2 ). r 2 also lies between 0 and 1; with 0 indicating linkage equilibrium and 1 indicating perfect linkage disequilibrium. This measure takes value 1 if only two haplotypes are present. For low allele frequencies r 2 is a more reliable measure of LD than D'. r 2 measures statistical association and there is a simple inverse relationship between this measure and the sample size required to detect association between a susceptibility locus and a marker SNP. The SNPs in each of the candidate genes were grouped according to the set conditions listed below. It would be time- and effort-consuming if all the SNPs in a gene were required to be genotyped. By arranging them according to the specific settings listed below, it was possible to achieve the tasks of this project without genotyping every single SNP. Using the approach described above we selected a panel of SNPs for genotyping in each of our candidate genes. For the factor B gene five SNPs were selected including a tagSNP for an amino acid changing polymorphism (Table 1).  17  Table 1: Factor B polymorphism selection (the criteria for grouping all SNPs by the LD Select software package: r 2 =0.8 and minor allele frequency>0.1). The positions of the SNPs are described in reference to the coding sequence of the factor B gene (accession number AF551848). No TaqMan assay could be created for testing of SNPs in Bin3. Therefore, only five SNPs were included in the analysis for Factor B. Although SNP1802 in Binl results in an amino acid change from arginine to glutamine, SNP8311 was selected due to the failure of production of an appropriate TagMan assay for SNP1802.  Bin 1  2 3 4 5 6  Position^of SNP 1802 4573 5180 7541 7580 8311 4022 9878 5162 9099 2557 6484 7202  Amino^acid change Arg to Gln n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  Selected SNP for the group 8311  4022 n/a 2557 6484 7202  The same tagSNP approach was applied to the complement factor 3 gene. However, using the parameters of r 2 =0.8 and minor allele frequency>0.1 a total of more than 40 bins resulted. This large number of SNPs was beyond the scope of this study. Therefore, the minor allele frequency cutoff was increased to >0.45 in order to study only common SNPs. This resulted in the selection of four polymorphisms (Table 2).  18  Table 2: Complement factor 3 polymorphism selection (the criteria for grouping all SNPs by the LD Select software package: r 2 =0.8 and minor allele freguency>0.45). The positions of the SNPs are described in reference to the coding sequence of the complement factor 3 gene (accession number AY513239). Due to the large size of this gene (46kb) and consequently large number of SNPs, a higher minor allele filter was chosen so that a feasible number of tagSNPs could be identified. No SNPs were picked for the analysis due to failure of the manufacture of TagMan assays for SNPs in Bins 2 and 5.  in  1  2 3 4 5 6  Position^of SNP  Amino^acid change  43118 43179 43928 44692 27159 27678 28433 28795 963 25884 36735  n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  Selected SNP for the group  43118  n/a 28795 963 n/a 26735  The original selection criteria i.e. those utilized for the factor B gene resulted in the identification of seven tagSNPs in the TLR4 gene and these are detailed in Table 3.  19  Table 3: TLR4 polymorphism selection (the criteria for grouping all SNPs by the LD Select software package: r 2 =0.8 and minor allele frequency>0.1). The positions of the SNPs are described in reference to the coding sequence of the TLR4 gene (accession number NM_138556).  Bin  1  2  3  4 5 6 7  Position^of SNP  Amino^acid change  1893 2032 2437 7764 11912 16649 17050 17447 17923 2856 10478 11541 851 11995 1859 10329 9263 15844  n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  Selected SNP for the group  11912  17050  2856  851 1859 9263 15844  Finally, we identified eight tagSNP bins in the heme oxygenase-1 gene and of these six SNPs could be assayed by the TagMan methodology (Table 4).  20  Table 4: Heme oxygenase-1 polymorphism selection (the criteria for grouping all SNPs by the LD Select software package: r2=0.8 and minor allele frequency>0.1). The positions of the SNPs are described in reference to the coding sequence of the heme oxygenase-1 gene (accession number AY460337). Only six SNPs were chosen for this project due to failure in the synthesis of TaqMan assays for SNPs in Bins 6 and 7.  Bin  1  2  3 4 5 6 7 8  Position^of SNP  Amino^acid change  149 7325 12825 12832 12860 12992 13286 13354 15028 3303 5079 13449 2007 9531 16442 17893 2790 15382 17922 1038  n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  Selected SNP for the group  149  3303  9531 16442 2790 n/a n/a 1038  4. TaqMan Assays  The TaqMan allelic discrimination assay is a qualitative analytic method for detection of the presence of different alleles at any nucleotide in a gene. The assay utilizes fluorescently labeled allelespecific probes and the 5'-> 3' exonuclease activity of Tag DNA polymerase (Figure 6). A DNA sample is first amplified by the Polymerase Chain Reaction (PCR)(discussed in a later section). In addition to the TagMan Universal PCR master mix, a mixture of specific primer/probe solution is added into the DNA sample for the PCR. After  21  the double-stranded template DNA is denatured into single strands, the complementary PCR primers hybridize to the template DNA 150-200 nucleotides apart and encompassing the SNP site. Furthermore, the allele-specific probes that are complementary to the SNP site hybridize to the template at this step.  Generally, the probes are molecules of about 20-nucleotides long with a reporter dye attached to the 5' end and a quencher dye attached to the 3' end. Two types of probes are included, one for each allele of the SNP being assayed. The two probes are differentiated by their reporter dyes: one probe contains VIC and the other FAM. The two reporter molecules absorb light and then release it at different wavelengths. Before polymerization, energy is transferred to the quencher molecule when light hits the reporter molecule since they are located close to each other. From this process, called Fluorescence Resource Energy Transfer, no fluorescence can be detected since the energy is then released in the form of heat. However, the probe is degraded by the 5'-> 3' exonuclease activity of Tag DNA polymerase as polymerization is initiated from the primer and proceeds along the strand. Since the reporter molecule is no longer linked to the quencher molecule, it absorbs light energy and emits energy as light of a specific wavelength.  Therefore different fluorescence is detected depending on whether the VIC-labeled probe the FAM-labeled probe is degraded or both are degraded. From this, the presence of both alleles can be deduced for each individual. In addition to the two probes, there is also a third dye called ROX that is included in the TaqMan assay. ROX is free in the master mix solution and is used for normalizing the well to well variation in volume. Therefore, the signal intensity detected for either the FAM or VIC molecules was corrected by the ROX signal (although this is less of a concern than in quantitative real-time PCR applications).  22  Figure  6:  The mechanism underlying TaqMan assays. Before the reaction,  the primer is complementary base-pairing to the segment which is about 150-200 nucleotides away from the SNP, while the probe is attacted to the site of SNP. As elongation of nucleotides proceeds, the probe is degraded which separates the dye molecule from the quencher.  "."  5. Primers and Probes for Genotyping  Primers and probes for each SNP were ordered from Applied Biosystems (Foster City, CA). The sequences of all primers and probes are listed in tables 5 and 6:  23  Table 5: The sequence of all primers used for SNP assays  Gene  Gene_SNP  Left Primer  Right Primer  TCGCACCTGCCAAGTGAA GCGACAGGGAGGACCAC CAGAACCTAGCTCTAGAAGGG CTTA TGGGTCCCTAGTCTGATTCCT TTAG GTAGCTTTGGCCCTCACCAT CTGCAGTGAGCTGTGATTGC  CTCACCTCCGTTGTCACAGATC CTCCATTGCCCAACGATCCT  Factor B  BF_2557 BF_4022 BF_6484 BF_7202 BF_8311 C3 963  Complement Factor 3  C3_28795 C3_36735 C3_43118 TLR4_851 TLR4_1859 TLR4_2856  Toll-like receptor 4  TLR4_ 9263 TLR4_11912 TLR4_15844 TLR4_17050  Heme oxygenase-1  CCATCCCTGTTGACAGCATATT CTTT GGGTGTGGCCTTGAGAAGATG AGAAGGCTCAACACACAGCTT GGCAGATGTGATGTGAAGATGA GA  TGTATTTGACACATGGTCTGC CTT AGGGATAGGACTGGCTAGTTT GAAT GCAAGCTTCTGCTATGATTAA AAGTGA  GGGATTAACACTCAATCATTTA CTGACCT CTGGCTTTTACACCCAAGTAGA CA CACAAATGGTGTACAGGAGTTC TCA  CTGGAAACTGATATAAAGATA GCGACATATAACA GCTGTCATGTAAGCACTTTTC ATAAACA GTTGGGCAATGCTCCTTGAC  CTTGACTACCCACCACAGAGAA G GTTGGTAGCCAAGATAAATGAC TGGTA ACCCCATTAATTCCAGACACAT TGT TTCAAATACACACAGCCCTGAT AGG GGAGCCCCAGACTTCGTTAG  HMOX1 _149 HMOX1 —1038  CGACAAGCACAGGGAGAGA  HMOX1_2790  GGGTTGCTAAGTTCCTGATGT TG CACCGGCCGGATGGA  HMOX1_16442  TCCTGGAAGCATGGCTGTTC GGTGGAGGGAGGAAAGAGGAA  CGAGACCCTGCCTCTTTTCAA TGGAGTTGAGAATCAGTTTTT ATTACTTGCA CTCCAGGTCTGAAGACTGAGA AC  CTGGGTGTGTTTCCATGTCTC AT GGCCTGGCCTCTTGCA  HMOX1_3303 HMOX1_9531  GCTCCCACCACTGTCATCTC  CCCTATCTGTAAAATAGGGAT AATAATGGTACCT TCGGTAGGAGAAGTGGTGATA GG  CTGTCTCAAAGGAAAAAAAGAC TTAACACA CCAGAAAGCTGGGAGGCA AAGGCGCCCGTCCC CTTCCTCTGTGCCAGACACT CCTGGGTGACAGAGTGAGACT  24  Table 6: The sequences of all probes used for the SNP assays  Gene  Gene_SNP  VIC  FAM  Factor B  BF_2557  TGGCCGGTGGAGTG  TGGCCGATGGAGTG  BF_4022  TTTGTAGTCAAAGGTTGAAC  TTGTAGTCAAAAGTTGAAC  BF 6484  CATTGCCTTTGTCACTC  ATTGCCTTCGTCACTC  BF 7202  CAGCTAAGACGCAAGCA  CAGCTAAGACACAAGCA  BF_8311  CTAGAGGCTTGAGAGAGA  CTAGAGGCTTAAGAGAGA  Complement  C3_963  CTGAGTGACAGAATGA  TGAGTGACAGAAGGA  Factor 3  C328795  TCCCCTGAGTCCCCA  CCCCTGGGTCCCCA  C3 _ 36735  AATACAATCTGGGTACTCC  ACAATCTGGATACTCC  C3_43118  CAGGCGTGGTCTT  CCAGGCATGGTCTT  Toll-like  TLR4_851  CTGGAAGAGCAACATAGA  TGGAAGAGCAGCATAGA  receptor 4  TLR4_1859  CAGGTACCAGACAAC  CCAGGTATCAGACAAC  TLR4_2856  CTTCACCAACACTTATT  TTCACCAACGCTTATT  TLR4_9263  TTTAAACTAAAGGTAACTAATTG  AACTAAAGGTAAATAATTG  TLR4_11912  ACTTATGTGTAATGTTTCG  TTATGTGTAATTTTTCG  TLR4 15844  ACATCCACTCTTCCC  CATCCACTGTTCCC  TLR4_17050  CACAAATGCACACATC  CACAAATGCGCACATC  Heme  HMOX1_149  CAGCCCCCCACACAG  ACAGCCCTCCACACAG  oxygenase-  HMOX1_1038  CCTTATCTGATCAAGAAC  CTTATCTGACCAAGAAC  1  HMOX1_2790  ACCAGGCTATTGCTCT  ACCAGGCTTTTGCTCT  HMOX1_3303  CAACCCGACAGGCAA  CAACCCCACAGGCAA  HMOX1_9531  TTACTGCTGTAAACTCACTC  CTGCTGTAAATTCACTC  HMOX1_16442  TCACCTTCTGTATTCTCAA  CACCTTCTGTAATCTCAA  6. Real-time Polymerase Chain Reaction  Testing of the new real-time PCR assays created by Applied Biosystems was done utilizing the 7900HT Sequence Detection System (Applied Biosystems). Stock Coriell samples at a concentration of 100ng/pL were diluted to lng/pL using lx TE buffer (Tris-EDTA buffer). TE is a commonly used buffer that helps to inactivate DNA nucleases: Tris adjusts the pH of DNA samples to around 8 since DNA nucleases are less active at this pH; EDTA chelates to metal cations which are  25  required by these enzymes. Coriell samples are DNA samples from individuals of both African American and European descent that have been employed by the IIPGA and SeattleSNPs groups for gene sequencing. These samples are available from a central repository (http://ccr.coriell.org/) . Since the genotypes of those African and European individuals were known, their DNA samples were suitable for quality controls to test the accuracy of the ordered TaqMan assays. 5pL of each diluted Coriell sample was put into a 384-well clear optical reaction plate (Applied Biosystems) utilizing a Biomek FX Laboratory Automation Workstation (Beckham Coulter, Fullerton, CA). The lng/pL Coriell samples were prepared in a 0.8mL 96-well storage plate (ABgene' House, Rochester, NY) and the plate was put onto the designated position on the workstation. The appropriate volume of solution was pipetted out by the automated syringes from the 96-well storage plate into the designated well of the 384-well plates. The plates were then placed in a fume hood for lyophylization of the DNA samples. In total, 24 sample samples of African American DNA and 23 samples of DNA from the Centre d'Etude du Polymorphisme Humain (CEPH) were tested for each assay. The CEPH samples are Utah residents with ancestry from Northern and Western Europe. In addition to the Coriell samples, wells of 1X TE buffer were included as negative controls.  175pL of 2X TaqMan Universal PCR master mix - No AmpErase UNG (Applied Biosystems), 8.75pL of 40X primer/probe solution (Applied Biosystems) and 166.25pL of water were mixed to form the master mix for real-time PCR with a total volume of 350pL. During amplification by TaqMan assay, the nucleotide uracil (U) was incorporated instead of thymine (T) into the DNA molecules. AmpErase is an enzyme (uracil-Nglycosylase) that can be used to cleave DNA molecules at sites containing a U, therefore it can be used for elimination of any carryover contaminating PCR products from a previous PCR reaction. However, the additional expense of the AmpErase was not justified in the reactions here because no contamination was detected in the negative controls. For each assay, a 5pL aliquot of master mix was added into the appropriate wells in the 384-well plate and the plate was sealed with clear optical adhesive cover. The plate was placed into an Allegra 6 Centrifuge (Beckham Coulter) for centrifugation for about 5 minutes at a speed of about 2500rpm.  26  Real-time PCR was performed by the 7900HT Sequence Detection System (Applied Biosystems). In the linked computer, the program SDS2.1 was selected and the following settings were chosen: assay = absolute quantification (standard curve); container = 384-wells clear plate; template = blank template. The file containing the Coriell samples layout was imported into the program and an equivalent layout was shown on the screen. All wells that were assigned for testing an assay were highlighted and the appropriate testing marker and detector were chosen from the list. The blank and TE wells were then highlighted and their setting altered to "NTC" (non-template control). Under the instrument tab, the following settings were amended: sample volume = 5pL; the box indicating 9600 emulation was unchecked; temperature in stage 3 = 92 2 C and repeats = 50. The modified program was connected to the 7900HT Sequence Detection System and the door of the machine was opened. The 384-well plate was placed correctly according to the markings on the tray. The door was then closed and the reaction was initiated.  The real-time PCR was finished in about 2 hours. An allelic discrimination analysis was started once the real-time PCR was done. The allelic discrimination analysis was used to determine the presence of different alleles in the DNA samples by releasing a light source onto the plates and then detecting the quantity of light of different wavelengths being emitted by the reporter dye molecules. Since the genotypes of all Coriell samples were known, the reporter molecules corresponding to each of the alleles could be found. The SDS2.1 program was opened and the following settings were selected: assay = allelic discrimination; container = 384-well clear plates; template = blank template. The file of the layout of the Coriell samples was imported into the program. All wells that were assigned for testing an assay were highlighted and the appropriate testing marker (the name of the SNP) and detector (FAM/VIC) was chosen from the list. The blank and TE wells were then highlighted and their setting altered to "NTC". The program was connected to the 7900HT Sequence Detection System and the analysis was begun. After the analysis was completed, a graph of the fluorescence from allele Y (VIC) plotted against that from allele X (FAM) was generated. Each point on the graph was compared with the genotyping of the corresponding Coriell sample provided by either the  27  IIPGA or SeattleSNPs websites. The alleles corresponding to both FAM and VIC could be determined and the names of the both detectors were changed to the appropriate alleles.  7. Sequencing  Discrepancies were detected for the Factor B gene SNP 7202 which are indicated in table 7:  Table 7: Genotype discrepancies between the real-time PCR results in this study and those posted by the Seattle SNPs website (http://pga.mbt.washington.edu/)  Sample  Result by real-time PCR GG AG AG  D008 D016 E016  Result posted by Seattle SNPs AG GG GG  In order to determine the genotypes of the above three Coriell samples with regard to SNP 7202, sequencing was done at the University of British Columbia sequencing facility. PCR products of around 500 nucleotides were submitted to the UBC Oligonucleotide Synthesis Facility with the 7202 SNP approximately in the middle of each product. Both left and right primers were used for sequencing and they are listed in table 8.  Table 8: The sequence of both left and right primers used for sequencing of BF 7202 for Coriell samples D008, D016 and E016  Name BF_7202L BF_7202R  Sequence (5' to 3') TGG GTC CCT AGT CTG ATT TTA G TCC TGG AAG CAT GGC TGT TC  The D008, D016 and E016 Coriell samples were diluted from a concentration of 2ng/pL to lng/pL with 1X TE buffer. A solution with a total volume of 20pL was prepared for each Coriell sample by mixing 2pL of 10X PCR mix, 2pL dNTP (200 pM each of dGTP, dCTP, dTTP and dATP),  28  1pL of primer L (900pM), 1pL of primer R (900pM), 0.1pL of TAQHOTSTART (QIAGEN Inc., Mississauga, Ontario), 13.iL of Coriell sample and 12.9pL of distilled water. The three solutions were placed in three separate 0.5m1 Eppendorf tubes and labeled correctly. The PCR was performed utilizing PCR Express thermal cycler (Thermo Hybrid, Ashford, Middlesex, UK) by setting the cycling conditions to be 95  2  C for 10  minutes, 40 cycles at 92 2 C for 15 seconds and 55 2 C for 30 seconds and 72 2 C for 30 seconds, 72 2 C for 5 minutes. The PCR products were stored at -20°C.  Gel electrophoresis was performed to detect the presence of PCR product for each Coriell sample. 2% agarose in 0.5% TBE (iris Borate EDTA) was prepared by mixing 2g of agarose powder into 100mL TBE buffer and the mixture of solute and solution was heated in a microwave until the agarose powder was completely dissolved. 10pL of ethidium bromide was added into the solution and mixed well. The gel was poured into the tray of a Horizon 20.25 GIBCO BRL Horizontal Gel Electrophoresis Apparatus (Life Technologies, St. Paul, MN) with the comb in place. 5pL of each PCR product and 2pL of sample buffer were mixed together, and the mixture was then loaded into separate wells after the gel solidified. 8pL of DNA ladder was loaded into another well. The gel electrophoresis apparatus was connected to a Power-PAC 300 (BIO-RAD, Hercules, CA) power supply. The gel was run for about 45 minutes under 150V. The presence and separation of bands in the gel was checked by a Spectroline hand held UV lamp (Model EF-140C, Spectronics Corporation, Westbury, NY). The gel was inspected carefully by an Eagle Eye II imager (Stratagene, La Jolla, CA) and a picture of the gel was taken.  The PCR products were then purified according to the QlAquick PCR Purification Kit Protocol as outlined on page 18 of the QlAquick Spin Handbook (QIAGEN Inc., Mississauga, Ontario). The purified PCR products and sequencing primers as listed in table 8 were sent to the Oligonucleotide Synthesis Facility of the University of British Columbia and used for sequencing.  29  8. Dilution of Samples  A total of 1605 DNA samples from 535 trios (two parents and the patient) at a concentration of lOng/pL was received from our collaborators in the Hospital for Sick Children in Toronto. All samples were diluted to a final concentration of lng/pL with 1X TE buffer. The diluted samples were pipetted into 0.8mL 96-well storage plates (ABgene House), sealed and stored at -20 2 C.  9. Preparation for Polymerase Chain Reaction and Genotyping  5pL of diluted samples were plated from 96-well plates into 384well clear optical reaction plates (Applied Biosystems) using a Biomek FX (Beckham Coulter). In addition to the diluted samples, 5pL of Coriell samples were included as the positive controls and TE buffer as the negative controls in each 384-well plate. All 384-well plates were dried in a fume hood overnight and then wrapped with aluminum foil and stored at 4 2 C.  10. Polymerase Chain Reaction  5000pL of 2X TaqMan Universal PCR master mix - NO AmpErase UNG (Applied Biosystems), 250pL of 40X primer/probe solution (Applied Biosystems) and 4750pL of water were mixed to form the assay for the PCR. A total of 10000pL mixture was sufficient for performing PCR for five 384-well plates. 5pL of this mixture was added into each well in the 384-well plates and they were sealed with clear optical adhesive cover. Then they were placed into an Allegra 6 Centrifuge (Beckham Coulter) for centrifuged for about 5 minutes at a speed of approximately 2500rpm. The plates were then put into a GeneAmp PCR System 9700 (Applied Biosystems) for polymerase chain reaction. The PCR cycles were set to be a 10-minute period at 95 2 C for denaturation of double stranded DNA and also the activation of the enzyme, 40 repeats of 15-second period at 92 2 C and 1-minute period at 60 2 C for annealing of primers and extension of the DNA strands, and another final 10minute period at 72 2 C for extension of the DNA strands.  30  11. Genotyping  After the PCR, the plates were placed into a 7900HT Sequence Detection System (Applied Biosystems). In the linked computer, the SDS2.1 program was opened, and the settings were set as follows: assay = allelic discrimination; container = 384-well clear plates and template = blank template. The appropriate plate layout file was imported into the program and the testing detector and markers were chosen. The wells containing TE buffer were assigned as "NTC". The adjusted program was connected to the detection machine and the door was opened to place the plate in correct orientation as marked on the tray. The door was closed and the analysis was started The genotyping was revealed in both table and diagram forms. Referring to the diagram, the three types of genotype (2 forms of homozygotes and 1 form of heterozygote) were designated by the auto-caller in the program and a sample diagram is shown in Figure 7. The results were saved and later used for analysis.  Figure 7: Sample genotyping result of Factor B SNP 2557. An allelic discrimination plot of BF_2557 A versus BF_2557 G is shown. Each of the points on the diagram represents a study sample, a Coriell sample or TE buffer. As can be seen in the figure, a cluster of several points was detected near the origin which symbolized wells with TE buffer since none of the two types of fluorescent light (from FAM or VIC) could be measured. In addition, three clusters of points were observed. Those points in the top left of the plot with a lower emission of light by the reporter molecule denoted BF_2557G and a higher emission of light by the reporter molecule denoted BF_2557A represent those individuals who had a homozygous AA genotype at SNP BF_2557. Those points in the middle with detection of emission by both the reporter molecules denoted BF_2557 A and BF_2557 G represent those individuals who were heterozygous AG at SNP BF_2557. Those points in the lower right of the plot with a lower emission of light by the reporter molecule denoted BF_2557A and a higher emission  of light by the reporter molecule  denoted BF _ 2557G represent those  individuals who had a homozygous GG  genotype at SNP BF_2557. There were two points between the clusters of AG and GG for which the genotyping could not be determined by the autocaller.  31  CAL Allelic Discrimination Plot X  29  19  1.3  2.3  Allele X (BF_2557G)  12. Quantification of DNA Samples  Unsatisfactory results were obtained during the first round of genotyping samples from the first two 384-well plates for SNP 149 of heme oxygenase-1, as shown in Figure 8.  Figure 8: Example genotyping result for SNP 149 of HMOX1. An allelic discrimination plot of HMOX1_149 A versus HMOX1  _ 149 G is shown. Each of  the points on the diagram represents a study sample, Coriell sample or  32  TE buffer. As can be seen in the figure, no distinct clusters could be observed to classify the individuals into one of the three types of genotype since most of the points were gathered at the lower left hand corner of the plot. This might indicate that most of the DNA samples were not be amplified during PCR and therefore the probes were still intact. Therefore all light absorbed by the reporter molecule was transmitted to the quencher molecule since they were still located in close proximity to each other. Therefore, no emission of fluorescent light was detected.  ^  en^  ruy f■i,rrimination Plot  0,44^0.  ^454^a9  Allele X (11M0X1149G)  Genotyping reactions were repeated for those samples in the first two 384-well plates, however, unsatisfactory results were again obtained. One explanation for these results was that the concentration of the samples was lower than expected. Therefore, the PicoGreen assay (Invitrogen, Burlington, Ontario) was used to ensure the concentration  33  of the original DNA samples from Toronto was lOng/pL. The PicoGreen dsDNA Quantitation Reagent is an ultra-sensitive fluorescent nucleic acid stain for quantification of double-stranded DNA. The advantage of the PicoGreen reagent over conventional absorbance measurement at 260nm is that nucleotide, single-stranded nucleic acids and proteins all contribute to absorbance at 260nm. These are commonly found contaminants in DNA preparations. The PicoGreen reagent is more selective for double- stranded DNA and does not fluorescence in the presence of protein in addition to being more sensitive. The amount of fluorescence detected is directly proportional to the concentration of DNA, and therefore the concentration unknown DNA sample can be deduced once a standard curve is achieved.  A standard DNA sample (A DNA) with a concentration of lOOng/pL was diluted to 2ng/pL with 1X TE buffer. 50pL of diluted A DNA was put into well Al of a black polystyrene 96-well plate (Corning Incorporation, Corning, NY). 25pL of the solution in well Al was pipetted into well B1 and mixed with another 25pL of 1X TE buffer. This serial dilution was repeated for the entire column, except the last well. In well H1, only 25pL of 1X TE buffer was added with no standard DNA sample.  Each DNA sample from the source plates sent by Toronto was diluted to three different concentrations (1/20, 1/40 and 1/80). For each concentration of source DNA samples, a serial dilution into other columns of the black 96-well plates was done as outlined above for the standard DNA samples. The layout of the black 96-well plates is shown in Table 9.  34  Table 9: Layout of a 96-well plate with standard DNA samples and DNA samples from source plates sent by Toronto  Plate Layout  Lambda DNA  2  2  1  1  0.5  0.5  0.25  0.25  0.125  0.125  0.063  0.063  0.031  0.031  0  0  k^s  1/20  1/20  1/20  1/40  1/40  1/40  1/80  1/80  1/80  000301 001901 005201 006301 006901 008301 010101 015001  000302 001902 005202 006302 006902 008302 010102 015002  000303 001903 005203 006303 006903 008303 010103 015003  000301 001901 005201 006301 006901 008301 010101 015001  000302 001902 005202 006302 006902 008302 010102 015002  000303 001903 005203 006303 006903 008303 010103 015003  000301 001901 005201 006301 006901 008301 010101 015001  000302 001902 005202 006302 006902 008302 010102 015002  000303 001903 005203 006303 006903 008303 010103 015003  200X PicoGreen reagent was diluted to 1X with TE buffer. 25pL of 1X PicoGreen reagent was added to each well in the black 96-well plate. The plate was covered with aluminum foil since the PicoGreen reagent is sensitive to light.  The black 96-well plate was placed into a GENios fluorimeter (Tecan Group Ltd., Durham, NC) for the detection of fluorescence emitted by the solution of PicoGreen reagent. In the linked computer, the Tecan-XFluor4 program was opened. Under the edit measurement parameter tab, the following settings were chosen: general = fluorescence; plate = GRE96fb,pdf; excitation A = 485nm; emission A = 535nm, gain manual = 60, integration time = 40ps and the box indicating fluorescein was checked. The measurement was started and the results were saved.  13. Re-genotyping  As a quality control measure for the genotyping results obtained from TaciMan assays, re-genotyping was performed for about 30% of the samples. In this project, the concentration of the DNA samples in source plates #1-8 was found to be lower than the expected value of  35  source plates #1-8 was found to be lower than the expected value of lOng/pL and unsatisfactory results were obtained from the TaqMan assays. While waiting for a new batch of samples to be sent from Toronto, genotyping of plates #9-20 was performed and the data were used for quality control. Specifically, the data from the first round of genotyping of plates #9-20 were compared with the data from the new DNA samples.  14. Statistical Data Analysis  Before performing any statistical analysis of the genotyping data, the pattern of Mendelian inheritance was first examined for all participating trio members by the Family Based Association Test (FBAT) program (www.biostat.harvard.edu/-fbat/default.html) . The presence of Mendelian errors in any one of the families for any one of the SNPs being tested would identify potential problems with the samples and/or genotyping assays, which would lead to misinterpretation of the data.  After the families with Mendelian inconsistencies were deleted from the rest of the analysis, chi square tests were used to determine the presence of Hardy-Weinberg Equilibrium among the parents and to compare the genotype frequencies of all SNPs between the parental population and the online databases. These two procedures helped to identify any inaccurate genotyping assays and/or whether the study population was genetically heterogeneous.  Analysis of Variance (ANOVA) was employed in attempting to establish any relationship between the genotypes and the severity of the disease. Three phenotypic characteristics (age of diagnosis, Forced Expiratory Volume in one second (FEV1) percent predicted and FEV1 standard deviation (see below) were selected as measurements which indicated the disease status of each patient.  FEV1 is defined as the volume exhaled during the first second of a forced expiratory maneuver. FEV1 values were expressed as a percentage of the expected values for age, sex and height calculated using the formula of Knudson et al [38] for those over the age of 10 years and according to the Hospital for Sick Children formula [39] for  36  those 6 to 10 years of age. In obstructive lung diseases such as CF, a decrease in FEV1 will be evident and therefore it is a common index for assessing pulmonary function. Consequently it was chosen as one of the phenotypic traits in this study in order to measure the severity of the disease.  As an alternate measure of lung function we used the difference between a patient's lung function and the average expected for a cohort of Canadian CF patients. A regression analysis of age versus FEV1 percent predicted was conducted using the longitudinal data of patients homozygous for the 1F508 mutation from the Canadian CF Patient Data Registry (2002) (see Figure 9). The FEV1 standard deviation score (FEV_SD) was calculated by comparing each patient's FEV1 percent predicted value with that from the regression curve for that age and gender. A positive value for FEV1_SD indicates the number of standard deviations that the individual's FEV percent predicted is above that predicted from this relationship and a negative value is the number of standard deviations the subject's value is below that predicted for that age and gender.  37  SO  ^  50  Figure 9: Normalization of the FEV1 percent predicted values by age. Longitudinal data from the Canadian CF Patient Data Registry was used to calculate the linear regression curve for patients homozygous for the deltaF508 mutation. The standard deviation scores were calculated for all patients according to the FEV1 percent predicted values and age. There is a deficiency of patients in age groups over 30 with low  38  scores due to mortality.  By performing the ANOVA test using the program JMP v5.1 (SAS Institute Inc. Cary, NC), the sample should be normally distributed and the patients were unrelated (i.e. were not from the same extended pedigree and thus the observations would be independent). The normality of a sample can be checked by the goodness of fit test in the JMP program. A normal quantile plot was done by plotting the phenotypic variables (i.e. age of diagnosis, FEV1 predicted and standard deviation values) as the y-axis according to the SNP genotype. Although t-tests could be used for identifying significant differences between two genotypic groups, ANOVA was superior over t-test in this analysis since multiple t-tests would be required for the necessary comparisons. In contrast, by using ANOVA we could compare more than two genotypic groups and thus generate a single P value.  Before we concluded that a significant association had been established between one of the three traits and the genotype of any SNP of the four candidate genes, regression analysis was performed to adjust the data for potentially confounding factors such as sex, age and CFTR genotype.  One of the traits, the age of first Pseudomonas aeruginosa infection, was examined by survival analysis or age of onset analysis. Although about 80% of CF patients were infected with Pseudomonas aeruginosa, not every participant colonized with this bacterium.  recruited in this study had been However, it is likely that these  individuals will become colonized by Pseudomonas aeruginosa in the future. Consequently it was not appropriate to ignore these patients in this part of the study as would be the case if the analysis was done by ANOVA. When performing the age of onset analysis using JMP, these individuals were assigned their current age as their age of first Pseudomonas aeruginosa infection. However, since the age of first Pseudomonas aeruginosa infection of these patients was incorrect, a censor column was added which stated 0=no Pseudomonas aeruginosa infection and 1=Pseudomonas aeruginosa infection. Therefore these individuals would be accounted for in the calculation by the program automatically according to the value in the censor column.  39  As mentioned before, Pseudomonas aeruginosa infection was not found in all patients recruited in this study. One of four categories was assigned to each CF patient with respect to their Pseudomonas aeruginosa infection status, namely no Pseudomonas aeruginosa infection (code 0), grew Pseudomonas aeruginosa once (code 1), sporadic Pseudomonas aeruginosa growth (code 2) and chronic Pseudomonas aeruginosa infection (code 3). Susceptibility to Pseudomonas aeruginosa infection was hypothesized to be influenced by one or more the SNPs of the four candidate modifier genes, and the Chi-test was used to confirm the existence of such a relationship.  Another analytical method, the Family Based Association Test (FBAT)[40], was employed for examining the presence of association between the selected phenotypes and SNPs. The FEAT software implements the TDT and other family-based tests of association. Although FEAT is also a test of genetic association, this test is markedly different from the ANOVA. FEAT tests for association within the family trio members while ANOVA only considered the patients in the analysis. Both phenotype and genotype files needed to be prepared in the correct format for analysis by the FEAT program. Both files were prepared in the Notepad program, with the genotype file saved as a .ped file whereas the phenotype file was saved as a .phe file. In the genotype file, the names of all 22 SNPs were placed in the first row. There were also restrictions for the columns. The first column was the pedigree ID and the second column was the individual ID. Father ID and mother ID were shown in the third and fourth columns, respectively. Gender was indicated in the fifth column with 1=male and 2=female. The next column represented the affection status. The remaining columns showed the alleles at each of the SNPs.  Haplotype analysis was the last part of the analysis in this study. Possible combinations of all the SNPs in each candidate gene were tested as a group. This was performed to determine whether there was any interaction effect between SNPs, which then may have resulted in a new association with the outcome variable that had not been detected when each SNP was analyzed individually. In addition, examination of haplotypes may have identified causal SNPs that were not  40  detected in the resequencing programs such as Seattle SNPs. These causal SNPs could be outside the region that was resequenced but found preferentially on a specific haplotype. Thus, the haplotype would act as a marker for the causal SNP. However, some of the haplotypes were found to have a low frequency and those were pooled together as a group. Also, the confounding factors, e.g. sex and CFTR genotype, were put into the analysis for adjustment of the data.  The haplotype analysis was first done by the RGui program version 2.5.0, provided by The R Foundation for Statistical Computing. Both genotyping and phenotypic input files were required to be in correct format recognized by the program and saved as a .csv file in Notepad. In the genotypic file, the first row contained the heading of each column: family ID, individual ID, father ID, mother ID and genotype of each allele of every SNP. Then the data of each participant was arranged in a row below according to the headings listed. In the phenotypic file, the first row showed the name of each column: person ID and each of the phenotypic characteristics being collected (including the target phenotypic traits and confounding factors in this study: age of diagnosis, FEV1 predicted and standard deviation values, gender, age and CFTR mutation). The patients' data were put below in the order of the headings named in the first row. The phenotypic trait of age of diagnosis was also logarithmically transformed in order to correct for normality. All SNPs within a single gene were grouped together as a haplotype for determination of any association and analysis was done twice, one with and the other without adjustment for confounding factors such as sex and CFTR genotype. Results from both analyses could then be evaluated in the presence and absence of those confounding factors.  Haplotype analysis between the genotype and phenotype (age of diagnosis, FEV1 predicted and standard deviation values) was also performed by FBAT program. The generation of possible haplotypes was determined according to implementation of the TDT and other family based tests of association offered by the program. The format of both genotypic and phenotypic input files were exactly the same as those used in previous FBAT calculations. Only the additive model was included in this part of the study. Due to the different computational  41  basis of both programs, direct comparison between results obtained from the RGui and FBAT programs could not be done. However if consistent results were obtained by both programs this would suggest stronger association.  For the age of first Pseudomonas aeruginosa infection, the haplotype analysis was done by the program Hapstat (http://www.bios.unc.edu/-lin/hapstat/) [41]. The input file was saved as .txt file in Notepad. The first row represented the headings of data listed below: age of first Pseudomonas aeruginosa infection, status of Pseudomonas aeruginosa infection, gender, CFTR genotype and the genotype of each of the involved SNPs. As the file was loaded as a cohort file into the program the following settings were utilized: observation time = age of first Pseudomonas aeruginosa infection; event indicator = status of Pseudomonas aeruginosa infection; genotype = all the SNPs in each of the candidate modifier genes; environment = confounding factors (gender and CFTR genotype: deltaF508/deltaF508, deltaF508/other mutation; and other mutation/other mutation). Only the additive model was used in this part of the study. Both boxes of haplotype frequency and effects had to be checked, and the haplotype frequency was calculated based on the number of samples and was set to default as 0.009; while the haplotype effect needed to be changed to 0.05 as the P value.  42  Chapter 3 Results  1. PicoGreen Reaction  In the initial experiments for this project, unsatisfactory results were obtained from the genotyping of HMOX1 SNP149 for the original source plates #1-8. Therefore, it was suspected that the initial concentration in each of the wells was less than the expected lOng/pL. Consequently, there may not have been enough DNA present for PCR and amplification could not be done to the expected level for genotyping. Therefore, PicoGreen reactions were performed in order to determine the concentration of DNA from source plates #1-8. A standard curve was plotted with serially diluted A DNA samples as indicated in the table layout in the "Materials and Methods" section. The standard curve derived from this serial dilution is shown in Figure 10.  Figure 10: Standard curve of optical density (OD) versus DNA concentration of the serially diluted A DNA samples  y =3319.8x R2 = 0.9999  0  ^  0.2^0.4^0.6^0.8 DNA [ug/m L]  1  1.2  With the aid of the above standard curve, the concentration of DNA in the original source plates #1-8 was determined and a representative subset of the results is summarized in Table 1 in the Appendix.  As observed from the Table Al in the Appendix, the concentration of samples provided by Toronto ranged from -0.03ng/pL to 9.77ng/pL and  43  most of the concentrations were lower than expected. This was presumably the major reason for the failed genotyping assays. However, this inconsistent concentration of DNA samples was not detected in source plates #9-20. DNA samples in those plates were genotyped for most of the SNPs while waiting for a new batch of DNA samples of all the participants to be sent from Toronto. These results from plates #920 would be used as a quality control measure to compare with genotypes from the new DNA samples. (see section 12-Re-genotyping)  2. Analysis of the Genotypic Data for Mendelian Inconsistencies  Analysis of the parent-offspring trios for Mendelian inconsistencies was performed using the Family Based Association Test (FBAT) program [42]. This program tests the inheritance pattern from both parents to their child. Each of the parents should contribute one of their two alleles to the child in accordance with the theory of heredity stated by the father of genetics, Gregor Mendel. The presence of Mendelian errors in any one of the families for any one of the SNPs being tested will identify potential problems with the samples and/or genotyping assays. Mendelian errors were detected in some of the families when testing for particular SNPs and the results are summarized in Table A2 in the Appendix.  There were many potential reasons leading to the observed Mendelian errors, which included contamination of samples by other DNA molecules during sample collection or genotyping, incorrect labeling of the identities of the individuals, non-paternity and random genotyping errors. In this study, those families with more than one error in the Mendelian inheritance patterns were deleted from the study while those with only one error were excluded for the analysis of the particular SNP for which the error was detected.  3. Sequencing Results  Due to discrepancies between genotypes of three of the control individuals when testing the assay for BF_7202, DNA samples from Coriell controls D0008, D016 and E016 were sent to the University of British Columbia for sequencing, together with the sequence of the  44  primers used. However, these sequencing reactions were not successful for unknown technical reasons. However, the genotypes of these three individuals were repeated two more times by the TaqMan assay designed for BF_7202, and they were found to match those on the SeattleSNPs website. Therefore, the genotyping result was confirmed as reliable regardless of the failure of the sequencing by UBC and the analyses of the BF_7202 SNP were continued.  4. Genotypes of the Participating Individuals  This was a sub-study of a large, Canada-wide and international project: the Canadian Consortium for CF Modifiers. DNA samples from 535 families with a total of 1605 subjects (2 parents and 1 patient) across Canada were included in this sub-study. A DNA sample of each participant was sent from Toronto at a concentration of lOng/pL. Four candidate genes with a total of 22 SNPs were chosen and all samples were genotyped for those SNPs by TaqMan assays according to the procedures detailed in the "Materials and Methods" section.  Five SNPs were selected for Factor B gene. Genotyping results for this gene are summarized in Table 3 in the Appendix. As noted in the table, non call rates ("undetermined" in Table A3 in the Appendix) for all SNPs were about 2% or less. One of the possible reasons for genotyping failure was a low concentration of DNA (less than the expected concentration of lng/pL after dilution) and therefore not enough DNA molecules were amplified for those individuals and their genotypes were undetectable. Two families were deleted from the analysis of SNP 2557 since errors were detected in these two families when investigating the pattern of inheritance of the alleles from parents to offspring utilizing the FBAT program.  Four SNPs were selected for genotyping in the Complement factor 3 gene (Table A4 in the Appendix). Genotyping was not successful for less than 2% of the samples for each SNP. Furthermore, three and one families were deleted from the analysis of SNP 963 and 28795, respectively because of the non-Mendelian inheritance pattern observed at the two SNPs in those families.  45  Seven SNPs were selected for analysis in the toll-like receptor 4 gene (Table A5 in the Appendix). In the TLR4 gene, only one family was detected to have an error in the inheritance pattern (for SNP 11912) and it was removed from all analyses regarding this SNP. The genotypes of less than 2% of the samples were found to be undetermined.  Six SNPs were included for the analysis of the heme oxygenase-1 gene (Table A6 in the Appendix). For SNP 149, 17 families were excluded due to Mendelian errors and a high number of individuals were detected to have an undetermined genotype. With the exception of SNP 149, only about 1% of genotypes could not be determined. Furthermore, only 2 families were deleted from the analysis due to errors in the inheritance pattern (one each for SNP 2790 and 9531).  5. Phenotypic Characteristics of the Study Subjects  The study consisted of a total of 535 families recruited at CF clinics across Canada. Unfortunately, we were not able to collect characteristics of every single individual enrolled in the study. The available variables are summarized in Table A7 in the Appendix.  6. Determination of Hardy-Weinberg Equilibrium in the Parent Population  Hardy-Weinberg analysis determines how the allele frequency of a given SNP corresponds to the genotype distribution. The Hardy-Weinberg Law states that the frequency of an allele should remain constant over time unless there are outside driving forces acting on the population. A Hardy-Weinberg analysis was performed on the parent population since any deviation from equilibrium could indicate an inaccurate genotyping assay or a genetically heterogeneous study population. A summary table of the investigation of Hardy-Weinberg equilibrium among the parent population is shown in Table 10.  46  Table 10: Hardy-Weinberg Equilibrium among the parental population Gene  SNP  2557  4022 Factor B  6484  7202  8311  963  Complement factor 3  28795  36735  43118  Allele  Allele Frequency  Genotype  A  0.18  AA AG GG  G  0.82  Total  A  0.32  AA AG GG  G  0.68  Total  A  0.89  AA AG GG  G  0.11  Total  A  0.43  AA AG GG  G  0.57  C  0.91  T  0.09  Total CC CT TT Total  G  0.51  GG GT  T  0.49  TT Total  A  0.58  AA AG GG  G  0.42  Total  A  0.44  AA AG GG  G  0.56  Total  A  0.48  AA AG GG  G  0.52  Total  Observed Frequency  Expected Frequency  36 314 707 1057 107 458 489 1054 858 189 19 1066 197 508 344 1049 862 182 7 1051  34 312 711 1057 108 459 487 1054 844 209 13 1066 194 514 341 1049 870 172 9 1051  270 527 249 1046 355 521 177 1053 202 519 330 1051 239 529 292 1060  272 523 251 1046 354 513 186 1053 203 518 330 1051 244 529 287 1060  x2 Test  0.94  0.99  0.08  0.93  0.58  0.97  0.75  1.00  0.91  47  Table 10: Hardy-Weinberg Equilibrium among the parental population  851  1859  2856 TLR 4  9263  11912  15884  17050  149  1038  Herne oxygenase  2790  1 3303  9531  16442  AA AG GG  A  0.72  G  0.28  Total  A  0.37  AA AG GG  G  0.63  Total  C  0.15  CC CT TT  T  0.85  Total  A  0.11  AA AC CC  C  0.89  Total  G  0.67  GG GT TT  T  0.33  Total  C  0.15  CC CG GG  G  0.85  Total  C  0.15  CC CT TT  T  0.85  Total  A  0.32  AA AG GG  G  0.68  Total  C  0.96  CC CT TT  T  0.04  Total  A  0.57  AA AT TT  T  0.43  Total  C  0.05  CC CG GG  G  0.95  Total  A  0.46  AA AG GG  G  0.54  Total  A  0.95  AA AT TT  T  0.05  Total  551 434 81 1066 143 509 410 1062 24 266 770 1060 16 207 841 1064 478 467 114 1059 23 270 764 1057 20 277 765 1062  553 430 83 1066 145 495 422 1062 24 270 766 1060 13 208 843 1064 476 468 115 1059 24 269 764 1057 24 271 767 1062  122 404 479 1005 975 83 1 1059 341 517^' 200 1058 2 106 954 1062 225 528 302 1055 952 109 1 1062  103 437 465 1005 976 81 2 1059 344 519 195 1058 3 101 958 1062 223 524 308 1055 958 101 3 1062  0.95  0.68  0.96  0.70  0.99  0.98  0.67  0.04  0.76  0.92  0.74  0.92  0.37  48  Hardy-Weinberg Equilibrium was established for all the selected SNPs among the parental population, except SNP HMOX1_149.  7. Comparison of Genotype Frequencies in the Parent Population and in Online Databases  As another check of the genotyping assays the allele frequencies obtained in this study were compared with those recorded for Caucasian individuals in the Innate Immunity Programs for Genomic Applications (IIPGA) and UW-FHCRC Variation Discovery Resource (Seattle SNPs) website (Table 11). Wide disparities between these two datasets could indicate an inaccurate genotyping assay in this study.  49  Table 11: Comparison of allele frequencies between the genotyping results and the reported values on either the IIPGA or Seattle SNPs websites  Gene  SNP  Allele  Frequency^of the allele in this study  Reported frequency^of the allele  Factor B  2557  A G A G A G A G C T G T A G A G A  0.18 0.82 0.32 0.68 0.89 0.11 0.43 0.57 0.91 0.09 0.51 0.49 0.58 0.42 0.44 0.56 0.48  0.15 0.85 0.29 0.71 0.89 0.11 0.35 0.65 0.89 0.11 0.55 0.45 0.5 0.5 0.45 0.55 0.47  G A G A G C T A C G T C G C T A G C T A T C G A G A T  0.52 0.72 0.28 0.37 0.63 0.15 0.85 0.11 0.89 0.67 0.33 0.15 0.85 0.15 0.85 0.32 0.68 0.96 0.04 0.57 0.43 0.05 0.95 0.46 0.54 0.95 0.05  0.53 0.72 0.28 0.39 0.61 0.13 0.87 0.13 0.87 0.70 0.30 0.11 0.89 0.17 0.83 0.28 0.72 0.89 0.11 0.61 0.39 0.11 0.89 0.48 0.52 0.87 0.13  4022 6484 7202 8311 Complement factor 3  963 28795 36735 43118  TLR-4  851 1859 2856 9263 11912 15884 17050  HMOX1  149 1038 2790 3303 9531 16442  X2 Test  0.60 0.65 0.96 0.27 0.72 0.61 0.31 0.86 0.99 0.96 0.87 0.74 0.70 0.73 0.44 0.64 0.57 0 02 0.57 0.09 0.87 0.02  50  Inspection of Table 11 shows that the allele frequencies from this study and the reported allele frequencies were generally similar. However, statistical analysis by 2x2 Chi-square test demonstrated significant differences for the heme oxygenase-1 SNPs 1038 and 16442. There were discrepancies of approximately 7% in the allele frequencies for each of those two SNPs. This may indicate that the assays for these SNPs are unreliable. However, the fact that the discrepancies are small and the SNPs are in Hardy-Weinberg equilibrium argues against this.  Samples with different genetic backgrounds would be another possible reason for the discrepancy. The reported heme oxygenase-1 data shown in Table 11 are from the Seattle SNPs website. All the DNA samples used in this study and those used by Seattle SNPs were from the Caucasian population. However, the Caucasian individuals genotyped by Seattle SNPs were Utah residents with ancestry from Northern and Western Europe while those in this study may be representative of a different subset of the Caucasian samples. Another possible reason for the discrepancy could be the small sample size used by Seattle SNPs, i.e. only 23 European samples were sequenced.  8. ANOVA Analysis of the Influence of Genotype on the Phenotypes  Three phenotypic characteristics: age of diagnosis, FEV1 predicted value and FEV1 standard deviation were included in the oneway ANOVA to determine if any of the genotypes contributed to the variation in the selected traits. Each trait provides a measure of the degree of severity of disease in the patients. The use of ANOVA assumes that the dependent variable is normally distributed, therefore, various transformations were performed (including quadratic, cubic, common logarithm and natural logarithm) in order to normalize the distributions if needed. In this study, only the data set of age of diagnosis was slightly skewed and it was then transformed by common logarithm before ANOVA was performed.  (a) Factor B  (I)^Age of diagnosis  51  Table 12: ANOVA of age of diagnosis among different genotypes of the selected SNPs in Factor B. The age of diagnosis was logarithmically transformed for normality. SNPs BF 2557  BF 4022  BF 6484  BF 7202  BF 8311  Genotype  Number  Mean  Standard Error  P value  AA  18 111  -0.27 -0.16 -0.23 -0.20 -0.30 -0.15 -0.19 -0.31 -0.13 -0.02 -0.34 -0.13 -0.25 -0.04 0.11  0.20 0.08 0.05 0.12 0.07 0.06 0.05 0.09 0.33 0.09 0.06 0.07 0.05 0.10 0.43  0.74  AG GG  AA AG GG AA AG GG AA AG GG  CC CT TT  307 55 167 217 353 83 7 85 210 146 360 73 4  0.23  0.51  0.0057  0.12  (II) FEV1 predicted value  Table 13: ANOVA of FEV1 predicted value among different genotypes of the selected SNPs in Factor B  SNPs  Genotype  Number  Mean  Standard Error  P value  BF 2557  AA  BF 4022  14 104 279 48 150 205 329 70 7 82 185 137 321 72 5  78.31 68.90 77.86 69.89 74.86 77.72 75.22 79.19 68.97 69.43 76.30 78.47 75.77 74.58 72.67  6.98 2.56 1.56 3.79 2.14 1.83 1.45 3.14 9.92 2.89 1.93 2.24 1.47 3.11 11.79  0.01  AG GG AA AG GG  BF 6484  AA  BF 7202  AG GG AA AG GG  BF 8311  CC CT TT  0.16  0.41  0.04  0.91  52  (III) FEV1 standard deviation value  Table 14: ANOVA of FEV1 sd value among different genotypes of the selected SNPs in Factor B  SNPs  Genotype  Number  Mean  Standard Error  P value  BF 2557  AA AG GG AA AG GG AA AG GG AA AG GG CC CT TT  14 104 279 48 150 205 329 70 7 82 185 137 321 72 5  0.70 0.23 0.53 0.19 0.46 0.53 0.45 0.57 -0.18 0.23 0.48 0.58 0.46 0.47 0.08  0.25 0.09 0.06 0.14 0.08 0.07 0.05 0.11 0.36 0.10 0.07 0.08 0.05 0.11 0.42  0-01  BF 4022  BF 6484  BF 7202  BF 8311  0.08  0.12  0.03  0.66  From the above tables, the age of diagnosis was log-transformed in order to achieve normally distributed set of data. Three of the five SNPs (BF_4022, BF_6484 and BF_8311) had P values of greater than 0.05 when performing ANOVA for all the phenotypes. Therefore, it was unlikely that there was a relationship between these SNPs and CF disease severity. However, for the two SNPs (BF_2557 and BF_7202) with P values less than 0.05 the analyses were repeated with adjustment for confounding factors in order to confirm a significant association between the tested SNPs and phenotypic traits.  (b) Complement Factor 3  ANOVA tests were completed for investigating the selected three phenotypic traits among the SNPs of the candidate CF modifier gene of Complement factor 3 and the results are shown in Tables A8, A9 and A10 in the Appendix. No adjustment with respect to those confounding factors was required for any of the analyses in this part. None of the analyses revealed a relationship with the measured factors since all of the P values were greater than 0.05.  53  (c) Toll-like Receptor 4  (I)^Age of diagnosis  Table 15: ANOVA of age of diagnosis among different genotypes of the selected SNPs in Toll-like receptor 4. The age of diagnosis was logarithmically transformed for normality.  SNPs  Genotype  Number  Mean  Standard Error  P value  TLR4_851  AA AG GG AA AG GG CC CT TT AA AC CC GG GT TT CC CG GG CC CT TT  242 168 31 62 206 175 9 121 314 7 86 351 190 191 59 9 102 325 12 109 322  -0.19 -0.25 -0.08 -0.29 -0.10 -0.31 0.03 -0.32 -0.17 0.19 -0.21 -0.22 -0.19 -0.19 -0.34 -0.25 -0.24 -0.21 -0.15 -0.26 -0.20  0.06 0.07 0.15 0.11 0.06 0.06 0.29 0.08 0.05 0.33 0.09 0.05 0.06 0.06 0.11 0.29 0.09 0.05 0.25 0.08 0.05  0.57  TLR4_1859  TLR4_2856  TLR4_9263  TLR4_11912  TLR4_15884  TLR4_17050  0.04  0.20  0.47  0.47  0.92  0.80  ANOVA tests were completed for examining the selected three phenotypic traits among the SNPs of the candidate CF modifier gene of Toll-like receptor 4. As shown in the table above and Tables All and Al2 in the Appendix, no adjustment for confounding factors was performed since all the P values were greater than 0.05, except for the ANOVA of SNP TLR4_1859 and the age of diagnosis. Consequently, regression analysis was performed for this association.  (d) Heme Oxygenase-1  (I)^FEV1 predicted value  54  Table 16: The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in heme oxygenase-1  SNPs  Genotype  Number  Mean  Standard Error  P value  HMOX1_149  AA AG GG  CC CT TT  HMOX1_2790  AA AT  71.14 74.08 77.64 75.16 81.07 75.30 78.59 76.39 69.57 n/a 79.13 75.37 79.82 77.55 70.43 75.46 77.09 n/a  3.81 2.12 2.05 1.37 4.31 18.55 2.31 1.88 2.87 n/a 4.19 1.37 2.80 1.90 2.33 1.37 4.13 n/a  0.25  HMOX1_1038  49 159 169 367 37 2 128 193 83 0 39 366 87 188 126 366 40 0  HMOX1_3303  HMOX1_9531  HMOX1_16442  TT CC CG GG AA AG GG AA AT  TT  0.43  0.045  0.39  0.017  0.71  (II) FEV1 standard deviation value  Table 17: The ANOVA result of examining FEV1 sd value among different genotypes of the selected SNPs in heme oxygenase-1  SNPs  Genotype  Number  Mean  Standard Error  P value  HMOX1_149  AA AG GG  CC CT TT AA  0.27 0.44 0.48 0.45 0.59 0.20 0.49 0.54 0.22 n/a 0.46 0.46 0.55 0.59 0.24 0.46 0.42 n/a  0.14 0.08 0.07 0.05 0.16 0.67 0.08 0.07 0.10 n/a 0.15 0.05 0.10 0.07 0.08 0.05 0.15 n/a  0.43  HMOX1_1038  49 159 169 367 37 2 128 193 83 0 39 366 87 188 126 366 40 0  HMOX1_2790  AT  HMOX1_3303  HMOX1_9531  HMOX1_16442  TT CC CG GG AA AG^. GG AA  AT TT  0.61  0.029  0.96  0.038  0.79  55  From the two tables above and Table Al3 in the Appendix, it is unlikely that any of the six SNPs were associated with the three measured phenotypic traits, except for HMOX1_2790 and HMOX1_9531. These SNP were detected to have a P-value of less than 0.05 when performing ANOVA with FEV1 predicted value and FEV1 sd value. Before a significant relationship could be concluded, adjustment for confounding factors was required.  9. Regression Analysis  As indicated by the ANOVA results, a significant association was found for the following SNPs: (1) BF 2557 and FEV1 predicted value/sd value; (2) BF7202 and age of diagnosis; (3) BF 7202 and FEV1 predicted value/sd value; (4) TLR4_1859 and age of diagnosis; (5) HMOX1_2790 and FEV1 predicted value/sd value; and (6) HMOX1_9531 and FEV1 predicted value. The ANOVA of those five SNPs with the specified phenotypes had a P value of less than 0.05 which indicated that there might be an association between the SNP and trait. However, before a final conclusion that a correlation could be established, the association was corrected for confounding factors which included sex, age and CFTR genotype. These factors were hypothesized to have effect on those measured traits, for example, different genotype at the CFTR mutation would lead to a wide variety of clinical symptoms. Regression analysis was performed with those confounding factors and the results are illustrated in the following tables 18-27.  56  (a) BF_2557  Table 18: Regression analysis of the association of BF _ 2557 and FEV1 predicted value with confounding factors  SNP/phenotypic trait  Confounding factor  Sub-groups of confounding factor  Mean  Standard Error  P value  BF^2557/FEV1 predicted value  BF 2557  AA AG GG F M n/a AF508/AF508 AF508/other  77.40 69.25 78.53 76.19 75.98 n/a 75.61 77.92  6.63 2.63 1.75 3.01 2.80 n/a 2.51 2.92  0.014  other/other  72.05  4.68  Sex Age CFTR genotype  0.64 <0.0001 0.38  Table 19: Regression analysis of the association of BF_2557 and the FEV1 sd value with confounding factor  SNP/phenotypic trait  Confounding factor  Sub-groups of confounding factor  Mean  Standard Error  P value  BF_2557/FEV1 sd value  BF 2557  AA AG GG F M AF508/AF508 AF508/other  0.72 0.25 0.55 0.43 0.51 0.44 0.57  0.27 0.11 0.07 0.12 0.11 0.10 0.12  0.016  other/other  0.37  0.19  Sex CFTR genotype  0.55 0.38  From the above two tables, it can be seen that there were associations between the Factor B SNP 2557 and both FEV1 predicted and standard deviation values. The associations were independent of the potential confounding factors of sex and CFTR genotype. However, a P value of less than 0.0001 was found on evaluation of the effect of age on the FEV1 predicted value. This observation could be explained by the fact that lung function measured by percent predicted would be decreased with increasing age.  57  (b) BF_7202  Table 20: Regression analysis of the association of BF_7202 and the age of diagnosis with confounding factors  SNP/phenotypic trait  Confounding factor  Sub-groups of confounding factor  Mean  Standard Error  P value  BF_7202/age of diagnosis  BF 7202  AA AG GG F M AF508/AF508 AF508/other  2.75 2.00 3.70 2.31 3.04 1.59 3.80  0.63 0.45 0.48 0.45 0.44 0.34 0.45  0.0274  other/other  6.11  0.90  Sex CFTR genotype  0.13 <0.0001  Table 21: Regression analysis of the association of BF_7202 and the FEV1 predicted value with confounding factors  SNP/phenotypic trait  Confounding factor  Sub-groups of confounding factor  Mean  Standard Error  P value  BF_7202/FEV1 predicted value  BF 7202  AA AG GG F M n/a AF508/AF508 AF508/other  70.12 76.73 79.73 76.75 76.07 n/a 75.91 78.31  2.92 2.10 2.23 2.07 2.05 n/a 1.59 2.11  0.019  other/other  72.05  4.16  Sex Age CFTR genotype  0.66 <0.0001 0.25  58  Table 22: Regression analysis of the association of BF_7202 and the FEV1 standard deviation value  SNP/phenotypic trait  Confounding factor  Sub-groups of confounding factor  Mean  Standard Error  P value  BF_7202/FEV1 sd value  BF 7202  AA AG GG F M n/a AF508/.F508 AF508/other  0.26 0.49 0.61 0.45 0.52 n/a 0.45 0.58  0.12 0.08 0.09 0.08 0.08 n/a 0.06 0.08  0-0215  other/other  0.37  0.17  Sex Age CFTR genotype  0.57 0.12 0.26  A P value of 0.0274 was found, which confirmed the presence of a significant relationship for the regression analysis of BF_7202 and age of diagnosis. Also, the associations between both the FEV1 predicted and standard deviation values and BF_7202 were verified.  (c) TLR4_1859  Table 23: Regression analysis of the association of TLR4_1859 and the age of diagnosis with confounding factors  SNP/phenotypic trait  Confounding factor  Sub-groups of confounding factor  Mean^•  Standard Error  P value  TLR4/age^of diagnosis  TLR4_1859 _  AA AG GG F M AF508/AF508 AF508/other  2.17 2.95 2.59 2.31 3.05 1.59 3.80  0.73 0.44 0.47 0.45 0.46 0.36 0.46  0.68  other/other  6.11  0.90  Sex CFTR genotype  0.12 <0.0001  The presence of an association between TLR4_1859 and age of diagnosis was not confirmed by this analysis, as there was a p value for TLR4 1859 of 0.68 after adjustment for confounding factors.  59  (d) HMOX1_2790  Table 24: Regression analysis of the association of HMOX1_2790 and the FEV1 predicted value with confounding factors  SNP/phenotypic trait  Confounding factor  Sub-groups of confounding factor  Mean  Standard Error  P value  HMOX1_2790/FEV1 predicted value  HMOX1_2790  AA AT TT F M n/a AF508/AF508 AF508/other  78.39 77.08 70.74 76.47 76.08 n/a 75.87 78.19  2.41 2.04 2.91 2.09 2.08 n/a 1.62 2.19  0.05  other/other  71.14  4.21  Sex Age CFTR genotype  0.71 <0.0001 0.34  Table 25: Regression analysis of the association of HMOX1_2790 and the FEV1 standard deviation value with confounding factors  SNP/phenotypic trait  Confounding factor  Sub-groups of confounding factor  Mean  Standard Error  P value  HMOX1_2790/FEV1 sd value  HMOX1_2790  AA AT TT F M AF508/AF508 AF508/other  0.49 0.57 0.24 0.44 0.51 0.44 0.58  0.097 0.082 0.12 0.08 0.08 0.07 0.09  0.045  other/other  0.36  0.17  Sex CFTR genotype  0.62 0.33  For HMOX1_2790, both analyses indicated the correlation between the SNP and the two phenotypic traits. However, the association only showed a borderline P value of 0.05 and 0.045 when testing for FEV1 predicted value and FEV1 sd value respectively, and therefore must be viewed with caution.  60  (e) HMOX1_9531  Table 26: Regression analysis of the association of HMOX1 _ 9531 and the FEV1 predicted value with confounding factors  SNP/phenotypic trait  Confounding factor  Sub-groups of confounding factor  Mean  Standard Error  P value  HMOX1_9531/FEV1 predicted value  HMOX1_9531  AA AG GG F M n/a AF508/AF508 AF508/other  79.37 78.36 71.26 76.68 76.13 n/a 75.92 78.32  2.83 2.00 2.43 2.09 2.03 n/a 1.61 2.10  0.007  other/other  72.05  4.15  Sex Age CFTR genotype  0.53 <0.0001 0.39  Table 27: Regression analysis of the association of HMOX1_9531 and the FEV1 standard deviation value with confounding factors  SNP/phenotypic trait  Confounding factor  Sub-groups of confounding factor  Mean  Standard Error  P value  HMOX1_9531/FEV1 sd value  HMOX1_9531  AA AG GG F M AF508/AF508 AF508/other  0.54 0.61 0.26 0.45 0.52 0.45 0.59  0.11 0.08 0.10 0.08 0.08 0.06 0.08  0.006  other/other  0.37  0.17  Sex CFTR genotype  0.45 0.39  For HMOX1_9531, both analyses indicated a strong correlation between the SNP and the two phenotypic traits, as indicated by the P value of 0.007 and 0.006.  10.^Age of Onset Analysis for the Age of First Pseudomonas aeruginosa Infection  Approximately 80% of CF patients have either acute or chronic Pseudomonas aeruginosa infection. This was true for most of the  61  patients recruited in this study. However, there were individuals who had never been infected with Pseudomonas aeruginosa. Assuming that there were no errors in the data collection i.e. that the age of first Pseudomonas aeruginosa infection was missing; it was not possible to predict if these individuals would have Pseudomonas aeruginosa infection in the future. Survival analysis was performed to investigate the association between particular SNP and age of first Pseudomonas aeruginosa infection.  (a) Factor B  From Table Al4 in the Appendix, one subgroup of each variable is not indicated, for example, only two (AA and AG) out of three possible genotypes were present in the table. The missing subgroup was the one with the greatest proportion. It was used as the baseline for comparing with other subgroups in order to reveal the effect of the genotype of selected SNPs on the phenotypic trait. These analyses indicated that none of the selected SNPs in the candidate gene of Factor B played a role in affecting the age of first Pseudomonas aeruginosa infection.  (b) Complement Factor 3  From Table Al5 in the Appendix, none of the tested SNPs illustrated a correlation with the age of first Pseudomonas aeruginosa infection.  62  (c) Toll-like receptor 4  Table 28: Age of onset analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Tolllike receptor 4  SNPs  Variable  Sub-group  Estimated Value  Standard Error  P value  TLR4_851  AA AG Female AF508/AF508  0.169 0.0614 0.156 -0.0192  0.155 0.161 0.086 0.133  0.5269  AF508/other AA AG Female AF508/AF508  -0.0245 0.196 -0.069 0.139 -0.036  0.143 0.154 0.119 0.087 0.134  AF508/other CC CT Female AF508/AF508  -0.017 0.566 -0.198 0.163 -0.0293  0.143 0.393 0.225 0.086 0.133  AF508/other AA AC Female AF508/AF508  -0.0104 -0.292 -0.132 0.157 -0.066  -0.287 0.693 0.376 0.087 0.136  AF508/other GG GT Female AF508/AF508  -0.028 0.073 -0.131 0.141 -0.032  0.144 0.128 0.131 0.088 0.135  AF508/other CC CG Female AF508/AF508  -0.031 0.444 -0.077 0.146 0.0042  0.144 0.397 0.231 0.088 0.135  AF508/other CC CT Female AF508/AF508  -0.0455 -0.065 -0.121 0.157 -0.037  0.146 0.397 0.233 0.087 0.135  AF508/other  -0.051  0.144  TLR4_851  Sex CFTR mutation TLR4_1859 TLR4_1859  Sex CFTR mutation TLR4_2856 TLR4_2856  Sex CFTR mutation TLR4_9263 TLR4_9263  Sex CFTR mutation TLR4_11912 TLR4_11912  Sex CFTR mutation TLR4_15884 TLR4_15884  Sex CFTR mutation TLR4_17050 TLR4_17050  Sex CFTR mutation  0.0712 0.9773 0.4581 0.1109 0.9603 0.286 0.0598 0.975 0.0425 0.071 0.8834 0.5244 0.1102 0.9537 0.1965 0.0983 0.9515 0.3172 0.0714 0.9143  63  Among the seven selected SNPs in the candidate gene of toll-like receptor 4, one of them (TLR4_9263) had a P value of 0.0425. This meant that this SNP was significantly associated to the age of first Pseudomonas aeruginosa infection.  (d) Heme oxygenase-1  From the data indicated in Table Al6 in the Appendix, it can be seen that none of the six selected SNPs of heme oxygenase-1 was statistically associated to the age of first Pseudomonas aeruginosa infection.  11.^Pseudomonas aeruginosa Infection Status  Not all patients recruited in this study had Pseudomonas aeruginosa infection. All the patients had been classified into one of the following categories according to their Pseudomonas aeruginosa infection status: no Pseudomonas aeruginosa infection (code 0), grew Pseudomonas aeruginosa once (code 1), sporadic Pseudomonas aeruginosa growth (code 2) and chronic Pseudomonas aeruginosa infection (code 3). It was hypothesized that one or more of the SNPs of the four candidate modifier genes might contribute to susceptibility to Pseudomonas aeruginosa infection of CF patients. Chi-square tests were performed to confirm the existence of such a relationship.  (a) Factor B  The data in Table All in the Appendix show the chi square test results for all SNPs of the Factor B gene. From the statistical analysis, the P-value of each of the Chi square tests carried out for each SNP was greater than 0.05, which indicated that Pseudomonas aeruginosa infection status was not related to one of five selected SNPs in the Factor B gene.  (b) Complement factor 3  64  Table 29: Chi square test for investigating the relationship between different genotypes of the selected SNPs in Complement factor 3 and Pseudomonas aeruginosa infection status Genotype  SNPs  C3_963  C3_28795  C3_36735  C3_43118  PA status (in percentage) 0  1  2  3  GG  37.31  11.94  22.39  28.36  GT  26.55  17.26  17.70  38.50  TT  25.93  15.74  28.70  29.63  AA  28.33  17.78  21.67  32.22  AG  33.81  13.33  20.48  32.38  GG  21.69  14.46  22.89  40.96  AA  28.24  15.29  16.47  40.00  AG  31.38  15.48  21.34  31.80  GG  27.27  13.39  25.17  33.57  AA  34.55  14.55  18.18  32.73  AG  31.72  13.66  19.82  34.80  GG  22.06  18.38  26.47  33.09  Chi Test  P value  12.60  0.0498  6.13  0.49  3.86  0.70  7.79  0.25  The above table demonstrates that none of the C3_28795, C3_36735 and C3_43118 SNPs was associated with Pseudomonas aeruginosa infection status in CF patients. However, a borderline association was observed with the C3_963 SNP. Thus, genotype at the C3_963 polymorphism might contribute to Pseudomonas aeruginosa infection status in CF patients.  (c) Toll-like receptor 4  From the result Table A18 in the Appendix, it can be seen that all the SNPs in TLR4 had a P-value greater than 0.05 and therefore it was not likely that any of the SNPs influence Pseudomonas aeruginosa infection status among the CF patients.  (d) Heme oxygenase-1  The chi-square test was carried out for each of the SNPs in the HMOX1 gene and the results are summarized in Table Al9 in the Appendix. As with the TLR4 gene, there was no relationship between genotypes and  65  the Pseudomonas aeruginosa infection status. However, 0% was detected for some SNPs in some categories of status of Pseudomonas aeruginosa infection. For example, all patients with the TT genotype for SNP HMOX1 1038 were either classified to be 'grew Pseudomonas aeruginosa infection" or "chronic Pseudomonas aeruginosa infection". Also, all patients with the TT genotype for SNP HMOX1_16442 were found to be "no Pseudomonas aeruginosa infection". Logically one might think that those genotypes influenced the observed Pseudomonas aeruginosa infection status. However, this was not proved by the statistical test. Since the number of individuals with genotype TT at HMOX1_9531 and genotype TT at HMOX1_16442 were only 2 and 1, respectively it was not possible to draw valid conclusions regarding the influence of these SNPs on Pseudomonas aeruginosa infection status.  12.^Re-genotyping  Re-genotyping was performed as a quality control measure. The selection of SNPs and individuals for re-genotyping are described in the "Materials and Methods". The comparison is shown in Table 30.  Table 30: Comparing the result of genotyping and re-genotyping Gene  SNPs  Number^of Individuals  Errors found  Factor B  2557 4022 6484  618 618 618  1 1 0  963 851 1859 2856 9263 11912 15884  609 950 618 618 950 950 661  3 0 2 1 3 4 0  17050 1038 2790 3303 9531 16442  618 621 657 949 947 618  1 0 2 0 3 1  Complement Factor 3 Toll-like Receptor 4  Heme oxygenase-1  Although not all SNPs and individuals were included for regenotyping, the above selected SNPs and individuals were sufficient to  66  represent the whole population in this study. As can be seen from the table, the number of errors found ranged from 0 to 4. Such a low frequency of errors was acceptable and the possible reasons for the mismatch in genotypes included inconsistent concentration, contamination of the different DNA aliquots or random errors.  13.^FBAT Analysis of Phenotypic Characteristics  FBAT was another analytical method for determining the association between a particular SNP and a phenotypic trait. FBAT differs from ANOVA in that it determines the inheritance pattern within a family, i.e, from parents to their child. FBAT employs a type of transmission disequilibrium test (TDT). Two of the three models (additive, dominant and recessive) were performed, since both dominant and recessive should gave the same result. For those SNPs with a P value of less than 0.05, a detailed summary of the test is shown below. The complete FBAT analysis is shown in Tables A20-A25 of the Appendix.  (a) Age of diagnosis  Table 31: Detailed results of FBAT analysis of age of diagnosis and SNPs which had a P value of less than 0.05 under the additive model  Marker C3_28795 C3_43118  Allele  Allele Frequency  #^of families  S  E(S)  Var(S)  Z  P  1 2 1 2  0.583 0.417 0.474 0.526  332 332 342 342  -40.89 -73.47 -58.60 -87.27  -60.10 -54.26 -77.78 -68.09  82.74 82.74 94.39 94.39  2.11 -2.11 1.98 -1.98  0.0347 0.0347 0.0484 0.0484  From the data table, only two out of the selected 22 SNPs (C3_28795 and C3 _ 43118) were shown to have a significant association with the age of diagnosis. All SNPs were tested again under the dominant model.  67  Table 32: Detailed results of FBAT analysis of age of diagnosis and SNPs which had a P value of less than 0.05 under the dominant model  Marker BF_7202 C3_28795  Allele  Allele Frequency  #^of families  1 2 1 2  0.431 0.569 0.583 0.417  231 196 189 244  S -41.85 -40.07 -15.82 -37.96  E(S)  Var(S)  Z  P  -40.94 -27.23 -20.84 -23.75  43.19 34.55 30.73 42.450  -0.14 -2.19 0.90 -2.18  0.889 0.029 0.366 0.029  By comparing with the results obtained when analyzing by the additive model, C3_28795 was also found to have an association with the age of diagnosis. However, another SNP, BF_7202 instead of C3_43118, was associated under the dominant model.  (b) FEV1 predicted value  Table 33: Detailed results of FEAT analysis of FEV1 predicted value and SNPs which had a P value of less than 0.05 under the additive model  Marker  Allele  Allele Freq  # of families  S  E(S)  Var(S)  Z  P value  BF_2557  1 2 1 2 1 2  0.184 0.816 0.149 0.851  201 201 170 170  8083.95 21452.06 6723.98 20027.89  9512.89 20023.12 7927.42 18824.45  371921.59 371921.59 327605.14 327605.14  0.959 0.041  59  6037.98  6846.47  110788.10  59  3324.07  2515.58  110788.10  -2.34 2.34 -2.10 2.10 -2.43 2.43  0.019 0.019 0.036 0.036 0.015 0.015  TLR4_15884 HMOX1_1038  Among the selected 22 SNPs, three (BF_2557, TLR4_15884 and HMOX1_1038) were found to have an association with the age of diagnosis, when analyzed under the additive model by the FBAT program.  Table 34: Detailed results of FBAT analysis of FEV1 predicted value and SNPs which had a P value of less than 0.05 under the dominant model  Marker  BF_2557 BF_4022 C3_28795 TLR4_158 84 HMOX1_10 38  Allel  Allele  #^of  S  E(S)  Var(S)  Z  e 1 2 1 2 1 2 1 2 1 2  Frequency 0.184 0.816 0.319 0.681 0.583 0.417 0.149 0.851 0.959 0.041  families 193 49 232 118 166 233 165 27 4 59  6329.50 2503.32 8853.45 5761.40 8337.34 9551.50 5859.52 1614.44 n/a 3173.48  7829.51 2574.39 10359.97 5993.83 8171.95 10781.25 6945.67 1497.16 n/a 2428.05  279688.25 60601.62 341446.47 156520.15 240014.45 350400.00 271528.68 36498.35 n/a 99750.38  -2.84 -0.29 -2.58 -0.59 0.34 -2.08 -2.08 0.61 n/a 2.36  P value 0.005 0.773 0.010 0.556 0.736 0.038 0.037 0.539 n/a 0.018  In addition to the three SNPs (BF_2557, TLR4_15884 and HMOX1_1038) which were determined have a relationship with the FEV1 predicted value when analyzed under the additive model, two more SNPs (BF_4022 and C3_28795) also achieved a P value of less than 0.05 when analyzed under the dominant model.  (c) FEV1 standard deviation value  Table 35: Detailed results of FBAT analysis of FEV1 standard deviation value and SNPs which had a P value of less than 0.05 under the additive model  Marker BF_4022 BF_7202  Allele  Allele Frequency  #^of families  S  E(S)  Var(S)  Z  P  1 2 1 2  0.319 0.681 0.431 0.569  264 264 293 293  66.89 165.27 99.02 164.93  90.96 141.19 122.02 141.92  92.05 92.05 106.17 106.17  -2.51 2.51 -2.23 2.23  0.012 0.012 0.0256 0.0256  When investigating by the FBAT program under the additive model, two SNPs, BF_4022 and BF_7202, had a P value of less than 0.05.  69  Table 36: Detailed results of FEAT analysis of FEV1 standard deviation value and SNPs which had a P value of less than 0.05 under the dominant model  Marker BF_2557 BF_4022 BF_6484 BF_7202 RMOX1_9531  Allele  Allele Frequency  #^of families  1 2 1 2 1 2 1 2 1 2  0.184 0.816 0.319 0.681 0.893 0.107 0.431 0.569 0.463 0.537  193 49 231 118 22 131 217 179 215 194  S 19.32 11.60 39.14 38.09 8.41 34.39 48.80 61.86 78.88 56.87  E(S)  Var(S)  Z  P  36.10 14.89 56.00 30.88 4.24 32.62 64.80 54.85 55.91 52.56  48.58 9.63 54.16 25.43 4.70 40.18 52.25 38.46 53.01 54.17  -2.41 -1.06 -2.29 1.43 1.93 0.28 -2.21 1.13 3.16 0.59  0.016 0.290 0.022 0.153 0.054 0.779 0.027 0.258 0.0016 0.558  In addition to the two SNPs found to have a relationship with the phenotypic characteristic of FEV1 sd value, three more SNPs (BF_2557, BF_6484 and HMOX1_9531) were also determined to have a P value of less than 0.05 when analyzed by the FBAT program under the dominant model.  14. Haplotype Analysis by the RGui Program  In the haplotype analysis, combinations of alleles of the selected SNPs of each candidate gene were examined as a group, in order to investigate any association between the haplotype and the phenotypic characteristics. Single SNP analysis had been done by both ANOVA and FEAT, as described in the previous sections. However, combinations of SNPs might offer additional insights into the correlation between the genotype and the phenotype, since the SNPs might interact with one another.  70  (a) Factor B  (I)  ^  Age of diagnosis  Table 37: Analysis of the correlation between haplotypes of selected SNPs in Factor B and the age of diagnosis, with no adjustment for confounding factors. The age of diagnosis was logarithmically transformed for normality  hlllll h21211 h22112 pooled  Estimate^of Regression Coefficient  Standard Error  Z-score  P  0.343 -0.019 0.552 -0.0506  0.168 0.217 0.230 0.2632  2.045 -0.088 2.405 -0.1922  0.0409 0.9299 0.0162 0.8476  32 haplotypes were theoretically possible when analyzing the Factor B gene with the five selected SNPs. However, only 8 haplotypes were reported by the program as shown in Table A26 in the Appendix. Since the frequencies of five of the 8 haplotypes were too low, they were combined as a pooled haplotype for the analysis. Two haplotypes (hlllll and h22112) were found to have a P value of less than 0.05, suggesting that an association was present. The investigation was continued with further analysis including confounding factors to confirm the association.  Table 38: Analysis of the correlation between haplotypes of selected SNPs in Factor B and the age of diagnosis, with adjustment for confounding factors. The age of diagnosis was logarithmically transformed for normality  hlllll h21211 h22112 pooled SEXM genotypeFO genotypeOO  Estimate^of Regression Coefficient  Standard Error  Z-score  P  0.3386 -0.0169 0.5353 -0.0926 0.0247 0.4342 0.4385  0.1675 0.2158 0.2287 0.2666 0.1820 0.2077 0.2903  2.0210 -0.0784 2.3405 -0.3472 0.1359 2.0910 1.5106  0.0433 0.9375 0.0193 0.7284 0.8919 0.0365 0.1309  71  The same eight haplotypes were found when analyzing with confounding factors. In addition, both h11111 and h22112 were found to have a P value of less than 0.05, which confirmed the association between these two haplotypes in the Factor B gene and age of diagnosis.  (II)  FEV1 predicted value  Among the eight reported haplotypes shown in Table A27 in the Appendix, none demonstrated a relationship with the FEV1 predicted value. Similar results were found when examining with the confounding factors. From Tables A28 and A29 in the Appendix, none of the eight haplotypes in the Factor B gene showed an association with the FEV1 predicted value.  (III) FEV1 standard deviation value  Table 39: Analysis of the correlation between haplotypes of selected SNPs in Factor B and FEV1 standard deviation value, with no adjustment for confounding factors  h11111 h21211 h22112 pooled  Estimate^of Regression Coefficient  Standard Error  Z-score  P  -0.1350 -0.1512 -0.1050 -0.3546  0.0914 0.1163 0.1156 0.1366  -1.4772 -1.3002 -0.9084 -2.5957  0.1396 0.1935 0.3636 0.0094  As indicated in the above table, the pooled haplotypes had a P value of 0.0094, which suggested an association with the phenotype. However, the pooled group included five low frequency haplotypes (table A30 of the Appendix). Therefore, it was not possible to determine which haplotype in the group generated the observed result.  72  Table 40: Analysis of the correlation between haplotypes of selected SNPs in Factor B and FEV1 standard deviation value, with adjustment for the confounding factors  hlllll h21211 h22112 pooled SEXM genotypeFO genotypeOO  Estimate^of Regression Coefficient  Standard Error  Z-score  P  -0.1421 -0.1574 -0.1225 -0.3946 0.0053 0.1348 -0.1104  0.0915 0.1160 0.1159 0.1389 0.0981 0.1128 0.1537  -1.5538 -1.3567 -1.0569 -2.8402 0.0537 1.1943 -0.7182  0.1202 0.1749 0.2906 0.0045 0.9572 0.2323 0.4726  As shown in Table 40, a significant P value was found for the  pooled haplotypes. This confirmed the results when analyzing without confounding factors. However, this analysis was not able to narrow down the relationship to just one of the five haplotypes in the group.  (b) Complement Factor 3  (I)^Age of diagnosis  Table 41: Analysis of the correlation between haplotypes of selected SNPs in Complement factor 3 and the age of diagnosis, with no adjustment for confounding factors. The age of diagnosis was logarithmically transformed for normality  h1112 h1122 h1212 h2121 h2122 h2212 pooled  Estimate^of Regression Coefficient  Standard Error  Z-score  -0.3883 0.0310 -0.9269 -0.1406 0.0846 -0.1365 -0.1417  0.3514 0.3892 0.4435 0.4514 0.3576 0.3439 0.2889  -1.1051 0.0798 -2.0899 -0.3114 0.2366 -0.3969 -0.4905  0.2691 0.9364 0.0366 0.7555 0.8130 0.6915 0.6238  Four SNPs were included in the analysis of the Complement factor 3 gene. There were a possible 16 haplotypes and all were detected in the participants (Table A31 of the Appendix). However, only six haplotypes with high frequencies were examined individually while the  73  others were grouped together. As indicated in the table, only h1212 revealed a P value of less than 0.05, and therefore a significant relationship between this haplotype and age of diagnosis was indicated. The analysis was repeated taking the confounding factors into consideration.  Table 42: Analysis of the correlation between combinations of selected SNPs in Complement factor 3 and the age of diagnosis, with adjustment for confounding factors. The age of diagnosis was logarithmically transformed for normality  h1112 h1122 h1212 h2121 h2122 h2212 pooled SEXM genotypeFO genotype00  Estimate^of Regression Coefficient  Standard Error  Z-score  P  -0.3764 0.0112 -1.1090 -0.1822 0.0093 -0.1949 -0.1463 0.1086 0.4907 0.4984  0.3484 0.3849 0.4397 0.4379 0.3570 0.3399 0.2859 0.1812 0.2076 0.2899  -1.0804 0.0290 -2.5223 -0.4160 0.0261 -0.5735 -0.5117 0.5994 2.3639 1.7190  0.2800 0.9769 0.0117 0.6774 0.9792 0.5663 0.6088 0.5489 0.0181 0.0856  A similar result was obtained when examining the haplotypes correcting for the confounding factors, as shown in the above table. This indicated an association of h1212 with the age of diagnosis.  (II) FEV1 predicted value  In Table A32 in the Appendix, 16 haplotypes were detected when determining the presence of any correlation between the haplotypes in Complement Factor 3 and FEV1 predicted value. As shown in Table A33 in the Appendix, none of the six haplotypes and the pooled group demonstrated an association with the phenotype of FEV1 predicted when no adjustments were made for confounding factors. The conclusion was confirmed when analyzing with the confounding factors  Table A34 in the Appendix matched the one in the preceding section, which suggested that none of the haplotypes in Complement  74  factor 3 was related to the observed FEV1 predicted value after adjustments were made for confounding factors.  (III) FEV1 standard deviation value  16 haplotypes were found when determining the presence of any correlation between the haplotypes in Complement Factor 3 and FEV1 standard deviation, as indicated in Table A35 in the Appendix. In Table A36 in the Appendix, all of the SNPs were found to have a P value of greater than 0.05 when no adjustments were made for confounding factors. That is, none of the SNPs were related to the FEV1 standard deviation value observed in the patients.  From Table A37 in the Appendix, none of the haplotypes in Complement factor 3 showed a significant relationship with the FEV1 standard deviation value, even when the confounding factors were included.  (c) Toll-like Receptor 4  (I)^Age of diagnosis  Table 43: Analysis of the correlation between haplotypes of selected SNPs in Toll-like receptor 4 and the age of diagnosis, with no adjustment for confounding factors. The age of diagnosis was logarithmically transformed for normality  h1122112 h1122122 h1212222 h1221221 h1222222 pooled  Estimate^of Regression Coefficient  Standard Error  Z-score  P  -0.1485 0.1282 -0.3101 -0.0963 -0.3622 -0.6202  0.2048 0.1835 0.2334 0.2422 0.2673 0.2912  -0.7250 0.6990 -1.3284 -0.3977 -1.3554 -2.1296  0.4684 0.4846 0.1840 0.6908 0.1753 0.0332  From the seven SNPs included in the Toll-like receptor 4 gene, 128 possible haplotypes were theoretically possible. However, only 15 haplotypes were detected in the sample (Table A38 in the Appendix). Five out of the seven haplotypes were analyzed individually whereas the  75  rest were grouped. Only this pooled group of haplotypes showed a P value of less than 0.05, however, it is not possible to determine which in the pooled group was the driving force behind the observed result.  Table 44: Analysis of the correlation between haplotypes of selected SNPs in Toll-like receptor 4 and the age of diagnosis, with adjustment for the confounding factors. The age of diagnosis was logarithmically transformed for normality  h1122112 h1122122 h1212222 h1221221 h1222222 pooled SEXM genotypeFO genotypeOO  Estimate^of Regression Coefficient  Standard Error  Z-score  P  -0.1676 0.1098 -0.3163 -0.1469 -0.4295 -0.6439 0.0433 0.4723 0.4487  0.2049 0.1828 0.2325 0.2428 0.2667 0.2887 0.1817 0.2067 0.2894  -0.8178 0.6006 -1.3604 -0.6052 -1.6105 -2.2302 0.2381 2.2852 1.5505  0.4135 0.5481 0.1737 0.5451 0.1073 0.0257 0.8118 0.0223 0.1210  As indicated in the above table, the pooled group demonstrated a P value of 0.0257. That is, one or more of the haplotypes in the group might be related to the observed age of diagnosis in the patients.  (II) FEV1 predicted value  15 haplotypes were detected when determining the presence of any correlation between haplotypes in Toll-like receptor 4 and FEV1 predicted value, as shown in Table A39 in the Appendix. Among the five reported haplotypes and the pooled group shown in Table A40 in the Appendix, none of them revealed a P value of less than 0.05 and therefore a significant relationship with the FEV1 predicted value.  From Table A41 in the Appendix, the same conclusion could be deduced when compared with the results analyzed in the absence of confounding factors. None of the haplotypes were shown to be related to the FEV1 predicted value.  76  (III) FEV1 standard deviation value  Indicated in Table A42 in the Appendix, 15 haplotypes were included when determining the presence of any correlation between these haplotypes in Toll-like receptor 4 and FEV1 standard deviation value. When compared to the analysis of the phenotype of FEV1 predicted, the same conclusion was found since none of the haplotypes were related to the FEV1 standard deviation value, as shown in Table A43 in the Appendix.  Table A44 in the Appendix further strengthened the conclusion in the preceding section, which indicated that none of the haplotypes was found to have a significant association with the FEV1 standard deviation value after adjustments were made for confounding factors.  (d) Heme Oxygenase-1  (I)^Age of diagnosis  Indicated in Table A45 in the Appendix, 64 haplotypes were theoretically possible from the six SNPs selected in the heme oxygenase-1 gene. However, only 15 were observed in the sample. Four of the haplotypes with high frequency were considered separately while the rest were grouped together. None of them were found to be associated with the age of diagnosis when no adjustments were made for confounding factors (Table A46 of the Appendix).  In Table A47 in the Appendix, no haplotype was found to be related to the observed age of diagnosis in the sample after confounding factors were adjusted for.  77  (II) FEV1 predicted value  Table 45: Analysis of the correlation between haplotypes of selected SNPs in heme oxygenase-1 and FEV1 predicted value, with no adjustment for the confounding factors  h112221 h211221 h212221 pooled  Estimate^of Regression Coefficient  Standard Error  Z-score  P  -6.3390 -4.2824 -4.9749 2.3963  2.2781 3.3487 4.4609 3.1362  -2.7826 -1.2788 -1.1152 0.7641  0.0054 0.2010 0.2648 0.4448  14 haplotypes were detected when determining the presence of any correlation between haplotypes in Heme oxygenase-1 and FEV1 predicted value, as shown in Table A48 in the Appendix. The above table showed that h112221 had a P value of 0.0054, which indicated that this haplotype was associated with the FEV1 predicted value.  Table 46: Analysis for of the correlation between haplotypes of selected SNPs in heme oxygenase-1 and FEV1 predicted value, with adjustment for the confounding factors  h112221 h211221 h212221 pooled SEXM genotypeFO genotypeOO  Estimate^of Regression Coefficient  Standard Error  Z-score  P  -6.3269 -3.9819 -5.0514 2.4056 -3.1139 2.0136 0.2597  2.2775 3.3483 4.4708 3.1343 2.7183 3.1297 4.2367  -2.7780 -1.1893 -1.1298 0.7675 -1.1455 0.6434 0.0613  0.0055 0.2343 0.2585 0.4428 0.2520 0.5200 0.9511  When taking the confounding factors into account, the h112221 haplotype also demonstrated a P value of less than 0.05. Therefore, the association between the haplotype h112221 and the FEV1 predicted value was confirmed.  78  (III) FEV1 standard deviation value  Table 47: Analysis of the correlation between haplotypes of selected SNPs in heme oxygenase-1 and FEV1 standard deviation value, with no adjustment for the confounding factors  h112221 h211221 h212221 pooled  Estimate^of Regression Coefficient  Standard Error  Z-score  P  -0.2351 -0.2813 -0.0639 -0.0584  0.0815 0.1198 0.1611 0.1124  -2.8838 -2.3488 -0.3965 -0.5196  0.0039 0.0188 0.6917 0.6033  As shown in Table A49 in the Appendix, 14 haplotypes were included when determining the presence of any correlation between haplotypes in Heme oxygenase-1 and FEV1 standard deviation value. Two of the SNPs, h112221 and h211221, showed a P value of less than 0.05. Therefore, these two haplotypes had a significant association with the FEV1 standard deviation value.  Table 48: Analysis of the correlation between haplotypes of selected SNPs in heme oxygenase-1 and FEV1 standard deviation value, with adjustment for the confounding factors.  h112221 h211221 h212221 pooled SEXM genotypeFO genotypeOO  Estimate^of Regression Coefficient  Standard Error  Z-score  P  -0.2336 -0.2796 -0.0537 -0.0644 0.0203 0.0576 -0.0837  0.0816 0.1200 0.1614 0.1125 0.0973 0.1120 0.1516  -2.8635 -2.3300 -0.3325 -0.5720 0.2086 0.5143 -0.5520  0.0042 0.0198  0.7395 0.5673 0.8347 0.6070 0.5810  The above table revealed the same conclusion as in Table 47: both h112221 and h211221 were shown to have a significant relationship to the FEV1 standard deviation value.  79  15.^Haplotype Analysis by the FEAT Program  Although haplotype analysis was repeated utilizing another program, the FEAT program, different results might have resulted when compared with the results obtained from the RGui program. The FBAT analysis was a transmission disequilibrium test and it considered the inheritance pattern within a family. However, a stronger conclusion could be deduced if consistent results were obtained in these two parts of the project.  (a) Factor B  With the five SNPs selected in the Factor B gene, only ten of the possible 32 haplotypes were detected in the sample (Table A50 in the Appendix).  (I)  Age of diagnosis  As can be seen from Table A51 in the Appendix, none of the haplotypes in Factor B was found to be significantly associated with the age of diagnosis.  (II)  FEV1 predicted value  Table 49: Analysis of the correlation between haplotypes of selected SNPs in Factor B and FEV1 predicted value by the FBAT program  Haplotype  #^of family  S  E(S)  Var(S)  h1 h2 h3 h4 h5 h6 h7 h8 h9 h10  233 162 108 114 50 26 19 0 1 0  24199.545 7280.643 5376.241 5409.311 2192.113 1098.942 692.77 n/a n/a n/a  23181.74 8812.768 5192.011 5022.368 2266.254 1109.629 703.602 n/a n/a n/a  571383.658 360007.335 197300.266 203605.092 89422.653 44874.197 28218.472 n/a n/a n/a  Z  1.346 -2.554 0.415 0.858 -0.248 -0.050 -0.064 n/a n/a n/a  P value  0.178 0.011 0.678 0.391 0.804 0.960 0.949 n/a n/a n/a  80  Among the eight haplotypes, h2 (11111) was shown to have a P value of 0.011. This indicated that there was a significant relationship with the FEV1 predicted value.  (III) FEV1 standard deviation value  Table 50: Analysis of the correlation between haplotypes of selected SNPs in Factor B and FEV1 standard deviation value by the FBAT program  Haplotype  #^of family  S  E(S)  Var(S)  Z  P value  hl h2 h3 h4 h5 h6 h7 h8 h9 h10  233 162 107 113 50 26 19 0 1 0  157.872 23.583 28.777 27.559 5.966 0.443 4.849 n/a n/a n/a  131.288 39.301 31.268 28.826 11.568 3.711 3.14 n/a n/a n/a  86.494 59.438 39.812 27.411 11.986 8.222 5.123 n/a n/a n/a  2.858 -2.039 -0.395 -0.242 -1.618 -1.140 0.755 n/a n/a n/a  0.004 0.041 0.693 0.809 0.106 0.254 0.450 n/a n/a n/a  Two haplotypes, h11111 and h22121, were revealed to have a P value of less than 0.05, which showed that both were associated with the FEV1 standard deviation value.  (b) Complement Factor 3  Of the 16 possible haplotypes due to the four selected SNPs in the complement factor 3 gene, all were observed in the sample (Table A52 in the Appendix).  81  (I)^Age of diagnosis  Table 51: Analysis of the correlation between haplotypes of selected SNPs in Complement factor 3 and age of diagnosis by the FBAT program. Age of diagnosis was logarithmically transformed for normality  Haplotype  h1 h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 h16  #^of family  S  E(S)  Var(S)  Z  P value  164 139 129 121 100 98 105 76 77 71 72 69 65 37 30 34  -20.120 -13.746 -11.276 -16.908 -11.562 -21.861 -13.969 -4.641 -9.517 -9.672 -19.591 -3.127 -3.385 -6.114 -6.596 -2.501  -24.035 -18.993 -11.547 -15.696 -17.531 -14.178 -13.781 -5.385 -9.925 -8.116 -12.871 -4.449 -9.696 -1.435 -3.471 -3.474  32.335 31.238 20.088 22.35 19.37 18.548 15.621 13.185 14.561 12.435 15.543 12.926 8.873 11.447 4.687 4.957  0.689 0.939 0.06 -0.256 1.356 -1.784 -0.048 0.205 0.107 -0.441 -1.704 0.368 2.119 -1.383 -1.443 0.437  0.491 0.348 0.952 0.798 0.175 0.074 0.962 0.838 0.915 0.659 0.088 0.713 0.034 0.167 0.149 0.662  Among the 16 observed haplotypes, h13 (1111) was found to have a P value of 0.034, indicating a significant relationship with age of diagnosis.  (II) FEV1 predicted value  As indicated in Table A53 in the Appendix, none of the 16 haplotypes showed a P value which was less than 0.05 when analyzing with FEV1 predicted value.  82  (III) FEV1 standard deviation value  Table 52: Analysis of the correlation between haplotypes of selected SNPs in Complement factor 3 and FEV1 standard deviation value by the FBAT program  Haplotype  #^of family  hl h2 h3 h4 h5 h6 h7 h8 h9 h10 hll h12 h13 h14 h15 h16  146 132 124 107 92 89 97 63 69 66 63 61 61 33 30 29  S  54.271 46.542 35.346 24.055 32.645 12.940 24.051 27.133 15.999 14.908 14.001 21.521 21.365 5.338 6.126 4.254  E(S)  Var(S)  Z  P value  53.023 42.171 30.546 28.920 31.652 24.835 22.509 23.455 14.696 15.584 16.883 17.252 19.758 8.541 5.778 4.891  37.776 52.292 27.929 35.015 23.609 24.024 19.450 30.673 17.247 16.326 15.587 16.101 15.987 10.861 5.522 4.121  0.203 0.605 0.908 -0.822 0.204 -2.427 0.350 0.664 0.314 -0.167 -0.730 1.064 0.402 -0.972 0.148 -0.313  0.839 0.545 0.364 0.411 0.838 0.015 0.727 0.507 0.754 0.867 0.465 0.287 0.688 0.331 0.882 0.754  Haplotype #6 (1212) was shown to have a P value of 0.015, suggesting an association between the haplotype 1212 and the FEV1 standard deviation value.  (c) Toll-like Receptor 4  With the seven selected SNPs in the gene of Toll-like receptor 4, 17 out of 128 possible haplotypes were reported by the FBAT program (Table A54 in the Appendix).  (I)^Age of diagnosis  From Table A55 in the Appendix, none of the haplotypes revealed an association with the age of diagnosis.  83  (II) FEV1 predicted value  Table 53: Analysis of the correlation between haplotypes of selected SNPs in Toll-like receptor 4 and FEV1 predicted value by the FBAT program  Haplotype  #^of family  S  E(S)  hl h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 h16 h17 h18 h19 h20 h21 h22 h23  215 198 149 114 112 76 57 18 10 2 2 2 1 1 1 0 1 1 0 1 1 0 0  13935.028 13166.095 6730.848 5983.613 6168.453 4058.668 2226.345 406.307 172.570 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  14499.250 12221.084 7850.667 5480.313 5826.104 3499.135 2409.595 727.964 330.288 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  Var(S)  503036.881 434711.069 302636.978 181054.614 212860.982 122261.301 98033.212 33417.196 12434.211 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  Z  P value  -0.796 1.433 -2.036 1.183 0.742 1.600 -0.585 -1.760 -1.414 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  0.426 0.152 0.042 0.237 0.458 0.110 0.558 0.078 0.157 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  One of the haplotypes, h3 (1122112), was demonstrated to have a P value of less than 0.05, indicating that this haplotype was associated with the FEV1 predicted value observed in patients. (III) FEV1 standard deviation value  As indicated in Table A56 in the Appendix, none of the observed haplotypes in Toll-like receptor 4 revealed a significant relationship with the FEV1 standard deviation value.  84  (d) Heme Oxygenase-1  With the six selected SNPs in the heme oxygenase-1 gene, 64 haplotypes were possible. Only 20 of them were observed in the participants as indicated in Table A57 in the Appendix.  (I)  Age of diagnosis  Among the detected haplotypes seen in the sample, none of them had a P value of smaller than 0.05 as shown in Table A58 in the Appendix.  (II)  FEV1 predicted value  Table 54: Analysis of the correlation between haplotypes of selected SNPs in Heme oxygenase-1 and FEV1 predicted value by the FBAT program  Haplotype  #^of family  hl h2 h3 h4 h5 h6 h7 h8 h9 h10 hil h12 h13 h14 h15 h16 h17 h18 h19 h20 h21  234 200 114 69 44 47 15 8 8 7 3 2 2 0 1 1 1 0 0 1 0  S 17514.369 14639.825 4934.857 2971.849 2769.310 1833.378 613.016 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  E(S)  Var(S)  Z  P value  18099.875 14051.769 5071.520 2932.472 2171.419 2138.615 631.542 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  516761.732 445759.360 205314.256 114032.044 78544.271 73738.049 20264.602 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  -0.814 0.881 -0.302 0.117 2.133 -1.124 -0.130 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  0.415 0.378 0.763 0.907 0.033 0.261 0.896 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  One haplotype, h3 (221211), showed a P value of 0.033. Therefore, this haplotype was determined to have a significant relationship with the FEV1 predicted value.  85  (III) FEV1 standard deviation value  None of the observed haplotypes revealed a P value of less than 0.05 as shown in Table A59 in the Appendix, consequently, no haplotypes in the heme oxygenase-1 gene were found to be significantly associated with the FEV1 standard deviation value.  16.^Haplotype Analysis of the Age of First Pseudomonas aeruginosa Infection Utilizing Hapstat  In addition to examining single locus association between the 22 selected SNPs and the age of first Pseudomonas aeruginosa infection, the relationship between the haplotypes formed by the SNPs within each gene and this phenotypic trait was investigated. This test could not be done by either the RGui program or PBAT, due to the fact that the age of first Pseudomonas aeruginosa infection was not available if the organism had not colonized the patients during data collection period. Therefore, this analysis was performed utilizing the Hapstat program.  (a) Factor B  Among the five selected SNPs in the Factor B gene, 32 possible haplotypes were theoretically possible. However, only 3 haplotypes were analyzed utilizing a cutoff haplotype frequency of greater than 0.05. As indicated in Table A60 in the Appendix, none of the haplotypes showed a relationship to the age of first Pseudomonas aeruginosa infection.  (b) Complement Factor 3  Among the four selected SNPs in Complement factor 3 gene, 16 haplotypes were possible. However, only 8 haplotypes were analyzed utilizing a cutoff haplotype frequency of greater than 0.05. As indicated in Table A61 in the Appendix, none of the haplotypes showed a relationship to the age of first Pseudomonas aeruginosa infection.  86  (c) Toll-like Receptor 4  Among the seven selected SNPs in Toll-like receptor 4 gene, 128 haplotypes were possible. However, only 5 haplotypes were analyzed utilizing a cutoff haplotype frequency of greater than 0.05. As indicated in Table A62 in the Appendix, none of the haplotypes showed a relationship to the age of first Pseudomonas aeruginosa infection.  (d) Heme Oxygenase-1  Among the six selected SNPs in the Heme oxygenase-1 gene, 64 possible haplotypes were possible. However, only 4 haplotypes were analyzed utilizing a cutoff haplotype frequency of greater than 0.05. As indicated in Table A63 in the Appendix, none of the haplotypes showed a relationship to the age of first Pseudomonas aeruginosa infection.  87  Chapter 4 Discussion  This project is a sub-study of a large, Canada-wide and international endeavor: the Canadian Consortium for Cystic Fibrosis Modifiers (http://www.cfmod.ca ). It is well-established that Cystic Fibrosis is an autosomal recessive disease which is caused by mutations in the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) gene located on chromosome 7. There is a wide range of clinical symptoms among affected individuals, for example, deterioration of the lungs, pancreatic insufficiency, and liver disease. Heterogeneity in symptoms is not only seen in patients from different families, as siblings from the same lineage can have very different clinical presentations. As mentioned before, there are various CFTR mutations which can be classified into five categories, and these help to explain some of the differences in phenotypic characteristics of CF patients. However, as the previous literature illustrates there are other factors, both environmental and genetic, which can affect the course of the disease. The main aim of this project is to identify these secondary genetic factors or so called modifier genes, which are influencing the severity of the disease.  1. Analysis of the Genotypic Data for Mendelian Inconsistencies  There were 1605 individuals from 535 trios (535 patients and both of their parents) that participated in this sub-study. According to Gregor Mendel's theory of heredity, each of the parents should contribute one of their alleles to their child. Since the major purpose of this study was to investigate the relationship between polymorphisms of the candidate genes and the disease status, this inheritance pattern had to be tested in order to avoid the possibility of having Mendelian errors which might have led to misinterpretation of the results. Errors might be caused by contamination of samples during sample collection and/or genotyping, incorrect labeling of the identities of the individuals, non-paternity and random genotyping errors.  Mendelian inconsistency was detected in some of the SNPs for some of the families as indicated in the Results section, which indicated that one or more of the above suggested factors were present in the  88  study. Genotyping results of those families were compared with the analysis done by our colleagues in Toronto and the same conclusion was obtained. Since our colleagues in Toronto used genotyping protocols (Illumina and Luminex) different than the TagMan assays used in this study and the same outcome was concluded, this helped to minimize the possibility of errors from genotyping due to incorrect methodology. Out of the 23 families that had been excluded completely for this study, 22 of them were also found to have Mendelian inconsistencies by our colleagues in Toronto. For example, for pedigree #9 Toronto had found 16 and 26 errors by Illumina and Luminex respectively, while we detected 7 errors by TagMan assays. Inclusion of those families in the study would have reduced the accuracy of the results and led to weaker statistical power, therefore those families were deleted from the rest of the study.  2. Genotypes of all Participating Individuals  Genotyping of the selected SNPs in the four candidate modifier genes was done with the appropriate TagMan assays. The genotype of some individuals could not be detected and those were indicated as "undetermined" in the tables in the Results section. One of the possible explanations for the undetermined genotypes was inadequate concentration of DNA in the source plates for amplification by PCR. This reason seemed likely since the same individuals often had an "undetermined" genotype for several of the selected SNPs. Fortunately, this problem only occurred in a small portion of all individuals.  A total of 22 SNPs in the four candidate modifier genes were selected for testing. There were three genotype groups observed for each SNP and a coherent pattern was observed in both parents and patients for the same SNP (as illustrated in Table A3-A6), and this was a further indication of the reliability of the genotyping results.  3. Genotyping Analysis of the Parental Population  In addition to using the Family Based Association Test (FBAT) program to check for Mendelian inheritance, two additional analyses  89  were performed on the parental population as quality control procedures.  The first approach was to test for the existence of HardyWeinberg Equilibrium among the parental samples. A population is said to be in Hardy-Weinberg Equilibrium if an allele frequency remains constant through successive generations. This will be the case if there are no outside driving forces acting upon the population. There are generally five assumptions for the establishment of Hardy-Weinberg Equilibrium: (1) a large population; (2) no mutation at the locus of interest; (3) no migration; (4) random mating and (5) no natural selection at the locus of interest Comparison between the observed and the expected genotype frequencies was done by determining HardyWeinberg Equilibrium in the parental population. With the establishment of Hardy-Weinberg Equilibrium among parental individuals, we confirmed that the data were consistent with the above five assumptions and the study population was not subject to any major bias due to an inappropriate sampling method. In addition, the presence of HardyWeinberg Equilibrium suggested that the genotyping assays were accurate. Among all 22 SNPs, Hardy-Weinberg Equilibrium was found except for SNP 149 of the HMOX1 gene. As observed in both the parental and patient populations, the number of successful genotypings for SNP 149 was lower when compared with other SNPs. This suggested that the assay was not as robust as the others and might be one of the explanations for the failure of Hardy-Weinberg Equilibrium in the parental samples for this SNP. As a result, the observed genotyping results of SNP 149 should be viewed with caution. The departure from Hardy-Weinberg Equilibrium was due to an excess of heterozygotes and therefore was unlikely to be due to unidentified population stratification in the study population. This conclusion is derived from the Wahlund Effect, which was established in 1928 by the Swedish scientist, Sten Wahlund. This theory states that the number of heterozygotes is lower than expected when populations with different allele frequencies are mixed together for analysis.[43]  The second approach was to compare the allele frequency of each SNP in the parents with the values listed on the IIPGA and Seattle SNPs websites. Both websites show genotyping results of 23 Caucasian  90  individuals from the State of Utah. The allele frequency of each SNP on these websites was compared with the detected genotypes of the parental group in this study by the 2x2 Chi-square test, and this served as another tool to check for the reliability of the TaqMan assays for genotyping. When compared with the first method described above, a different conclusion was obtained: SNPs 1038 and 16442 of the HMOX1 gene showed a P-value of less than 0.05 indicating that there might be unidentified errors in the genotyping results of these two SNPs. However, the magnitude of the difference was small (<10% for both SNPs) and the discrepancy in allele frequency between the participating parents and the Caucasian samples listed by the Seattle SNPs website might be explained by two reasons. Firstly, only 23 Caucasian samples were included in these the Seattle SNPs website. This small sample size would reduce the accuracy of the calculated allele frequency using these data. It would be ideal if there were other research centers that had done the genotyping of these 2 SNPs so another comparison would be possible. However, only Seattle SNPs contains data for SNP HMOX_1038. For the other SNP HMOX1_16442, an additional 60 DNA samples from another European population were genotyped by Seattle SNPs and the allele frequency was found to be 0.92 and 0.08 for alleles A and T, respectively. These values are closer to the ones found for the samples in this study, which favored the accuracy of the genotyping data. Secondly, Caucasian is a general racial classification and people in this group could possess different genetic backgrounds. Therefore, there could be genuine allele frequency differences between our study group and the individuals genotyped for the Seattle SNPs project. For these reasons the results from both SNP 1038 and 16442 of the HMOX1 gene were considered acceptable.  4. ANOVA, Regression Analysis and FBAT  Two approaches were employed for testing the existence of any correlation between the 22 selected SNPs and the 3 phenotypic traits. Results from these two methods were then compared to reinforce the conclusions.  The first method was the one-way ANOVA. The dependent variables, i.e. the selected phenotypic characteristics, in an ANOVA must be  91  normally distributed before any statement can be deduced. A goodness of fit test was used to check for the normality of the data, and it was found that only the dataset of age of diagnosis was skewed. Therefore, age of diagnosis was transformed by taking the logarithm to the base 10 in order to achieve a normally distributed variable. ANOVA of 10 pairs of genotype-phenotype comparisons were found to have a P value of less than 0.05, which suggested the existence of an association. These 10 pairs were: BF_2557 with FEV1 predicted value and standard deviation, BF_7202 with age of diagnosis, FEV1 predicted value and standard deviation, TLR4_1859 with age of diagnosis, HMOX1_2790 with FEV1 predicted value and standard deviation; and lastly, HMOX1_9531 with FEV1 predicted value and standard deviation. However, as FEV1 predicted value and standard deviation were highly correlated (r 2 =0.64) associations with both variables were to be expected and could not be considered as independent observations. Further investigation was done to confirm these significant associations.  Regression analysis was performed for those 10 SNP-phenotype pairs, with inclusion of age, sex and CFTR genotype as the confounding factors. Those factors were chosen since they might contribute to the observed phenotypes in addition to the tested SNP. For example, it is known that FEV1 percent predicted values decline with age in CF patients. Therefore, it was essential to take into consideration those confounding factors before a conclusion could be finalized. Sex had no confounding effect on the observed phenotypes as indicated in the tables in the Results section. CFTR mutation contributed to part of the association for BF_7202 when tested with age of diagnosis and for TLR4_1859 when tested with age of diagnosis. CFTR genotype was related to the age of diagnosis since it was observed that individuals were diagnosed at an older age when they had a milder form of CFTR mutation. Of course, this is one of the reasons that a CFTR mutation is classified as mild. Age had a significant effect on FEV1 percent predicted value, but not on the FEV1 standard deviation values. This could be explained by the fact that FEV1 is a measurement of lung function that is derived from a comparison to a random sample of the healthy population. Therefore, as CF progresses this is reflected by a decrease in the FEV1 percent predicted. However, such decrease would not be detected in the FEV1 standard deviation values because the FEV1  92  standard deviation value collected for this study was recorded as the difference from the average of CF patients in same age group. Among the 10 pairs of genotype-phenotype associations, only (1) BF_2557 and FEV1 standard deviation value, (2) BF_7202 and FEV1 standard deviation value, (3) HMOX1_2790 and FEV1 standard deviation value and (4) HMOX1_9531 and FEV1 standard deviation value demonstrated a significant association after adjustment for the confounding factors. Such multiple tests may generate significant results by chance. This is mitigated to some degree by the fact that the outcome variables were related instead of independent in the multiple tests. Nevertheless, the results of this study should be considered as hypothesis generating and need to be confirmed by repeating the experiment with a larger group of samples.  FBAT was another approach used in this study for determining the existence of a correlation between selected SNPs and phenotypes. Two models (additive and dominant) were included in the FBAT analyses. Under the additive model, 7 pairs of genotype-phenotype associations were found to be significant. They were: (1) BF_2557 with FEV1 predicted value, (2) BF_4022 with FEV1 standard deviation value, (3) BF_7202 with FEV1 standard deviation value, (4) C3_28795 with age of diagnosis, (5) C3_43118 with age of diagnosis, (6) TLR4_15884 with FEV1 predicted value and (7) HMOX1_1038 with FEV1 predicted value. On the other hand, there were 12 pairs of genotype-phenotype associations determined to be significant under the dominant model. They were (1) BF_7202 with age of diagnosis and FEV1 standard deviation, (2) BF_2557 with FEV1 predicted value and FEV1 standard deviation, (3) BF_4022 with FEV1 predicted value and standard deviation, (4) BF_6484 with FEV1 standard deviation, (5) C3_28795 with age of diagnosis and FEV1 predicted value, (6) TLR4_15884 with FEV1 predicted value, (7) HMOX1_1038 with FEV1 predicted value and (8) HMOX1_9531 with FEV1 standard deviation. Thus, there was a slight difference in the conclusions between the additive and dominant models. This was likely due to the different underlying assumptions of each model. The dominant model assumes that one of the alleles is dominant over the other. In other words, it was assumed that phenotypic trait of the patients with either the dominant homozygous genotype or the heterozygous genotype was significantly different from those with recessive homozygous genotype. Under the additive model it was assumed that the phenotypic  93  trait of heterozygous patients was intermediate between the two groups of homozygous patients. Both models were fitted to the data as there was no a priori information regarding the mode of inheritance for any of the selected polymorphisms.  FBAT analysis involved families with parents who had a heterozygous genotype since this test analyzed the inheritance pattern from parents to the affected individual. Although the theory, calculation and criteria for both ANOVA and FBAT were different, it was rational to expect that the results from FBAT should overlap with those obtained from ANOVA and lead to a consistent conclusion. However, only a few matches were observed when the results of the two methods were compared. For some SNPs, a significant association was detected by ANOVA but not by FBAT. There are several possible reasons that could explain this inconsistency. Firstly, a false positive might have resulted due to the recruitment of patients of different ethnicity. Although all patients were self-reported as Caucasians, they might have had subtly different genetic backgrounds. In another words, there might have been unrecognized subgroups in the study, for example, patients with ancestry in England might be genetically dissimilar to those from Quebec. Different ethnic subgroups could show different allele frequencies and therefore this population stratification might lead to a false positive conclusion in the ANOVA. It might be inappropriate to compare results from ANOVA with results from FEAT since they were different with regard to the groups of individuals being analyzed. ANOVA analyzed all individuals recruited in this study while FBAT excluded families with homozygous parents. Lastly, the significant relationship seen in ANOVA may have resulted by chance.  On the other hand, significant associations were detected by FBAT for some other SNPs, which were not significant in the ANOVA. FEAT was used to analyze the inheritance pattern of alleles of patients from their parents. There should not be a problem of false positives when considering a sample population with mixed ethnicity, because the genetic background does not affect the inheritance pattern. In addition, the mean values were compared in the ANOVA in order to determine the presence of any association between the SNPs and phenotypic characteristics. In this type of analysis, it is optimal if  94  the number of individuals in each group is identical. However, this was not the case for most of the SNPs. Therefore, the ANOVA might have been underpowered to detect an association. Furthermore, some important information might have been missed by only looking at the mean values. For example, two groups might have the same mean but one group might have had the patients clustered around the mean values and the other group might have included some patients who had extreme values.  5. Age of Onset Analysis  Pseudomonas aeruginosa is one of the most frequent pathogens found in lungs of CF patients. It is common to see that some patients get colonized in their early childhood while others are not infected with this bacterium until adulthood. However, testing for association between the selected SNPs and age of patients when they first get colonized with Pseudomonas aeruginosa requires a different type of analysis than those described above. Some of the recruited individuals had never been infected with Pseudomonas aeruginosa. ANOVA was inappropriate to use for this type of censored data and therefore this analysis was performed by age of onset analysis or so called survival analysis. In general, this test includes two phases: (1) The KaplanMeir technique is used to analyze the time to an endpoint (in this case first infection with Pseudomonas aeruginosa); and 2) The Log Rank test is then used to compare two or more Kaplan-Meier "curves" (in this case the curves for each genotype). By examining all the 22 SNPs by this survival analysis, only TLR4_9263 was revealed to be significantly associated with the age of first Pseudomonas aeruginosa infection. In this survival analysis the data were censored as some individuals had not experienced their first Pseudomonas aeruginosa infection at the time of the data collection. Therefore, the current age was entered for patients recorded as no Pseudomonas aeruginosa infection so that they would contribute to the analysis. However, inclusion of this group of patients might cause misinterpretation of the data since they might have had one or two colonizations many years ago and the documentation could not be traced. Additionally, this group of patients might get infected in the future so it would be inappropriate to group them as no Pseudomonas aeruginosa infection and make a conclusion that their  95  genotype was not linked to the Pseudomonas aeruginosa infection at the time of recruitment.  6. Pseudomonas aeruginosa Infection Status  As described in the Introduction section, colonization with Pseudomonas aeruginosa occurs in about 80% of CF patients and infection by this pathogen greatly deteriorates the health status of the patient. In this study, all the participating patients were classified into one of the four groups according to their exposure to  Pseudomonas  aeruginosa. A chi-square test was performed in order to determine if one or more SNPs contributed to susceptibility to  Pseudomonas  aeruginosa infection in CF patients. Among the 22 selected SNPs of the four candidate modifier genes, only C3_963 illustrated a borderline Pvalue which suggested that it might play a role in the patients' ability to eradicate Pseudomonas aeruginosa infection. Thus, no informative conclusions could be drawn in this section. Usually, microbial testing of patients' sputum is one of the routine tests in CF clinics and there should be minimal error in the testing protocol for the determination of Pseudomonas aeruginosa infection status for CF patients. However, errors might have arisen from the procedures in data collection. It was time-consuming and complicated to look for relevant data if the patients had only one or a few Pseudomonas aeruginosa infections in the past. This is especially true for patients who had been transferred from different clinics and those who were seriously ill and had voluminous clinic/hospital charts. For example, it would be easy to misclassify an adult patient as group 0 with no Pseudomonas aeruginosa infection if no related information could be found in his/her charts. However, the patient might have been colonized once many years ago in childhood and there was no way to trace back to the previous records. In addition, the number of patients in one of the four categories was small for most SNPs and such a small sample size might reduce the power of the analysis.  96  7. Haplotype Analysis Utilizing the RGui and FBAT programs  The selected SNPs in each candidate modifier genes were grouped together as a haplotype in the Haplotype analysis. There were two main reasons for performing this investigation: (1) to test for any interaction between SNPs which might result in the observed phenotype even though no association could be found when considering SNPs separately, and (2) to see if there was another unidentified SNP in the vicinity of the selected ones which might have contributed to the phenotypic trait and was in linkage disequilibrium with the tested SNPs. Haplotype analysis was done using two programs: RGui and FBAT, and the results were then compared for consistency in order to strengthen the conclusion of an association between the haplotype and the phenotypic trait. The major difference between these two programs in the haplotype analysis was similar to that between ANOVA and FBAT in the single locus analysis: the RGui program included all participants in the study while the FBAT program looked for the inheritance pattern and only considered those trios with heterozygous parents.  (a) Haplotype analysis utilizing the RGui program  In the haplotype analysis done utilizing the RGui program, only two haplotypes in Factor B, hlllll and h22112, were found to be significantly associated with the age of diagnosis. By visual inspection of the two haplotypes, it was apparent that they shared the same alleles at SNPs 6484 and 7202. By examining the results table it could be observed that only three haplotypes were analyzed individually while the rest (with lower frequencies) were grouped together as a pool. The third individual haplotype (with a P-value of greater than 0.05) did not have the same alleles at SNP 6484 and 7202 as the two significant haplotypes. It was possible that SNPs 6484 and 7202 were interacting and affecting the age of diagnosis. However, this result might be a false positive as two additional haplotypes in the pooled group also had the same alleles at SNP 6484 and 7202 but an association with age of diagnosis was not apparent for these two haplotypes. In addition, it was more likely that SNP 7202 was the cause for the significant relationship since it was found to be significant by both ANOVA and FEAT when the SNPs were considered independently.  97  An association was only found for one of the 16 haplotypes of the Complement factor 3 gene, i.e., h1212. Comparison of this haplotype with the others, suggested that the first two SNPs, SNP 963 and 28795, might be interacting and thus responsible for the significant result. C3_28795 was observed to be significant in the single locus FBAT analysis but none of the SNPs in Complement factor 3 were significant in the ANOVA. Thus, it may be that C3_28795 was the true driving force in the detected haplotype result.  Even though TLR4_1859 and TLR4_15884 were found to be significantly associated with age of diagnosis and FEV1 predicted value respectively, haplotypes consisting of these two SNPs showed no significant association in the analysis. Only the pooled group was shown to have a P value of less than 0.05 and thus no specific haplotype could be concluded to be responsible.  In the last candidate gene Heme oxygenase-1, one haplotype (h112221) and two haplotypes (h112221 and h211221) were discovered to be associated with the FEV1 predicted value and the FEV1 standard deviation, respectively. Therefore, the combination of the last three SNPs of Heme oxygenase-1, SNP 3303, 9531 and 16442, might have been responsible for the observed associations. However, it was interesting to note that another haplotype with the same alleles at those three positions did not have a P-value of less than 0.05. Therefore, it might be that another unobserved SNP in the HMOX1 gene found on the h112221 and h211221 backgrounds was responsible for the observed association.  (b) Haplotype analysis utilizing the FBAT program  More haplotypes were found to be significantly associated with the phenotypes by FBAT analysis. In Factor B, haplotype h11111 was found to be associated with both FEV1 predicted value and standard deviation, while another haplotype h22121 was observed to be strongly associated with the FEV1 standard deviation value. When comparing all the haplotypes seen in Factor B, no consistent pattern was observed in the haplotypes which had a P value of less than 0.05. In the single locus analysis of the SNPs in factor B by FEAT, almost all of them were  98  found to be related to FEV1 predicted value, FEV1 standard deviation or both. This might be a reason for the lack of a consistent pattern of Factor B haplotypes associated with these phenotypic traits. It is possible that several Factor B SNPs independently influence the level of FEV1.  In Complement factor 3, haplotypes h1111 and h1212 were discovered to be significantly associated with age of diagnosis and FEV1 standard deviation value, respectively. When comparing the results of individual SNPs and haplotypes, it was difficult to discern a consistent pattern. Sixteen haplotypes were detected and each had a sample size sufficient for individual haplotype analysis. Such a large number of haplotypes made it difficult to search for any interaction among SNPs and the relationship to the phenotypic traits.  Twenty-three haplotypes were seen in Toll-like receptor 4 in the patients and only 9 of them were used for the analysis since the frequency of the rest was too low. One of the haplotypes, h1122112, was shown to have a P value of less than 0.05 for its correlation with FEV1 predicted value. The only difference in this haplotype compared with the remaining 8 lay in the sixth SNP, which was SNP 15884. This conclusion was consistent with the single SNP FBAT analysis, and suggested that there was no interaction of SNPs or unidentified SNPs contributing to the phenotypic trait.  A similar observation was made in the heme oxygenase-1 gene analysis. Among the 21 haplotypes found in this candidate gene, only 7 of them were frequent enough for analysis. One haplotype, h221211, was determined to be associated with the FEV1 % predicted value. The only noticeable difference between this haplotype and the rest was in the second SNP, SNP 1038. This SNP was significantly associated with the FEV1 predicted value in the single locus FEAT analysis and thus was likely to be the driving factor for the haplotype association seen here.  There were some differences between the results obtained by the two programs. Firstly, the possible haplotypes determined in the patients were slightly different by the two programs due to the  99  selection procedure of participants eligible for analysis. Even if the same haplotype was found in both programs, the haplotype frequency was not the same due to the difference in the number of patients in each analysis. Secondly, the RGui program grouped the haplotypes with low frequency together while the FEAT program ignored the haplotypes with less than 10 families. Obviously, some information would be lost in both cases. On the other hand, a similar pattern was observed in the results obtained by both types of analysis. For most of the haplotypes that were found to have a significant association with an outcome variable, it appeared that the driving factor behind the detected association was due to a single SNP, not the interaction of SNPs. This was because the haplotype results were consistent with the single SNP analyses by both ANOVA and FBAT.  8. Haplotype Analysis of the Age of First Pseudomonas aeruginosa Infection utilizing Hapstat  Haplotype analysis was also performed for investigation of the age of first Pseudomonas aeruginosa infection, and this was done using the Hapstat program. The same concerns applied to this analysis, with respect to the difficultly of recording the age of first Pseudomonas aeruginosa infection accurately, as for the single SNP analysis.  Although TLR4_9263 was concluded to be associated with the age of first Pseudomonas aeruginosa infection, none of the haplotypes in the four candidate genes were found to have a significant relationship with this outcome. As illustrated in the Results section, the number of possible haplotypes analyzed by the program was less than the haplotype analysis done by both RGui and FEAT, due to the fact that the program was set to ignore the haplotypes with a frequency of less than 0.05. Inevitably this decreased the number of participants and therefore the power of this part of the study.  100  9. Position of SNPs with Significant Association and their Effect on Gene Function  Among the 22 selected SNPs, 12 of them were found to be significantly linked to one of the tested phenotypic traits and the results are summarized in Tables 55-57.  101  Table 55: Summary of all significant SNP-phenotype associations by ANOVA and FBAT analyses  Analytical Test ANOVA  Gene  Polymorphism  Phenotypic trait  Factor B  BF2557 _  FEV1 predicted value FEV1 standard deviation Age of Diagnosis FEV1 predicted value FEV1 standard deviation FEV1 predicted value FEV1 standard deviation FEV1 predicted value FEV1 standard deviation FEV1 predicted value FEV1 standard deviation FEV1 predicted value FEV1 standard deviation FEV1 standard deviation Age of Diagnosis FEV1 standard deviation Age of Diagnosis FEV1 predicted value Age of Diagnosis FEV1 predicted value FEV1 predicted value FEV1 standard deviation  BF_7202  Heme oxygenase-1  HM0X1_2790 HMOX1_9531  FBAT  Factor B  BF_2557 BF_4022 BF_6484 BF_7202  Complement Factor 3  C3_28795  Toll-like Receptor 4 Heme oxygenase-1  C3_43118 TLR4_15844 HMOX1_1038 HMOX1_9531  Position in the gene  Exon value Intron value Intron value Intron value Exon value Intron value value  Intron Intron  value Intron Intron Exon Intron value  Table 56: Summary of all significant haplotype-phenotype associations when testing by RGui and FEAT analyses  Analytical Test RGui  Gene  Haplotype  Phenotypic trait  Factor B  hlllll h22112 h1212  Age of Diagnosis Age of Diagnosis Age of Diagnosis  h221211 h112221 h112221 h211221 h11111 hlllll h22121 hllll h1112 h1212 h1122112  Age of Diagnosis FEV1 predicted value FEV1 standard deviation FEV1 standard deviation FEV1 predicted value FEV1 standard deviation FEV1 standard deviation Age of Diagnosis FEV1 predicted value FEV1 standard deviation FEV1 predicted value  h221211  FEV1 predicted value  Complement Factor 3 Heme oxygenase-1  HBAT  Factor B  Complement Factor 3 Toll-like Receptor 4 Heme oxygenase-1  value value value value  value  Table 57: Summary table of SNPs which revealed a significant association with the Pseudomonas aeruginosa infection status and the age of first Pseudomonas aeruginosa infection  Analytical Test  Gene  Polymorphism / Haplotype  Phenotypic trait  Position in the gene  Survival analysis  Complement Factor 3  C3_963  Age^of^first  Intron  Toll-like Receptor 4  TLR4_9263  Complement Factor 3  C3_963  Chi^squared test  Pseudomonas aeruginosa infection Age^of^first Pseudomonas aeruginosa infection Pseudomonas aeruginosa infection status  Intron  Intron  As indicated in the above table, those 13 SNPs which demonstrated a significant relationship with one of the phenotypic traits could be divided into two main groups according to their position in the gene: intron or exon. Most of the SNPs are situated in the intronic regions. Although introns are spliced out and therefore not translated into amino acids, polymorphisms in this region could also affect the protein products. It is common to find the nucleotides C and T at the beginning of an intron, whereas nucleotides A and G are found at the end of it. [44] However, there are other sequences that control the exact position of where to cut and the process of splicing. [45] This signal is included in the intron, therefore splicing error might occur if there was a mutation in the intron and this might influence the function of the gene. In addition, two of the SNPs, C3_963 and HMOX1_1038 are in a promoter region, which might affect the initiation step of transcription. On the other hand, two SNPs are in an exonic region, i.e., BF_2557 and TLR4_15844. For example, BF_2557 is located in exon 3 of the Factor B gene, however, it changes the third base of the codon from G to A without altering the amino acid produced. The position of this SNP is near the end of exon 3 which may be an exonic splicing enhancer. That is, it is in the region of the exon which promotes the cutting of the adjacent intron and therefore might affect the structure and function of the protein products being formed. This is similar to the C3_43118 SNP, which is a tagSNP of C3_44692 in this study since  104  C3_44692 is located in the beginning region of exon 41 of the Complement factor 3 gene.  The large number of statistical tests performed in this study posed a potential problem in the analysis. A P value of less than 0.05 was reported to be significant in this project. However, such an observation might have been due to chance rather than reflecting a true association. In order to have equal statistical power as the case when only one hypothesis is being investigated, the Bonferroni correction procedure can be used where the statistical significance level is multiplied by the number of tests done when independent hypotheses are being examined for the same dataset. [46] Therefore, Bonferroni correction might be a solution for solving such potential risk, however, it was not used in this project as the correction factor is overly conservative since the phenotypic outcomes were correlated with each other and  some of the SNPs showed low levels of linkage  disequilibrium i.e. the hypotheses being tested were not independent. [47]  One way to assess the potential functional consequences of a SNP is to determine the degree of conservation of the surrounding sequence between species. Among the 22 SNPs in the four candidate genes, a few of them were conserved among 17 vertebrate species [48] and these are summarized in table 58:  Table 58: Summary table of the conservation score of the selected SNPs in the four candidate genes  Gene Factor 8  Heme oxygenase-1  SNP  Conservation Score  2557 6484 11912 17050 3303  0.961 0.054 0.011 0.120 0.269  A conservation score from 0 (not conserved) to 1 (highly conserved) was assigned when comparing the sequence within the 17 vertebrate species. This conservation score can be interpreted as the probability that any given base is in a conserved element. The SNPs in the above table were found to be conserved among that 17 vertebrate  105  species, which meant that they remained the same through evolution. In another words, they might have an important function in the gene. It is important if the above listed SNPs were found to be significant with one or more of the tested phenotypes, since it further strengthened the conclusion that those SNPs affected the gene function. For example, BF_2557 was shown to be highly conserved among those 17 vertebrate species and therefore, it is likely to be functionally important. This conclusion is interesting because BF_2557 was one of the SNPs which was shown to have an association with the tested phenotypes.  10.  Summary  In summary, 23 families were deleted from the study due to the presence of Mendelian errors in them. With the remaining 1605 individuals or 535 trios, only 13 of the selected 22 SNPs were found to have a significant association by either ANOVA or FEAT with one or more of the four selected phenotypic traits (age of diagnosis, FEV1 predicted and standard deviation value, age of first Pseudomonas aeruginosa infection). Also only one of the SNPs was determined to have a borderline P value when testing for the Pseudomonas aeruginosa infection status among the participants. In the haplotype analyses done by RGui or FEAT, some haplotypes were found to have a P value of less than 0.05. Most of these haplotypes contained a SNP which was found to have a significant relationship with the phenotype by either ANOVA or FEAT, and such inclusion of this SNP would probably be the driving force behind the observed result.  11.  Future studies  This is a sub-study of a large, Canada-wide and international research project on CF modifier genes. There were a total of 1605 patients from 535 trios that participated in this part of the study. The recruitment of patients was ongoing at the time when this sub-study started. Therefore, the trends that we observed can be investigated in additional trio members who were consented later in the project and families with more than one affected child. This is particularly critical for some of the selected SNPs where borderline P values were obtained. If the same result is found in a larger dataset then such a  106  consistent pattern would be more convincing. Such a replication study is ongoing.  Replication of these results could also be sought in the two large US CF modifier gene consortia. The other two cohorts of patients are from the University of North Carolina/Case Western Reserve University and Johns Hopkins University.  For those SNPs that were found to be significantly associated with one or more of the tested phenotypes, it is possible that they affect the gene function and therefore lead to different severity of the disease. This hypothesis could be confirmed by performing further investigation. For example, if the SNP is in the promoter region of the DNA molecule, it is in a region that serves as a control point for regulating transcription. Therefore the amount of mRNA and protein produced would be affected, which could be measured by Real-time PCR and ELISA, respectively. If the SNP is in the intronic region, it may affect the splicing procedure and this could be tested by either sequencing or measuring the length of the transcript by amplifying the cDNA sequence.  Although the frequency of Burkholderia cepacia infection of CF patients is lower than that of Pseudomonas aeruginosa, it is even more difficult to eradicate the pathogen once infection is established. An experiment could be done utilizing the SNPs shown to be associated with disease status severity in this project in order to determine the relationship between these SNPs and Burkholderia cepacia infection. For example, the genes that are responsible for the attachment of the pathogen in lungs (TLR4 would be a possible choice since Burkholderia cepacia is a Gram-negative bacteria and its LPS also attaches to TLR4 as is the case for Pseudomonas aeruginosa) and other genes in the immune system for clearing the micro-organism.  Finally, it is possible to perform a genome-wide association analysis with 300,000 - 500,000 SNPs being analyzed in a single experiment for each study participant. This can be done by recently developed high-throughput genotyping technologies such as the Illumina or Affymetrix systems. With appropriate adjustments for multiple  107  comparisons and the use of several populations to provide replication this approach provides a systematic assessment of the entire human genome.  108  References  1.^Dupuis, A., et al., Cystic fibrosis birth rates in Canada: a decreasing trend since the onset of genetic testing. J Pediatr, 2005. 147(3): p. 312-5. 2^Riordan, J.R., et al., Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science, 1989. 245(4922): p. 1066-73. 3.  Turcios,^N.L.,^Cystic^fibrosis:^an^overview.^J^Clin Gastroenterol, 2005. 39(4): p. 307-17.  4.  Haston, C.K. and T.J. Hudson, Finding genetic modifiers of cystic fibrosis. N Engl J Med, 2005. 353(14): p. 1509-11.  5.  Merlo, C.A. and M.P. Boyle, Modifier genes in cystic fibrosis lung disease. J Lab Clin Med, 2003. 141(4): p. 237-41.  6.  Zielenski, J., Genotype and phenotype in cystic fibrosis. Respiration, 2000. 67(2): p. 117-33.  7.  Rowntree, R.K. and A. Harris, The phenotypic consequences of CFTR mutations. Ann Hum Genet, 2003. 67(Pt 5): p. 471-85.  8.  McKone, E.F., et al., Effect of genotype on phenotype and mortality in cystic fibrosis: a retrospective cohort study. Lancet, 2003. 361(9370): p. 1671-6.  9.  Harris, A., Cystic fibrosis gene. Br Med Bull, 1992. 48(4): p. 738-53.  10.  Moraes, T.J., et al., Abnormalities in the Pulmonary Innate Immune System in Cystic Fibrosis. Am J Respir Cell Mol Biol, 2005.  11.  Morales, M.M., M.A. Capella, and A.G. Lopes, Structure and function of the cystic fibrosis transmembrane conductance regulator. Braz J Med Biol Res, 1999. 32(8): p. 1021-8.  12.  Davis, P.B., M. Drumm, and M.W. Konstan, Cystic fibrosis. Am J Respir Crit Care Med, 1996. 154(5): p. 1229-56.  13.  Fanen, P., et al., Structure-function analysis of a double-mutant cystic fibrosis transmembrane conductance regulator protein occurring in disorders related to cystic fibrosis. FEBS Lett, 1999. 452(3): p. 371-4.  14.  Stern, R.C., The diagnosis of cystic fibrosis. N Engl J Med, 1997. 336(7): p. 487-91.  109  15.  Kerem, B.S., et al., DNA marker haplotype association with  pancreatic sufficiency in cystic fibrosis. Am J Hum Genet, 1989. 44(6): p. 827 34. -  16.  Davis, P.B., Pathophysiology of cystic fibrosis with emphasis on  salivary gland involvement. J Dent Res, 1987. 66 Spec No: p. 66771. 17.  Rubinstein, S., R. Moss, and N. Lewiston, Constipation and  meconium ileus equivalent in patients with cystic fibrosis. Pediatrics, 1986. 78(3): p. 473-9. 18.  Craig, J.M., H. Haddad, and H. Shwachman, The pathological  changes in the liver in cystic fibrosis of the pancreas. AMA J Dis Child, 1957. 93(4): p. 357 69. -  19.  Cox, K.L., et al., Orthotopic liver transplantation in patients  with cystic fibrosis. Pediatrics, 1987. 80(4): p. 571 4. -  20.  Taussig, L.M., et al., Fertility in males with cystic fibrosis. N Engl J Med, 1972. 287(12): p. 586-9.  21.  Currie, A.J., D.P. Speert, and D.J. Davidson, Pseudomonas  aeruginosa: role in the pathogenesis of the CF lung lesion. Semin Respir Crit Care Med, 2003. 24(6): p. 671 80. -  22.  Hoiby, N., Understanding bacterial biofilms in patients with  cystic fibrosis: current and innovative approaches to potential therapies. J Cyst Fibros, 2002. 1(4): p. 249 54. -  23.  Kharazmi, A., Mechanisms involved in the evasion of the host  defence by Pseudomonas aeruginosa. Immunol Lett, 1991. 30(2): p. 201-5. 24.  Doring, G., A. Albus, and N. Hoiby, Immunologic aspects of cystic  fibrosis. Chest, 1988. 94(2 Suppl): p. 109S-115S. 25.  Friedl, P., B. Konig, and W. Konig, Effects of mucoid and non-  mucoid Pseudomonas aeruginosa isolates from cystic fibrosis patients on inflammatory mediator release from human  polymorphonuclear granulocytes and rat mast cells. Immunology, 1992. 76(1): p. 86-94. 26.  Sedlak-Weinstein, E., et al., Pseudomonas aeruginosa:^the  potential to immunise, against infection. Expert Opin Biol Ther, 2005. 5(7): p. 967-82. 27.^Mueller-Ortiz,^S.L.,^S.M.^Drouin,^and^R.A.^Wetsel,^The  alternative activation pathway and complement component C3 are critical for a protective immune response against Pseudomonas  110  aeruginosa in a murine model of pneumonia. Infect Immun, 2004. 72(5): p. 2899-906. 28.  Kronborg, G., et al., Antibody responses to lipid A, core, and 0  sugars of the Pseudomonas aeruginosa lipopolysaccharide in chronically infected cystic fibrosis patients. J Clin Microbiol, 1992. 30(7): p. 1848 55. -  29.  Ernst, R.K., et al., Specific lipopolysaccharide found in cystic  fibrosis airway Pseudomonas aeruginosa. Science, 1999. 286(5444): p. 1561 5. -  30.  Ernst, R.K., et al., Pseudomonas aeruginosa lipid A diversity and  its recognition by Toll-like receptor 4. J Endotoxin Res, 2003. 9(6): p. 395 400. -  31.  Wilks, A., Heme oxygenase: evolution, structure, and mechanism. Antioxid Redox Signal, 2002. 4(4): p. 603  32.  -  14.  Shibahara,^S.,^The heme oxygenase dilemma in cellular  homeostasis: new insights for the feedback regulation of heme catabolism. Tohoku J Exp Med, 2003. 200(4): p. 167 86. -  33.  Bach, F.H., Heme oxygenase 1 as a protective gene. Wien Klin -  Wochenschr, 2002. 114 Suppl 4: p. 1-3. 34.  Slebos, D.J., S.W. Ryter, and A.M. Choi, Heme oxygenase 1 and -  carbon monoxide in pulmonary medicine. Respir Res, 2003. 4(1): p. 7. 35.  Sabra, W., E.J. Kim, and A.P. Zeng, Physiological responses of  Pseudomonas aeruginosa PA01 to oxidative stress in controlled microaerobic and aerobic cultures. Microbiology, 2002. 148(Pt 10): p. 3195-202. 36.  Zhou, H., et al., Heme oxygenase-1 expression in human lungs with  cystic fibrosis and cytoprotective effects against Pseudomonas aeruginosa in vitro. Am J Respir Crit Care Med, 2004. 170(6): p. 633-40. 37.  Carlson, C.S., et al., Selecting a maximally informative set of  single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet, 2004. 74(1): p. 106 20. -  38.^Knudson, R.J., et al., Changes in the normal maximal expiratory  flow-volume curve with growth and aging. Am Rev Respir Dis, 1983. 127(6): p. 725-34.  111  39.  Corey, M., H. Levison, and D. Crozier, Five  -  to seven year course -  of pulmonary function in cystic fibrosis. Am Rev Respir Dis, 1976. 114(6): p. 1085-92. 40.  Horvath, S., X. Xu, and N.M. Laird, The family based association  test method: strategies for studying general genotype--phenotype associations. Eur J Hum Genet, 2001. 9(4): p. 301 6. -  41.  Zeng, D., et al., Efficient semiparametric estimation of  haplotype-disease associations in case-cohort and nested casecontrol studies. Biostatistics, 2006. 7(3): p. 486-502. 42.  Lake, S.L., D. Blacker, and N.M. Laird, Family based tests of —  association in the presence of linkage. Am J Hum Genet, 2000. 67(6): p. 1515 25. -  43.  Law, B., et al., Effects of population structure and admixture on exact tests for association between Loci. Genetics, 2003. 164(1): p. 381-7.  44.  Simpson, A.G., E.K. MacQuarrie, and A.J. Roger, Eukaryotic  evolution: early origin of canonical introns. Nature, 2002. 419(6904): p. 270. 45.  Havlioglu, N., et al., An intronic signal for alternative  splicing in the human genome. PLoS ONE, 2007. 2(11): p. e1246. 46.  Perneger, T.V., What's wrong with Bonferroni adjustments. Bmj, 1998. 316(7139): p. 1236 8. -  47.  Nyholt, D.R., A simple correction for multiple testing for  single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet, 2004. 74(4): p. 765 9. -  48.  Siepel, A., et al., Evolutionarily conserved elements in  vertebrate, insect, worm, and yeast genomes. Genome Res, 2005. 15(8): p. 1034-50.  112  Appendix Table Al: Concentration of a subset of the DNA samples in the original source plates #1-8  Individual ID 0003-01 0003-02 0003-03 0019-01 0019-02 0019-03 0052-01 0052-02 0052-03 0063-01 0063-02 0063-03 0069-01 0069-02 0069-03 0083-01 0083-02 0083-03 0101-01 0101-02 0101-03 0150-01 0150-02 0150-03  DNA concentration (ng/uL) 4.43 3.93 1.13 2.52 1.16 0.25 1.95 0.87 0.72 1.94 0.93 0.46 2.54 1.60 -0.03 2.63 1.47 0.55 3.17 2.47 1.29 3.40 9.77 2.75  113  Table A2: Families with non-Mendelian inheritance when genotyping a particular SNP  Family ID 0167 0313 0374 0445 0451 0453 0478  0482 1251 1300 1324 1350 1383 1431 1444 1878 1881 2543 2800 2809 2821 4120  SNP HMOX1_2790 HMOX1_149 HMOX1_149 BF_2557 HMOX1_149 HMOX1_149 TLR4_11912 HMOX1_149 BF _2557 C3_28795 HMOX1_9531 HMOX1_149 HMOX1_149 HMOX1_149 HMOX1_149 HMOX1_149 HMOX1_149 C3_963 HMOX1_149 HMOX1_149 HMOX1_149 C3_963 C3_963 HMOX1_149 HMOX1_149 HMOX1_149  114  Table A3: Genotype frequency of each of the SNPs examined in the gene of Factor B  SNP  Group  SF_2557  Parents  Total # of Individuals 1066  Patients  533  Parents  1070  Patients  535  Parents  1070  Patients  535  Parents  1070  Patients  535  Parents  1070  Patients  535  BF_4022  # of Individuals 36 314 705 11 20 135 367 11 107 458 489 16 67 201 259 8  SF_6484  BF_7202  BF_8311  858 189 19 4 423 100 8 4 197 508 344 21 105 252 172 6 862 182 7 19 428 90 5 12  Genotype  Percentage  AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined CC CT TT undetermined CC CT TT undetermined  3.4 29.5 66.1 1.0 3.8 25.3 68.9 2.1 10.0 42.8 45.7 1.5 12.5 37.6 48.4 1.5 80.2 17.7 1.8 0.4 79.1 18.7 1.5 0.8 18.4 47.5 32.2 2.0 19.6 47.1 32.2 1.1 80.6 17.0 0.7 1.8 80.0 16.8 0.9 2.2  115  Table A4: Genotype frequency of each of the SNPs examined in the gene of Complement factor 3  SNP  Group  Total # of Individuals  # of Individuals  Genotype  Percentage  C3_963  Parents  1064  Patients  532  Parents  1068  Patients  534  Parents  1070  Patients  535  Parents  1070  Patients  535  270 527 249 18 152 255 119 6 355 521 177 15 202 237 92 3 202 519 330 19 96 266 163 10 239 529 292 10 123 251 157 4  GG GT TT undetermined GG GT TT undetermined AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined  25.4 49.5 23.4 1.7 28.6 47.9 22.4 1.1 33.2 48.8 16.6 1.4 37.8 44.4 17.2 0.6 18.9 48.5 30.8 1.8 17.9 49.7 30.5 1.9 22.3 49.4 27.3 0.9 23.0 46.9 29.4 0.8  C3_28795  C3_36735  C343116  116  Table A5: Genotype frequency of each of the SNPs examined in the gene of Toll-like receptor 4  SNP  Group  Total # of Individuals  # of Individuals  Genotype  TLR4_851  Parents  1070  Patients  535  Parents  1070  Patients  535  Parents  1070  Patients  535  Parents  1070  Patients  535  Parents  1068  Patients  534  Parents  1070  Patients  535  551 434 81 4 285 203 38 9 143 509 410 8 77 244 209 5 24 266 770 10 10 140 382 3 16 207 841 6 8 106 418 3 478 467 114 9 234 229 65 6 23 270 764 13 12 120 392 11  AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined AA AG GG undetermined CC CT TT undetermined CC CT TT undetermined AA AC CC undetermined AA AC CC undetermined GG GT TT undetermined GG GT TT undetermined CC CG GG undetermined CC CG GG undetermined  TLR4_1859  TRL4_2856  TLR4_9263  TLR4_11912  TLR4_15884  Percentage  51.5 40.6 7.6 0.4 53.3 37.9 7.1 1.7 13.4 47.6 38.3 0.8 14.4 45.6 39.1 0.9 2.2 24.9 72.0 0.9 1.9 26.2 71.4 0.6 1.5 19.4 78.6 0.6 1.5 19.8 78.1 0.6 44.8 43.7 10.7 0.8 43.7 42.8 12.2 1.1 2.2 25.2 71.4 1.2 2.2 22.4 73.3 2.1  117  Table A5: Genotype frequency of each of the SNPs examined in the gene of Toll-like receptor 4  TLR4_17050  Parents  1070  Patients  535  20 277 765 8 15 132 384 4  CC CT TT undetermined CC CT TT undetermined  1.9 25.9 71.5 0.8 2.8 24.7 71.8 0.8  118  Table A6: Genotype and allele frequency of each of the SNP examined in the gene of Heme oxygenase-1  SNP  Group  Total # of Individuals  HMOX1149  Parents  1036  Patients  518  Parents  1070  Patients  535  Parents  1068  Patients  534  Parents  1070  Patients  535  Parents  1068  Patients  534  Parents  1070  Patients  535  HMOX1_1038  HMOX12790  HMOX1_3308  HMOX1_9531  HMOX1_16442  # of Individuals  122 404 479 31 61 208 227 22 975 83 1 11 478 51 2 4 341 517 200 10 166 253 108 7 2 106 954 8 0 53 476 6 225 528 302 13 108 255 163 8 952 109 1 8 475 55 1 4  Genotype  Percentage  11.8 39.0 46.2 undetermined 3.0 AA 11.8 40.2 AG GG 43.8 undetermined 4.2 CC 91.1 7.8 CT TT 0.1 undetermined 1.0 CC 89.4 CT 9.5 TT 0.4 undetermined 0.8 AA 31.9 AT 48.4 TT 18.7 undetermined 0.9 AA 31.1 AT 47.4 TT 20.2 undetermined 1.3 CC 0.2 9.9 CG GG 89.2 undetermined 0.8 CC 0.0 CG 9.9 GG 89.0 undetermined 1.1 AA 21.1 AG 49.4 GG 28.3 undetermined 1.2 AA 20.2 AG 47.8 GG 30.5 undetermined 1.5 AA 89.0 AT 10.2 TT 0.1 undetermined 0.8 88.8 AA AT 10.3 TT 0.2 undetermined 0.8 AA AG GG  119  Table A7: Phenotypic characteristics of the CF patients  Characteristics  # of Individuals  Average  Range  #^(%)  Age (years) Age^at Diagnosis (years) Age of First PA Infection (years) FEV1 % Predicted FEV1 SD  488 464  16.2 2.7  1.0^-^61.1 0^-^59.4  na na  284  8.5  0.2^-^39.3  na  409  75.7  na  409  0.45  Sex  495  na  14.2^140.1 (-1.93)^4.08 na  Pancreatic Function  478  na  na  Meconium Ileus  473  na  na  Pseudomonas aeruginosa status  476  na  na  Genotype  463  na  na  na 236^(47.7%)^female, 259^(52.3%)^male 428^(89.5%)^PI, 49^(10.3%)^PS, 1^(0.2%) unknown 380^(80.3%)^no, 90^(19%)^yes, 3^(0.7%) unknown 141^(29.6%)^none, 73^(15.4%)^once, 102(21.4%)sporadic, 160^(33.6%)^chronic 272(58.7%)8F508/aF508, 155^(33.4%)8F508/other, 36^(7.9%)^other/other  120  Table A8: The ANOVA result of examining age of diagnosis among  different genotypes of the selected SNPs in Complement factor 3. The age of diagnosis was logarithmically transformed for normality  SNPs  Genotype  C3_963  GG GT TT AA AG GG AA AG GG AA AG GG  C3_28795  C3_36735  C3_43118  Standard Error  Mean  Number 128 208 102 170 194 80 79 225 134 107 208 128  -0.28 -0.19 -0.18 -0.14 -0.25 -0.28 -0.37 -0.15 -0.19 -0.16 -0.23 -0.23  0.08 0.06 0.09 0.07 0.06 0.10 0.10 0.06 0.07 0.08 0.06 0.08  P value 0.59  0.35  0.17  0.81  Table A9: The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Complement factor 3  SNPs  Genotype  C3_963  GG GT TT AA AG GG AA AG GG AA AG GG  C3_28795  C3_36735  C3_43118  Number  Mean 120 195 86 161 175 70 73 202 126 96 193 117  Standard Error 73.86 76.40 77.59 77.33 74.74 74.72 73.00 77.89 75.65 77.37 76.86 72.85  2.39 1.88 2.83 2.07 1.99 3.14 69.67 1.84 2.34 2.68 1.89 2.42  P value 0.56  0.62  0.07  0.34  121  Table A10: The ANOVA result of examining FEV1 sd value among different genotypes of the selected SNPs in Complement factor 3  SNPs  Genotype  C3_963  GG GT  C3_28795  AA AG GG AA AG GG AA AG GG  Number  120 195 86 161 175 70 73 202 126 96 193 117  TT  C3_36735  C3_43118  Mean  Standard Error  0.31 0.53 0.56 0.54 0.39 0.47 0.24 0.52 0.49 0.59 0.48 0.34  P value  0.09 0.07 0.10 0.07 0.07 0.11 0.11 0.07 0.08 0.10 0.07 0.09  0.09  0.35  0.10  0.15  Table All: The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Toll-like receptor 4  SNPs  Genotype  TLR4_851  AA AG GG AA AG GG  TLR4_1859  TLR4_2856  CC CT TT  TLR4_9263  AA AC  TLR4_11912  GG GT  CC  TLR4_15884  TT CC CG GG  TLR4_17050  CC CT TT  Number  Mean  210 163 30 55 184 166 7 106 294 6 85 316 176 177 50 8 90 301 10 105 291  76.04 75.96 71.76 79.18 74.31 76.19 66.41 77.02 75.54 81.71 75.89 75.66 74.78 77.02 75.11 71.57 76.16 75.69 91.25 74.90 75.52  Standard Error  1.82 2.07 4.81 3.54 1.94 2.04 9.92 2.55 1.53 10.73 2.85 1.48 1.98 1.98 3.72 9.35 2.79 1.52 8.28 2.56 1.54  P value  0.70  0.46  0.56  0.86  0.71  0.89  0.16  122  Table Al2: The ANOVA result of examining FEV1 sd value among different genotypes of the selected SNPs in Toll-like receptor 4  SNPs  Genotype  TLR4_851  AA AG GG AA AG GG  TLR4_1859  TLR4_2856  TLR4_9263  TLR4_11912  TLR4_15884  Number 210 163 30 55 184 166 7 106 294 6 85 316 176 177 50 8 90 301  CC CT TT AA AC CC GG GT TT  CC CG GG  TLR4_17050  Mean  CC  10  CT TT  105 291  Standard Error 0.44 0.49 0.44 0.57 0.43 0.46 -0.02 0.47 0.47 0.66 0.50 0.45 0.47 0.51 0.31 0.13 0.50 0.46 1.02 0.43 0.45  0.07 0.07 0.17 0.13 0.07 0.07 0.36 0.09 0.06 0.39 0.10 0.05 0.07 0.07 0.13 0.34 0.10 0.06 0.30 0.09 0.06  P value 0.85  0.66  0.39  0.81  0.41  0.56  0.16  Table A13: The ANOVA result of examining age of diagnosis among different genotypes of the selected SNPs in Heme oxygenase-1. The age of diagnosis was logarithmically transformed for normality  SNPs  Genotype  HMOX1149  AA AG GG  HMOX1_1038  CC  HMOX1_2790  CT TT AA AT TT  HMOX1_3303  HMOX1_9531  HMOX1_16442  Number  Mean 54 167 195 398 43 2 142 207 91  CC  0  CG GG AA AG GG AA AT TT  41 401 96 205 138 403 39 1  Standard Error -0.20 -0.13 -0.30 -0.20 -0.30 -1.15 -0.26 -0.17 -0.21  n/a  0.12 0.07 0.06 0.04 0.13 0.61 0.07 0.06 0.09  0.17  0.23  0.60  0.26  n/a -0.36 -0.20 -0.27 -0.15 -0.25 -0.20 -0.34 0.57  P value  0.13 0.04 0.09 0.06 0.07 0.04 0.14 0.86  0.44  0.42  123  Table A14: Age of onset analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Factor B  SNPs  Estimated Value  Standard Error  Variable  Sub-group  BF_2557  AA AG Female AF508/AF508  0.098 -0.204 0.148 -0.036  0.284 0.189 0.088 0.135  AF508/other AA AG Female AF508/AF508  -0.009 0.0246 -0.105 0.146 -0.029  0.143 0.175 0.133 0.087 0.135  AF508/other AA AG Female AF508/AF508  -0.015 -0.0445 -0.0465 0.156 -0.0275  0.143 0.253 0.273 0.086 0.133  AF508/other AA AG Female AF508/AF508  -0.0148 -0.0684 0.058 0.152 -0.0465  0.143 0.146 0.118 0.087 0.135  AF508/other CC CT Female AF508/AF508  -0.00099 -0.135 -0.24 0.149 -0.0406  0.144 0.257 0.28 0.087 0.134  AF508/other  -0.005  0.144  BF_2557  Sex CFTR mutation BF_4022 BF_4022  Sex CFTR mutation BF_6484 BP6484  Sex CFTR mutation BF_7202 BF_7202  Sex CFTR mutation BF_8311 BF_8311  Sex CFTR mutation  P value  0.322 0.094 0.965  0.620 0.0965 0.974  0.983 0.071 0.976  0.859 0.082 0.942  0.719 0.0887 0.955  124  Table A15: Age of onset analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Complement factor 3  SNPs  Variable  Sub-group  C3_963  GG GT Female AF508/AF508  0.26 -0.177 0.16 0.0014  0.125 0.119 0.087 0.137  AF508/other AA AG Female AF508/AF508  0.007 -0.025 0.216 0.15 -0.0068  0.146 0.131 0.123 0.0869 0.133  AF508/other AA AG Female AF508/AF508  0.0045 0.0735 0.168 0.173 -0.031  0.144 0.157 0.12 0.0876 0.134  AF508/other AA AG Female AF508/AF508  -0.027 0.112 0.107 0.153 -0.03  0.144 0.136 0.116 0.0866 0.133  AF508/other  -0.0128  0.142  C3_963  Sex CFTR mutation C328795 C3_28795  Sex CFTR mutation C3_36735 C3_36735  Sex CFTR mutation C3_43118 C3_43118  Sex CFTR mutation  Estimated Value  Standard Error  P value  0.0786 0.0668 0.9988 0.2087 0.0842 0.9981 0.1213 0.0495 0.9605 0.2875 0.0958 0.4232  125  Table A16: Age of onset analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Heme oxygenase-1  SNPs  Variable  Sub-group  HMOX1_149 HNOX1_149  Sex CFTR mutation HMOX1_1038 HNOX1_1038  Sex CFTR mutation HMOX1_2790 }INOX1_2790  Sex CFTR mutation HMOX1_3303 HMOX1_3303  Sex CFTR mutation HM0X1_9531  HNOX1_9531  Sex CFTR mutation HMOX1_16442 HMOX1_16442  Sex CFTR mutation  Estimated Value  Standard Error  AA AG Female AF508/AF508  0.0189 -0.0377 0.132 -0.032  0.18 0.131 0.089 0.135  AF508/other CC CT Female AF508/AF508  -0.02 3.283 3.31 0.148 -0.0262  0.146 61.03 61.03 0.086 0.133  AF508/other AA AT Female AF508/AF508  -0.0093 0.077 0.031 0.145 0.0039  0.142 0.131 0.118 0.087 0.136  AF508/other CG  0.00041 0.02  0.146 0.158  Female AF508/AF508  0.151 -0.0346  0.0867 0.133  AF508/other AA AG Female AF508/AF508  -0.0095 0.0048 -0.047 0.162 -0.033  0.142 0.143 0.118 0.088 0.134  AF508/other AA AT Female AF508/AF508  -0.021 -0.886 -0.938 0.144 -0.03  0.143 0.363 0.396 0.087 0.133  AF508/other  -0.0198  0.143  P value  0.9514 0.14 0.9654 0.5595 0.0861 0.9797 0.7633 0.0966 0.9996 0.8987 0.0819 0.9663 0.9069 0.0647 0.9628 0.17 0.0986  0.9688  126  Table A17: Chi squared test for investigating the relationship between different genotypes of the selected SNPs in Factor B and Pseudomonas aeruginosa infection status  SNPs  Genotype  PA status (in percentage)  0  1  2  3  BF 2557  AA AG GG AA AG GG AA AG GG AA AG GG CC CT TT  36.84 26.09 30.00 30.51 28.25 29.61 29.79 30.00 28.57 30.43 29.15 29.03 29.92 27.50 40.00  10.53 13.91 16.36 10.71 12.99 17.60 16.22 11.11 14.29 11.96 14.35 18.71 14.44 20.00 0.00  15.79 20.87 21.21 25.42 20.34 21.89 20.48 24.44 28.57 22.83 19.28 23.87 21.52 20.00 20.00  36.84 39.13 32.42 33.90 38.42 30.90 33.51 34.44 28.57 34.78 37.22 28.39 34.12 32.50 40.00  BF 4022  BF 6484  BF 7202  BF 8311  Chi Test  P value  2.91  0.82  4.79  0.57  2.11  0.91  5.19  0.52  3.25  0.78  127  Table A18: Chi square test for investigating the relationship between different genotypes of the selected SNPs in Toll-like receptor 4 and  Pseudomonas aeruginosa infection status  SNPs  Genotype  PA status (in percentage) 0 1 2 3  Chi Test  P value  TLR4_851  AA AG GG AA AG GG CC CT TT AA AC CC GG GT TT CC CG GG CC CT TT  32.16 28.89 20.59 35.29 30.91 26.09 37.50 32.80 28.45 14.29 22.83 31.47 31.10 26.96 29.82 30.00 33.33 27.87 27.27 25.64 31.01  7.32  0.29  4.72  0.58  1.78  0.94  4.13  0.66  2.37  0.88  3.62  0.73  1.82  0.94  TLR4_1859  TLR4_2856  TLR4_9263  TLR4_11912  TLR4_15884  TLR4_17050  16.08 13.33 8.82 17.56 13.64 16.30 12.50 12.80 16.13 14.29 18.48 14.40 13.88 16.18 17.54 20.00 11.11 16.38 9.09 16.24 15.07  21.96 20.56 20.59 22.06 20.00 22.83 25.00 21.60 21.11 28.57 21.74 21.33 20.10 22.06 24.56 30.00 22.22 21.26 27.27 23.08 20.58  29.80 37.22 50.00 25.00 35.45 34.78 25.00 32.80 34.31 42.86 36.96 32.80 34.93 34.80 28.07 20.00 33.33 34.48 36.36 35.04 33.33  128  Table A19: Chi square test for investigating the relationship between different genotypes of the selected SNPs in Heme oxygenase-1 and  Pseudomonas aeruginosa infection status  SNPs  Genotype  PA status (in percentage) 1 0 2 3  Chi Test  P value  HMOX1_149  AA AG GG CC CT TT AA AT TT CC CG GG AA AG GG AA AT TT  29.63 30.39 30.14 30.35 26.09 0.00 30.07 31.82 23.71 0.00 23.40 30.35 29.13 30.91 27.59 30.05 23.91 100.00  6.03  0.42  4.05  0.67  6.86  0.33  6.41  0.094  4.84  0.56  7.86  0.25  HMOX1_1038  HN0X1_2790  HMOX1_3303  H1OX1_9531  HMOX1_16442  11.11 12.71 18.66 15.29 13.04 50.00 18.95 14.09 11.34 0.00 19.15 14.82 19.42 15.91 11.03 14.79 19.57 0.00  22.22 19.34 22.49 21.41 21.74 0.00 19.61 20.00 27.84 0.00 34.04 20.00 20.39 20.45 22.76 20.19 32.61 0.00  37.04 37.57 28.71 32.94 39.13 50.00 31.37 34.09 37.11 0.00 23.40 34.82 31.07 32.73 38.62 34.98 23.91 0.00  129  Table A20: FBAT analysis of the age of diagnosis under the additive model. The age of diagnosis was first log-transformed before entering into the program  Gene  SNPs  Allele  Factor B  BF_2557 BF_2557 BF_4022 BF_4022 BF_6484 BF_6484 BF_7202 BF_7202 BF_8311 BF_8311 C3_963 C3_963 C3_28795 C3_28795 C3_36735 C3_36735 C3_43118  A G A G A G A G C T G T A G A G A  224 224 298 298 148 148 317 317 139 139 344 344 332 332 320 320 342  0.377 0.377 0.755 0.755 0.288 0.288 0.207 0.207 0.136 0.136 0.443 0.443 0.034 0.034 0.250 0.250 0.048  C3_43118 TLR4_851 TLR4_851 TLR4_1859 TLR4_1859 TLR4_2856 TLR4_2856 TLR4_9263 TLR4_9263 TLR4_11912 TLR4_11912 TLR4_15884 TLR4_15884 TLR4_17050  G A G A G C T A C G T C G C  342 300 300 333 333 207 207 153 153 317 317 194 194 196  0.048 0.289 0.289 0.730 0.730 0.366 0.366 0.855 0.855 0.366 0.366 0.389 0.389 0.344  TLR4_17050 HMOX1_149 HMOX1_149 HMOX1_1038 HMOX1_1038 HMOX1_2790 HM0X1_2790 HMOX1_3303 HMOX1_3303 HMOX1_9531 HMOX1_9531 HMOX1_16442  T A G C T A T C G A G A  196 250 250 65 65 333 333 80 80 334 334 79  0.344 0.289 0.289 0.086 0.086 0.688 0.688 0.114 0.114 0.553 0.553 0.194  HMOX1_16442  T  79  0.194  Complement factor 3  Toll-like receptor 4  Heme oxygenase-1  #^of Individuals  P value  130  Table A21: FBAT analysis of the age of diagnosis under the dominant model. The age of diagnosis was first log-transformed before entering into the program  #^of Individuals  Gene  SNPs  Allele  Factor B  BF_2557 BF_2557 BF_4022 BF_4022 BF_6484 BF_6484 BF_7202 BF_7202 BF_8311 BF_8311 C3_963 C3_963 C3_28795 C3_28795 C3_36735 C3_36735 C3_43118  A G A G A G A G C T G T A G A G A  214 57 253 136 22 142 231 196 8 138 228 227 189 244 243 179 239  C3_43118 TLR4_851 TLR4_851 TLR4_1859 TLR4_1859 TLR4_2856 TLR4_2856 TLR4_9263 TLR4_9263 TLR4_11912 TLR4_11912 TLR4_15884 TLR4_15884 TLR4_17050  G A G A G C T A C G T C G C  223 94 275 277 165 201 34 149 20 136 261 187 31 192  0.096 0.401 0.413 0.533 0.144 0.131 0.231 0.607 0.332 0.133 0.930 0.457 0.577 0.318  TLR4_17050 HMOX1_149 HMOX1_149 HMOX1_1038 HMOX1_1038 HMOX1_2790 HMOX1_2790 HMOX1_3303 HMOX1_3303 HMOX1_9531 HMOX1_9531 HMOX1_16442  T A G C T A T C G A G A  37 205 117 4 65 185 246 80 7 226 208 7  0.884 0.059 0.495  HMOX1_16442  T  79  Complement factor 3  Toll-like receptor 4  Heme oxygenase-1  P value  0.370 0.832 0.818 0.802 0.545 0.342 0.889 0.029 n/a 0.124 0.942 0.213 0.366 0.029 0.891 0.084 0.177  n/a 0.198 0.304 0.762 0.064 n/a 0.195 0.648 n/a 0.080  131  Table A22: FBAT analysis of FEV1 predicted value under the additive model  #^of Individuals  Gene  SNP  Allele  Factor B  BF_2557 BF_2557 BF_4022 BF_4022 BF_6484 BF_6484 BF_7202 BF_7202 BF_8311 BF_8311 C3_963 C3_963 C3_28795 C3_28795 C3_36735 C3_36735 C3_43118  A G A G A G A G C T G T A G A G A  201 201 265 265 138 138 293 293 132 132 321 321 307 307 293 293 310  0.019 0.019 0.090 0.090 0.965 0.965 0.598 0.598 0.573 0.573 0.124 0.124 0.087 0.087 0.307 0.307 0.823  C3_43118 TLR4_851 TLR4_851 TLR4_1859 TLR4_1859 TLR4_2856 TLR4_2856 TLR4_9263 TLR4_9263 TLR4_11912 TLR4_11912 TLR4_15884 TLR4_15884 TLR4_17050  G A G A G C T A C G T C G C  310 273 273 297 297 187 187 146 146 285 285 170 170 185  0.823 0.247 0.247 0.610 0.610 0.876 0.876 0.721 0.721 0.092 0.092 0.036 0.036 0.940  TLR4_17050 HMOX1_149 HMOX1_149 HMOX1_1038 HMOX1_1038 HMOX1_2790 HMOX1_2790 HMOX1_3303 HMOX1_3303 HMOX1_9531 HMOX1_9531 HMOX1_16442  T A G C T A T C G A G A  185 228 228 59 59 308 308 77 77 316 316 73  0.940 0.830 0.830 0.015 0.015 0.469 0.469 0.333 0.333 0.886 0.886 0.693  HMOX1_16442  T  73  0.693  Complement factor 3  Toll-like receptor 4  Heme oxygenase-1  P value  132  Table A23: FBAT analysis of FEV1 predicted value under the dominant model  Gene  SNP  Allele  Factor B  BF_2557 BF_2557 BF_4022 BF_4022 BF_6484 BF_6484 BF_7202 BF_7202 BF_8311 BF_8311 C3_963 C3_963 C3_28795 C3_28795 C3_36735 C3_36735 C3_43118  A G A G A G A G C T G T A G A G A  193 49 232 118 22 132 217 179 9 131 207 220 166 233 224 166 216  C3_43118 TLR4_851 TLR4_851 TLR4_1859 TLR4_1859 TLR4_2856 TLR4_2856 TLR4_9263 _ TLR4_9263 TLR4_11912 TLR4_11912 TLR4_15884 TLR4_15884 TLR4_17050  G A G A G C T A C G T C G C  200 93 250 247 150 180 31 142 21 124 236 165 27 181  0.865 0.487 0.312 0.448 0.919 0.575 0.298 0.748 0.849 0.294 0.151 0.037 0.539 0.838  TLR4_17050 HMOX1_149 HMOX1_149 HMOX1_1038 HMOX1_1038 HMOX1_2790 HMOX1_2790 HMOX1_3303 HMOX1_3303 HMOX1_9531 HMOX1_9531 HMOX1_16442  T A G C T A T C G A G A  36 190 103 4 59 177 230 77 6 216 195 5  0.762 0.984 0.668  HMOX1_16442  T  73  Complement factor 3  Toll-like receptor 4  Hem oxygenase-1  #^of Individuals  P value  0.005 0.773 0.010 0.556 0.847 0.908 0.132 0.397 n/a 0.812 0.267 0.226 0.736 0.038 0.574 0.305 0.616  n/a 0.018 0.654 0.528 0.509 n/a 0.800 0.625 n/a 0.887  133  Table A24: FBAT analysis of FEV1 standard deviation value under the additive model  # of Individuals  Gene  SNP  Allele  Factor B  BF_2557 BF_2557 BF_4022 BF_4022 BF_6484 BF_6484 BF_7202 BF_7202 BF_8311 BF_8311 C3_963 C3_963 C3_28795 C3_28795 C3_36735 C3_36735 C3_43118  A G A G A G A G C T G T A G A G A  201 201 264 264 137 137 293 293 131 131 320 320 306 306 293 293 309  0.089 0.089 0.012 0.012 0.727 0.727 0.026 0.026 0.859 0.859 0.848 0.848 0.316 0.316 0.128 0.128 0.224  C3_43118 TLR4_851 TLR4_851 TLR4_1859 TLR4_1859 TLR4_2856 TLR4_2856 TLR4_9263 TLR4_9263 TLR4_11912 TLR4_11912 TLR4_15884 TLR4_15884 TLR4_17050  G A G A G C T A C G T C G C  309 272 272 296 296 187 187 146 146 285 285 170 170 185  0.224 0.999 0.999 0.473 0.473 0.807 0.807 0.356 0.356 0.377 0.377 0.163 0.163 0.643  TLR4_17050 HMOX1_149 HMOX1_149 HMOX1_1038 HMOX1_1038 HM0X1_2790 HMOX1_2790 HMOX1_3303 HMOX1_3303 HMOX1_9531 HMOX1_9531 HMOX1_16442  T A G C T A T C G A G A  185 227 227 59 59 307 307 77 77 315 315 73  0.643 0.690 0.690 0.111 0.111 0.723 0.723 0.521 0.521 0.089 0.089 0.614  HMOX1_16442  T  73  0.614  Complement factor 3  Toll-like receptor 4  Heme oxygenase-1  P value  134  Table A25: FBAT analysis of FEV1 standard deviation value under the dominant model  # of Individuals  Gene  SNP  Allele  Factor B  BF_2557 BF_2557 BF_4022 BF_4022 BF_6484 BF_6484 BF_7202 BF_7202 BF_8311 BF_8311 C3_963 C3_963 C3_28795 C3_28795 C3_36735 C3_36735 C3_43118  A G A G A G A G C T G T A G A G A  193 49 231 118 22 131 217 179 9 130 207 219 166 232 224 166 215  C3_43118 TLR4_851 TLR4_851 TLR4_1859 TLR4_1859 TLR4_2856 TLR4_2856 TLR4_9263 TLR4_9263 TLR4_11912 TLR4_11912 TLR4_15884 TLR4_15884 TLR4_17050  G A G A G C T A C G T C G C  199 92 249 246 149 180 31 142 21 124 236 165 27 181  0.280 0.694 0.826 0.195 0.648 0.911 0.320 0.509 0.236 0.827 0.322 0.220 0.404 0.955  TLR4_17050 HMOX1_149 HM0X1_149 HMOX1_1038 HMOX1_1038 HMOX1_2790 HMOX1_2790 HMOX1_3303 HMOX1_3303 HMOX1_9531 HMOX1_9531 HMOX1_16442  T A G C T A T C G A G A  36 189 102 4 59 176 229 77 6 215 194 5  0.176 0.812 0.664  HMOX1_16442  T  73  Complement factor 3  Toll-like receptor 4  Herne oxygenase-1  P value  0.016 0.290 0.022 0.153 0.054 0.779 0.027 0.258 n/a 0.964 0.602 0.803 0.763 0.100 0.502 0.087 0.438  n/a 0.069 0.089 0.404 0.608 n/a 0.0016 0.558 n/a 0.628  135  Table A26: Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and age of diagnosis. 430 participants were included  Estimate^of^Haplotype Frequency  Standard Error  0.0133 0.0019 0.0041 0.0067 0.0104 0.0046 0.0101 0.0170  0.1850 0.0005 0.0116 0.0385 0.1040 0.0180 0.0947 0.5477  f.hlllll f.h12112 f.h12121 f.h21111 f.h21211 f.h22111 f.h22112 f.h22121  Table A27: Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value. 380 participants were included  Estimate^of^Haplotype Frequency  f.hlllll f.h12112 f.h12121 f.h21111 f.h21211 f.h22111 f.h22112 f.h22121  Standard Error  0.1804 0.0004 0.0134 0.0411 0.0979 0.0195 0.1030 0.5445  0.0140 0.0022 0.0047 0.0074 0.0108 0.0051 0.0112 0.0181  Table A28: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Factor B and FEV1 predicted value, with no adjustment for the confounding factors  Estimate of Regression Coefficient  h11111 h21211 h22121 pooled  -2.0478 1.4678 2.8597 -4.6232  Standard Error  3.7388 4.1147 3.2461 4.6197  Z-score  -0.5477 0.3567 0.8810 -1.0008  P  0.5839 0.7213 0.3783 0.3169  136  Table A29: Haplotype analysis for investigation of possible correlation between combinations of selected SNPs in Factor B and FEV1 predicted value, with adjustment for the confounding factors  Estimate of Regression Coefficient  h11111 h21211 h22121 pooled SEXM genotypeFO genotypeOO  -1.4185 1.9765 3.4425 -5.2740 -2.8884 4.5233 -0.4898  Standard Error  Z-score  3.7412 4.1065 3.2503 4.6159 2.7585 3.1719 4.3212  P  -0.3791 0.4813 1.0591 -1.1426 -1.0471 1.4261 -0.1134  0.7046 0.6303 0.2895 0.2532 0.2951 0.1538 0.9098  Table A30: Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value. 380 participants were included  Estimate^of^Haplotype Frequency  f.h11111 f.h12112 f.h12121 f.h21111 f.h21211 f.h22111 f.h22112 f.h22121  0.1804 0.0003 0.0135 0.0410 0.0979 0.0194 0.1031 0.5444  Standard Error  0.0140 0.0022 0.0047 0.0074 0.0108 0.0051 0.0112 0.0181  137  Table A31: Frequencies of possible haplotypes generated for the Complement Factor 3 gene when determining the presence of any correlation between the haplotypes and age of diagnosis. 434 participants were included  Estimate^of^Haplotype Frequency  Standard Error  0.0112 0.0122 0.0143 0.0142 0.0116 0.0115 0.0117 0.0076 0.0099 0.0077 0.0141 0.0141 0.0135 0.0134 0.0125 0.0107  0.0468 0.0814 0.1127 0.1105 0.0492 0.0596 0.0484 0.0146 0.0395 0.0174 0.0877 0.1057 0.0496 0.0968 0.0407 0.0396  f.h1111 f.h1112 f.h1121 f.h1122 f.h1211 f.h1212 f.h1221 f.h1222 f.h2111 f.h2112 f.h2121 f.h2122 f.h2211 f.h2212 f.h2221 f.h2222  Table A32: Frequencies of possible haplotypes generated for the Complement factor 3 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value. 384 participants were included  Estimate^of^Haplotype Frequency  f.h1111 f.h1112 f.h1121 f.h1122 f.h1211 f.h1212 f.h1221 f.h1222 f.h2111 f.h2112 f.h2121 f.h2122 f.h2211 f.h2212 f.h2221 f.h2222  0.0428 0.0853 0.1191 0.1087 0.0545 0.0656 0.0461 0.0092 0.0359 0.0111 0.0895 0.1158 0.0492 0.0875 0.0418 0.0377  Standard Error  0.0120 0.0133 0.0153 0.0147 0.0125 0.0122 0.0119 0.0063 0.0099 0.0063 0.0147 0.0149 0.0134 0.0131 0.0132 0.0102  138  Table A33: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 predicted value, with no adjustment for the confounding factors  Estimate of Regression Coefficient  h1112 h1122 h1211 h1212 h2122 h2212 pooled  -3.1249 -5.7612 -3.2693 -7.1414 2.7698 -5.0863 -1.8448  Standard Error  4.4955 4.2983 6.0141 5.6725 4.1147 4.5827 3.1835  Z-score  P  0.4870 0.1801 0.5867 0.2080 0.5009 0.2670 0.5623  -0.6951 -1.3403 -0.5436 -1.2590 0.6731 -1.1099 -0.5795  Table A34: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 predicted value, with adjustment for the confounding factors  Estimate of Regression Coefficient  h1112 h1122 h1211 h1212 h2122 h2212 pooled SEXM genotypeFO genotype00  -3.1173 -5.7416 -3.2399 -7.5332 2.3798 -5.2010 -1.7596 -2.2927 3.4788 1.8634  Standard Error  4.5129 4.3112 6.1242 5.8414 4.1276 4.6088 3.1975 2.7407 3.1592 4.3248  Z-score  -0.6908 -1.3318 -0.5290 -1.2896 0.5765 -1.1285 -0.5503 -0.8365 1.1012 0.4309  P  0.4897 0.1829 0.5968 0.1972 0.5642 0.2591 0.5821 0.4029 0.2708 0.6666  139  Table A35: Frequencies of possible haplotypes generated for the Complement factor 3 gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value. 384 participants were included  Estimate^of^Haplotype Frequency  Standard Error  0.0438 0.0850 0.1180 0.1090 0.0534 0.0662 0.0468 0.0091 0.0349 0.0112 0.0897 0.1166 0.0506 0.0870 0.0415 0.0371  f.h1111 f.h1112 f.h1121 f.h1122 f.h1211 f.h1212 f.h1221 f.h1222 f.h2111 f.h2112 f.h2121 f.h2122 f.h2211 f.h2212 f.h2221 f.h2222  0.0121 0.0133 0.0153 0.0147 0.0123 0.0122 0.0117 0.0061 0.0098 0.0064 0.0147 0.0151 0.0134 0.0131 0.0130 0.0101  Table A36: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 standard deviation value, with no adjustment for the confounding factors  Estimate of Regression Coefficient  h1112 h1122 h1211 h1212 h2121 h2122 h2212 pooled  -0.1705 -0.2052 -0.1801 -0.1882 0.1773 0.1110 -0.1177 -0.0260  Standard Error  0.1811 0.2000 0.2443 0.2235 0.2090 0.1619 0.1899 0.1428  Z-score  -0.9411 -1.0263 -0.7372 -0.8421 0.8486 0.6858 -0.6196 -0.1818  P  0.3467 0.3048 0.4610 0.3998 0.3961 0.4929 0.5355 0.8557  140  Table A37: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 standard deviation value, with adjustment for the confounding factors  Estimate of Regression Coefficient  h1112 h1122 h1211 h1212 h2121 h2122 h2212 pooled SEXM genotypeFO genotypeOO  -0.1686 -0.1966 -0.1766 -0.2058 0.1790 0.1032 -0.1142 -0.0204 0.0384 0.0948 -0.0210  Z-score  Standard Error  0.1817 0.2001 0.2469 0.2274 0.2093 0.1623 0.1901 0.1430 0.0975 0.1120 0.1534  P  -0.9278 -0.9826 -0.7152 -0.9051 0.8550 0.6362 -0.6009 -0.1427 0.3940 0.8463 -0.1366  0.3535 0.3258 0.4745 0.3654 0.3926 0.5246 0.5479 0.8865 0.6936 0.3974 0.8914  Table A38: Frequencies of possible haplotypes generated for the Tolllike receptor 4 gene when determining the presence of any correlation between the haplotypes and age of diagnosis. 432 participants were included  Estimate^of^Haplotype Frequency  f.h1122112 f.h1122122 f.h1212112 f.h1212221 f.h1212222 f.h1221212 f.h1221221 f.h1221222^• f.h1222112 f.h1222122 f.h1222221 f.h1222222 f.h2222112 f.h2222121 f.h2222122  0.1447 0.2303 0.0012 0.0428 0.1042 0.0016 0.1026 0.0057 0.0013 0.0097 0.0010 0.0729 0.0015 0.0017 0.2788  Standard Error  0.0121 0.0144 0.0012 0.0070 0.0104 0.0015 0.0104 0.0027 0.0014 0.0035 0.0015 0.0089 0.0015 0.0017 0.0153  141  Table A39: Frequencies of possible haplotypes generated for the Tolllike receptor 4 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value. 382 participants were included  Estimate^of^Haplotype Frequency  Standard Error  0.1350 0.2354 0.0013 0.0405 0.1022 0.0018 0.1043 0.0065 0.0013 0.0070 0.0011 0.0733 0.0017 0.0019 0.2867  f.h1122112 f.h1122122 f.h1212112 f.h1212221 f.h1212222 f.h1221212 f.h1221221 f.h1221222 f.h1222112 f.h1222122 f.h1222221 f.h1222222 f.h2222112 f.h2222121 f.h2222122  0.0125 0.0154 0.0013 0.0072 0.0110 0.0017 0.0111 0.0031 0.0013 0.0031 0.0017 0.0095 0.0017 0.0019 0.0164  Table A40: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 predicted value, with no adjustment for the confounding factors  Estimate of Regression Coefficient  h1122112 h1122122 h1212222 h1221221 h1222222 pooled  -0.6825 1.7004 -1.1586 -0.0822 2.0593 1.7087  Standard Error  3.1893 2.7327 3.6077 3.6590 4.0267 4.4352  Z-score  -0.2140 0.6222 -0.3212 -0.0225 0.5114 0.3853  P  0.8306 0.5338 0.7481 0.9821 0.6091 0.7001  142  Table A41: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 predicted value, with adjustment for the confounding factors  Estimate of Regression Coefficient  h1122112 h1122122 h1212222 h1221221 h1222222 pooled SEXM genotypeFO genotype00  -0.9215 1.7599 -1.1055 -0.1823 2.1089 1.7826 -3.0651 2.8110 0.5437  Standard Error  Z-score  3.2089 2.7359 3.6083 3.6909 4.0556 4.4310 2.7789 3.1823 4.3542  P  -0.2872 0.6433 -0.3064 -0.0494 0.5200 0.4023 -1.1030 0.8833 0.1249  0.7740 0.5201 0.7593 0.9606 0.6031 0.6875 0.2700 0.3771 0.9006  Table A42: Frequencies of possible haplotypes generated for the Tolllike receptor 4 gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value. 382 participants were included  Estimate^of^Haplotype Frequency  f.h1122112 f.h1122122 f.h1212112 f.h1212221 f.h1212222 f.h1221212 f.h1221221 f.h1221222 f.h1222112 f.h1222122 f.h1222221 f.h1222222 f.h2222112 f.h2222121 f.h2222122  0.1350 0.2354 0.0013 0.0405 0.1022 0.0017 0.1043 0.0065 0.0013 0.0070 0.0011 0.0733 0.0017 0.0020 0.2867  Standard Error  0.0125 0.0154 0.0013 0.0072 0.0110  0.0017 0.0111 0.0031 0.0013 0.0031 0.0017 0.0095 0.0017 0.0020 0.0164  143  Table A43: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value, with no adjustment for the confounding factors  h1122112 h1122122 h1212222 h1221221 h1222222 pooled  Estimate of Regression Coefficient -0.0400 -0.0148 -0.1856 -0.0675 -0.1347 -0.1072  Standard Error 0.1136 0.0974 0.1285 0.1303 0.1437 0.1580  Z-score  P  -0.3521 -0.1518 -1.4440 -0.5182 -0.9371 -0.6782  0.7248 0.8793 0.1487 0.6043 0.3487 0.4976  Table A44: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value, with adjustment for the confounding factors  h1122112 h1122122 h1212222 h1221221 h1222222 pooled SEXM genotypeFO genotype00  Estimate of Regression Coefficient -0.0315 -0.0125 -0.1776 -0.0629 -0.1501 -0.0991 0.0269 0.0901 -0.0533  Standard Error 0.1144 0.0977 0.1287 0.1316 0.1449 0.1581 0.0991 0.1135 0.1553  Z-score  -0.2756 -0.1280 -1.3798 -0.4782 -1.0354 -0.6268 0.2714 0.7938 -0.3433  0.7828 0.8981 0.1676 0.6325 0.3005 0.5308 0.7861 0.4273 0.7314  144  Table A45: Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and age of diagnosis. 433 participants were included  Estimate^of^Haplotype Frequency  f.h111211 f.h111221 f.h112122 f.h112221 f.h211211 f.h211212 f.h211221 f.h211222 f.h212121 f.h212122 f.h212221 f.h212222 f.h221211 f.h221212 f.h221221  Standard Error  0.0074 0.0015 0.0013 0.3273 0.3730 0.0100 0.1085 0.0036 0.0135 0.0363 0.0612 0.0019 0.0522 0.0010 0.0013  0.0031 0.0015 0.0013 0.0160 0.0166 0.0040 0.0107 0.0022 0.0040 0.0064 0.0084 0.0016 0.0077 0.0018 0.0013  Table A46: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Heme oxygenase-1 and the age of diagnosis, with no adjustment for the confounding factors. The age of diagnosis was logarithmically transformed for normality  Estimate of Regression Coefficient  h112221 h211221 h212221 h221211 pooled  -0.1633 -0.1412 -0.2404 -0.5300 -0.0541  Standard Error  0.1525 0.2224 0.3017 0.3007 0.2641  Z-score  -1.0710 -0.6348 -0.7967 -1.7626 -0.2049  P  0.2842 0.5255 0.4256 0.0780 0.8377  145  Table A47: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and the age of diagnosis, with adjustment for the confounding factors. The age of diagnosis was logarithmically transformed for normality  Estimate of Regression Coefficient  h112221 h211221 h212221 h221211 pooled SEXM genotypeFO genotype00  -0.1535 -0.1162 -0.2301 -0.5149 -0.0804 0.0633 0.4132 0.3573  Standard Error  Z-score  0.1517 0.2216 0.3008 0.3017 0.2632 0.1820 0.2078 0.2910  P  -1.0116 -0.5245 -0.7647 -1.7068 -0.3053 0.3477 1.9883 1.2278  0.3117 0.6000 0.4445 0.0879 0.7602 0.7281 0.0468 0.2195  Table A48: Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value. 383 participants were included  Estimate^of^Haplotype Frequency  f.h111211 f.h111221 f.h112122 f.h112221 f.h211211 f.h211212 f.h211221 f.h211222 f.h212121 f.h212122 f.h212221 f.h212222 f.h221211 f.h221221  0.0085 0.0018 0.0014 0.3308 0.3744 0.0100 0.1072 0.0026 0.0097 0.0413 0.0592 0.0020 0.0499 0.0013  Standard Error  0.0035 0.0018 0.0014 0.0171 0.0176 0.0039 0.0113 0.0022 0.0036 0.0072 0.0088 0.0018 0.0079 0.0013  146  Table A49: Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value. 383 participants were included  Estimate^of^Haplotype Frequency  Standard Error  0.0035 0.0018 0.0014 0.0171 0.0176 0.0039 0.0113 0.0022 0.0036 0.0072 0.0088 0.0018 0.0079 0.0013  0.0085 0.0018 0.0014 0.3307 0.3745 0.0098 0.1073 0.0026 0.0097 0.0413 0.0591 0.0021 0.0499 0.0013  f.h111211 f.h111221 f.h112122 f.h112221 f.h211211 f.h211212 f.h211221 f.h211222 f.h212121 f.h212122 f.h212221 f.h212222 f.h221211 f.h221221  Table A50: Frequencies of possible haplotypes generated for the Factor B gene by the FEAT program  Estimates of frequency:  haplotypes  hl h2 h3 h4 h5 h6 h7 h8 h9 h10  22121 11111 21211 22112 21111 22111 12121 22122 22221 21221  0.556 0.177 0.104 0.089 0.039 0.021 0.011 0.001 0.000 0.000  147  Table A51: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Factor B and age of diagnosis by the FBAT program. Age of diagnosis was logarithmically transformed for normality  Haplotype  hl h2 h3 h4 h5 h6 h7 h8 h9 h10  # of Family  251 182 117 120 55 31 19 1 1 0  S  E(S)  Var(S)  Z  -104.948 -27.32 -14.405 0.241 -15.496 -1.428 -0.349 n/a n/a n/a  -92.713 -33.303 -17.557 -7.648 -10.599 -2.042 -0.24 n/a n/a n/a  76.79 54.699 27.811 21.898 11.493 6.609 2.519 n/a n/a n/a  -1.396 0.809 0.598 1.686 -1.444 0.239 -0.068 n/a n/a n/a  P value  0.163 0.418 0.550 0.092 0.149 0.811 0.946 n/a n/a n/a  Table A52: Frequencies of possible haplotypes generated for the Complement factor 3 gene by the FBAT program  Haplotypes  hl h2 h3 h4 h5 h6 h7 h8 h9 h10 hil h12 h13 h14 h15 h16  Estimates of frequency  2122 1121 1122 2212 2121 1212 1112 2221 1221 2111 1211 2211 1111 2222 1222 2112  0.123 0.105 0.102 0.097 0.072 0.071 0.070 0.055 0.054 0.048 0.046 0.046 0.044 0.025 0.021 0.020  148  Table A53: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Complement factor 3 and FEV1 predicted value by FBAT program  Haplotype hl h2 h3 h4 h5 h6 h7 h8 h9 h10 hll h12 h13 h14 h15 h16  # of family 146 133 124 108 93 89 97 63 70 66 64 61 62 33 30 29  S 7487.676 6542.610 6316.309 4553.519 4105.946 3466.416 4850.073 2794.117 2983.738 2559.862 2678.309 2630.163 2878.470 1396.275 998.844 807.060  E(S) 7443.953 6142.430 5745.721 5070.933 4166.365 3999.288 4148.990 3115.907 2909.471 2640.701 2723.217 2708.422 2702.198 1337.111 1197.097 997.585  Var(S) 235348.436 223082.326 180879.428 176068.694 145694.779 141407.77 137093.53 113221.903 114481.785 95908.587 92187.825 100397.066 98142.93 51932.961 37983.311 28156.775  P value  Z 0.09 0.847 1.34 -1.233 -0.158 -1.417 1.893 -0.956 0.219 -0.261 -0.148 -0.247 0.563 0.26 -1.017 -1.135  0.928 0.397 0.180 0.218 0.874 0.156 0.058 0.339 0.826 0.794 0.882 0.805 0.574 0.795 0.309 0.256  149  Table A54: Frequencies of possible haplotypes generated for the gene of Toll-like receptor 4 by the FBAT program  Estimates of frequency:  Haplotypes  hl h2 h3 h4 h5 h6 h7 h8 h9 h10 hll h12 h13 h14 h15 h16 h17 h18 h19 h20 h21 h22 h23  2222122 1122122 1122112 1212222 1221221 1222222 1212221 1222122 1221222 2222112 1222221 1122121 1212122 1212121 1212212 1222112 1112122 1222212 1122222 1221212 2222121 1221111 1221121  0.277 0.225 0.145 0.104 0.104 0.068 0.042 0.016 0.008 0.002 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000  150  Table A55: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and age of diagnosis by the FBAT program. Age of diagnosis was logarithmically transformed for normality  Haplotype  hl h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 h16 h17 h18 h19 h20 h21 h22 h23  # of family  238 221 170 133 117 90 61 21 13 2 3 2 1 1 1 1 1 0 0 1 1 0 0  S -43.878 -31.153 -24.737 -20.116 -17.313 -20.642 -15.974 -3.244 -2.272 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  E(S) -53.555 -22.668 -30.509 -16.067 -12.126 -19.783 -14.514 -4.737 -5.136 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  Var(S)  61.458 51.439 34.862 26.067 25.422 21.797 15.556 4.760 4.983 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  Z  1.234 -1.183 0.978 -0.793 -1.029 -0.184 -0.370 0.684 1.283 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  P value  0.217 0.237 0.328 0.428 0.304 0.854 0.711 0.494 0.200 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  151  Table A56: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value by the FEAT program  Haplotype  hl h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 h16 h17 h18 h19 h20 h21 h22 h23  # of family  215 198 149 114 112 76 57 18 10 2 2 2 1 1 1 0 1 1 0 1 1 0 0  S  E(S)  Var(S)  Z  89.665 86.617 38.735 28.836 44.000 20.103 13.377 0.364 -0.178 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  87.686 78.920 50.966 28.579 41.269 16.635 13.234 5.476 0.898 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  83.367 81.316 56.578 33.093 37.156 22.980 20.769 7.912 1.261 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  0.217 0.854 -1.626 0.045 0.448 0.724 0.031 -1.817 -0.959 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  P value  0.828 0.393 0.104 0.964 0.654 0.469 0.975 0.069 0.338 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  152  Table A57: Frequencies of possible haplotypes generated for the Herne oxygenase-1 gene by the FBAT program  Estimates of frequency:  Haplotypes  hl h2 h3 h4 h5 h6 h7 h8 h9 h10 hll h12 h13 h14 h15 h16 h17 h18 h19 h20 h21  211211 112221 211221 212221 221211 212122 212121 211212 111211 111221 112211 211222 221221 112122 212222 112222 122221 212211 221212 121211 111222  0.404 0.308 0.102 0.068 0.039 0.037 0.013 0.009 0.005 0.004 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.000  153  Table A58: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and age of diagnosis by the FBAT program. Age of diagnosis was logarithmically transformed for normality  Haplotype  hl h2 h3 h4 h5 h6 h7 h8 h9 h10 h11 h12 h13 h14 h15 h16 h17 h18 h19 h20 h21  # of family  255 221 124 83 52 49 18 10 9 8 2 2 1 1 2 1 1 0 1 1 0  S  E(S)  Var(S)  Z  -41.328 -22.466 -13.264 -11.206 -16.957 -5.699 -8.772 -1.623 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  -49.460 -30.480 -14.073 -8.905 -12.072 -2.119 -5.713 -2.611 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  71.522 58.029 23.826 17.249 12.119 8.887 3.132 2.088 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  0.962 1.052 0.166 -0.554 -1.403 -1.201 -1.728 0.684 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  P value  0.336 0.293 0.868 0.580 0.161 0.230 0.084 0.494 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  154  Table A59: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and FEV1 standard deviation value by the FEAT program  Haplotype  hl h2 h3 h4 h5 h6 h7 h8 h9 h10 hil h12 h13 h14 h15 h16 h17 h18 h19 h20 h21^  # of family  S  233 115.283 198 75.282 114 17.461 69 14.362 44 19.581 47 7.874 15 1.953 8 n/a 8 n/a 7 n/a 3 n/a 2 n/a 2 n/a 0 n/a 1 n/a 1 n/a 1 •n/a 0 n/a 0 n/a 1 n/a 0 n/a  E(S)  Var(S)  Z  105.434 76.587 26.854 14.004 15.284 10.227 2.011 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  89.373 69.100 33.189 14.980 12.130 12.511 1.592 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  1.042 -0.157 -1.630 0.092 1.234 -0.665 -0.046 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  P  0.298 0.875 0.103 0.926 0.217 0.506 0.963 n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a n/a  ^  Table A60: Haplotype analysis between the haplotypes formed by the five selected SNPs in the Factor B gene and age of Pseudomonas aeruginosa infection by Hapstat, in order to determine any association between the haplotypes and the phenotype  00000 11001 11001 Sex CFTR genotype  Estimated value  Standard Error  Z score  P value  -0.306 0.014 -0.174 -0.117 0.089  0.196 0.179 0.229 0.171 0.120  -1.549 0.076 -0.759 -0.687 0.739  0.121 0.939 0.448 0.492 0.460  155  Table A61: Haplotype analysis between the haplotypes formed by the four selected SNPs in the Complement factor 3 gene and age of Pseudomonas  aeruginosa infection by Hapstat, in order to determine any association between the haplotypes and the phenotype  0001 0010 0011 0100 0101 1010 1101 1110 Sex CFTR genotype  Estimated value  Standard Error  Z score  P value  -0.341 -0.401 -0.264 -0.601 -0.506 -0.526 -0.070 -0.474 -0.090 0.0589  0.317 0.238 0.268 0.383 0.385 0.487 0.323 0.360 0.184 0.1305  -1.077 -1.687 -0.983 -1.571 -1.315 -1.078 -0.216 -1.317 -0.489 0.4512  0.282 0.092 0.326 0.116 0.188 0.281 0.829 0.188 0.625 0.6518  Table A62: Haplotype analysis between the haplotypes formed by the seven selected SNPs in the Toll-like receptor 4 gene and age of  Pseudomonas aeruginosa infection by Hapstat, in order to determine any association between the haplotypes and the phenotype  0011001 0011011 0101111 0110110 0111111 Sex CFTR genotype  Estimated value  Standard Error  Z score  P value  0.133 -0.070 -0.080 0.089 0.096 -0.188 0.169  0.198 0.160 0.236 0.249 0.209 0.178 0.134  0.670 -0.435 -0.340 0.358 0.457 -1.060 1.261  0.503 0.664 0.734 0.720 0.648 0.289 0.207  156  Table A63: Haplotype analysis between the haplotypes formed by the six selected SNPs in the Heme oxygenase-1 gene and age of Pseuclomonas  aeruginosa infection by Hapstat, in order to determine any association between the haplotypes and the phenotype  001110 100110 101011 110100 Sex CFTR genotype  Estimated value  Standard Error  Z score  P value  0.072 0.034 0.363 0.173 0.171 0.107  0.156 0.220 0.271 0.265 0.182 0.129  0.464 0.156 1.338 0.653 0.943 0.827  0.643 0.876 0.181 0.514 0.346 0.409  157  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0066649/manifest

Comment

Related Items