Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Polymorphisms of CF modifier genes : their relationship to Pseudomonas aeruginosa infection and severity… Yung, Rossitta Pui Ki 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
24-ubc_2008_fall_yung_rossitta_pui_ki.pdf [ 8.53MB ]
Metadata
JSON: 24-1.0066649.json
JSON-LD: 24-1.0066649-ld.json
RDF/XML (Pretty): 24-1.0066649-rdf.xml
RDF/JSON: 24-1.0066649-rdf.json
Turtle: 24-1.0066649-turtle.txt
N-Triples: 24-1.0066649-rdf-ntriples.txt
Original Record: 24-1.0066649-source.json
Full Text
24-1.0066649-fulltext.txt
Citation
24-1.0066649.ris

Full Text

Polymorphisms of CF modifier genes: Their relationship to Pseudomonas aeruginosa infection and severity of disease in CF patients by Rossitta Pui Ki Yung B.Sc., University of British Columbia, 2002 BMLSc., University of British Columbia, 2003 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Experimental Medicine) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) April 2008 Rossitta Pui Ki Yung, 2008 Abstract Cystic Fibrosis is one of the most common genetic recessive diseases among Caucasians and is caused by mutations in the Cystic Fibrosis Transmembrane conductance Regulator (CFTR) gene on chromosome 7. There are different classes of CFTR mutation, leading to differences in disease severity among patients. In addition to the CFTR genotype, secondary genetic factors, modifier genes, also influence CF phenotypes. Due to the dysfunction of CFTR protein and production of thickened mucus, bacterial infection in the lungs is favored and can lead to further clinical complications in CF patients. Pseudomonas aeruginosa is one of the most common bacteria detected among patients. The aim of this project was to investigate four candidate modifier genes, Factor B, Complement Factor 3, Toll-like Receptor 4 and Heme oxygenase-1, which might affect the status of Pseudomonas aeruginosa infection. A total of 22 single nucleotide polymorphisms (SNPs) were selected in these four genes and they were tested against five phenotypic traits, including age of diagnosis, FEV1% predicted and standard deviation value, age of first Pseudomonas aeruginosa infection and Pseudomonas aeruginosa infection status. Among the selected SNPs, both case-control studies and family-based analysis were performed in order to establish any correlation between the genotypes and the phenotypes. In addition, haplotype analysis was performed to determine whether there was interaction between SNPs or whether there were unidentified SNPs in the vicinity of the selected ones that might contribute to the observed phenotypic traits. Among the 22 chosen SNPs, 13 of them were found to be significantly linked to one or more of the tested phenotypes. The three most significant associations were BF_2557 with lung function, HMOX1_9531 with lung function and BF_7202 with age of diagnosis. Several haplotypes were significantly associated with one of the five phenotypes. There was no evidence for the presence of unidentified SNPs or interaction between SNPs. Most of haplotype associations were likely due to the presence of a single SNP which was found to be significantly linked to the phenotype. Conclusively, both SNPs and haplotype analyses suggest that the four candidate genes are modifiers of disease severity in CF. Table of Contents Page Abstract^ ii Table of Contents^ iii List of Tables viii List of Figures^ .xv Acknowledgements xvi Chapter 1: Introduction 1.1^Cystic fibrosis transmembrane regulator (CFTR) mutation^ 1 1.2^CFTR protein and its function^ 2 1.3^Diagnosis and clinical symptoms of CF disease^4 1.4^Pseudomonas aeruginosa infection in CF patients 5 1.5^Candidate modifier genes^ 6 (a) Factor B and Complement factor 3^ 7 (b) Toll-like receptor 4^ 10 (c)^Heme oxygenase-1 11 1.6^Single nucleotide polymorphism (SNP)^ 12 1.7^Association study^ 13 1.8^Thesis objectives 14 Chapter 2: Materials and Methods 2.1^Patients recruitment^ 15 2.2^Quality control 15 2.3^Selection of SNPs^ 16 2.4^TaqMan assays 21 2.5 Primers and probes for genotyping^ 23 2.6^Real time PCR reactions^ 25 2.7^Sequencing^ 28 2.8^Dilution of samples^ 30 2.9^Preparation for polymerase chain reaction and genotyping^ 30 2.10 Polymerase chain reaction^ 30 2.11 Genotyping^ 31 2.12 Quantification of DNA samples^ 32 2.13 Re-genotyping^ 35 2.14 Statistical data analysis^ 36 Chapter 3: Results ^3.1^PicoGreen reaction^ 43 ^ 3.2^Analysis of the genotypic data for Mendelian inconsistencies^ 44 3.3^Sequencing result by the University of British Columbia^ 44 3.4^Genotypes of the participating individuals^45 3.5^Phenotypic characteristics of the study subjects^46 3.6 Determination of Hardy-Weinberg equilibrium in the patient population^ 46 3.7^Comparison of genotype frequencies in the parent population and in online database^ 49 3.8 ANOVA analysis of the influence of genotype on the phenotypes^ 51 (a)^Factor B - (I) Age of diagnosis^ 51 - (II) FEV1 predicted value 52 - (III) FEV1 standard deviation value^53 (b)^Complement factor 3 - (I) Age of diagnosis^ 53 - (II) FEV1 predicted value 53 - (III) FEV1 standard deviation value^53 (c)^Toll-like receptor 4 - (I) Age of diagnosis^ 54 - (II) FEV1 predicted value 54 - (III) FEV1 standard deviation value^54 (d)^Heme oxygenase-1 - (I) Age of diagnosis^ 54 - (II) FEV1 predicted value 55 - (III) FEV1 standard deviation value^55 3.9^Regression analysis^ 56 (a) BF_2557^ 57 (b) BF_7202 58 (c) TLR4_1859^ 59 (d) HMOX1_2790 60 (e)^HMOX1_9531^ 61 iv 3.10 Age of onset analysis for the age of first Pseudomonas aeruginosa infection^ 61 (a) Factor B^ 62 (b) Complement factor 3^ 62 (c) Toll-like receptor 4 63 (d) Heme oxygenase-1^ 64 3.11 Pseudomonas aeruginosa infection status^ 64 (a) Factor B^ 64 (b) Complement factor 3^ 64 (c) Toll-like receptor 4 65 (d) Heme oxygenase-1^ 65 3.12 Re-genotyping^ 66 3.13 FBAT analysis of phenotypic characteristics^ 67 (a) Age of diagnosis^ 67 (b) FEV1 predicted value 68 (c) FEV1 standard deviation value^ 69 3.14 Haplotype analysis by GRui program 70 (a) Factor B -^(I) Age of diagnosis^ 71 -^(II) FEV1 predicted value 72 -^(III) FEV1 standard deviation value^ 72 (b) Complement factor 3 (I) Age of diagnosis^ 73 (II) FEV1 predicted value 74 (III) FEV1 standard deviation value^ 75 (c) Toll-like receptor 4 (I) Age of diagnosis^ 75 (II) FEV1 predicted value 76 (III) FEV1 standard deviation value^ 77 (e) Heme oxygenase-1 (I) Age of diagnosis^ 77 (II) FEV1 predicted value 78 (III) FEV1 standard deviation value^ 79 3.15 Haplotype analysis by the FEAT program.^ 80 (a) Factor B -^(I) Age of diagnosis^ 80 -^(II) FEV1 predicted value 80 -^(III) FEV1 standard deviation value^ 81 V (b) Complement factor 3 - (I) Age of diagnosis^ 82 - (II) FEV1 predicted value 82 - (III) FEV1 standard deviation value^83 (c) Toll-like receptor 4 - (I) Age of diagnosis^ 83 - (II) FEV1 predicted value 84 - (III) FEV1 standard deviation value^84 (d) Heme oxygenase-1 - (I) Age of diagnosis^ 85 - (II) FEV1 predicted value 85 - (III) FEV1 standard deviation value^86 3.16 Haplotype analysis of the age of first Pseudomonas aeruginosa infection by Hapstat^86 (a) Factor B^ 86 (b) Complement factor 3^ 86 (c) Toll-like receptor 4 87 (d) Heme oxygenase-1^ 87 Chapter 4: Discussion 4.1 Analysis of the genotypic data for Mendelian inconsistencies^ 88 4.2^Genotypes of all participating individuals^89 4.3^Genotyping analysis of the parental population^89 4.4 ANOVA, regression analysis and FBAT^ 91 4.5^Age of onset analysis^ 95 4.6^Pseudomonas aeruginosa infection status^ 96 4.7 Haplotype analysis by the GRui and FBAT programs^97 (a) Haplotype analysis by RGui program^ 97 (b) Haplotype analysis by the FBAT program 98 4.8^Haplotype analysis of the age of first Pseudomonas aeruginosa infection by Hapstat^ 100 4.9^Position of SNPs with significant association and their effect of gene function^ 101 4.10 Summary^ 106 4.11 Future studies^ 106 vi References^ 109 Appendix 113 vii List of Tables Table 1^Factor B polymorphism selection^ 18 Table 2^Complement factor 3 polymorphism selection^19 Table 3^Toll-like receptor 4 polymorphism selection 20 Table 4^Heme oxygenase-1 polymorphism selection 21 Table 5^The sequence of all primers used for SNP assays^24 Table 6^The sequence of all probes used for SNP assays 25 Table 7 ^ ^Genotype discrepancies between the real time PCR results in this study and those posted by the Seattle SNPs website^ 28 Table 8^The sequence of both left and right primers used for sequencing of BF_7202 for Coriell samples^28 Table 9^Layout of a 96-well plate with standard DNA samples and DNA samples from source plates sent by Toronto^ 35 Table 10^Hardy-Weinberg equilibrium among the parental population 47 Table 11^Comparison of allele frequencies between the genotyping results and the reported values on either IIPGA or Seattle SNPs websites^ 50 Table 12^The ANOVA result of examining age of diagnosis among different genotypes of the selected SNPs in Factor B^ 52 Table 13^The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Factor B^ 52 Table 14^The ANOVA result of examining FEV1 standard deviation value among different genotypes of the selected SNPs in Factor B^ 53 Table 15^The ANOVA result of examining age of diagnosis among different genotypes of the selected SNPs in Toll-like receptor 4^ 54 Table 16^The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Heme oxygenase-1^ 55 Table 17^The ANOVA result of examining FEV1 standard deviation value among different genotypes of the selected SNPs in Heme oxygenase-1^ 55 Table 18^Regression analysis of the association of BF_2557 and FEV1 predicted value with confounding factors^ 57 Table 19^Regression analysis of the association of BF_2557 and FEV1 standard deviation value with confounding factors^ 57 Table 20^Regression analysis of the association of BF_7202 and age of diagnosis with confounding factors^ 58 Table 21^Regression analysis of the association of BF_7202 and FEV1 predicted value with confounding factors^58 Table 22^Regression analysis of the association of BF_7202 and FEV1 standard deviation value with confounding factors^ 59 Table 23^Regression analysis of the association of TLR4_1859 and age of diagnosis with viii confounding factors^ 59 Table 24^Regression analysis of the association of HMOX1_2790 and FEV1 predicted value with confounding factors^ 60 Table 25^Regression analysis of the association of HMOX1_2790 and FEV1 standard deviation value with confounding factors^ 60 Table 26^Regression analysis of the association of HMOX1_9531 and FEV1 predicted value with confounding factors^ 61 Table 27^Regression analysis of the association of HMOX1_9531 and FEV1 standard deviation value with confounding factors^ 61 Table 28^Age of onset analysis investigating sssociation between age of first Pseudomonas aeruginosa infection and selected SNPs in Toll-like receptor 4^ 63 Table 29 ^ ^Chi square test for investigating the relationship between different genotypes of the selected SNPs in Complement factor 3 and Pseudomonas aeruginosa infection status^ 65 Table 30^Comparing the result of genotyping and re-genotyping^66 Table 31^Detailed results of FBAT analysis of age of diagnosis and SNPs under additive model^ 67 Table 32^Detailed results of FEAT analysis of age of diagnosis and SNPs under dominant model 68 Table 33^Detailed results of FEAT analysis of FEV1 predicted value and SNPs under additive model^68 Table 34^Detailed results of FEAT analysis of FEV1 predicted value and SNPs under dominant model 69 Table 35^Detailed results of FEAT analysis of FEV1 standard deviation value and SNPs under additive model^69 Table 36^Detailed results of FEAT analysis of FEV1 standard deviation value and SNPs under dominant model 70 Table 37^Haplotype analysis for investigation of correlation relationship between haplotype of Factor B and age of diagnosis with no adjustment for confounding factors^71 Table 38^Haplotype analysis for investigation of correlation relationship between haplotype of Factor B and age of diagnosis with adjustment for confounding factors^ 71 Table 39 Haplotype analysis for investigation of correlation relationship between haplotype of Factor B and FEV1 standard deviation value with no adjustment for confounding factors^ 72 Table 40 Haplotype analysis for investigation of correlation relationship between haplotype of Factor B and FEV1 standard deviation value with adjustment for confounding factors^ 73 Table 41^Haplotype analysis for investigation of correlation relationship between haplotype of complement factor 3 and age of diagnosis with no adjustment for confounding factors^ 73 Table 42^Haplotype analysis for investigation of correlation relationship between haplotype of Complement factor 3 and age of diagnosis with adjustment for confounding factors^ 74 ix Table 43^Haplotype analysis for investigation of correlation relationship between haplotype of Toll-like receptor 4 and age of diagnosis with no adjustment for confounding factors^ 75 Table 44^Haplotype analysis for investigation of correlation relationship between haplotype of Toll-like receptor 4 and age of diagnosis with adjustment for confounding factors^ 76 Table 45 Haplotype analysis for investigation of correlation relationship between haplotype of Heme oxygenase-1 and FEV1 predicted value with no adjustment for confounding factors^ 78 Table 46^Haplotype analysis for investigation of correlation relationship between haplotype of Heme oxygenase-1 and FEV1 predicted value with adjustment for confounding factors^ 78 Table 47^Haplotype analysis for investigation of correlation relationship between haplotype of Heme oxygenase-1 and FEV1 standard deviation value with no adjustment for confounding factors^ 79 Table 48^Haplotype analysis for investigation of correlation relationship between haplotype of Heme oxygenase-1 and FEV1 standard deviation value with adjustment for confounding factors^ 79 Table 49 ^ ^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Factor B and FEV1 predicted value by the FEAT program.^80 Table 50^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Factor B and FEV1 standard deviation value by the FBAT program^81 Table 51^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Complement factor 3 and age of diagnosis by the FBAT program^ 82 Table 52^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Complement factor 3 and FEV1 standard deviation value by the FEAT program.^ 83 Table 53^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and FEV1 predicted value by the FBAT program.^ 84 Table 54^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and FEV1 predicted value by the FBAT program.^ 85 Table 55^Summary of all significant SNP-phenotype association by ANOVA and FEAT analyses^ 102 Table 56^Summary of all significant haplotype-phenotype association when testing by RGui and FEAT analyses^103 Table 57^Summary Table of SNPs which revealed a significant association with the Pseudomonas aeruginosa infection status and the age of first Pseudomonas aeruginosa infection^ 104 Table 58^Summary table of the conservation score of the selected SNPs in the four candidate genes^105 X Table Al^Concentration of a subset of the DNA samples in the original source plates^ 113 Table A2^Families with non-Mendelian inheritance^ 114 Table A3^Genotype frequency of each of the SNPs examined in the gene of Factor B^ 115 Table A4^Genotype frequency of each of the SNPs examined in the gene of Complement factor 3^ 116 Table A5^Genotype frequency of each of the SNPs examined in the gene of Toll-like receptor 4 117 Table A6^Genotype and allele frequency of each of the SNP examined in the gene of Heme oxygenase-1^119 Table A7^Phenotypic characteristics of the CF patients 120 Table A8^The ANOVA result of examining age of diagnosis among different genotypes of the selected SNPs in Complement factor 3^ 121 Table A9^The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Complement factor 3^ 121 Table A10 The ANOVA result of examining FEV1 sd value among different genotypes of the selected SNPs in Complement factor 3^ 122 Table All The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Toll-like receptor 4^ 122 Table Al2 The ANOVA result of examining FEV1 sd value among different genotypes of the selected SNPs in Toll-like receptor 4^ 123 Table Al3 The ANOVA result of examining age of diagnosis among different genotypes of the selected SNPs in Heme oxygenase-1^ 123 Table A14 Age of Onset Analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Factor B^ 124 Table Al5 Age of Onset Analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Complement factor 3^ 125 Table Al6 Age of Onset Analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Heme oxygenase 1^ 126 Table Al7 Chi squared test for investigating the relationship between different genotypes of the selected SNPs in Factor B and Pseudomonas aeruginosa infection status^127 Table A18 Chi square test for investigating the relationship between different genotypes of the selected SNPs in Toll-like receptor 4 and Pseudomonas aeruginosa infection status^ 128 Table A19 Chi square test for investigating the relationship between different genotypes of the selected SNPs in Heme oxygenase-1 and Pseudomonas aeruginosa infection status^ 129 Table A20 FBAT analysis of the age of diagnosis under the additive model 130 Table A21 FBAT analysis of the age of diagnosis under the dominant model^ 131 xi Table A22^FBAT analysis of FEV1 predicted value under the additive model^ 132 Table A23 FBAT analysis of FEV1 predicted value under the dominant model 133 Table A24 FBAT analysis of FEV1 standard deviation value under the additive model^ 134 Table A25 FBAT analysis of FEV1 standard deviation value under the dominant model 135 Table A26 Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and age of Diagnosis^ 136 Table A27 Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value^ 136 Table A28 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Factor B and FEV1 predicted value, with no adjustment for the confounding factors^ 136 Table A29 Haplotype analysis for investigation of possible correlation between combinations of selected SNPs in Factor B and FEV1 predicted value, with adjustment for the confounding factors^ 137 Table A30 Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value^ 137 Table A31 Frequencies of possible haplotypes generated for the Complement factor 3 gene when determining the presence of any correlation between the haplotypes and age of diagnosis^ 138 Table A32 Frequencies of possible haplotypes generated for the Complement factor 3 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value^ 138 Table A33 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 predicted value, with no adjustment for the confounding factors^ 139 Table A34 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 predicted value, with adjustment for the confounding factors^ 139 Table A35 Frequencies of possible haplotypes generated for the Complement factor 3 gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value^ 140 Table A36 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 standard deviation value, with no adjustment for the confounding factors^140 Table A37 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 standard deviation value, with adjustment for the confounding factors^141 xii Table A38^Frequencies of possible haplotypes generated for the Toll-like receptor 4 gene when determining the presence of any correlation between the haplotypes and age of diagnosis 141 Table A39^Frequencies of possible haplotypes generated for the Toll-like receptor 4 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value^ 142 Table A40^Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 predicted value, with no adjustment for the confounding factors^142 Table A41 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 predicted value, with adjustment for the confounding factors^143 Table A42^Frequencies of possible haplotypes generated for the Toll-like receptor 4 gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value^ 143 Table A43 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value, with no adjustment for the confounding factors 144 Table A44 Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value, with adjustment for the confounding factors 144 Table A45 Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and age of diagnosis 145 Table A46 Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Heme oxygenase-1 and the age of diagnosis, with no adjustment for the confounding factors^ 145 Table A47 Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and the age of diagnosis, with adjustment for the confounding factors^ 146 Table A48 Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value^ 146 Table A49 Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value^ 147 Table A50 Frequencies of possible haplotypes generated for the Factor B gene by the FBAT program^ 147 Table A51 ^ ^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Factor B and age of diagnosis by the FEAT program.^ 148 Table A52^Frequencies of possible haplotypes generated for the Complement factor 3 gene by the FBAT program 148 Table A53^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Complement factor 3 and FEV1 predicted value by FBAT program.^149 Table A54^Frequencies of possible haplotypes generated for the gene of Toll-like receptor 4 by the FBAT program.^150 Table A55 ^ ^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and age of diagnosis by the FEAT program.^ 151 Table A56^Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value by the FEAT program.^ 152 Table A57^Frequencies of possible haplotypes generated for the Heme-oxygenase-1 gene by the FBAT program.^153 Table A58 Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and age of diagnosis by the FEAT program^ 154 Table A59 Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and FEV1 standard deviation value by the FBAT program.^ 155 Table A60 Haplotype analysis between the haplotypes formed by the five selected SNPs in the Factor B gene and age of Pseudomonas aeruginosa infection by Hapstat^155 Table A61 Haplotype analysis between the haplotypes formed by the four selected SNPs in the Complement factor 3 gene and age of Pseudomonas aeruginosa infection by Hapstat^156 Table A62 Haplotype analysis between the haplotypes formed by the seven selected SNPs in the Toll-like receptor 4 gene and age of Pseudomonas aeruginosa infection by Hapstat^ 156 Table A63 Haplotype analysis between the haplotypes formed by the six selected SNPs in the Heme oxygenase-1 gene and age of Pseudomonas aeruginosa infection by Hapstat^157 xiv List of Figures Figure 1^Structure of the CFTR protein^ 3 Figure 2^Function of CFTR protein 4 Figure 3^The classic complement pathway 8 Figure 4^The alternative pathway^ 9 Figure 5^Function of Heme oxygenase-1 in lung disease^11 Figure 6^The mechanism underlying TagMan assays^ 23 Figure 7^Sample genotyping result of Factor B SNP 2557^31 Figure 8^Example genotyping result for SNP 149 of HMOX1 33 Figure 9^Normalization of the FEV1 percent predicted value by age^ 38 Figure 10 Standard curve of optical density versus DNA concentration of the serially diluted A DNA samples^43 xv Acknowledgments I would like to take this opportunity to express my appreciation for those who had given me their valuable help and support: Dr. Sandford for his supervision, patience, and precious time for guiding me and helping me, starting from the brain-storming of this project to the execution of the laboratory work and finally to the write-up of this thesis. The successful completion of this thesis would have been impossible without his guidance and supportive efforts. Dr. Peter Pare, Dr. Pearce Wilcox and Dr. John Hill for being members of my supervisory committee and providing me with valuable opinions on my thesis. The CF Modifier Gene Project team at UBC especially Roxanne Rousseau and at The Hospital for Sick Children in Toronto for supplying the DNA samples and phenotypic data of the participating patients and providing relevant information. Loubna Akhabir and Dorota Stefanowicz for helping me in the laboratory work and providing me their professional technical advice. Karey Shumansky for her expert knowledge and patient guidance in Statistics. My family and friends for their understanding and support. xvi Chapter 1 Introduction Cystic Fibrosis (CF) is one of the most common genetic recessive disorders among Caucasians. In 2000, the estimated rate of CF in Canada was 1 in 3608 births although the rate may decline if antenatal carrier screening of the general population is implemented. [1] Mutations in the cystic fibrosis transmembrane conductance regulator (CFTR) gene located on chromosome 7 have been identified to be the cause of the disease by the technique of positional cloning. [2] A wide diversity of CF pulmonary phenotypes is observed among CF patients. Furthermore, they may suffer from other CF-related medical conditions, for example, obstructive azoospermia and pancreatitis. [3] It is clear that CFTR genotype alone does not account for the broad spectrum of severity of the disease. It is likely that secondary genetic factors separate from the CFTR locus significantly influence CF phenotypes, with these loci termed modifier genes. [4] A number of putative modifier genes have been identified, for example, mannose-binding lectin, glutathione S- transf erases, transforming growth factor beta-1, tumor necrosis factor- alpha, beta-2 adrenergic receptor and HLA class II. [5] The investigation of CF modifier genes may offer new insights into the pathophysiology of CF and provide leads for new CF therapeutic interventions. 1. Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) Mutations Cystic Fibrosis is caused by a defect in the CFTR protein and the function of this chloride channel will be discussed in detail in the next section. At the molecular level, the CFTR gene is located at chromosome 7q31 and it includes 27 exons for a total size of 230kb. There are more than 1000 CFTR mutations [6] and five categories have been established to describe CFTR mutations with regard to their effect on CFTR function as a chloride channel. [7, 8] This classification scheme is summarized in the following table: 1 Class Description Class 1 Mutations^that cause defective synthesis and thus total^or partial^absence^of^CFTR protein.^Consequently,^there^is^a loss of conductance of Cl - by the channel. Class 2 Mutations that affect maturation of CFTR protein.^CFTR mRNA is formed, but the protein fails to mature and does not traffic correctly to the cell membrane. The protein is either absent or present in very small quantities in the membrane. Class 3 Mutations that disturb the regulation of the Cl - channel and these usually occur in the ATP binding domains. CFTR protein is produced and traffics to the apical cell membrane. Class 4 Mutations^that^affect^the^conductance^of^the^Cl -^channel which result in the reduction of ion flux and/or modification of selectivity. CFTR protein is produced and traffics to the apical cell membrane. Class 5 Mutations that alter the stability of mRNA or the stability of mature CFTR protein. Lower levels of CFTR protein are produced and traffic to the apical cell membrane is reduced. Among the more than 1000 CFTR mutations, delta F508 is the most common and it is detected in about 70% of CF genes in North America and North Europe. [9] Delta F508 is a mutation that results in the deletion of a phenylalanine amino acid at position 508 of the hydrophilic region for ATP binding which is called the nucleotide binding fold (NBF). This mutation is categorized as a Class 2 mutation. For patients with this type of CFTR mutation, no or a small amount of CFTR protein can be produced in the lungs, pancreas or other organs. The transport of chlorides will be aborted or reduced which results in the thickening of mucus in the affected organs. This condition favors bacterial infection and eventually, CF patients are more susceptible to infection in the respiratory tract because of their compromised defense mechanism. [10] 2. CFTR Protein and its Function The CFTR protein is a membrane associated, N-linked glycoprotein located at the apical membrane (Figure 1). CFTR is composed of two repeated motifs with a total size of 1480 amino acids. Each motif consists of a hydrophilic membrane-spanning domain with six helices and a hydrophilic NBF region which is responsible for ATP binding. The two motifs are connected by a cytoplasmic regulatory region (R domain) with a number of charged residues. [9] 2 R Domain T D 1^TM 2 Figure 1: Structure of the CFTR protein. There are three major regions in a CFTR protein: (1) two membrane-spanning domains (TMD1 and TMD2) each composed of six transmembrane segments; (2) two nucleotide binding domains where binding of ATP occurs (NBD1 and NBD2); and (3) a regulatory region (R Domain) which is responsible for the phosphorylation by protein kinase A. As indicated in the diagram, some of the common CFTR mutations are located in the NBFs. [11] CFTR has been classified as a member of the family of ATP-binding cassette (ABC) transporters. The CFTR mainly functions as a chloride channel in the lungs, pancreas, liver, digestive tract, reproductive tract and skin. Upon phosphorylation by protein kinase A in the regulatory region, a conformational . change takes place for transporting the chloride ions across the membrane with the consumption of ATP as the energy source. In addition to working as a cAMP-induced chloride channel, CFTR is also capable of regulating other ion channels, for example, it regulates an outwardly rectifying chloride channel (ORCC), an epithelial sodium ion channel (eNaC), and at least two outwardly rectifying potassium ion channels (RMOK1 and RMOK2), as shown in Figure 2. [12] 3 Apical Basolateral Figure 2: Function of the CFTR protein. CFTR protein can be found at the apical membrane of epithelial cells in the lungs, pancreas, liver, digestive tract, reproductive tract and skin. With normally functioning CFTR protein, chloride ions are transported across the membrane with ATP as the energy source upon phosphorylation by protein kinase A. In addition, CFTR can then regulate the transport of chloride ions through an outwardly rectifying chloride channel (ORCC) in the presence of ATP; and the transport of sodium ions through an epithelial sodium ion channel (eNaC) in the absence of ATP. CFTR also regulates the pumping out of potassium ions through an outwardly rectifying potassium ion channel (RMOK). [13] 3. Diagnosis and Clinical Symptoms of CF Disease Detection of CFTR mutations by genotyping is a powerful method for diagnosing CF patients, and it is especially important for infants where both parents are known to be CF carriers. Clinical complications (for example, meconium ileus and failure to thrive) may present in some of the new-born babies which can be clues for the diagnosis of CF in those infants. Testing the sweat chloride level can confirm the diagnosis in such situations, since CFTR is the only channel for 4 reabsorption of chloride ions in the sweat glands. CF patients with abnormal chloride channels are found to have a sweat chloride level which is five times higher than unaffected individuals. [14] However, patients with some forms of CFTR mutation show milder clinical symptoms and a borderline sweat test result. Consequently, DNA screening should be used for confirmation of the disease. In addition to the lungs, CFTR is expressed in other organs, for example, liver, pancreas, intestines and reproductive tract. Consequently, CF patients commonly suffer from problems in those organs. Approximately 85% of CF patients have pancreatic insufficiency. [15] Complications can result due to the lack of functional CFTR protein in the ductules of the pancreas, including the failure to produce adequate bicarbonate ions and water; the blockage of ductules further results in inability of the pancreatic enzymes to reach the intestine in order to facilitate normal digestion. However, exogenous enzymes can be prescribed to patients in order to correct for problems in this exocrine organ. [16] Intestinal obstructions, both meconium ileus at birth and distal intestinal obstruction syndrome (DIOS) later in life, are another frequently seen manifestation in CF patients. Meconium ileus and DIOS patients fail to pass meconium at birth and suffer from excessive mucus production, respectively. Constipation is a common outcome. [17] Some CF patients may suffer from CF-related liver disease. Without functional CFTR in the lining of the epithelial cells in the biliary ductules, the ductules become blocked with mucous secretion and the accumulation of eosinophils. [18] Further symptoms include obstructive cirrhosis, splenomegaly and hypersplenism. A total of 1-2% of deaths in CF patients are liver-related. [19] Other CF complications include azoospermia in males [20] and chronic sinusitis. 4. Pseudomonas aeruginosa Infection in CF Patients Staphylococcus aureus, Pseudomonas aeruginosa, Burkholderia cepacia, Hemophilus influenzae and Xanthomonas maltophilia are common bacteria which can be found in the lungs of many CF patients. Among them, Pseudomonas aeruginosa is found in nearly 80% of CF patients and the morbidity and the mortality rates increase when individuals are chronically colonized with this pathogen. [21] The bacteria adhere to host tissues via their lipopolysaccharide layer, which also protects them from digestion and lysing by leukocytes. The organisms mainly grow in the bronchioles of CF patients as a biofilm and they are relatively resistant to the host's defense system and antibiotics by causing a low phagocytic response. [22] Furthermore, Pseudomonas aeruginosa can secrete several toxins and chemicals upon infection, for example, lipopolysaccharide, exotoxin A, exoenzyme S, elastase, alkaline proteinase, and phospholipase C protein. [23] The body's immune system tries to eradicate the bacteria; however, CF patients may suffer from lung infection due to failure of recruitment of innate defense mechanisms and involvement of immune complex formation. [24] This inflammatory situation induces a phenotypic shift of the pathogen from non-mucoid to mucoid type, which leads to exacerbation of the disease [25] The reason why antibiotics or other therapeutic methods cannot eradicate Pseudomonas aeruginosa infection in CF patients is still unknown. [26] Some individuals are detected to be CF patients shortly after birth due to a positive family history of CF and/or acute clinical symptoms, while others are diagnosed at a later age. Usually the former group of patients is found to exhibit more serious CF-related problems, and the type of CFTR mutations present may be one of the causes of this increased severity, in addition to modifier genes. Not every CF patient becomes infected by Pseudomonas aeruginosa, however, the rates of morbidity and mortality increase significantly once the individual is chronically colonized with this pathogen. We hypothesized that the particular genotype of any of the polymorphisms in the four candidate genes that we investigated might contribute to the difference between patients with regard to the status of Pseudomonas aeruginosa infection and therefore the severity of the disease. 5. Candidate Modifier Genes Previous studies have identified some putative modifier genes which may contribute to the diversity of disease severity in CF patients. The aim of this project was to investigate specific genes that are involved in Pseudomonas aeruginosa infection in CF patients and to investigate relationships between polymorphisms in those genes 6 and disease severity. Four candidate modifier genes were chosen for experiment in this project, namely, Factor B, Complement Factor 3, Toll-like Receptor 4 and Heme Oxygenase-l. (a) Factor B and Complement Factor 3 The human immune system is responsible for providing different degrees of defenses which protect the body from being harmed by foreign substances or antigens. There are many components in the immune system and one of them is the cascade of complement reactions which helps in fighting infections. Complement reactions are a series of non-specific reactions in the host defense system to destroy invading micro- organisms. Complement reactions can be sub-divided into two pathways: the classical pathway and the alternative pathway. The classical pathway is initiated by the binding of antibodies to complement protein Clq which results in cleavage of Clq into Cls and Clr. The complete set of reactions in the classical pathway is shown in Figure 3: 7 Figure 3: The classical complement pathway. Upon binding of the microbe to the antibody, Clq is cleaved into Clr and Cls which both catalyze the formation of C4b2b complex. After combining with C3b, the C4b2b3b complex catalyzes the cleavage of C5 into C5b which eventually results in the synthesis of membrane attack complex (MAC). (Note: each dotted line shows a catalytic effect on the corresponding reaction in box) In comparison, the alternative pathway is antibody-independent. Consequently, the presence of the microbes can directly trigger the alternative pathway. However, the two pathways share some of the same proteins. The complete set of reactions in the alternative pathway is shown in Figure 4: 8 Microbe-FactorB(a/b) Figure 4: The alternative pathway. The presence of the microbe (light beige ellipses) can directly trigger the formation of C3b-Factor B complex. After modification of the complex by Factor D and joining of another C3b unit, the resulting complex of C3bBb3b cleaves C5 into C5b and subsequently leads to the synthesis of the membrane attack complex (MAC). (Note: each dotted line shows a catalytic effect on the corresponding reaction in box) Both complement pathways result in the generation of the membrane attack complex (MAC) which leads to neutrophil activation and bacterial cell lysis. In addition to the major triggering factors of the two pathways, they also differ in the reactions for the generation of complement C3. The key player is C4 and factor B in the classical pathway and alternative pathways, respectively. It has been found that the alternative pathway is critical for the generation of a protective response against Pseudomonas aeruginosa in a murine model of pneumonia. [27] In that study, C3 and factor B deficient mice were infected with Pseudomonas aeruginosa via intranasal inoculation and they were found 9 to have a higher mortality rate when compared with wild-type mice. However, the same outcome was not seen for C4 deficient mice. This analysis indicated that alternative pathway was critical for the eradication of Pseudomonas aeruginosa infection from the host. This result motivated us to investigate the role of the alternative pathway in host defense against Pseudomonas aeruginosa infection in CF patients. Consequently, we hypothesized that the alternative pathway also plays a key role in eradication of Pseudomonas aeruginosa from the lungs of CF patients and that polymorphisms in genes encoding for factor B and C3 are associated with severity of disease in CF patients. (b) Toll-like 4 Receptor The lipopolysaccharide (LPS) molecules of Pseudomonas aeruginosa are composed of three main parts, namely lipid A, core and 0- polysaccharide. [28] Modifications of lipid A are observed in Pseudomonas aeruginosa from CF patients, for example, the pathogen synthesizes a unique hexa-acylated lipid A containing palmitate and aminoarabinose during adaptation to the CF airway while it produces a novel hepta-acylated lipid A in CF patients with severe pulmonary disease. [29] Lipid A of Pseudomonas aeruginosa is recognized by toll- like receptor 4 on the epithelial surface of the lungs. A 222 amino acid region in the extracellular portion of human toll-like receptor 4 has been identified using Pseudomonas aeruginosa lipid A with different levels of acylation and this region has been found to be crucial for the recognition of the lipid A which is specifically found in CF patients. [30] Toll-like receptor 4 is a transmembrane LPS receptor. Release of antimicrobial peptides, inflammatory cytokines and chemokines, and co-stimulatory molecules is triggered upon binding of Pseudomonas aeruginosa. All of these result in the activation of the innate immune response to the pathogens. Therefore, we hypothesized that polymorphisms in the gene encoding toll-like receptor 4 could lead to failure in recognition of Pseudomonas aeruginosa via its lipid A, the accumulation of the pathogens in lungs of CF patients and therefore increased disease severity. 1 0 Hems — I Oxygenase <Fe ^ 1 Herne Ferritin (illilivertlin^IReductasej CO Billrerdin IXa- - Apoptpsis - Inflammation - Oxidative stress —) Bilirulan IXce (c) Heme Oxygenase-1 Heme oxygenase-1 (H0-1) is an enzyme which is responsible for the catalysis of the oxidative reaction of heme to anti-oxidant molecules, bilirubin and carbon monoxide. [31] There are two isoenzymes of heme oxygenase-1: an inducible HO-1 and a constitutively expressed HO-2. [32] HO-1 is induced by a wide variety of stimuli including conditions of oxidative stress, inflammatory agents, transforming growth factor beta and heat shock. [33] Because of their defective immune defense system, CF patients usually develop secondary acute or chronic bacterial infections and recruitment of the inflammatory defense mechanisms. Excessive inflammation can result in tissue damage which makes the condition of CF patients more severe. However, HO-1 plays a major role in resolving this situation. The major roles of HO-1 have been shown to include anti-inflammatory, anti-apoptotic, and anti- proliferative effects [34] which are illustrated in Figure 5:  Lung Disease - Lung Damage1 E/dialled}CO (-Tissue Injury Figure 5: Function of heme oxygenase-1 in lung diseases. Patients with lung diseases like cystic fibrosis may suffer from inflammation, oxidative stress and apoptosis. Tissue injury can be the final result. However, these complications can be modulated by the production of heme oxygenase which can catalytically break down heme into carbon monoxide, iron and biliverdin-IXa. All those components have a negative effect on the detrimental processes mentioned above and therefore can reduce tissue injury. Heme oxygenase has been conclusively shown to be anti- inflammatory, anti-apoptotic, and anti-proliferative. 11 As mentioned before, the recruitment of the inflammatory defense mechanisms in CF patients against Pseudomonas aeruginosa infection can lead to tissue damage in lungs. Due to the anti-inflammatory, anti- apoptotic and anti-proliferative effect, HO-1 can modulate these adverse outcomes. [35] It has been demonstrated that HO-1 level is raised in CF patients and it is responsible for cytoprotective effects against Pseudomonas aeruginosa infection. [36] The levels of HO-1 and related by-products in CF patients and controls were examined by different methods [36]: (1) heme expression in the lungs was measured by both immunochemistry and quantitative reverse transcription PCR; (2) the level of acute inflammation and an increase in oxidant stress was measured by myelo-peroxidase staining; and (3) iron status was assessed by ferritin staining. All the above tests revealed an increase in HO-1 level in diseased lungs when compared with normal controls. Furthermore, investigation was performed to determine whether Pseudomonas aeruginosa infection was a direct cause of upregulation in HO-1 expression. A cell line of human CF bronchial epithelial cells (IB3.1) was treated with either Pseudomonas aeruginosa or LPS, however, no significant increase of HO-1 protein could be detected in either sample. The same authors also evaluated the survival of IB3.1 cells overpressing HO-1 in response to Pseudomonas aeruginosa infection by transfecting the IB3.1 cells with either pc DNA3.1 empty vector or pc 3.1 DNA 3.1/H0-1 vector [36]. Cells transfected with HO-i vectors were shown to have a higher survival rate when compared with those with empty vectors. As a result of this study, we hypothesized that polymorphisms in the HO-1 gene could result in defective synthesis of HO-1, which in turn could lead to failure in inhibition of tissue injury due to Pseudomonas aeruginosa infection and increased severity of disease in CF patients. 6. Single Nucleotide Polymorphisms This project includes the above four candidate genes, i.e., Factor B, Complement factor 3, Toll-like receptor 4 and Heme Oxygenase- i. A total of 22 single nucleotide polymorphisms (SNPs) in these genes were selected. A detailed description of the SNP selection is presented in the Materials and Methods section. A SNP is defined as a single-base variation in DNA sequence with an allele frequency of at least 1% in a 12 population. Not all SNPs result in alteration of protein structure and function. For example, a synonymous SNP usually occurs when the third nucleotide in the codon is altered but both the synthesis and function of the resulting protein remain unchanged. However, the structure and function of the protein are often changed if either a non-sense SNP or non-synonymous SNP is present. A non-sense SNP leads to premature termination in the elongation of the polynucleotide chain while a non- synonymous SNP results in substitution of one amino acid for another. In addition, the position of the SNP in the entire polynucleotide chain is also important when analyzing its effect. SNPs in coding regions are a common cause of many monogenic disorders, however, SNPs in promoters or other regulatory regions can also show their significance in determination of timing, location and level of gene expression. Even if no impact is found for SNPs in non-coding regions, they may be important in the investigation of the genetic aspects of a disease since they can be used as disease-markers for further investigation and/or analysis. 7. Association Studies Linkage analysis can be used to determine the genetic location of a disease gene. The goal of the analysis is to identify an allele of a gene that is co-inherited with the disease. Linkage analysis is a powerful technique for the identification of genes that cause monogenic disorders. For example, it was used for the identification of the CFTR as the causal gene in cystic fibrosis. However, this approach is less powerful when a more complex phenotype is to be studied i.e. a trait influenced by several genetic and environmental factors. Therefore, we used an association approach in this project since severity of the disease is likely a complex phenotype. Case-control studies and family based trios (i.e. offspring and both his/her parents) are two commonly used types of association studies. An example of an analysis that uses trios is the transmission disequilibrium test (TDT). The TDT requires genetic information from both parents and the affected individual since it tests for the transmission from heterozygous parents to the offspring. The TDT provides a better approach in the investigation and establishment of genetic associations under certain circumstances. If a case-control method is used for analysis, false-positive results may be 13 generated due to insufficient matching of the case and control groups for genetic background. However, there are still some drawbacks associated with the TDT method. First, the marker has to be in close proximity to the disease-causing gene. Otherwise, no association can be identified. Second, at least one of the parents has to be heterozygous, because if both parents are homozygous, the child will be homozygous as well and nothing can be concluded from the analysis. Therefore, the case-control approach was also included in the analysis of this project. Case-control studies test for the prevalence of a specific allele in the two groups and two approaches can be taken. First, the cases are defined as CF patients with severe lung disease and the controls are defined as CF patients with mild lung disease. The division of mild and severe lung disease is based on pulmonary function test data. Second, the patients are stratified by genotype and differences in the means of the outcome variables are investigated. In this study we utilized the latter case-control approach and supplemented the analysis with the family-based design. 8. Thesis Objectives 1. To perform a literature review to select candidate CF modifier genes. The following genes were selected: (a) Complement factor 3 and factor B protein - deficiency in recruitment of complement reactions against Pseudomonas aeruginosa infection in CF patients. (b) Toll-like receptor 4 - attachment of Pseudomonas aeruginosa onto host cells via binding of its LPS in CF patients. (c) Heme oxygenase-1 cytoprotective effects against Pseudomonas aeruginosa infection in CF patients. 2. To pick potential SNPs within each candidate modifier gene utilizing the LD Select software. 3. To genotype DNA samples of participating individuals (trio members) at the selected SNPs by TagMan assays. 4. To analyze the genotyping results with respect to phenotypic traits in an attempt to determine the presence of any association with disease severity. 14 Chapter 2 Materials and Methods 1. Patient Recruitment This is a sub-study of a large, Canada-wide and international endeavor: the Canadian Consortium for Cystic Fibrosis Modifiers (http://www.cfmod.ca). There were 1674 individuals from 558 trios (558 patients and both of their parents) in this sub-study who were recruited from Cystic Fibrosis clinics in participating hospitals in Canada. Patients with a diagnosis of CF on the basis of clinical signs, elevated sweat chloride values and/or positive genotyping for the CFTR gene were recruited for the study. CF patients who had received a lung transplant were also recruited and pulmonary function data for these individuals were collected prior to transplantation. The recruitment of patients and their family members was based upon several criteria: (a) satisfying ethical requirements (for example, willingness to give informed consent); (b) willingness to provide blood/DNA sample for the study; (c) availability of verified clinical information. Blood samples from patients in different provinces were sent to the Hospital for Sick Children in Toronto for extraction of DNA and, for the patient samples, establishment of cell lines. 2. Quality Control As mentioned before, this project was a sub-study of the Canadian Consortium for Cystic Fibrosis Modifiers. Genotyping was also carried out for all samples for other analyses by our colleagues in Toronto. Non-Mendelian inheritance was found in some of the families by both Toronto and by the use of the FBAT program in this sub-study. Mendelian errors were detected for many SNPs in some of the families. This implied that there had been sample mislabeling or there was non- paternity. Consequently, a total of 23 families were deleted from this study. However, when Mendelian inconsistency was detected only at one SNP in a given family, analysis was continued with those families except that genotypes for those particular non-Mendelian SNPs were excluded. In total, DNA samples from 1605 individuals/535 trios (535 patients and both of their parents) were used for the analysis. 15 3. Selection of SNPs Several polymorphic sites were picked for each candidate modifier gene. Information about polymorphisms for each candidate gene was obtained from the websites of Innate Immunity Programs for Genomic Applications or IIPGA (http://innateimmunity.net//)  and UW-FHCRC Variation Discovery Resource or Seattle SNPS (http://pga.mbt.washington.edu/) . By using the LD Select software (http://droog.gs.washington.edu/ldSelect.html)[37], all sites of polymorphism in each gene were divided into groups/bins according to specific settings as listed below. Only one SNP was selected and examined for each group. An assumption was made that phenotypic characteristics were more likely be affected if the site of polymorphism brought about a change of amino acid. Based upon this assumption, two stages were applied for the selection of a particular SNP in each group: (a) if one of the SNPs in the group resulted in a change of amino acid, it would be selected for experimentation; (b) if none of the SNPs in the group resulted in a change of amino acid, an arbitrary tagSNP was picked for experimentation. A tagSNP is a representative SNP in each bin which shows a high degree of linkage disequilibrium with all the other SNPs in the bin. Linkage disequilibrium (LD) is a measure for detecting the presence of any non-random association of alleles at two or more loci on the same or different chromosomes. A strong allelic association is termed to be complete LD, which means the involved alleles are found together at a high frequency in the population. In general, linkage disequilibrium can be represented by the formula Pzu *Psi PA1B1; whereas linkage equilibrium can be shown by the equation PAI * P131 = PA1B1 (PA]. and PB1 are the probabilities of allele Al and allele Bl, respectively in the population and P- AlB1 is the probability of the A1B1 haplotype). There are several parameters^for representing the degree of^linkage disequilibrium but the two most commonly used are Lewontin's D' and a measure that can be interpreted as a squared correlation coefficient, r2 . D is determined by the difference between the observed haplotype frequency and the expected haplotype frequency. Then D' is calculated by the equation D' = (D / Dmax), where Dmax is the maximum difference between the observed haplotype frequency and the expected haplotype 16 frequency that can be observed in a given population. The value of D' ranges from 0 to 1: 0 represents linkage equilibrium and 1 represents complete linkage disequilibrium i.e. one or more haplotypes are not present in the population. A disadvantage of the D' measure is that it overestimates LD in small samples. The evaluation of r 2 depends on the value of D and the allele frequency at every locus involved. It is calculated by dividing D 2 by the product of all allele frequencies, as illustrated by the equation r 2 = (D 2 / a i*a2 *bi*b 2 ). r2 also lies between 0 and 1; with 0 indicating linkage equilibrium and 1 indicating perfect linkage disequilibrium. This measure takes value 1 if only two haplotypes are present. For low allele frequencies r 2 is a more reliable measure of LD than D'. r 2 measures statistical association and there is a simple inverse relationship between this measure and the sample size required to detect association between a susceptibility locus and a marker SNP. The SNPs in each of the candidate genes were grouped according to the set conditions listed below. It would be time- and effort-consuming if all the SNPs in a gene were required to be genotyped. By arranging them according to the specific settings listed below, it was possible to achieve the tasks of this project without genotyping every single SNP. Using the approach described above we selected a panel of SNPs for genotyping in each of our candidate genes. For the factor B gene five SNPs were selected including a tagSNP for an amino acid changing polymorphism (Table 1). 17 Table 1: Factor B polymorphism selection (the criteria for grouping all SNPs by the LD Select software package: r 2=0.8 and minor allele frequency>0.1). The positions of the SNPs are described in reference to the coding sequence of the factor B gene (accession number AF551848). No TaqMan assay could be created for testing of SNPs in Bin3. Therefore, only five SNPs were included in the analysis for Factor B. Although SNP1802 in Binl results in an amino acid change from arginine to glutamine, SNP8311 was selected due to the failure of production of an appropriate TagMan assay for SNP1802. Bin Position^of SNP Amino^acid change Selected SNP for the group 1 1802 Arg to Gln 83114573 n/a 5180 n/a 7541 n/a 7580 n/a 8311 n/a 2 4022 n/a 4022 9878 n/a 3 5162 n/a n/a 9099 n/a 4 2557 n/a 2557 5 6484 n/a 6484 6 7202 n/a 7202 The same tagSNP approach was applied to the complement factor 3 gene. However, using the parameters of r 2=0.8 and minor allele frequency>0.1 a total of more than 40 bins resulted. This large number of SNPs was beyond the scope of this study. Therefore, the minor allele frequency cutoff was increased to >0.45 in order to study only common SNPs. This resulted in the selection of four polymorphisms (Table 2). 18 Table 2: Complement factor 3 polymorphism selection (the criteria for grouping all SNPs by the LD Select software package: r2=0.8 and minor allele freguency>0.45). The positions of the SNPs are described in reference to the coding sequence of the complement factor 3 gene (accession number AY513239). Due to the large size of this gene (46kb) and consequently large number of SNPs, a higher minor allele filter was chosen so that a feasible number of tagSNPs could be identified. No SNPs were picked for the analysis due to failure of the manufacture of TagMan assays for SNPs in Bins 2 and 5. in Position^of SNP Amino^acid change Selected SNP for the group 1 43118 n/a 4311843179 n/a 43928 n/a 44692 n/a 2 27159 n/a n/a 27678 n/a 3 28433 n/a 28795 28795 n/a 4 963 n/a 963 5 25884 n/a n/a 6 36735 n/a 26735 The original selection criteria i.e. those utilized for the factor B gene resulted in the identification of seven tagSNPs in the TLR4 gene and these are detailed in Table 3. 19 Table 3: TLR4 polymorphism selection (the criteria for grouping all SNPs by the LD Select software package: r 2=0.8 and minor allele frequency>0.1). The positions of the SNPs are described in reference to the coding sequence of the TLR4 gene (accession number NM_138556). Bin Position^of SNP Amino^acid change Selected SNP for the group 1 1893 n/a 119122032 n/a 2437 n/a 7764 n/a 11912 n/a 2 16649 n/a 1705017050 n/a 17447 n/a 17923 n/a 3 2856 n/a 2856 10478 n/a 11541 n/a 4 851 n/a 851 11995 n/a 5 1859 n/a 1859 10329 n/a 6 9263 n/a 9263 7 15844 n/a 15844 Finally, we identified eight tagSNP bins in the heme oxygenase-1 gene and of these six SNPs could be assayed by the TagMan methodology (Table 4). 20 Table 4: Heme oxygenase-1 polymorphism selection (the criteria for grouping all SNPs by the LD Select software package: r2=0.8 and minor allele frequency>0.1). The positions of the SNPs are described in reference to the coding sequence of the heme oxygenase-1 gene (accession number AY460337). Only six SNPs were chosen for this project due to failure in the synthesis of TaqMan assays for SNPs in Bins 6 and 7. Bin Position^of SNP Amino^acid change Selected SNP for the group 1 149 n/a 1497325 n/a 12825 n/a 12832 n/a 12860 n/a 12992 n/a 13286 n/a 13354 n/a 15028 n/a 2 3303 n/a 3303 5079 n/a 13449 n/a 3 2007 n/a 9531 9531 n/a 4 16442 n/a 16442 17893 n/a 5 2790 n/a 2790 6 15382 n/a n/a 7 17922 n/a n/a 8 1038 n/a 1038 4. TaqMan Assays The TaqMan allelic discrimination assay is a qualitative analytic method for detection of the presence of different alleles at any nucleotide in a gene. The assay utilizes fluorescently labeled allele- specific probes and the 5'-> 3' exonuclease activity of Tag DNA polymerase (Figure 6). A DNA sample is first amplified by the Polymerase Chain Reaction (PCR)(discussed in a later section). In addition to the TagMan Universal PCR master mix, a mixture of specific primer/probe solution is added into the DNA sample for the PCR. After 21 the double-stranded template DNA is denatured into single strands, the complementary PCR primers hybridize to the template DNA 150-200 nucleotides apart and encompassing the SNP site. Furthermore, the allele-specific probes that are complementary to the SNP site hybridize to the template at this step. Generally, the probes are molecules of about 20-nucleotides long with a reporter dye attached to the 5' end and a quencher dye attached to the 3' end. Two types of probes are included, one for each allele of the SNP being assayed. The two probes are differentiated by their reporter dyes: one probe contains VIC and the other FAM. The two reporter molecules absorb light and then release it at different wavelengths. Before polymerization, energy is transferred to the quencher molecule when light hits the reporter molecule since they are located close to each other. From this process, called Fluorescence Resource Energy Transfer, no fluorescence can be detected since the energy is then released in the form of heat. However, the probe is degraded by the 5'-> 3' exonuclease activity of Tag DNA polymerase as polymerization is initiated from the primer and proceeds along the strand. Since the reporter molecule is no longer linked to the quencher molecule, it absorbs light energy and emits energy as light of a specific wavelength. Therefore different fluorescence is detected depending on whether the VIC-labeled probe the FAM-labeled probe is degraded or both are degraded. From this, the presence of both alleles can be deduced for each individual. In addition to the two probes, there is also a third dye called ROX that is included in the TaqMan assay. ROX is free in the master mix solution and is used for normalizing the well to well variation in volume. Therefore, the signal intensity detected for either the FAM or VIC molecules was corrected by the ROX signal (although this is less of a concern than in quantitative real-time PCR applications). 22 Figure 6: The mechanism underlying TaqMan assays. Before the reaction, the primer is complementary base-pairing to the segment which is about 150-200 nucleotides away from the SNP, while the probe is attacted to the site of SNP. As elongation of nucleotides proceeds, the probe is degraded which separates the dye molecule from the quencher. "."  5. Primers and Probes for Genotyping Primers and probes for each SNP were ordered from Applied Biosystems (Foster City, CA). The sequences of all primers and probes are listed in tables 5 and 6: 23 Table 5: The sequence of all primers used for SNP assays Gene Gene_SNP Left Primer Right Primer Factor B BF_2557 TCGCACCTGCCAAGTGAA CTCACCTCCGTTGTCACAGATC BF_4022 GCGACAGGGAGGACCAC CTCCATTGCCCAACGATCCT BF_6484 CAGAACCTAGCTCTAGAAGGG CTTA GCTCCCACCACTGTCATCTC BF_7202 TGGGTCCCTAGTCTGATTCCT TTAG TCCTGGAAGCATGGCTGTTC BF_8311 GTAGCTTTGGCCCTCACCAT GGTGGAGGGAGGAAAGAGGAA Complement Factor 3 C3 963 CTGCAGTGAGCTGTGATTGC CCATCCCTGTTGACAGCATATT CTTT C3_28795 CGAGACCCTGCCTCTTTTCAA GGGTGTGGCCTTGAGAAGATG C3_36735 TGGAGTTGAGAATCAGTTTTT ATTACTTGCA AGAAGGCTCAACACACAGCTT C3_43118 CTCCAGGTCTGAAGACTGAGA AC GGCAGATGTGATGTGAAGATGA GA Toll-like receptor 4 TLR4_851 TGTATTTGACACATGGTCTGC CTT GGGATTAACACTCAATCATTTA CTGACCT TLR4_1859 AGGGATAGGACTGGCTAGTTT GAAT CTGGCTTTTACACCCAAGTAGA CA TLR4_2856 GCAAGCTTCTGCTATGATTAA AAGTGA CACAAATGGTGTACAGGAGTTC TCA TLR4_ 9263 CTGGAAACTGATATAAAGATA GCGACATATAACA CTTGACTACCCACCACAGAGAA G TLR4_11912 GCTGTCATGTAAGCACTTTTC ATAAACA GTTGGTAGCCAAGATAAATGAC TGGTA TLR4_15844 GTTGGGCAATGCTCCTTGAC ACCCCATTAATTCCAGACACAT TGT TLR4_17050 CTGGGTGTGTTTCCATGTCTC AT TTCAAATACACACAGCCCTGAT AGG Heme oxygenase-1 HMOX1 149_ GGCCTGGCCTCTTGCA GGAGCCCCAGACTTCGTTAG HMOX1 —1038 CGACAAGCACAGGGAGAGA CTGTCTCAAAGGAAAAAAAGAC TTAACACA HMOX1_2790 GGGTTGCTAAGTTCCTGATGT TG CCAGAAAGCTGGGAGGCA HMOX1_3303 CACCGGCCGGATGGA AAGGCGCCCGTCCC HMOX1_9531 CCCTATCTGTAAAATAGGGAT AATAATGGTACCT CTTCCTCTGTGCCAGACACT HMOX1_16442 TCGGTAGGAGAAGTGGTGATA GG CCTGGGTGACAGAGTGAGACT 24 Table 6: The sequences of all probes used for the SNP assays Gene Gene_SNP VIC FAM Factor B BF_2557 TGGCCGGTGGAGTG TGGCCGATGGAGTG BF_4022 TTTGTAGTCAAAGGTTGAAC TTGTAGTCAAAAGTTGAAC BF 6484 CATTGCCTTTGTCACTC ATTGCCTTCGTCACTC BF 7202 CAGCTAAGACGCAAGCA CAGCTAAGACACAAGCA BF_8311 CTAGAGGCTTGAGAGAGA CTAGAGGCTTAAGAGAGA Complement Factor 3 C3_963 CTGAGTGACAGAATGA TGAGTGACAGAAGGA C328795 TCCCCTGAGTCCCCA CCCCTGGGTCCCCA C3_36735 AATACAATCTGGGTACTCC ACAATCTGGATACTCC C3_43118 CAGGCGTGGTCTT CCAGGCATGGTCTT Toll-like receptor 4 TLR4_851 CTGGAAGAGCAACATAGA TGGAAGAGCAGCATAGA TLR4_1859 CAGGTACCAGACAAC CCAGGTATCAGACAAC TLR4_2856 CTTCACCAACACTTATT TTCACCAACGCTTATT TLR4_9263 TTTAAACTAAAGGTAACTAATTG AACTAAAGGTAAATAATTG TLR4_11912 ACTTATGTGTAATGTTTCG TTATGTGTAATTTTTCG TLR4 15844 ACATCCACTCTTCCC CATCCACTGTTCCC TLR4_17050 CACAAATGCACACATC CACAAATGCGCACATC Heme oxygenase- 1 HMOX1_149 CAGCCCCCCACACAG ACAGCCCTCCACACAG HMOX1_1038 CCTTATCTGATCAAGAAC CTTATCTGACCAAGAAC HMOX1_2790 ACCAGGCTATTGCTCT ACCAGGCTTTTGCTCT HMOX1_3303 CAACCCGACAGGCAA CAACCCCACAGGCAA HMOX1_9531 TTACTGCTGTAAACTCACTC CTGCTGTAAATTCACTC HMOX1_16442 TCACCTTCTGTATTCTCAA CACCTTCTGTAATCTCAA 6. Real-time Polymerase Chain Reaction Testing of the new real-time PCR assays created by Applied Biosystems was done utilizing the 7900HT Sequence Detection System (Applied Biosystems). Stock Coriell samples at a concentration of 100ng/pL were diluted to lng/pL using lx TE buffer (Tris-EDTA buffer). TE is a commonly used buffer that helps to inactivate DNA nucleases: Tris adjusts the pH of DNA samples to around 8 since DNA nucleases are less active at this pH; EDTA chelates to metal cations which are 25 required by these enzymes. Coriell samples are DNA samples from individuals of both African American and European descent that have been employed by the IIPGA and SeattleSNPs groups for gene sequencing. These samples are available from a central repository (http://ccr.coriell.org/) . Since the genotypes of those African and European individuals were known, their DNA samples were suitable for quality controls to test the accuracy of the ordered TaqMan assays. 5pL of each diluted Coriell sample was put into a 384-well clear optical reaction plate (Applied Biosystems) utilizing a Biomek FX Laboratory Automation Workstation (Beckham Coulter, Fullerton, CA). The lng/pL Coriell samples were prepared in a 0.8mL 96-well storage plate (ABgene' House, Rochester, NY) and the plate was put onto the designated position on the workstation. The appropriate volume of solution was pipetted out by the automated syringes from the 96-well storage plate into the designated well of the 384-well plates. The plates were then placed in a fume hood for lyophylization of the DNA samples. In total, 24 sample samples of African American DNA and 23 samples of DNA from the Centre d'Etude du Polymorphisme Humain (CEPH) were tested for each assay. The CEPH samples are Utah residents with ancestry from Northern and Western Europe. In addition to the Coriell samples, wells of 1X TE buffer were included as negative controls. 175pL of 2X TaqMan Universal PCR master mix - No AmpErase UNG (Applied Biosystems), 8.75pL of 40X primer/probe solution (Applied Biosystems) and 166.25pL of water were mixed to form the master mix for real-time PCR with a total volume of 350pL. During amplification by TaqMan assay, the nucleotide uracil (U) was incorporated instead of thymine (T) into the DNA molecules. AmpErase is an enzyme (uracil-N- glycosylase) that can be used to cleave DNA molecules at sites containing a U, therefore it can be used for elimination of any carry- over contaminating PCR products from a previous PCR reaction. However, the additional expense of the AmpErase was not justified in the reactions here because no contamination was detected in the negative controls. For each assay, a 5pL aliquot of master mix was added into the appropriate wells in the 384-well plate and the plate was sealed with clear optical adhesive cover. The plate was placed into an Allegra 6 Centrifuge (Beckham Coulter) for centrifugation for about 5 minutes at a speed of about 2500rpm. 26 Real-time PCR was performed by the 7900HT Sequence Detection System (Applied Biosystems). In the linked computer, the program SDS2.1 was selected and the following settings were chosen: assay = absolute quantification (standard curve); container = 384-wells clear plate; template = blank template. The file containing the Coriell samples layout was imported into the program and an equivalent layout was shown on the screen. All wells that were assigned for testing an assay were highlighted and the appropriate testing marker and detector were chosen from the list. The blank and TE wells were then highlighted and their setting altered to "NTC" (non-template control). Under the instrument tab, the following settings were amended: sample volume = 5pL; the box indicating 9600 emulation was unchecked; temperature in stage 3 = 92 2 C and repeats = 50. The modified program was connected to the 7900HT Sequence Detection System and the door of the machine was opened. The 384-well plate was placed correctly according to the markings on the tray. The door was then closed and the reaction was initiated. The real-time PCR was finished in about 2 hours. An allelic discrimination analysis was started once the real-time PCR was done. The allelic discrimination analysis was used to determine the presence of different alleles in the DNA samples by releasing a light source onto the plates and then detecting the quantity of light of different wavelengths being emitted by the reporter dye molecules. Since the genotypes of all Coriell samples were known, the reporter molecules corresponding to each of the alleles could be found. The SDS2.1 program was opened and the following settings were selected: assay = allelic discrimination; container = 384-well clear plates; template = blank template. The file of the layout of the Coriell samples was imported into the program. All wells that were assigned for testing an assay were highlighted and the appropriate testing marker (the name of the SNP) and detector (FAM/VIC) was chosen from the list. The blank and TE wells were then highlighted and their setting altered to "NTC". The program was connected to the 7900HT Sequence Detection System and the analysis was begun. After the analysis was completed, a graph of the fluorescence from allele Y (VIC) plotted against that from allele X (FAM) was generated. Each point on the graph was compared with the genotyping of the corresponding Coriell sample provided by either the 27 IIPGA or SeattleSNPs websites. The alleles corresponding to both FAM and VIC could be determined and the names of the both detectors were changed to the appropriate alleles. 7. Sequencing Discrepancies were detected for the Factor B gene SNP 7202 which are indicated in table 7: Table 7: Genotype discrepancies between the real-time PCR results in this study and those posted by the Seattle SNPs website (http://pga.mbt.washington.edu/) Sample Result PCR by real-time Result Seattle posted by SNPs D008 GG AG D016 AG GG E016 AG GG In order to determine the genotypes of the above three Coriell samples with regard to SNP 7202, sequencing was done at the University of British Columbia sequencing facility. PCR products of around 500 nucleotides were submitted to the UBC Oligonucleotide Synthesis Facility with the 7202 SNP approximately in the middle of each product. Both left and right primers were used for sequencing and they are listed in table 8. Table 8: The sequence of both left and right primers used for sequencing of BF 7202 for Coriell samples D008, D016 and E016 Name Sequence (5' to 3') BF_7202L TGG GTC CCT AGT CTG ATT TTA G BF_7202R TCC TGG AAG CAT GGC TGT TC The D008, D016 and E016 Coriell samples were diluted from a concentration of 2ng/pL to lng/pL with 1X TE buffer. A solution with a total volume of 20pL was prepared for each Coriell sample by mixing 2pL of 10X PCR mix, 2pL dNTP (200 pM each of dGTP, dCTP, dTTP and dATP), 28 1pL of primer L (900pM), 1pL of primer R (900pM), 0.1pL of TAQHOTSTART (QIAGEN Inc., Mississauga, Ontario), 13.iL of Coriell sample and 12.9pL of distilled water. The three solutions were placed in three separate 0.5m1 Eppendorf tubes and labeled correctly. The PCR was performed utilizing PCR Express thermal cycler (Thermo Hybrid, Ashford, Middlesex, UK) by setting the cycling conditions to be 95 2C for 10 minutes, 40 cycles at 92 2 C for 15 seconds and 55 2C for 30 seconds and 72 2 C for 30 seconds, 72 2C for 5 minutes. The PCR products were stored at -20°C. Gel electrophoresis was performed to detect the presence of PCR product for each Coriell sample. 2% agarose in 0.5% TBE (iris Borate EDTA) was prepared by mixing 2g of agarose powder into 100mL TBE buffer and the mixture of solute and solution was heated in a microwave until the agarose powder was completely dissolved. 10pL of ethidium bromide was added into the solution and mixed well. The gel was poured into the tray of a Horizon 20.25 GIBCO BRL Horizontal Gel Electrophoresis Apparatus (Life Technologies, St. Paul, MN) with the comb in place. 5pL of each PCR product and 2pL of sample buffer were mixed together, and the mixture was then loaded into separate wells after the gel solidified. 8pL of DNA ladder was loaded into another well. The gel electrophoresis apparatus was connected to a Power-PAC 300 (BIO-RAD, Hercules, CA) power supply. The gel was run for about 45 minutes under 150V. The presence and separation of bands in the gel was checked by a Spectroline hand held UV lamp (Model EF-140C, Spectronics Corporation, Westbury, NY). The gel was inspected carefully by an Eagle Eye II imager (Stratagene, La Jolla, CA) and a picture of the gel was taken. The PCR products were then purified according to the QlAquick PCR Purification Kit Protocol as outlined on page 18 of the QlAquick Spin Handbook (QIAGEN Inc., Mississauga, Ontario). The purified PCR products and sequencing primers as listed in table 8 were sent to the Oligonucleotide Synthesis Facility of the University of British Columbia and used for sequencing. 29 8. Dilution of Samples A total of 1605 DNA samples from 535 trios (two parents and the patient) at a concentration of lOng/pL was received from our collaborators in the Hospital for Sick Children in Toronto. All samples were diluted to a final concentration of lng/pL with 1X TE buffer. The diluted samples were pipetted into 0.8mL 96-well storage plates (ABgene House), sealed and stored at -20 2 C. 9. Preparation for Polymerase Chain Reaction and Genotyping 5pL of diluted samples were plated from 96-well plates into 384- well clear optical reaction plates (Applied Biosystems) using a Biomek FX (Beckham Coulter). In addition to the diluted samples, 5pL of Coriell samples were included as the positive controls and TE buffer as the negative controls in each 384-well plate. All 384-well plates were dried in a fume hood overnight and then wrapped with aluminum foil and stored at 4 2 C. 10. Polymerase Chain Reaction 5000pL of 2X TaqMan Universal PCR master mix - NO AmpErase UNG (Applied Biosystems), 250pL of 40X primer/probe solution (Applied Biosystems) and 4750pL of water were mixed to form the assay for the PCR. A total of 10000pL mixture was sufficient for performing PCR for five 384-well plates. 5pL of this mixture was added into each well in the 384-well plates and they were sealed with clear optical adhesive cover. Then they were placed into an Allegra 6 Centrifuge (Beckham Coulter) for centrifuged for about 5 minutes at a speed of approximately 2500rpm. The plates were then put into a GeneAmp PCR System 9700 (Applied Biosystems) for polymerase chain reaction. The PCR cycles were set to be a 10-minute period at 95 2 C for denaturation of double stranded DNA and also the activation of the enzyme, 40 repeats of 15-second period at 92 2 C and 1-minute period at 60 2 C for annealing of primers and extension of the DNA strands, and another final 10- minute period at 72 2C for extension of the DNA strands. 30 11. Genotyping After the PCR, the plates were placed into a 7900HT Sequence Detection System (Applied Biosystems). In the linked computer, the SDS2.1 program was opened, and the settings were set as follows: assay = allelic discrimination; container = 384-well clear plates and template = blank template. The appropriate plate layout file was imported into the program and the testing detector and markers were chosen. The wells containing TE buffer were assigned as "NTC". The adjusted program was connected to the detection machine and the door was opened to place the plate in correct orientation as marked on the tray. The door was closed and the analysis was started The genotyping was revealed in both table and diagram forms. Referring to the diagram, the three types of genotype (2 forms of homozygotes and 1 form of heterozygote) were designated by the auto-caller in the program and a sample diagram is shown in Figure 7. The results were saved and later used for analysis. Figure 7: Sample genotyping result of Factor B SNP 2557. An allelic discrimination plot of BF_2557 A versus BF_2557 G is shown. Each of the points on the diagram represents a study sample, a Coriell sample or TE buffer. As can be seen in the figure, a cluster of several points was detected near the origin which symbolized wells with TE buffer since none of the two types of fluorescent light (from FAM or VIC) could be measured. In addition, three clusters of points were observed. Those points in the top left of the plot with a lower emission of light by the reporter molecule denoted BF_2557G and a higher emission of light by the reporter molecule denoted BF_2557A represent those individuals who had a homozygous AA genotype at SNP BF_2557. Those points in the middle with detection of emission by both the reporter molecules denoted BF_2557 A and BF_2557 G represent those individuals who were heterozygous AG at SNP BF_2557. Those points in the lower right of the plot with a lower emission of light by the reporter molecule denoted BF_2557A and a higher emission denoted BF_2557G represent those of light by the reporter molecule individuals who had a homozygous GG genotype at SNP BF_2557. There were two points between the clusters of AG and GG for which the genotyping could not be determined by the auto- caller. 31 CAL Allelic Discrimination Plot X 29 19 1.3 Allele X (BF_2557G) 2.3 12. Quantification of DNA Samples Unsatisfactory results were obtained during the first round of genotyping samples from the first two 384-well plates for SNP 149 of heme oxygenase-1, as shown in Figure 8. Figure 8: Example genotyping result for SNP 149 of HMOX1. An allelic discrimination plot of HMOX1_149 A versus HMOX1 _149 G is shown. Each of the points on the diagram represents a study sample, Coriell sample or 32 ^ ruy f■i,rrimination Plot en^ 0,44^0. ^454^a9 Allele X (11M0X1149G) TE buffer. As can be seen in the figure, no distinct clusters could be observed to classify the individuals into one of the three types of genotype since most of the points were gathered at the lower left hand corner of the plot. This might indicate that most of the DNA samples were not be amplified during PCR and therefore the probes were still intact. Therefore all light absorbed by the reporter molecule was transmitted to the quencher molecule since they were still located in close proximity to each other. Therefore, no emission of fluorescent light was detected. Genotyping reactions were repeated for those samples in the first two 384-well plates, however, unsatisfactory results were again obtained. One explanation for these results was that the concentration of the samples was lower than expected. Therefore, the PicoGreen assay (Invitrogen, Burlington, Ontario) was used to ensure the concentration 33 of the original DNA samples from Toronto was lOng/pL. The PicoGreen dsDNA Quantitation Reagent is an ultra-sensitive fluorescent nucleic acid stain for quantification of double-stranded DNA. The advantage of the PicoGreen reagent over conventional absorbance measurement at 260nm is that nucleotide, single-stranded nucleic acids and proteins all contribute to absorbance at 260nm. These are commonly found contaminants in DNA preparations. The PicoGreen reagent is more selective for double- stranded DNA and does not fluorescence in the presence of protein in addition to being more sensitive. The amount of fluorescence detected is directly proportional to the concentration of DNA, and therefore the concentration unknown DNA sample can be deduced once a standard curve is achieved. A standard DNA sample (A DNA) with a concentration of lOOng/pL was diluted to 2ng/pL with 1X TE buffer. 50pL of diluted A DNA was put into well Al of a black polystyrene 96-well plate (Corning Incorporation, Corning, NY). 25pL of the solution in well Al was pipetted into well B1 and mixed with another 25pL of 1X TE buffer. This serial dilution was repeated for the entire column, except the last well. In well H1, only 25pL of 1X TE buffer was added with no standard DNA sample. Each DNA sample from the source plates sent by Toronto was diluted to three different concentrations (1/20, 1/40 and 1/80). For each concentration of source DNA samples, a serial dilution into other columns of the black 96-well plates was done as outlined above for the standard DNA samples. The layout of the black 96-well plates is shown in Table 9. 34 Table 9: Layout of a 96-well plate with standard DNA samples and DNA samples from source plates sent by Toronto Plate Layout Lambda DNA 2 2 1/20 0003- 01 1/20 0003- 02 1/20 0003- 03 1/40 0003- 01 1/40 0003- 02 1/40 0003- 03 1/80 0003- 01 1/80 0003- 02 1/80 0003- 03 1 1 0019- 01 0019- 02 0019- 03 0019- 01 0019- 02 0019- 03 0019- 01 0019- 02 0019- 03 0.5 0.5 0052- 01 0052- 02 0052- 03 0052- 01 0052- 02 0052- 03 0052- 01 0052- 02 0052- 03 0.25 0.25 0063- 01 0063- 02 0063- 03 0063- 01 0063- 02 0063- 03 0063- 01 0063- 02 0063- 03 k^s 0.125 0.125 0069- 01 0069- 02 0069- 03 0069- 01 0069- 02 0069- 03 0069- 01 0069- 02 0069- 03 0.063 0.063 0083- 01 0083- 02 0083- 03 0083- 01 0083- 02 0083- 03 0083- 01 0083- 02 0083- 03 0.031 0.031 0101- 01 0101- 02 0101- 03 0101- 01 0101- 02 0101- 03 0101- 01 0101- 02 0101- 03 0 0 0150- 01 0150- 02 0150- 03 0150- 01 0150- 02 0150- 03 0150- 01 0150- 02 0150- 03 200X PicoGreen reagent was diluted to 1X with TE buffer. 25pL of 1X PicoGreen reagent was added to each well in the black 96-well plate. The plate was covered with aluminum foil since the PicoGreen reagent is sensitive to light. The black 96-well plate was placed into a GENios fluorimeter (Tecan Group Ltd., Durham, NC) for the detection of fluorescence emitted by the solution of PicoGreen reagent. In the linked computer, the Tecan-XFluor4 program was opened. Under the edit measurement parameter tab, the following settings were chosen: general = fluorescence; plate = GRE96fb,pdf; excitation A = 485nm; emission A = 535nm, gain manual = 60, integration time = 40ps and the box indicating fluorescein was checked. The measurement was started and the results were saved. 13. Re-genotyping As a quality control measure for the genotyping results obtained from TaciMan assays, re-genotyping was performed for about 30% of the samples. In this project, the concentration of the DNA samples in source plates #1-8 was found to be lower than the expected value of 35 source plates #1-8 was found to be lower than the expected value of lOng/pL and unsatisfactory results were obtained from the TaqMan assays. While waiting for a new batch of samples to be sent from Toronto, genotyping of plates #9-20 was performed and the data were used for quality control. Specifically, the data from the first round of genotyping of plates #9-20 were compared with the data from the new DNA samples. 14. Statistical Data Analysis Before performing any statistical analysis of the genotyping data, the pattern of Mendelian inheritance was first examined for all participating trio members by the Family Based Association Test (FBAT) program (www.biostat.harvard.edu/-fbat/default.html) . The presence of Mendelian errors in any one of the families for any one of the SNPs being tested would identify potential problems with the samples and/or genotyping assays, which would lead to misinterpretation of the data. After the families with Mendelian inconsistencies were deleted from the rest of the analysis, chi square tests were used to determine the presence of Hardy-Weinberg Equilibrium among the parents and to compare the genotype frequencies of all SNPs between the parental population and the online databases. These two procedures helped to identify any inaccurate genotyping assays and/or whether the study population was genetically heterogeneous. Analysis of Variance (ANOVA) was employed in attempting to establish any relationship between the genotypes and the severity of the disease. Three phenotypic characteristics (age of diagnosis, Forced Expiratory Volume in one second (FEV1) percent predicted and FEV1 standard deviation (see below) were selected as measurements which indicated the disease status of each patient. FEV1 is defined as the volume exhaled during the first second of a forced expiratory maneuver. FEV1 values were expressed as a percentage of the expected values for age, sex and height calculated using the formula of Knudson et al [38] for those over the age of 10 years and according to the Hospital for Sick Children formula [39] for 36 those 6 to 10 years of age. In obstructive lung diseases such as CF, a decrease in FEV1 will be evident and therefore it is a common index for assessing pulmonary function. Consequently it was chosen as one of the phenotypic traits in this study in order to measure the severity of the disease. As an alternate measure of lung function we used the difference between a patient's lung function and the average expected for a cohort of Canadian CF patients. A regression analysis of age versus FEV1 percent predicted was conducted using the longitudinal data of patients homozygous for the 1F508 mutation from the Canadian CF Patient Data Registry (2002) (see Figure 9). The FEV1 standard deviation score (FEV_SD) was calculated by comparing each patient's FEV1 percent predicted value with that from the regression curve for that age and gender. A positive value for FEV1_SD indicates the number of standard deviations that the individual's FEV percent predicted is above that predicted from this relationship and a negative value is the number of standard deviations the subject's value is below that predicted for that age and gender. 37 SO ^ 50 Figure 9: Normalization of the FEV1 percent predicted values by age. Longitudinal data from the Canadian CF Patient Data Registry was used to calculate the linear regression curve for patients homozygous for the deltaF508 mutation. The standard deviation scores were calculated for all patients according to the FEV1 percent predicted values and age. There is a deficiency of patients in age groups over 30 with low 38 scores due to mortality. By performing the ANOVA test using the program JMP v5.1 (SAS Institute Inc. Cary, NC), the sample should be normally distributed and the patients were unrelated (i.e. were not from the same extended pedigree and thus the observations would be independent). The normality of a sample can be checked by the goodness of fit test in the JMP program. A normal quantile plot was done by plotting the phenotypic variables (i.e. age of diagnosis, FEV1 predicted and standard deviation values) as the y-axis according to the SNP genotype. Although t-tests could be used for identifying significant differences between two genotypic groups, ANOVA was superior over t-test in this analysis since multiple t-tests would be required for the necessary comparisons. In contrast, by using ANOVA we could compare more than two genotypic groups and thus generate a single P value. Before we concluded that a significant association had been established between one of the three traits and the genotype of any SNP of the four candidate genes, regression analysis was performed to adjust the data for potentially confounding factors such as sex, age and CFTR genotype. One of the traits, the age of first Pseudomonas aeruginosa infection, was examined by survival analysis or age of onset analysis. Although about 80% of CF patients were infected with Pseudomonas aeruginosa, not every participant colonized with this bacterium. recruited in this study had been However, it is likely that these individuals will become colonized by Pseudomonas aeruginosa in the future. Consequently it was not appropriate to ignore these patients in this part of the study as would be the case if the analysis was done by ANOVA. When performing the age of onset analysis using JMP, these individuals were assigned their current age as their age of first Pseudomonas aeruginosa infection. However, since the age of first Pseudomonas aeruginosa infection of these patients was incorrect, a censor column was added which stated 0=no Pseudomonas aeruginosa infection and 1=Pseudomonas aeruginosa infection. Therefore these individuals would be accounted for in the calculation by the program automatically according to the value in the censor column. 39 As mentioned before, Pseudomonas aeruginosa infection was not found in all patients recruited in this study. One of four categories was assigned to each CF patient with respect to their Pseudomonas aeruginosa infection status, namely no Pseudomonas aeruginosa infection (code 0), grew Pseudomonas aeruginosa once (code 1), sporadic Pseudomonas aeruginosa growth (code 2) and chronic Pseudomonas aeruginosa infection (code 3). Susceptibility to Pseudomonas aeruginosa infection was hypothesized to be influenced by one or more the SNPs of the four candidate modifier genes, and the Chi-test was used to confirm the existence of such a relationship. Another analytical method, the Family Based Association Test (FBAT)[40], was employed for examining the presence of association between the selected phenotypes and SNPs. The FEAT software implements the TDT and other family-based tests of association. Although FEAT is also a test of genetic association, this test is markedly different from the ANOVA. FEAT tests for association within the family trio members while ANOVA only considered the patients in the analysis. Both phenotype and genotype files needed to be prepared in the correct format for analysis by the FEAT program. Both files were prepared in the Notepad program, with the genotype file saved as a .ped file whereas the phenotype file was saved as a .phe file. In the genotype file, the names of all 22 SNPs were placed in the first row. There were also restrictions for the columns. The first column was the pedigree ID and the second column was the individual ID. Father ID and mother ID were shown in the third and fourth columns, respectively. Gender was indicated in the fifth column with 1=male and 2=female. The next column represented the affection status. The remaining columns showed the alleles at each of the SNPs. Haplotype analysis was the last part of the analysis in this study. Possible combinations of all the SNPs in each candidate gene were tested as a group. This was performed to determine whether there was any interaction effect between SNPs, which then may have resulted in a new association with the outcome variable that had not been detected when each SNP was analyzed individually. In addition, examination of haplotypes may have identified causal SNPs that were not 40 detected in the resequencing programs such as Seattle SNPs. These causal SNPs could be outside the region that was resequenced but found preferentially on a specific haplotype. Thus, the haplotype would act as a marker for the causal SNP. However, some of the haplotypes were found to have a low frequency and those were pooled together as a group. Also, the confounding factors, e.g. sex and CFTR genotype, were put into the analysis for adjustment of the data. The haplotype analysis was first done by the RGui program version 2.5.0, provided by The R Foundation for Statistical Computing. Both genotyping and phenotypic input files were required to be in correct format recognized by the program and saved as a .csv file in Notepad. In the genotypic file, the first row contained the heading of each column: family ID, individual ID, father ID, mother ID and genotype of each allele of every SNP. Then the data of each participant was arranged in a row below according to the headings listed. In the phenotypic file, the first row showed the name of each column: person ID and each of the phenotypic characteristics being collected (including the target phenotypic traits and confounding factors in this study: age of diagnosis, FEV1 predicted and standard deviation values, gender, age and CFTR mutation). The patients' data were put below in the order of the headings named in the first row. The phenotypic trait of age of diagnosis was also logarithmically transformed in order to correct for normality. All SNPs within a single gene were grouped together as a haplotype for determination of any association and analysis was done twice, one with and the other without adjustment for confounding factors such as sex and CFTR genotype. Results from both analyses could then be evaluated in the presence and absence of those confounding factors. Haplotype analysis between the genotype and phenotype (age of diagnosis, FEV1 predicted and standard deviation values) was also performed by FBAT program. The generation of possible haplotypes was determined according to implementation of the TDT and other family based tests of association offered by the program. The format of both genotypic and phenotypic input files were exactly the same as those used in previous FBAT calculations. Only the additive model was included in this part of the study. Due to the different computational 41 basis of both programs, direct comparison between results obtained from the RGui and FBAT programs could not be done. However if consistent results were obtained by both programs this would suggest stronger association. For the age of first Pseudomonas aeruginosa infection, the haplotype analysis was done by the program Hapstat (http://www.bios.unc.edu/-lin/hapstat/) [41]. The input file was saved as .txt file in Notepad. The first row represented the headings of data listed below: age of first Pseudomonas aeruginosa infection, status of Pseudomonas aeruginosa infection, gender, CFTR genotype and the genotype of each of the involved SNPs. As the file was loaded as a cohort file into the program the following settings were utilized: observation time = age of first Pseudomonas aeruginosa infection; event indicator = status of Pseudomonas aeruginosa infection; genotype = all the SNPs in each of the candidate modifier genes; environment = confounding factors (gender and CFTR genotype: deltaF508/deltaF508, deltaF508/other mutation; and other mutation/other mutation). Only the additive model was used in this part of the study. Both boxes of haplotype frequency and effects had to be checked, and the haplotype frequency was calculated based on the number of samples and was set to default as 0.009; while the haplotype effect needed to be changed to 0.05 as the P value. 42 1.210^0.2^0.4^0.6^0.8 DNA [ug/m L] y =3319.8x R2 = 0.9999 Chapter 3 Results 1. PicoGreen Reaction In the initial experiments for this project, unsatisfactory results were obtained from the genotyping of HMOX1 SNP149 for the original source plates #1-8. Therefore, it was suspected that the initial concentration in each of the wells was less than the expected lOng/pL. Consequently, there may not have been enough DNA present for PCR and amplification could not be done to the expected level for genotyping. Therefore, PicoGreen reactions were performed in order to determine the concentration of DNA from source plates #1-8. A standard curve was plotted with serially diluted A DNA samples as indicated in the table layout in the "Materials and Methods" section. The standard curve derived from this serial dilution is shown in Figure 10. Figure 10: Standard curve of optical density (OD) versus DNA concentration of the serially diluted A DNA samples With the aid of the above standard curve, the concentration of DNA in the original source plates #1-8 was determined and a representative subset of the results is summarized in Table 1 in the Appendix. As observed from the Table Al in the Appendix, the concentration of samples provided by Toronto ranged from -0.03ng/pL to 9.77ng/pL and 43 most of the concentrations were lower than expected. This was presumably the major reason for the failed genotyping assays. However, this inconsistent concentration of DNA samples was not detected in source plates #9-20. DNA samples in those plates were genotyped for most of the SNPs while waiting for a new batch of DNA samples of all the participants to be sent from Toronto. These results from plates #9- 20 would be used as a quality control measure to compare with genotypes from the new DNA samples. (see section 12-Re-genotyping) 2. Analysis of the Genotypic Data for Mendelian Inconsistencies Analysis of the parent-offspring trios for Mendelian inconsistencies was performed using the Family Based Association Test (FBAT) program [42]. This program tests the inheritance pattern from both parents to their child. Each of the parents should contribute one of their two alleles to the child in accordance with the theory of heredity stated by the father of genetics, Gregor Mendel. The presence of Mendelian errors in any one of the families for any one of the SNPs being tested will identify potential problems with the samples and/or genotyping assays. Mendelian errors were detected in some of the families when testing for particular SNPs and the results are summarized in Table A2 in the Appendix. There were many potential reasons leading to the observed Mendelian errors, which included contamination of samples by other DNA molecules during sample collection or genotyping, incorrect labeling of the identities of the individuals, non-paternity and random genotyping errors. In this study, those families with more than one error in the Mendelian inheritance patterns were deleted from the study while those with only one error were excluded for the analysis of the particular SNP for which the error was detected. 3. Sequencing Results Due to discrepancies between genotypes of three of the control individuals when testing the assay for BF_7202, DNA samples from Coriell controls D0008, D016 and E016 were sent to the University of British Columbia for sequencing, together with the sequence of the 44 primers used. However, these sequencing reactions were not successful for unknown technical reasons. However, the genotypes of these three individuals were repeated two more times by the TaqMan assay designed for BF_7202, and they were found to match those on the SeattleSNPs website. Therefore, the genotyping result was confirmed as reliable regardless of the failure of the sequencing by UBC and the analyses of the BF_7202 SNP were continued. 4. Genotypes of the Participating Individuals This was a sub-study of a large, Canada-wide and international project: the Canadian Consortium for CF Modifiers. DNA samples from 535 families with a total of 1605 subjects (2 parents and 1 patient) across Canada were included in this sub-study. A DNA sample of each participant was sent from Toronto at a concentration of lOng/pL. Four candidate genes with a total of 22 SNPs were chosen and all samples were genotyped for those SNPs by TaqMan assays according to the procedures detailed in the "Materials and Methods" section. Five SNPs were selected for Factor B gene. Genotyping results for this gene are summarized in Table 3 in the Appendix. As noted in the table, non call rates ("undetermined" in Table A3 in the Appendix) for all SNPs were about 2% or less. One of the possible reasons for genotyping failure was a low concentration of DNA (less than the expected concentration of lng/pL after dilution) and therefore not enough DNA molecules were amplified for those individuals and their genotypes were undetectable. Two families were deleted from the analysis of SNP 2557 since errors were detected in these two families when investigating the pattern of inheritance of the alleles from parents to offspring utilizing the FBAT program. Four SNPs were selected for genotyping in the Complement factor 3 gene (Table A4 in the Appendix). Genotyping was not successful for less than 2% of the samples for each SNP. Furthermore, three and one families were deleted from the analysis of SNP 963 and 28795, respectively because of the non-Mendelian inheritance pattern observed at the two SNPs in those families. 45 Seven SNPs were selected for analysis in the toll-like receptor 4 gene (Table A5 in the Appendix). In the TLR4 gene, only one family was detected to have an error in the inheritance pattern (for SNP 11912) and it was removed from all analyses regarding this SNP. The genotypes of less than 2% of the samples were found to be undetermined. Six SNPs were included for the analysis of the heme oxygenase-1 gene (Table A6 in the Appendix). For SNP 149, 17 families were excluded due to Mendelian errors and a high number of individuals were detected to have an undetermined genotype. With the exception of SNP 149, only about 1% of genotypes could not be determined. Furthermore, only 2 families were deleted from the analysis due to errors in the inheritance pattern (one each for SNP 2790 and 9531). 5. Phenotypic Characteristics of the Study Subjects The study consisted of a total of 535 families recruited at CF clinics across Canada. Unfortunately, we were not able to collect characteristics of every single individual enrolled in the study. The available variables are summarized in Table A7 in the Appendix. 6. Determination of Hardy-Weinberg Equilibrium in the Parent Population Hardy-Weinberg analysis determines how the allele frequency of a given SNP corresponds to the genotype distribution. The Hardy-Weinberg Law states that the frequency of an allele should remain constant over time unless there are outside driving forces acting on the population. A Hardy-Weinberg analysis was performed on the parent population since any deviation from equilibrium could indicate an inaccurate genotyping assay or a genetically heterogeneous study population. A summary table of the investigation of Hardy-Weinberg equilibrium among the parent population is shown in Table 10. 46 Table 10: Hardy-Weinberg Equilibrium among the parental population Gene SNP Allele Allele Frequency Genotype Observed Frequency Expected Frequency x2 Test Factor B 2557 A 0.18 AA 36 34 0.94 AG 314 312 G 0.82 GG 707 711 Total 1057 1057 4022 A 0.32 AA 107 108 0.99 AG 458 459 G 0.68 GG 489 487 Total 1054 1054 6484 A 0.89 AA 858 844 0.08 AG 189 209 G 0.11 GG 19 13 Total 1066 1066 7202 A 0.43 AA 197 194 0.93 AG 508 514 G 0.57 GG 344 341 Total 1049 1049 8311 C 0.91 CC 862 870 0.58 CT 182 172 T 0.09 TT 7 9 Total 1051 1051 Complement factor 3 963 G 0.51 GG 270 272 0.97 GT 527 523 T 0.49 TT 249 251 Total 1046 1046 28795 A 0.58 AA 355 354 0.75 AG 521 513 G 0.42 GG 177 186 Total 1053 1053 36735 A 0.44 AA 202 203 1.00 AG 519 518 G 0.56 GG 330 330 Total 1051 1051 43118 A 0.48 AA 239 244 0.91 AG 529 529 G 0.52 GG 292 287 Total 1060 1060 47 Table 10: Hardy-Weinberg Equilibrium among the parental population TLR 4 851 A 0.72 AA 551 553 0.95 AG 434 430 G 0.28 GG 81 83 Total 1066 1066 1859 A 0.37 AA 143 145 0.68 AG 509 495 G 0.63 GG 410 422 Total 1062 1062 2856 C 0.15 CC 24 24 0.96 CT 266 270 T 0.85 TT 770 766 Total 1060 1060 9263 A 0.11 AA 16 13 0.70 AC 207 208 C 0.89 CC 841 843 Total 1064 1064 11912 G 0.67 GG 478 476 0.99 GT 467 468 T 0.33 TT 114 115 Total 1059 1059 15884 C 0.15 CC 23 24 0.98 CG 270 269 G 0.85 GG 764 764 Total 1057 1057 17050 C 0.15 CC 20 24 0.67 CT 277 271 T 0.85 TT 765 767 Total 1062 1062 Herne oxygenase 1 149 A 0.32 AA 122 103 0.04 AG 404 437 G 0.68 GG 479 465 Total 1005 1005 1038 C 0.96 CC 975 976 0.76 CT 83 81 T 0.04 TT 1 2 Total 1059 1059 2790 A 0.57 AA 341 344 0.92 AT 517^' 519 T 0.43 TT 200 195 Total 1058 1058 3303 C 0.05 CC 2 3 0.74 CG 106 101 G 0.95 GG 954 958 Total 1062 1062 9531 A 0.46 AA 225 223 0.92 AG 528 524 G 0.54 GG 302 308 Total 1055 1055 16442 A 0.95 AA 952 958 0.37 AT 109 101 T 0.05 TT 1 3 Total 1062 1062 48 Hardy-Weinberg Equilibrium was established for all the selected SNPs among the parental population, except SNP HMOX1_149. 7. Comparison of Genotype Frequencies in the Parent Population and in Online Databases As another check of the genotyping assays the allele frequencies obtained in this study were compared with those recorded for Caucasian individuals in the Innate Immunity Programs for Genomic Applications (IIPGA) and UW-FHCRC Variation Discovery Resource (Seattle SNPs) website (Table 11). Wide disparities between these two datasets could indicate an inaccurate genotyping assay in this study. 49 Table 11: Comparison of allele frequencies between the genotyping results and the reported values on either the IIPGA or Seattle SNPs websites Gene SNP Allele Frequency^of the allele in this study Reported frequency^of the allele X2 Test Factor B 2557 A 0.18 0.15G 0.82 0.85 0.60 4022 A 0.32 0.29 G 0.68 0.71 0.65 6484 A 0.89 0.89 G 0.11 0.11 0.96 7202 A 0.43 0.35 G 0.57 0.65 0.27 8311 C 0.91 0.89 0.72T 0.09 0.11 Complement factor 3 963 G 0.51 0.55 0.61T 0.49 0.45 28795 A 0.58 0.5 0.31G 0.42 0.5 36735 A 0.44 0.45 0.86G 0.56 0.55 43118 A 0.48 0.47 0.99G 0.52 0.53 TLR-4 851 A 0.72 0.72 0.96G 0.28 0.28 1859 A 0.37 0.39 0.87G 0.63 0.61 2856 C 0.15 0.13 0.74T 0.85 0.87 9263 A 0.11 0.13 0.70C 0.89 0.87 11912 G 0.67 0.70 0.73T 0.33 0.30 15884 C 0.15 0.11 0.44G 0.85 0.89 17050 C 0.15 0.17 0.64T 0.85 0.83 HMOX1 149 A 0.32 0.28 0.57G 0.68 0.72 1038 C 0.96 0.89 0 02T 0.04 0.11 2790 A 0.57 0.61 0.57T 0.43 0.39 3303 C 0.05 0.11 0.09G 0.95 0.89 9531 A 0.46 0.48 0.87G 0.54 0.52 16442 A 0.95 0.87 0.02T 0.05 0.13 50 Inspection of Table 11 shows that the allele frequencies from this study and the reported allele frequencies were generally similar. However, statistical analysis by 2x2 Chi-square test demonstrated significant differences for the heme oxygenase-1 SNPs 1038 and 16442. There were discrepancies of approximately 7% in the allele frequencies for each of those two SNPs. This may indicate that the assays for these SNPs are unreliable. However, the fact that the discrepancies are small and the SNPs are in Hardy-Weinberg equilibrium argues against this. Samples with different genetic backgrounds would be another possible reason for the discrepancy. The reported heme oxygenase-1 data shown in Table 11 are from the Seattle SNPs website. All the DNA samples used in this study and those used by Seattle SNPs were from the Caucasian population. However, the Caucasian individuals genotyped by Seattle SNPs were Utah residents with ancestry from Northern and Western Europe while those in this study may be representative of a different subset of the Caucasian samples. Another possible reason for the discrepancy could be the small sample size used by Seattle SNPs, i.e. only 23 European samples were sequenced. 8. ANOVA Analysis of the Influence of Genotype on the Phenotypes Three phenotypic characteristics: age of diagnosis, FEV1 predicted value and FEV1 standard deviation were included in the one- way ANOVA to determine if any of the genotypes contributed to the variation in the selected traits. Each trait provides a measure of the degree of severity of disease in the patients. The use of ANOVA assumes that the dependent variable is normally distributed, therefore, various transformations were performed (including quadratic, cubic, common logarithm and natural logarithm) in order to normalize the distributions if needed. In this study, only the data set of age of diagnosis was slightly skewed and it was then transformed by common logarithm before ANOVA was performed. (a) Factor B (I)^Age of diagnosis 51 Table 12: ANOVA of age of diagnosis among different genotypes of the selected SNPs in Factor B. The age of diagnosis was logarithmically transformed for normality. SNPs Genotype Number Mean Standard Error P value BF 2557 AA 18 -0.27 0.20 0.74 AG 111 -0.16 0.08 GG 307 -0.23 0.05 BF 4022 AA 55 -0.20 0.12 0.23 AG 167 -0.30 0.07 GG 217 -0.15 0.06 BF 6484 AA 353 -0.19 0.05 0.51 AG 83 -0.31 0.09 GG 7 -0.13 0.33 BF 7202 AA 85 -0.02 0.09 0.0057 AG 210 -0.34 0.06 GG 146 -0.13 0.07 BF 8311 CC 360 -0.25 0.05 0.12 CT 73 -0.04 0.10 TT 4 0.11 0.43 (II) FEV1 predicted value Table 13: ANOVA of FEV1 predicted value among different genotypes of the selected SNPs in Factor B SNPs Genotype Number Mean Standard Error P value BF 2557 AA 14 78.31 6.98 0.01 AG 104 68.90 2.56 GG 279 77.86 1.56 BF 4022 AA 48 69.89 3.79 0.16 AG 150 74.86 2.14 GG 205 77.72 1.83 BF 6484 AA 329 75.22 1.45 0.41 AG 70 79.19 3.14 GG 7 68.97 9.92 BF 7202 AA 82 69.43 2.89 0.04 AG 185 76.30 1.93 GG 137 78.47 2.24 BF 8311 CC 321 75.77 1.47 0.91 CT 72 74.58 3.11 TT 5 72.67 11.79 52 (III) FEV1 standard deviation value Table 14: ANOVA of FEV1 sd value among different genotypes of the selected SNPs in Factor B SNPs Genotype Number Mean Standard Error P value BF 2557 AA 14 0.70 0.25 0-01 AG 104 0.23 0.09 GG 279 0.53 0.06 BF 4022 AA 48 0.19 0.14 0.08 AG 150 0.46 0.08 GG 205 0.53 0.07 BF 6484 AA 329 0.45 0.05 0.12 AG 70 0.57 0.11 GG 7 -0.18 0.36 BF 7202 AA 82 0.23 0.10 0.03 AG 185 0.48 0.07 GG 137 0.58 0.08 BF 8311 CC 321 0.46 0.05 0.66 CT 72 0.47 0.11 TT 5 0.08 0.42 From the above tables, the age of diagnosis was log-transformed in order to achieve normally distributed set of data. Three of the five SNPs (BF_4022, BF_6484 and BF_8311) had P values of greater than 0.05 when performing ANOVA for all the phenotypes. Therefore, it was unlikely that there was a relationship between these SNPs and CF disease severity. However, for the two SNPs (BF_2557 and BF_7202) with P values less than 0.05 the analyses were repeated with adjustment for confounding factors in order to confirm a significant association between the tested SNPs and phenotypic traits. (b) Complement Factor 3 ANOVA tests were completed for investigating the selected three phenotypic traits among the SNPs of the candidate CF modifier gene of Complement factor 3 and the results are shown in Tables A8, A9 and A10 in the Appendix. No adjustment with respect to those confounding factors was required for any of the analyses in this part. None of the analyses revealed a relationship with the measured factors since all of the P values were greater than 0.05. 53 (c) Toll-like Receptor 4 (I)^Age of diagnosis Table 15: ANOVA of age of diagnosis among different genotypes of the selected SNPs in Toll-like receptor 4. The age of diagnosis was logarithmically transformed for normality. SNPs Genotype Number Mean Standard Error P value TLR4_851 AA 242 -0.19 0.06 0.57 AG 168 -0.25 0.07 GG 31 -0.08 0.15 TLR4_1859 AA 62 -0.29 0.11 0.04 AG 206 -0.10 0.06 GG 175 -0.31 0.06 TLR4_2856 CC 9 0.03 0.29 0.20 CT 121 -0.32 0.08 TT 314 -0.17 0.05 TLR4_9263 AA 7 0.19 0.33 0.47 AC 86 -0.21 0.09 CC 351 -0.22 0.05 TLR4_11912 GG 190 -0.19 0.06 0.47 GT 191 -0.19 0.06 TT 59 -0.34 0.11 TLR4_15884 CC 9 -0.25 0.29 0.92 CG 102 -0.24 0.09 GG 325 -0.21 0.05 TLR4_17050 CC 12 -0.15 0.25 0.80 CT 109 -0.26 0.08 TT 322  -0.20 0.05 ANOVA tests were completed for examining the selected three phenotypic traits among the SNPs of the candidate CF modifier gene of Toll-like receptor 4. As shown in the table above and Tables All and Al2 in the Appendix, no adjustment for confounding factors was performed since all the P values were greater than 0.05, except for the ANOVA of SNP TLR4_1859 and the age of diagnosis. Consequently, regression analysis was performed for this association. (d) Heme Oxygenase-1 (I)^FEV1 predicted value 54 Table 16: The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in heme oxygenase-1 SNPs Genotype Number Mean Standard Error P value HMOX1_149 AA 49 71.14 3.81 0.25 AG 159 74.08 2.12 GG 169 77.64 2.05 HMOX1_1038 CC 367 75.16 1.37 0.43 CT 37 81.07 4.31 TT 2 75.30 18.55 HMOX1_2790 AA 128 78.59 2.31 0.045 AT 193 76.39 1.88 TT 83 69.57 2.87 HMOX1_3303 CC 0 n/a n/a 0.39 CG 39 79.13 4.19 GG 366 75.37 1.37 HMOX1_9531 AA 87 79.82 2.80 0.017 AG 188 77.55 1.90 GG 126 70.43 2.33 HMOX1_16442 AA 366 75.46 1.37 0.71 AT 40 77.09 4.13 TT 0 n/a n/a (II) FEV1 standard deviation value Table 17: The ANOVA result of examining FEV1 sd value among different genotypes of the selected SNPs in heme oxygenase-1 SNPs Genotype Number Mean Standard Error P value HMOX1_149 AA 49 0.27 0.14 0.43 AG 159 0.44 0.08 GG 169 0.48 0.07 HMOX1_1038 CC 367 0.45 0.05 0.61 CT 37 0.59 0.16 TT 2 0.20 0.67 HMOX1_2790 AA 128 0.49 0.08 0.029 AT 193 0.54 0.07 TT 83 0.22 0.10 HMOX1_3303 CC 0 n/a n/a 0.96 CG 39 0.46 0.15 GG 366 0.46 0.05 HMOX1_9531 AA 87 0.55 0.10 0.038 AG^. 188 0.59 0.07 GG 126 0.24 0.08 HMOX1_16442 AA 366 0.46 0.05 0.79 AT 40 0.42 0.15 TT 0 n/a n/a 55 From the two tables above and Table Al3 in the Appendix, it is unlikely that any of the six SNPs were associated with the three measured phenotypic traits, except for HMOX1_2790 and HMOX1_9531. These SNP were detected to have a P-value of less than 0.05 when performing ANOVA with FEV1 predicted value and FEV1 sd value. Before a significant relationship could be concluded, adjustment for confounding factors was required. 9. Regression Analysis As indicated by the ANOVA results, a significant association was found for the following SNPs: (1) BF 2557 and FEV1 predicted value/sd value; (2) BF7202 and age of diagnosis; (3) BF 7202 and FEV1 predicted value/sd value; (4) TLR4_1859 and age of diagnosis; (5) HMOX1_2790 and FEV1 predicted value/sd value; and (6) HMOX1_9531 and FEV1 predicted value. The ANOVA of those five SNPs with the specified phenotypes had a P value of less than 0.05 which indicated that there might be an association between the SNP and trait. However, before a final conclusion that a correlation could be established, the association was corrected for confounding factors which included sex, age and CFTR genotype. These factors were hypothesized to have effect on those measured traits, for example, different genotype at the CFTR mutation would lead to a wide variety of clinical symptoms. Regression analysis was performed with those confounding factors and the results are illustrated in the following tables 18-27. 56 (a) BF_2557 Table 18: Regression analysis of the association of BF_2557 and FEV1 predicted value with confounding factors SNP/phenotypic trait Confounding factor Sub-groups of confounding factor Mean Standard Error P value BF^2557/FEV1 predicted value BF 2557 AA 77.40 6.63 0.014 AG 69.25 2.63 GG 78.53 1.75 Sex F 76.19 3.01 0.64 M 75.98 2.80 Age n/a n/a n/a <0.0001 CFTR genotype AF508/AF508 75.61 2.51 0.38 AF508/other 77.92 2.92 other/other 72.05 4.68 Table 19: Regression analysis of the association of BF_2557 and the FEV1 sd value with confounding factor SNP/phenotypic trait Confounding factor Sub-groups of confounding factor Mean Standard Error P value BF_2557/FEV1 sd value BF 2557 AA 0.72 0.27 0.016 AG 0.25 0.11 GG 0.55 0.07 Sex F 0.43 0.12 0.55 M 0.51 0.11 CFTR genotype AF508/AF508 0.44 0.10 0.38 AF508/other 0.57 0.12 other/other 0.37 0.19 From the above two tables, it can be seen that there were associations between the Factor B SNP 2557 and both FEV1 predicted and standard deviation values. The associations were independent of the potential confounding factors of sex and CFTR genotype. However, a P value of less than 0.0001 was found on evaluation of the effect of age on the FEV1 predicted value. This observation could be explained by the fact that lung function measured by percent predicted would be decreased with increasing age. 57 (b) BF_7202 Table 20: Regression analysis of the association of BF_7202 and the age of diagnosis with confounding factors SNP/phenotypic trait Confounding factor Sub-groups of confounding factor Mean Standard Error P value BF_7202/age of diagnosis BF 7202 AA 2.75 0.63 0.0274 AG 2.00 0.45 GG 3.70 0.48 Sex F 2.31 0.45 0.13 M 3.04 0.44 CFTR genotype AF508/AF508 1.59 0.34 <0.0001 AF508/other 3.80 0.45 other/other 6.11 0.90 Table 21: Regression analysis of the association of BF_7202 and the FEV1 predicted value with confounding factors SNP/phenotypic trait Confounding factor Sub-groups of confounding factor Mean Standard Error P value BF_7202/FEV1 predicted value BF 7202 AA 70.12 2.92 0.019 AG 76.73 2.10 GG 79.73 2.23 Sex F 76.75 2.07 0.66 M 76.07 2.05 Age n/a n/a n/a <0.0001 CFTR genotype AF508/AF508 75.91 1.59 0.25 AF508/other 78.31 2.11 other/other 72.05 4.16 58 Table 22: Regression analysis of the association of BF_7202 and the FEV1 standard deviation value SNP/phenotypic trait Confounding factor Sub-groups of confounding factor Mean Standard Error P value BF_7202/FEV1 sd value BF 7202 AA 0.26 0.12 0-0215 AG 0.49 0.08 GG 0.61 0.09 Sex F 0.45 0.08 0.57 M 0.52 0.08 Age n/a n/a n/a 0.12 CFTR genotype AF508/.F508 0.45 0.06 0.26 AF508/other 0.58 0.08 other/other 0.37 0.17 A P value of 0.0274 was found, which confirmed the presence of a significant relationship for the regression analysis of BF_7202 and age of diagnosis. Also, the associations between both the FEV1 predicted and standard deviation values and BF_7202 were verified. (c) TLR4_1859 Table 23: Regression analysis of the association of TLR4_1859 and the age of diagnosis with confounding factors SNP/phenotypic trait Confounding factor Sub-groups of confounding factor Mean^• Standard Error P value TLR4/age^of diagnosis TLR4_1859 AA 2.17 0.680.73 AG 2.95 0.44 GG 2.59 0.47 Sex F 2.31 0.45 0.12 M 3.05 0.46 CFTR genotype AF508/AF508 1.59 0.36 <0.0001 AF508/other 3.80 0.46 other/other 6.11 0.90 The presence of an association between TLR4_1859 and age of diagnosis was not confirmed by this analysis, as there was a p value for TLR4 1859 of 0.68 after adjustment for confounding factors. 59 (d) HMOX1_2790 Table 24: Regression analysis of the association of HMOX1_2790 and the FEV1 predicted value with confounding factors SNP/phenotypic trait Confounding factor Sub-groups of confounding factor Mean Standard Error P value HMOX1_2790/FEV1 predicted value HMOX1_2790 AA 78.39 2.41 0.05 AT 77.08 2.04 TT 70.74 2.91 Sex F 76.47 2.09 0.71 M 76.08 2.08 Age n/a n/a n/a <0.0001 CFTR genotype AF508/AF508 75.87 1.62 0.34 AF508/other 78.19 2.19 other/other 71.14 4.21 Table 25: Regression analysis of the association of HMOX1_2790 and the FEV1 standard deviation value with confounding factors SNP/phenotypic trait Confounding factor Sub-groups of confounding factor Mean Standard Error P value HMOX1_2790/FEV1 sd value HMOX1_2790 AA 0.49 0.097 0.045 AT 0.57 0.082 TT 0.24 0.12 Sex F 0.44 0.08 0.62 M 0.51 0.08 CFTR genotype AF508/AF508 0.44 0.07 0.33 AF508/other 0.58 0.09 other/other 0.36 0.17 For HMOX1_2790, both analyses indicated the correlation between the SNP and the two phenotypic traits. However, the association only showed a borderline P value of 0.05 and 0.045 when testing for FEV1 predicted value and FEV1 sd value respectively, and therefore must be viewed with caution. 60 (e) HMOX1_9531 Table 26: Regression analysis of the association of HMOX1_9531 and the FEV1 predicted value with confounding factors SNP/phenotypic trait Confounding factor Sub-groups of confounding factor Mean Standard Error P value HMOX1_9531/FEV1 predicted value HMOX1_9531 AA 79.37 2.83 0.007 AG 78.36 2.00 GG 71.26 2.43 Sex F 76.68 2.09 0.53 M 76.13 2.03 Age n/a n/a n/a <0.0001 CFTR genotype AF508/AF508 75.92 1.61 0.39 AF508/other 78.32 2.10 other/other 72.05 4.15 Table 27: Regression analysis of the association of HMOX1_9531 and the FEV1 standard deviation value with confounding factors SNP/phenotypic trait Confounding factor Sub-groups of confounding factor Mean Standard Error P value HMOX1_9531/FEV1 sd value HMOX1_9531 AA 0.54 0.11 0.006 AG 0.61 0.08 GG 0.26 0.10 Sex F 0.45 0.08 0.45 M 0.52 0.08 CFTR genotype AF508/AF508 0.45 0.06 0.39 AF508/other 0.59 0.08 other/other 0.37 0.17 For HMOX1_9531, both analyses indicated a strong correlation between the SNP and the two phenotypic traits, as indicated by the P value of 0.007 and 0.006. 10.^Age of Onset Analysis for the Age of First Pseudomonas aeruginosa Infection Approximately 80% of CF patients have either acute or chronic Pseudomonas aeruginosa infection. This was true for most of the 61 patients recruited in this study. However, there were individuals who had never been infected with Pseudomonas aeruginosa. Assuming that there were no errors in the data collection i.e. that the age of first Pseudomonas aeruginosa infection was missing; it was not possible to predict if these individuals would have Pseudomonas aeruginosa infection in the future. Survival analysis was performed to investigate the association between particular SNP and age of first Pseudomonas aeruginosa infection. (a) Factor B From Table Al4 in the Appendix, one subgroup of each variable is not indicated, for example, only two (AA and AG) out of three possible genotypes were present in the table. The missing subgroup was the one with the greatest proportion. It was used as the baseline for comparing with other subgroups in order to reveal the effect of the genotype of selected SNPs on the phenotypic trait. These analyses indicated that none of the selected SNPs in the candidate gene of Factor B played a role in affecting the age of first Pseudomonas aeruginosa infection. (b) Complement Factor 3 From Table Al5 in the Appendix, none of the tested SNPs illustrated a correlation with the age of first Pseudomonas aeruginosa infection. 62 (c) Toll-like receptor 4 Table 28: Age of onset analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Toll- like receptor 4 SNPs Variable Sub-group Estimated Value Standard Error P value TLR4_851 TLR4_851 AA 0.169 0.155 0.5269 AG 0.0614 0.161 Sex Female 0.156 0.086 0.0712 CFTR mutation AF508/AF508 -0.0192 0.133 0.9773 AF508/other -0.0245 0.143 TLR4_1859 TLR4_1859 AA 0.196 0.154 0.4581 AG -0.069 0.119 Sex Female 0.139 0.087 0.1109 CFTR mutation AF508/AF508 -0.036 0.134 0.9603 AF508/other -0.017 0.143 TLR4_2856 TLR4_2856 CC 0.566 0.393 0.286 CT -0.198 0.225 Sex Female 0.163 0.086 0.0598 CFTR mutation AF508/AF508 -0.0293 0.133 0.975 AF508/other -0.0104 -0.287 TLR4_9263 TLR4_9263 AA -0.292 0.693 0.0425 AC -0.132 0.376 Sex Female 0.157 0.087 0.071 CFTR mutation AF508/AF508 -0.066 0.136 0.8834 AF508/other -0.028 0.144 TLR4_11912 TLR4_11912 GG 0.073 0.128 0.5244 GT -0.131 0.131 Sex Female 0.141 0.088 0.1102 CFTR mutation AF508/AF508 -0.032 0.135 0.9537 AF508/other -0.031 0.144 TLR4_15884 TLR4_15884 CC 0.444 0.397 0.1965 CG -0.077 0.231 Sex Female 0.146 0.088 0.0983 CFTR mutation AF508/AF508 0.0042 0.135 0.9515 AF508/other -0.0455 0.146 TLR4_17050 TLR4_17050 CC -0.065 0.397 0.3172 CT -0.121 0.233 Sex Female 0.157 0.087 0.0714 CFTR AF508/AF508 -0.037 0.135 0.9143mutation AF508/other -0.051 0.144 63 Among the seven selected SNPs in the candidate gene of toll-like receptor 4, one of them (TLR4_9263) had a P value of 0.0425. This meant that this SNP was significantly associated to the age of first Pseudomonas aeruginosa infection. (d) Heme oxygenase-1 From the data indicated in Table Al6 in the Appendix, it can be seen that none of the six selected SNPs of heme oxygenase-1 was statistically associated to the age of first Pseudomonas aeruginosa infection. 11.^Pseudomonas aeruginosa Infection Status Not all patients recruited in this study had Pseudomonas aeruginosa infection. All the patients had been classified into one of the following categories according to their Pseudomonas aeruginosa infection status: no Pseudomonas aeruginosa infection (code 0), grew Pseudomonas aeruginosa once (code 1), sporadic Pseudomonas aeruginosa growth (code 2) and chronic Pseudomonas aeruginosa infection (code 3). It was hypothesized that one or more of the SNPs of the four candidate modifier genes might contribute to susceptibility to Pseudomonas aeruginosa infection of CF patients. Chi-square tests were performed to confirm the existence of such a relationship. (a) Factor B The data in Table All in the Appendix show the chi square test results for all SNPs of the Factor B gene. From the statistical analysis, the P-value of each of the Chi square tests carried out for each SNP was greater than 0.05, which indicated that Pseudomonas aeruginosa infection status was not related to one of five selected SNPs in the Factor B gene. (b) Complement factor 3 64 Table 29: Chi square test for investigating the relationship between different genotypes of the selected SNPs in Complement factor 3 and Pseudomonas aeruginosa infection status SNPs Genotype PA status (in percentage) Chi Test P value 0 1 2 3 C3_963 GG 37.31 11.94 22.39 28.36 12.60 0.0498GT 26.55 17.26 17.70 38.50 TT 25.93 15.74 28.70 29.63 C3_28795 AA 28.33 17.78 21.67 32.22 6.13 0.49 AG 33.81 13.33 20.48 32.38 GG 21.69 14.46 22.89 40.96 C3_36735 AA 28.24 15.29 16.47 40.00 3.86 0.70 AG 31.38 15.48 21.34 31.80 GG 27.27 13.39 25.17 33.57 C3_43118 AA 34.55 14.55 18.18 32.73 7.79 0.25 AG 31.72 13.66 19.82 34.80 GG 22.06 18.38 26.47 33.09 The above table demonstrates that none of the C3_28795, C3_36735 and C3_43118 SNPs was associated with Pseudomonas aeruginosa infection status in CF patients. However, a borderline association was observed with the C3_963 SNP. Thus, genotype at the C3_963 polymorphism might contribute to Pseudomonas aeruginosa infection status in CF patients. (c) Toll-like receptor 4 From the result Table A18 in the Appendix, it can be seen that all the SNPs in TLR4 had a P-value greater than 0.05 and therefore it was not likely that any of the SNPs influence Pseudomonas aeruginosa infection status among the CF patients. (d) Heme oxygenase-1 The chi-square test was carried out for each of the SNPs in the HMOX1 gene and the results are summarized in Table Al9 in the Appendix. As with the TLR4 gene, there was no relationship between genotypes and 65 the Pseudomonas aeruginosa infection status. However, 0% was detected for some SNPs in some categories of status of Pseudomonas aeruginosa infection. For example, all patients with the TT genotype for SNP HMOX1 1038 were either classified to be 'grew Pseudomonas aeruginosa infection" or "chronic Pseudomonas aeruginosa infection". Also, all patients with the TT genotype for SNP HMOX1_16442 were found to be "no Pseudomonas aeruginosa infection". Logically one might think that those genotypes influenced the observed Pseudomonas aeruginosa infection status. However, this was not proved by the statistical test. Since the number of individuals with genotype TT at HMOX1_9531 and genotype TT at HMOX1_16442 were only 2 and 1, respectively it was not possible to draw valid conclusions regarding the influence of these SNPs on Pseudomonas aeruginosa infection status. 12.^Re-genotyping Re-genotyping was performed as a quality control measure. The selection of SNPs and individuals for re-genotyping are described in the "Materials and Methods". The comparison is shown in Table 30. Table 30: Comparing the result of genotyping and re-genotyping Gene SNPs Number^of Individuals Errors found Factor B 2557 618 1 4022 618 1 6484 618 0 Complement Factor 3 963 609 3 Toll-like Receptor 4 851 950 0 1859 618 2 2856 618 1 9263 950 3 11912 950 4 15884 661 0 17050 618 1 Heme oxygenase-1 1038 621 0 2790 657 2 3303 949 0 9531 947 3 16442 618 1 Although not all SNPs and individuals were included for re- genotyping, the above selected SNPs and individuals were sufficient to 66 represent the whole population in this study. As can be seen from the table, the number of errors found ranged from 0 to 4. Such a low frequency of errors was acceptable and the possible reasons for the mismatch in genotypes included inconsistent concentration, contamination of the different DNA aliquots or random errors. 13.^FBAT Analysis of Phenotypic Characteristics FBAT was another analytical method for determining the association between a particular SNP and a phenotypic trait. FBAT differs from ANOVA in that it determines the inheritance pattern within a family, i.e, from parents to their child. FBAT employs a type of transmission disequilibrium test (TDT). Two of the three models (additive, dominant and recessive) were performed, since both dominant and recessive should gave the same result. For those SNPs with a P value of less than 0.05, a detailed summary of the test is shown below. The complete FBAT analysis is shown in Tables A20-A25 of the Appendix. (a) Age of diagnosis Table 31: Detailed results of FBAT analysis of age of diagnosis and SNPs which had a P value of less than 0.05 under the additive model Marker Allele Allele Frequency #^of families S E(S) Var(S) Z P C3_28795 1 0.583 332 -40.89 -60.10 82.74 2.11 0.0347 2 0.417 332 -73.47 -54.26 82.74 -2.11 0.0347 C3_43118 1 0.474 342 -58.60 -77.78 94.39 1.98 0.0484 2 0.526 342 -87.27 -68.09 94.39 -1.98 0.0484 From the data table, only two out of the selected 22 SNPs (C3_28795 and C3_43118) were shown to have a significant association with the age of diagnosis. All SNPs were tested again under the dominant model. 67 Table 32: Detailed results of FBAT analysis of age of diagnosis and SNPs which had a P value of less than 0.05 under the dominant model Marker Allele Allele Frequency #^of families S E(S) Var(S) Z P BF_7202 1 0.431 231 -41.85 -40.94 43.19 -0.14 0.889 2 0.569 196 -40.07 -27.23 34.55 -2.19 0.029 C3_28795 1 0.583 189 -15.82 -20.84 30.73 0.90 0.366 2 0.417 244 -37.96 -23.75 42.450 -2.18 0.029 By comparing with the results obtained when analyzing by the additive model, C3_28795 was also found to have an association with the age of diagnosis. However, another SNP, BF_7202 instead of C3_43118, was associated under the dominant model. (b) FEV1 predicted value Table 33: Detailed results of FEAT analysis of FEV1 predicted value and SNPs which had a P value of less than 0.05 under the additive model Marker Allele Allele Freq # of families S E(S) Var(S) Z P value BF_2557 1 0.184 201 8083.95 9512.89 371921.59 -2.34 0.019 2 0.816 201 21452.06 20023.12 371921.59 2.34 0.019 TLR4_15884 1 0.149 170 6723.98 7927.42 327605.14 -2.10 0.036 2 0.851 170 20027.89 18824.45 327605.14 2.10 0.036 HMOX1_1038 1 0.959 59 6037.98 6846.47 110788.10 -2.43 0.015 2 0.041 59 3324.07 2515.58 110788.10 2.43 0.015 Among the selected 22 SNPs, three (BF_2557, TLR4_15884 and HMOX1_1038) were found to have an association with the age of diagnosis, when analyzed under the additive model by the FBAT program. Table 34: Detailed results of FBAT analysis of FEV1 predicted value and SNPs which had a P value of less than 0.05 under the dominant model Marker Allel e Allele Frequency #^of families S E(S) Var(S) Z P value BF_2557 1 0.184 193 6329.50 7829.51 279688.25 -2.84 0.005 2 0.816 49 2503.32 2574.39 60601.62 -0.29 0.773 BF_4022 1 0.319 232 8853.45 10359.97 341446.47 -2.58 0.010 2 0.681 118 5761.40 5993.83 156520.15 -0.59 0.556 C3_28795 1 0.583 166 8337.34 8171.95 240014.45 0.34 0.736 2 0.417 233 9551.50 10781.25 350400.00 -2.08 0.038 TLR4_158 84 1 0.149 165 5859.52 6945.67 271528.68 -2.08 0.037 2 0.851 27 1614.44 1497.16 36498.35 0.61 0.539 HMOX1_10 38 1 0.959 4 n/a n/a n/a n/a n/a 2 0.041 59 3173.48 2428.05 99750.38 2.36 0.018 In addition to the three SNPs (BF_2557, TLR4_15884 and HMOX1_1038) which were determined have a relationship with the FEV1 predicted value when analyzed under the additive model, two more SNPs (BF_4022 and C3_28795) also achieved a P value of less than 0.05 when analyzed under the dominant model. (c) FEV1 standard deviation value Table 35: Detailed results of FBAT analysis of FEV1 standard deviation value and SNPs which had a P value of less than 0.05 under the additive model Marker Allele Allele Frequency #^of families S E(S) Var(S) Z P BF_4022 1 0.319 264 66.89 90.96 92.05 -2.51 0.012 2 0.681 264 165.27 141.19 92.05 2.51 0.012 BF_7202 1 0.431 293 99.02 122.02 106.17 -2.23 0.0256 2 0.569 293 164.93 141.92 106.17 2.23 0.0256 When investigating by the FBAT program under the additive model, two SNPs, BF_4022 and BF_7202, had a P value of less than 0.05. 69 Table 36: Detailed results of FEAT analysis of FEV1 standard deviation value and SNPs which had a P value of less than 0.05 under the dominant model Marker Allele Allele Frequency #^of families S E(S) Var(S) Z P BF_2557 1 0.184 193 19.32 36.10 48.58 -2.41 0.016 2 0.816 49 11.60 14.89 9.63 -1.06 0.290 BF_4022 1 0.319 231 39.14 56.00 54.16 -2.29 0.022 2 0.681 118 38.09 30.88 25.43 1.43 0.153 BF_6484 1 0.893 22 8.41 4.24 4.70 1.93 0.054 2 0.107 131 34.39 32.62 40.18 0.28 0.779 BF_7202 1 0.431 217 48.80 64.80 52.25 -2.21 0.027 2 0.569 179 61.86 54.85 38.46 1.13 0.258 RMOX1_9531 1 0.463 215 78.88 55.91 53.01 3.16 0.0016 2 0.537 194 56.87 52.56 54.17 0.59 0.558 In addition to the two SNPs found to have a relationship with the phenotypic characteristic of FEV1 sd value, three more SNPs (BF_2557, BF_6484 and HMOX1_9531) were also determined to have a P value of less than 0.05 when analyzed by the FBAT program under the dominant model. 14. Haplotype Analysis by the RGui Program In the haplotype analysis, combinations of alleles of the selected SNPs of each candidate gene were examined as a group, in order to investigate any association between the haplotype and the phenotypic characteristics. Single SNP analysis had been done by both ANOVA and FEAT, as described in the previous sections. However, combinations of SNPs might offer additional insights into the correlation between the genotype and the phenotype, since the SNPs might interact with one another. 70 (a) Factor B (I) ^Age of diagnosis Table 37: Analysis of the correlation between haplotypes of selected SNPs in Factor B and the age of diagnosis, with no adjustment for confounding factors. The age of diagnosis was logarithmically transformed for normality Estimate^of Regression Coefficient Standard Error Z-score P hlllll 0.343 0.168 2.045 0.0409 h21211 -0.019 0.217 -0.088 0.9299 h22112 0.552 0.230 2.405 0.0162 pooled -0.0506 0.2632 -0.1922 0.8476 32 haplotypes were theoretically possible when analyzing the Factor B gene with the five selected SNPs. However, only 8 haplotypes were reported by the program as shown in Table A26 in the Appendix. Since the frequencies of five of the 8 haplotypes were too low, they were combined as a pooled haplotype for the analysis. Two haplotypes (hlllll and h22112) were found to have a P value of less than 0.05, suggesting that an association was present. The investigation was continued with further analysis including confounding factors to confirm the association. Table 38: Analysis of the correlation between haplotypes of selected SNPs in Factor B and the age of diagnosis, with adjustment for confounding factors. The age of diagnosis was logarithmically transformed for normality Estimate^of Regression Coefficient Standard Error Z-score P hlllll 0.3386 0.1675 2.0210 0.0433 h21211 -0.0169 0.2158 -0.0784 0.9375 h22112 0.5353 0.2287 2.3405 0.0193 pooled -0.0926 0.2666 -0.3472 0.7284 SEXM 0.0247 0.1820 0.1359 0.8919 genotypeFO 0.4342 0.2077 2.0910 0.0365 genotypeOO 0.4385 0.2903 1.5106 0.1309 71 The same eight haplotypes were found when analyzing with confounding factors. In addition, both h11111 and h22112 were found to have a P value of less than 0.05, which confirmed the association between these two haplotypes in the Factor B gene and age of diagnosis. (II) FEV1 predicted value Among the eight reported haplotypes shown in Table A27 in the Appendix, none demonstrated a relationship with the FEV1 predicted value. Similar results were found when examining with the confounding factors. From Tables A28 and A29 in the Appendix, none of the eight haplotypes in the Factor B gene showed an association with the FEV1 predicted value. (III) FEV1 standard deviation value Table 39: Analysis of the correlation between haplotypes of selected SNPs in Factor B and FEV1 standard deviation value, with no adjustment for confounding factors Estimate^of Regression Coefficient Standard Error Z-score P h11111 -0.1350 0.0914 -1.4772 0.1396 h21211 -0.1512 0.1163 -1.3002 0.1935 h22112 -0.1050 0.1156 -0.9084 0.3636 pooled -0.3546 0.1366 -2.5957 0.0094 As indicated in the above table, the pooled haplotypes had a P value of 0.0094, which suggested an association with the phenotype. However, the pooled group included five low frequency haplotypes (table A30 of the Appendix). Therefore, it was not possible to determine which haplotype in the group generated the observed result. 72 Table 40: Analysis of the correlation between haplotypes of selected SNPs in Factor B and FEV1 standard deviation value, with adjustment for the confounding factors Estimate^of Regression Coefficient Standard Error Z-score P hlllll -0.1421 0.0915 -1.5538 0.1202 h21211 -0.1574 0.1160 -1.3567 0.1749 h22112 -0.1225 0.1159 -1.0569 0.2906 pooled -0.3946 0.1389 -2.8402 0.0045 SEXM 0.0053 0.0981 0.0537 0.9572 genotypeFO 0.1348 0.1128 1.1943 0.2323 genotypeOO -0.1104 0.1537 -0.7182 0.4726 As shown in Table 40, a significant P value was found for the pooled haplotypes. This confirmed the results when analyzing without confounding factors. However, this analysis was not able to narrow down the relationship to just one of the five haplotypes in the group. (b) Complement Factor 3 (I)^Age of diagnosis Table 41: Analysis of the correlation between haplotypes of selected SNPs in Complement factor 3 and the age of diagnosis, with no adjustment for confounding factors. The age of diagnosis was logarithmically transformed for normality Estimate^of Regression Coefficient Standard Error Z-score h1112 -0.3883 0.3514 -1.1051 0.2691 h1122 0.0310 0.3892 0.0798 0.9364 h1212 -0.9269 0.4435 -2.0899 0.0366 h2121 -0.1406 0.4514 -0.3114 0.7555 h2122 0.0846 0.3576 0.2366 0.8130 h2212 -0.1365 0.3439 -0.3969 0.6915 pooled -0.1417 0.2889 -0.4905 0.6238 Four SNPs were included in the analysis of the Complement factor 3 gene. There were a possible 16 haplotypes and all were detected in the participants (Table A31 of the Appendix). However, only six haplotypes with high frequencies were examined individually while the 73 others were grouped together. As indicated in the table, only h1212 revealed a P value of less than 0.05, and therefore a significant relationship between this haplotype and age of diagnosis was indicated. The analysis was repeated taking the confounding factors into consideration. Table 42: Analysis of the correlation between combinations of selected SNPs in Complement factor 3 and the age of diagnosis, with adjustment for confounding factors. The age of diagnosis was logarithmically transformed for normality Estimate^of Regression Coefficient Standard Error Z-score P h1112 -0.3764 0.3484 -1.0804 0.2800 h1122 0.0112 0.3849 0.0290 0.9769 h1212 -1.1090 0.4397 -2.5223 0.0117 h2121 -0.1822 0.4379 -0.4160 0.6774 h2122 0.0093 0.3570 0.0261 0.9792 h2212 -0.1949 0.3399 -0.5735 0.5663 pooled -0.1463 0.2859 -0.5117 0.6088 SEXM 0.1086 0.1812 0.5994 0.5489 genotypeFO 0.4907 0.2076 2.3639 0.0181 genotype00 0.4984 0.2899 1.7190 0.0856 A similar result was obtained when examining the haplotypes correcting for the confounding factors, as shown in the above table. This indicated an association of h1212 with the age of diagnosis. (II) FEV1 predicted value In Table A32 in the Appendix, 16 haplotypes were detected when determining the presence of any correlation between the haplotypes in Complement Factor 3 and FEV1 predicted value. As shown in Table A33 in the Appendix, none of the six haplotypes and the pooled group demonstrated an association with the phenotype of FEV1 predicted when no adjustments were made for confounding factors. The conclusion was confirmed when analyzing with the confounding factors Table A34 in the Appendix matched the one in the preceding section, which suggested that none of the haplotypes in Complement 74 factor 3 was related to the observed FEV1 predicted value after adjustments were made for confounding factors. (III) FEV1 standard deviation value 16 haplotypes were found when determining the presence of any correlation between the haplotypes in Complement Factor 3 and FEV1 standard deviation, as indicated in Table A35 in the Appendix. In Table A36 in the Appendix, all of the SNPs were found to have a P value of greater than 0.05 when no adjustments were made for confounding factors. That is, none of the SNPs were related to the FEV1 standard deviation value observed in the patients. From Table A37 in the Appendix, none of the haplotypes in Complement factor 3 showed a significant relationship with the FEV1 standard deviation value, even when the confounding factors were included. (c) Toll-like Receptor 4 (I)^Age of diagnosis Table 43: Analysis of the correlation between haplotypes of selected SNPs in Toll-like receptor 4 and the age of diagnosis, with no adjustment for confounding factors. The age of diagnosis was logarithmically transformed for normality Estimate^of Regression Coefficient Standard Error Z-score P h1122112 -0.1485 0.2048 -0.7250 0.4684 h1122122 0.1282 0.1835 0.6990 0.4846 h1212222 -0.3101 0.2334 -1.3284 0.1840 h1221221 -0.0963 0.2422 -0.3977 0.6908 h1222222 -0.3622 0.2673 -1.3554 0.1753 pooled -0.6202 0.2912 -2.1296 0.0332 From the seven SNPs included in the Toll-like receptor 4 gene, 128 possible haplotypes were theoretically possible. However, only 15 haplotypes were detected in the sample (Table A38 in the Appendix). Five out of the seven haplotypes were analyzed individually whereas the 75 rest were grouped. Only this pooled group of haplotypes showed a P value of less than 0.05, however, it is not possible to determine which in the pooled group was the driving force behind the observed result. Table 44: Analysis of the correlation between haplotypes of selected SNPs in Toll-like receptor 4 and the age of diagnosis, with adjustment for the confounding factors. The age of diagnosis was logarithmically transformed for normality Estimate^of Regression Coefficient Standard Error Z-score P h1122112 -0.1676 0.2049 -0.8178 0.4135 h1122122 0.1098 0.1828 0.6006 0.5481 h1212222 -0.3163 0.2325 -1.3604 0.1737 h1221221 -0.1469 0.2428 -0.6052 0.5451 h1222222 -0.4295 0.2667 -1.6105 0.1073 pooled -0.6439 0.2887 -2.2302 0.0257 SEXM 0.0433 0.1817 0.2381 0.8118 genotypeFO 0.4723 0.2067 2.2852 0.0223 genotypeOO 0.4487 0.2894 1.5505 0.1210 As indicated in the above table, the pooled group demonstrated a P value of 0.0257. That is, one or more of the haplotypes in the group might be related to the observed age of diagnosis in the patients. (II) FEV1 predicted value 15 haplotypes were detected when determining the presence of any correlation between haplotypes in Toll-like receptor 4 and FEV1 predicted value, as shown in Table A39 in the Appendix. Among the five reported haplotypes and the pooled group shown in Table A40 in the Appendix, none of them revealed a P value of less than 0.05 and therefore a significant relationship with the FEV1 predicted value. From Table A41 in the Appendix, the same conclusion could be deduced when compared with the results analyzed in the absence of confounding factors. None of the haplotypes were shown to be related to the FEV1 predicted value. 76 (III) FEV1 standard deviation value Indicated in Table A42 in the Appendix, 15 haplotypes were included when determining the presence of any correlation between these haplotypes in Toll-like receptor 4 and FEV1 standard deviation value. When compared to the analysis of the phenotype of FEV1 predicted, the same conclusion was found since none of the haplotypes were related to the FEV1 standard deviation value, as shown in Table A43 in the Appendix. Table A44 in the Appendix further strengthened the conclusion in the preceding section, which indicated that none of the haplotypes was found to have a significant association with the FEV1 standard deviation value after adjustments were made for confounding factors. (d) Heme Oxygenase-1 (I)^Age of diagnosis Indicated in Table A45 in the Appendix, 64 haplotypes were theoretically possible from the six SNPs selected in the heme oxygenase-1 gene. However, only 15 were observed in the sample. Four of the haplotypes with high frequency were considered separately while the rest were grouped together. None of them were found to be associated with the age of diagnosis when no adjustments were made for confounding factors (Table A46 of the Appendix). In Table A47 in the Appendix, no haplotype was found to be related to the observed age of diagnosis in the sample after confounding factors were adjusted for. 77 (II) FEV1 predicted value Table 45: Analysis of the correlation between haplotypes of selected SNPs in heme oxygenase-1 and FEV1 predicted value, with no adjustment for the confounding factors Estimate^of Regression Coefficient Standard Error Z-score P h112221 -6.3390 2.2781 -2.7826 0.0054 h211221 -4.2824 3.3487 -1.2788 0.2010 h212221 -4.9749 4.4609 -1.1152 0.2648 pooled 2.3963 3.1362 0.7641 0.4448 14 haplotypes were detected when determining the presence of any correlation between haplotypes in Heme oxygenase-1 and FEV1 predicted value, as shown in Table A48 in the Appendix. The above table showed that h112221 had a P value of 0.0054, which indicated that this haplotype was associated with the FEV1 predicted value. Table 46: Analysis for of the correlation between haplotypes of selected SNPs in heme oxygenase-1 and FEV1 predicted value, with adjustment for the confounding factors Estimate^of Regression Coefficient Standard Error Z-score P h112221 -6.3269 2.2775 -2.7780 0.0055 h211221 -3.9819 3.3483 -1.1893 0.2343 h212221 -5.0514 4.4708 -1.1298 0.2585 pooled 2.4056 3.1343 0.7675 0.4428 SEXM -3.1139 2.7183 -1.1455 0.2520 genotypeFO 2.0136 3.1297 0.6434 0.5200 genotypeOO 0.2597 4.2367 0.0613 0.9511 When taking the confounding factors into account, the h112221 haplotype also demonstrated a P value of less than 0.05. Therefore, the association between the haplotype h112221 and the FEV1 predicted value was confirmed. 78 (III) FEV1 standard deviation value Table 47: Analysis of the correlation between haplotypes of selected SNPs in heme oxygenase-1 and FEV1 standard deviation value, with no adjustment for the confounding factors Estimate^of Regression Coefficient Standard Error Z-score P h112221 -0.2351 0.0815 -2.8838 0.0039 h211221 -0.2813 0.1198 -2.3488 0.0188 h212221 -0.0639 0.1611 -0.3965 0.6917 pooled -0.0584 0.1124 -0.5196 0.6033 As shown in Table A49 in the Appendix, 14 haplotypes were included when determining the presence of any correlation between haplotypes in Heme oxygenase-1 and FEV1 standard deviation value. Two of the SNPs, h112221 and h211221, showed a P value of less than 0.05. Therefore, these two haplotypes had a significant association with the FEV1 standard deviation value. Table 48: Analysis of the correlation between haplotypes of selected SNPs in heme oxygenase-1 and FEV1 standard deviation value, with adjustment for the confounding factors. Estimate^of Regression Coefficient Standard Error Z-score P h112221 -0.2336 0.0816 -2.8635 0.0042 h211221 -0.2796 0.1200 -2.3300 0.0198 h212221 -0.0537 0.1614 -0.3325 0.7395 pooled -0.0644 0.1125 -0.5720 0.5673 SEXM 0.0203 0.0973 0.2086 0.8347 genotypeFO 0.0576 0.1120 0.5143 0.6070 genotypeOO -0.0837 0.1516 -0.5520 0.5810 The above table revealed the same conclusion as in Table 47: both h112221 and h211221 were shown to have a significant relationship to the FEV1 standard deviation value. 79 15.^Haplotype Analysis by the FEAT Program Although haplotype analysis was repeated utilizing another program, the FEAT program, different results might have resulted when compared with the results obtained from the RGui program. The FBAT analysis was a transmission disequilibrium test and it considered the inheritance pattern within a family. However, a stronger conclusion could be deduced if consistent results were obtained in these two parts of the project. (a) Factor B With the five SNPs selected in the Factor B gene, only ten of the possible 32 haplotypes were detected in the sample (Table A50 in the Appendix). (I) Age of diagnosis As can be seen from Table A51 in the Appendix, none of the haplotypes in Factor B was found to be significantly associated with the age of diagnosis. (II) FEV1 predicted value Table 49: Analysis of the correlation between haplotypes of selected SNPs in Factor B and FEV1 predicted value by the FBAT program Haplotype #^of family S E(S) Var(S) Z P value h1 233 24199.545 23181.74 571383.658 1.346 0.178 h2 162 7280.643 8812.768 360007.335 -2.554 0.011 h3 108 5376.241 5192.011 197300.266 0.415 0.678 h4 114 5409.311 5022.368 203605.092 0.858 0.391 h5 50 2192.113 2266.254 89422.653 -0.248 0.804 h6 26 1098.942 1109.629 44874.197 -0.050 0.960 h7 19 692.77 703.602 28218.472 -0.064 0.949 h8 0 n/a n/a n/a n/a n/a h9 1 n/a n/a n/a n/a n/a h10 0 n/a n/a n/a n/a n/a 80 Among the eight haplotypes, h2 (11111) was shown to have a P value of 0.011. This indicated that there was a significant relationship with the FEV1 predicted value. (III) FEV1 standard deviation value Table 50: Analysis of the correlation between haplotypes of selected SNPs in Factor B and FEV1 standard deviation value by the FBAT program Haplotype #^of family S E(S) Var(S) Z P value hl 233 157.872 131.288 86.494 2.858 0.004 h2 162 23.583 39.301 59.438 -2.039 0.041 h3 107 28.777 31.268 39.812 -0.395 0.693 h4 113 27.559 28.826 27.411 -0.242 0.809 h5 50 5.966 11.568 11.986 -1.618 0.106 h6 26 0.443 3.711 8.222 -1.140 0.254 h7 19 4.849 3.14 5.123 0.755 0.450 h8 0 n/a n/a n/a n/a n/a h9 1 n/a n/a n/a n/a n/a h10 0 n/a n/a n/a n/a n/a Two haplotypes, h11111 and h22121, were revealed to have a P value of less than 0.05, which showed that both were associated with the FEV1 standard deviation value. (b) Complement Factor 3 Of the 16 possible haplotypes due to the four selected SNPs in the complement factor 3 gene, all were observed in the sample (Table A52 in the Appendix). 81 (I)^Age of diagnosis Table 51: Analysis of the correlation between haplotypes of selected SNPs in Complement factor 3 and age of diagnosis by the FBAT program. Age of diagnosis was logarithmically transformed for normality Haplotype #^of family S E(S) Var(S) Z P value h1 164 -20.120 -24.035 32.335 0.689 0.491 h2 139 -13.746 -18.993 31.238 0.939 0.348 h3 129 -11.276 -11.547 20.088 0.06 0.952 h4 121 -16.908 -15.696 22.35 -0.256 0.798 h5 100 -11.562 -17.531 19.37 1.356 0.175 h6 98 -21.861 -14.178 18.548 -1.784 0.074 h7 105 -13.969 -13.781 15.621 -0.048 0.962 h8 76 -4.641 -5.385 13.185 0.205 0.838 h9 77 -9.517 -9.925 14.561 0.107 0.915 h10 71 -9.672 -8.116 12.435 -0.441 0.659 h11 72 -19.591 -12.871 15.543 -1.704 0.088 h12 69 -3.127 -4.449 12.926 0.368 0.713 h13 65 -3.385 -9.696 8.873 2.119 0.034 h14 37 -6.114 -1.435 11.447 -1.383 0.167 h15 30 -6.596 -3.471 4.687 -1.443 0.149 h16 34 -2.501 -3.474 4.957 0.437 0.662 Among the 16 observed haplotypes, h13 (1111) was found to have a P value of 0.034, indicating a significant relationship with age of diagnosis. (II) FEV1 predicted value As indicated in Table A53 in the Appendix, none of the 16 haplotypes showed a P value which was less than 0.05 when analyzing with FEV1 predicted value. 82 (III) FEV1 standard deviation value Table 52: Analysis of the correlation between haplotypes of selected SNPs in Complement factor 3 and FEV1 standard deviation value by the FBAT program Haplotype #^of family S E(S) Var(S) Z P value hl 146 54.271 53.023 37.776 0.203 0.839 h2 132 46.542 42.171 52.292 0.605 0.545 h3 124 35.346 30.546 27.929 0.908 0.364 h4 107 24.055 28.920 35.015 -0.822 0.411 h5 92 32.645 31.652 23.609 0.204 0.838 h6 89 12.940 24.835 24.024 -2.427 0.015 h7 97 24.051 22.509 19.450 0.350 0.727 h8 63 27.133 23.455 30.673 0.664 0.507 h9 69 15.999 14.696 17.247 0.314 0.754 h10 66 14.908 15.584 16.326 -0.167 0.867 hll 63 14.001 16.883 15.587 -0.730 0.465 h12 61 21.521 17.252 16.101 1.064 0.287 h13 61 21.365 19.758 15.987 0.402 0.688 h14 33 5.338 8.541 10.861 -0.972 0.331 h15 30 6.126 5.778 5.522 0.148 0.882 h16 29 4.254 4.891 4.121 -0.313 0.754 Haplotype #6 (1212) was shown to have a P value of 0.015, suggesting an association between the haplotype 1212 and the FEV1 standard deviation value. (c) Toll-like Receptor 4 With the seven selected SNPs in the gene of Toll-like receptor 4, 17 out of 128 possible haplotypes were reported by the FBAT program (Table A54 in the Appendix). (I)^Age of diagnosis From Table A55 in the Appendix, none of the haplotypes revealed an association with the age of diagnosis. 83 (II) FEV1 predicted value Table 53: Analysis of the correlation between haplotypes of selected SNPs in Toll-like receptor 4 and FEV1 predicted value by the FBAT program Haplotype #^of family S E(S) Var(S) Z P value hl 215 13935.028 14499.250 503036.881 -0.796 0.426 h2 198 13166.095 12221.084 434711.069 1.433 0.152 h3 149 6730.848 7850.667 302636.978 -2.036 0.042 h4 114 5983.613 5480.313 181054.614 1.183 0.237 h5 112 6168.453 5826.104 212860.982 0.742 0.458 h6 76 4058.668 3499.135 122261.301 1.600 0.110 h7 57 2226.345 2409.595 98033.212 -0.585 0.558 h8 18 406.307 727.964 33417.196 -1.760 0.078 h9 10 172.570 330.288 12434.211 -1.414 0.157 h10 2 n/a n/a n/a n/a n/a h11 2 n/a n/a n/a n/a n/a h12 2 n/a n/a n/a n/a n/a h13 1 n/a n/a n/a n/a n/a h14 1 n/a n/a n/a n/a n/a h15 1 n/a n/a n/a n/a n/a h16 0 n/a n/a n/a n/a n/a h17 1 n/a n/a n/a n/a n/a h18 1 n/a n/a n/a n/a n/a h19 0 n/a n/a n/a n/a n/a h20 1 n/a n/a n/a n/a n/a h21 1 n/a n/a n/a n/a n/a h22 0 n/a n/a n/a n/a n/a h23 0 n/a n/a n/a n/a n/a One of the haplotypes, h3 (1122112), was demonstrated to have a P value of less than 0.05, indicating that this haplotype was associated with the FEV1 predicted value observed in patients. (III) FEV1 standard deviation value As indicated in Table A56 in the Appendix, none of the observed haplotypes in Toll-like receptor 4 revealed a significant relationship with the FEV1 standard deviation value. 84 (d) Heme Oxygenase-1 With the six selected SNPs in the heme oxygenase-1 gene, 64 haplotypes were possible. Only 20 of them were observed in the participants as indicated in Table A57 in the Appendix. (I) Age of diagnosis Among the detected haplotypes seen in the sample, none of them had a P value of smaller than 0.05 as shown in Table A58 in the Appendix. (II) FEV1 predicted value Table 54: Analysis of the correlation between haplotypes of selected SNPs in Heme oxygenase-1 and FEV1 predicted value by the FBAT program Haplotype #^of family S E(S) Var(S) Z P value hl 234 17514.369 18099.875 516761.732 -0.814 0.415 h2 200 14639.825 14051.769 445759.360 0.881 0.378 h3 114 4934.857 5071.520 205314.256 -0.302 0.763 h4 69 2971.849 2932.472 114032.044 0.117 0.907 h5 44 2769.310 2171.419 78544.271 2.133 0.033 h6 47 1833.378 2138.615 73738.049 -1.124 0.261 h7 15 613.016 631.542 20264.602 -0.130 0.896 h8 8 n/a n/a n/a n/a n/a h9 8 n/a n/a n/a n/a n/a h10 7 n/a n/a n/a n/a n/a hil 3 n/a n/a n/a n/a n/a h12 2 n/a n/a n/a n/a n/a h13 2 n/a n/a n/a n/a n/a h14 0 n/a n/a n/a n/a n/a h15 1 n/a n/a n/a n/a n/a h16 1 n/a n/a n/a n/a n/a h17 1 n/a n/a n/a n/a n/a h18 0 n/a n/a n/a n/a n/a h19 0 n/a n/a n/a n/a n/a h20 1 n/a n/a n/a n/a n/a h21 0 n/a n/a n/a n/a n/a One haplotype, h3 (221211), showed a P value of 0.033. Therefore, this haplotype was determined to have a significant relationship with the FEV1 predicted value. 85 (III) FEV1 standard deviation value None of the observed haplotypes revealed a P value of less than 0.05 as shown in Table A59 in the Appendix, consequently, no haplotypes in the heme oxygenase-1 gene were found to be significantly associated with the FEV1 standard deviation value. 16.^Haplotype Analysis of the Age of First Pseudomonas aeruginosa Infection Utilizing Hapstat In addition to examining single locus association between the 22 selected SNPs and the age of first Pseudomonas aeruginosa infection, the relationship between the haplotypes formed by the SNPs within each gene and this phenotypic trait was investigated. This test could not be done by either the RGui program or PBAT, due to the fact that the age of first Pseudomonas aeruginosa infection was not available if the organism had not colonized the patients during data collection period. Therefore, this analysis was performed utilizing the Hapstat program. (a) Factor B Among the five selected SNPs in the Factor B gene, 32 possible haplotypes were theoretically possible. However, only 3 haplotypes were analyzed utilizing a cutoff haplotype frequency of greater than 0.05. As indicated in Table A60 in the Appendix, none of the haplotypes showed a relationship to the age of first Pseudomonas aeruginosa infection. (b) Complement Factor 3 Among the four selected SNPs in Complement factor 3 gene, 16 haplotypes were possible. However, only 8 haplotypes were analyzed utilizing a cutoff haplotype frequency of greater than 0.05. As indicated in Table A61 in the Appendix, none of the haplotypes showed a relationship to the age of first Pseudomonas aeruginosa infection. 86 (c) Toll-like Receptor 4 Among the seven selected SNPs in Toll-like receptor 4 gene, 128 haplotypes were possible. However, only 5 haplotypes were analyzed utilizing a cutoff haplotype frequency of greater than 0.05. As indicated in Table A62 in the Appendix, none of the haplotypes showed a relationship to the age of first Pseudomonas aeruginosa infection. (d) Heme Oxygenase-1 Among the six selected SNPs in the Heme oxygenase-1 gene, 64 possible haplotypes were possible. However, only 4 haplotypes were analyzed utilizing a cutoff haplotype frequency of greater than 0.05. As indicated in Table A63 in the Appendix, none of the haplotypes showed a relationship to the age of first Pseudomonas aeruginosa infection. 87 Chapter 4 Discussion This project is a sub-study of a large, Canada-wide and international endeavor: the Canadian Consortium for Cystic Fibrosis Modifiers (http://www.cfmod.ca ). It is well-established that Cystic Fibrosis is an autosomal recessive disease which is caused by mutations in the Cystic Fibrosis Transmembrane Conductance Regulator (CFTR) gene located on chromosome 7. There is a wide range of clinical symptoms among affected individuals, for example, deterioration of the lungs, pancreatic insufficiency, and liver disease. Heterogeneity in symptoms is not only seen in patients from different families, as siblings from the same lineage can have very different clinical presentations. As mentioned before, there are various CFTR mutations which can be classified into five categories, and these help to explain some of the differences in phenotypic characteristics of CF patients. However, as the previous literature illustrates there are other factors, both environmental and genetic, which can affect the course of the disease. The main aim of this project is to identify these secondary genetic factors or so called modifier genes, which are influencing the severity of the disease. 1. Analysis of the Genotypic Data for Mendelian Inconsistencies There were 1605 individuals from 535 trios (535 patients and both of their parents) that participated in this sub-study. According to Gregor Mendel's theory of heredity, each of the parents should contribute one of their alleles to their child. Since the major purpose of this study was to investigate the relationship between polymorphisms of the candidate genes and the disease status, this inheritance pattern had to be tested in order to avoid the possibility of having Mendelian errors which might have led to misinterpretation of the results. Errors might be caused by contamination of samples during sample collection and/or genotyping, incorrect labeling of the identities of the individuals, non-paternity and random genotyping errors. Mendelian inconsistency was detected in some of the SNPs for some of the families as indicated in the Results section, which indicated that one or more of the above suggested factors were present in the 88 study. Genotyping results of those families were compared with the analysis done by our colleagues in Toronto and the same conclusion was obtained. Since our colleagues in Toronto used genotyping protocols (Illumina and Luminex) different than the TagMan assays used in this study and the same outcome was concluded, this helped to minimize the possibility of errors from genotyping due to incorrect methodology. Out of the 23 families that had been excluded completely for this study, 22 of them were also found to have Mendelian inconsistencies by our colleagues in Toronto. For example, for pedigree #9 Toronto had found 16 and 26 errors by Illumina and Luminex respectively, while we detected 7 errors by TagMan assays. Inclusion of those families in the study would have reduced the accuracy of the results and led to weaker statistical power, therefore those families were deleted from the rest of the study. 2. Genotypes of all Participating Individuals Genotyping of the selected SNPs in the four candidate modifier genes was done with the appropriate TagMan assays. The genotype of some individuals could not be detected and those were indicated as "undetermined" in the tables in the Results section. One of the possible explanations for the undetermined genotypes was inadequate concentration of DNA in the source plates for amplification by PCR. This reason seemed likely since the same individuals often had an "undetermined" genotype for several of the selected SNPs. Fortunately, this problem only occurred in a small portion of all individuals. A total of 22 SNPs in the four candidate modifier genes were selected for testing. There were three genotype groups observed for each SNP and a coherent pattern was observed in both parents and patients for the same SNP (as illustrated in Table A3-A6), and this was a further indication of the reliability of the genotyping results. 3. Genotyping Analysis of the Parental Population In addition to using the Family Based Association Test (FBAT) program to check for Mendelian inheritance, two additional analyses 89 were performed on the parental population as quality control procedures. The first approach was to test for the existence of Hardy- Weinberg Equilibrium among the parental samples. A population is said to be in Hardy-Weinberg Equilibrium if an allele frequency remains constant through successive generations. This will be the case if there are no outside driving forces acting upon the population. There are generally five assumptions for the establishment of Hardy-Weinberg Equilibrium: (1) a large population; (2) no mutation at the locus of interest; (3) no migration; (4) random mating and (5) no natural selection at the locus of interest Comparison between the observed and the expected genotype frequencies was done by determining Hardy- Weinberg Equilibrium in the parental population. With the establishment of Hardy-Weinberg Equilibrium among parental individuals, we confirmed that the data were consistent with the above five assumptions and the study population was not subject to any major bias due to an inappropriate sampling method. In addition, the presence of Hardy- Weinberg Equilibrium suggested that the genotyping assays were accurate. Among all 22 SNPs, Hardy-Weinberg Equilibrium was found except for SNP 149 of the HMOX1 gene. As observed in both the parental and patient populations, the number of successful genotypings for SNP 149 was lower when compared with other SNPs. This suggested that the assay was not as robust as the others and might be one of the explanations for the failure of Hardy-Weinberg Equilibrium in the parental samples for this SNP. As a result, the observed genotyping results of SNP 149 should be viewed with caution. The departure from Hardy-Weinberg Equilibrium was due to an excess of heterozygotes and therefore was unlikely to be due to unidentified population stratification in the study population. This conclusion is derived from the Wahlund Effect, which was established in 1928 by the Swedish scientist, Sten Wahlund. This theory states that the number of heterozygotes is lower than expected when populations with different allele frequencies are mixed together for analysis.[43] The second approach was to compare the allele frequency of each SNP in the parents with the values listed on the IIPGA and Seattle SNPs websites. Both websites show genotyping results of 23 Caucasian 90 individuals from the State of Utah. The allele frequency of each SNP on these websites was compared with the detected genotypes of the parental group in this study by the 2x2 Chi-square test, and this served as another tool to check for the reliability of the TaqMan assays for genotyping. When compared with the first method described above, a different conclusion was obtained: SNPs 1038 and 16442 of the HMOX1 gene showed a P-value of less than 0.05 indicating that there might be unidentified errors in the genotyping results of these two SNPs. However, the magnitude of the difference was small (<10% for both SNPs) and the discrepancy in allele frequency between the participating parents and the Caucasian samples listed by the Seattle SNPs website might be explained by two reasons. Firstly, only 23 Caucasian samples were included in these the Seattle SNPs website. This small sample size would reduce the accuracy of the calculated allele frequency using these data. It would be ideal if there were other research centers that had done the genotyping of these 2 SNPs so another comparison would be possible. However, only Seattle SNPs contains data for SNP HMOX_1038. For the other SNP HMOX1_16442, an additional 60 DNA samples from another European population were genotyped by Seattle SNPs and the allele frequency was found to be 0.92 and 0.08 for alleles A and T, respectively. These values are closer to the ones found for the samples in this study, which favored the accuracy of the genotyping data. Secondly, Caucasian is a general racial classification and people in this group could possess different genetic backgrounds. Therefore, there could be genuine allele frequency differences between our study group and the individuals genotyped for the Seattle SNPs project. For these reasons the results from both SNP 1038 and 16442 of the HMOX1 gene were considered acceptable. 4. ANOVA, Regression Analysis and FBAT Two approaches were employed for testing the existence of any correlation between the 22 selected SNPs and the 3 phenotypic traits. Results from these two methods were then compared to reinforce the conclusions. The first method was the one-way ANOVA. The dependent variables, i.e. the selected phenotypic characteristics, in an ANOVA must be 91 normally distributed before any statement can be deduced. A goodness of fit test was used to check for the normality of the data, and it was found that only the dataset of age of diagnosis was skewed. Therefore, age of diagnosis was transformed by taking the logarithm to the base 10 in order to achieve a normally distributed variable. ANOVA of 10 pairs of genotype-phenotype comparisons were found to have a P value of less than 0.05, which suggested the existence of an association. These 10 pairs were: BF_2557 with FEV1 predicted value and standard deviation, BF_7202 with age of diagnosis, FEV1 predicted value and standard deviation, TLR4_1859 with age of diagnosis, HMOX1_2790 with FEV1 predicted value and standard deviation; and lastly, HMOX1_9531 with FEV1 predicted value and standard deviation. However, as FEV1 predicted value and standard deviation were highly correlated (r 2=0.64) associations with both variables were to be expected and could not be considered as independent observations. Further investigation was done to confirm these significant associations. Regression analysis was performed for those 10 SNP-phenotype pairs, with inclusion of age, sex and CFTR genotype as the confounding factors. Those factors were chosen since they might contribute to the observed phenotypes in addition to the tested SNP. For example, it is known that FEV1 percent predicted values decline with age in CF patients. Therefore, it was essential to take into consideration those confounding factors before a conclusion could be finalized. Sex had no confounding effect on the observed phenotypes as indicated in the tables in the Results section. CFTR mutation contributed to part of the association for BF_7202 when tested with age of diagnosis and for TLR4_1859 when tested with age of diagnosis. CFTR genotype was related to the age of diagnosis since it was observed that individuals were diagnosed at an older age when they had a milder form of CFTR mutation. Of course, this is one of the reasons that a CFTR mutation is classified as mild. Age had a significant effect on FEV1 percent predicted value, but not on the FEV1 standard deviation values. This could be explained by the fact that FEV1 is a measurement of lung function that is derived from a comparison to a random sample of the healthy population. Therefore, as CF progresses this is reflected by a decrease in the FEV1 percent predicted. However, such decrease would not be detected in the FEV1 standard deviation values because the FEV1 92 standard deviation value collected for this study was recorded as the difference from the average of CF patients in same age group. Among the 10 pairs of genotype-phenotype associations, only (1) BF_2557 and FEV1 standard deviation value, (2) BF_7202 and FEV1 standard deviation value, (3) HMOX1_2790 and FEV1 standard deviation value and (4) HMOX1_9531 and FEV1 standard deviation value demonstrated a significant association after adjustment for the confounding factors. Such multiple tests may generate significant results by chance. This is mitigated to some degree by the fact that the outcome variables were related instead of independent in the multiple tests. Nevertheless, the results of this study should be considered as hypothesis generating and need to be confirmed by repeating the experiment with a larger group of samples. FBAT was another approach used in this study for determining the existence of a correlation between selected SNPs and phenotypes. Two models (additive and dominant) were included in the FBAT analyses. Under the additive model, 7 pairs of genotype-phenotype associations were found to be significant. They were: (1) BF_2557 with FEV1 predicted value, (2) BF_4022 with FEV1 standard deviation value, (3) BF_7202 with FEV1 standard deviation value, (4) C3_28795 with age of diagnosis, (5) C3_43118 with age of diagnosis, (6) TLR4_15884 with FEV1 predicted value and (7) HMOX1_1038 with FEV1 predicted value. On the other hand, there were 12 pairs of genotype-phenotype associations determined to be significant under the dominant model. They were (1) BF_7202 with age of diagnosis and FEV1 standard deviation, (2) BF_2557 with FEV1 predicted value and FEV1 standard deviation, (3) BF_4022 with FEV1 predicted value and standard deviation, (4) BF_6484 with FEV1 standard deviation, (5) C3_28795 with age of diagnosis and FEV1 predicted value, (6) TLR4_15884 with FEV1 predicted value, (7) HMOX1_1038 with FEV1 predicted value and (8) HMOX1_9531 with FEV1 standard deviation. Thus, there was a slight difference in the conclusions between the additive and dominant models. This was likely due to the different underlying assumptions of each model. The dominant model assumes that one of the alleles is dominant over the other. In other words, it was assumed that phenotypic trait of the patients with either the dominant homozygous genotype or the heterozygous genotype was significantly different from those with recessive homozygous genotype. Under the additive model it was assumed that the phenotypic 93 trait of heterozygous patients was intermediate between the two groups of homozygous patients. Both models were fitted to the data as there was no a priori information regarding the mode of inheritance for any of the selected polymorphisms. FBAT analysis involved families with parents who had a heterozygous genotype since this test analyzed the inheritance pattern from parents to the affected individual. Although the theory, calculation and criteria for both ANOVA and FBAT were different, it was rational to expect that the results from FBAT should overlap with those obtained from ANOVA and lead to a consistent conclusion. However, only a few matches were observed when the results of the two methods were compared. For some SNPs, a significant association was detected by ANOVA but not by FBAT. There are several possible reasons that could explain this inconsistency. Firstly, a false positive might have resulted due to the recruitment of patients of different ethnicity. Although all patients were self-reported as Caucasians, they might have had subtly different genetic backgrounds. In another words, there might have been unrecognized subgroups in the study, for example, patients with ancestry in England might be genetically dissimilar to those from Quebec. Different ethnic subgroups could show different allele frequencies and therefore this population stratification might lead to a false positive conclusion in the ANOVA. It might be inappropriate to compare results from ANOVA with results from FEAT since they were different with regard to the groups of individuals being analyzed. ANOVA analyzed all individuals recruited in this study while FBAT excluded families with homozygous parents. Lastly, the significant relationship seen in ANOVA may have resulted by chance. On the other hand, significant associations were detected by FBAT for some other SNPs, which were not significant in the ANOVA. FEAT was used to analyze the inheritance pattern of alleles of patients from their parents. There should not be a problem of false positives when considering a sample population with mixed ethnicity, because the genetic background does not affect the inheritance pattern. In addition, the mean values were compared in the ANOVA in order to determine the presence of any association between the SNPs and phenotypic characteristics. In this type of analysis, it is optimal if 94 the number of individuals in each group is identical. However, this was not the case for most of the SNPs. Therefore, the ANOVA might have been underpowered to detect an association. Furthermore, some important information might have been missed by only looking at the mean values. For example, two groups might have the same mean but one group might have had the patients clustered around the mean values and the other group might have included some patients who had extreme values. 5. Age of Onset Analysis Pseudomonas aeruginosa is one of the most frequent pathogens found in lungs of CF patients. It is common to see that some patients get colonized in their early childhood while others are not infected with this bacterium until adulthood. However, testing for association between the selected SNPs and age of patients when they first get colonized with Pseudomonas aeruginosa requires a different type of analysis than those described above. Some of the recruited individuals had never been infected with Pseudomonas aeruginosa. ANOVA was inappropriate to use for this type of censored data and therefore this analysis was performed by age of onset analysis or so called survival analysis. In general, this test includes two phases: (1) The Kaplan- Meir technique is used to analyze the time to an endpoint (in this case first infection with Pseudomonas aeruginosa); and 2) The Log Rank test is then used to compare two or more Kaplan-Meier "curves" (in this case the curves for each genotype). By examining all the 22 SNPs by this survival analysis, only TLR4_9263 was revealed to be significantly associated with the age of first Pseudomonas aeruginosa infection. In this survival analysis the data were censored as some individuals had not experienced their first Pseudomonas aeruginosa infection at the time of the data collection. Therefore, the current age was entered for patients recorded as no Pseudomonas aeruginosa infection so that they would contribute to the analysis. However, inclusion of this group of patients might cause misinterpretation of the data since they might have had one or two colonizations many years ago and the documentation could not be traced. Additionally, this group of patients might get infected in the future so it would be inappropriate to group them as no Pseudomonas aeruginosa infection and make a conclusion that their 95 genotype was not linked to the Pseudomonas aeruginosa infection at the time of recruitment. 6. Pseudomonas aeruginosa Infection Status As described in the Introduction section, colonization with Pseudomonas aeruginosa occurs in about 80% of CF patients and infection by this pathogen greatly deteriorates the health status of the patient. In this study, all the participating patients were classified into one of the four groups according to their exposure to Pseudomonas aeruginosa. A chi-square test was performed in order to determine if one or more SNPs contributed to susceptibility to Pseudomonas aeruginosa infection in CF patients. Among the 22 selected SNPs of the four candidate modifier genes, only C3_963 illustrated a borderline P- value which suggested that it might play a role in the patients' ability to eradicate Pseudomonas aeruginosa infection. Thus, no informative conclusions could be drawn in this section. Usually, microbial testing of patients' sputum is one of the routine tests in CF clinics and there should be minimal error in the testing protocol for the determination of Pseudomonas aeruginosa infection status for CF patients. However, errors might have arisen from the procedures in data collection. It was time-consuming and complicated to look for relevant data if the patients had only one or a few Pseudomonas aeruginosa infections in the past. This is especially true for patients who had been transferred from different clinics and those who were seriously ill and had voluminous clinic/hospital charts. For example, it would be easy to misclassify an adult patient as group 0 with no Pseudomonas aeruginosa infection if no related information could be found in his/her charts. However, the patient might have been colonized once many years ago in childhood and there was no way to trace back to the previous records. In addition, the number of patients in one of the four categories was small for most SNPs and such a small sample size might reduce the power of the analysis. 96 7. Haplotype Analysis Utilizing the RGui and FBAT programs The selected SNPs in each candidate modifier genes were grouped together as a haplotype in the Haplotype analysis. There were two main reasons for performing this investigation: (1) to test for any interaction between SNPs which might result in the observed phenotype even though no association could be found when considering SNPs separately, and (2) to see if there was another unidentified SNP in the vicinity of the selected ones which might have contributed to the phenotypic trait and was in linkage disequilibrium with the tested SNPs. Haplotype analysis was done using two programs: RGui and FBAT, and the results were then compared for consistency in order to strengthen the conclusion of an association between the haplotype and the phenotypic trait. The major difference between these two programs in the haplotype analysis was similar to that between ANOVA and FBAT in the single locus analysis: the RGui program included all participants in the study while the FBAT program looked for the inheritance pattern and only considered those trios with heterozygous parents. (a) Haplotype analysis utilizing the RGui program In the haplotype analysis done utilizing the RGui program, only two haplotypes in Factor B, hlllll and h22112, were found to be significantly associated with the age of diagnosis. By visual inspection of the two haplotypes, it was apparent that they shared the same alleles at SNPs 6484 and 7202. By examining the results table it could be observed that only three haplotypes were analyzed individually while the rest (with lower frequencies) were grouped together as a pool. The third individual haplotype (with a P-value of greater than 0.05) did not have the same alleles at SNP 6484 and 7202 as the two significant haplotypes. It was possible that SNPs 6484 and 7202 were interacting and affecting the age of diagnosis. However, this result might be a false positive as two additional haplotypes in the pooled group also had the same alleles at SNP 6484 and 7202 but an association with age of diagnosis was not apparent for these two haplotypes. In addition, it was more likely that SNP 7202 was the cause for the significant relationship since it was found to be significant by both ANOVA and FEAT when the SNPs were considered independently. 97 An association was only found for one of the 16 haplotypes of the Complement factor 3 gene, i.e., h1212. Comparison of this haplotype with the others, suggested that the first two SNPs, SNP 963 and 28795, might be interacting and thus responsible for the significant result. C3_28795 was observed to be significant in the single locus FBAT analysis but none of the SNPs in Complement factor 3 were significant in the ANOVA. Thus, it may be that C3_28795 was the true driving force in the detected haplotype result. Even though TLR4_1859 and TLR4_15884 were found to be significantly associated with age of diagnosis and FEV1 predicted value respectively, haplotypes consisting of these two SNPs showed no significant association in the analysis. Only the pooled group was shown to have a P value of less than 0.05 and thus no specific haplotype could be concluded to be responsible. In the last candidate gene Heme oxygenase-1, one haplotype (h112221) and two haplotypes (h112221 and h211221) were discovered to be associated with the FEV1 predicted value and the FEV1 standard deviation, respectively. Therefore, the combination of the last three SNPs of Heme oxygenase-1, SNP 3303, 9531 and 16442, might have been responsible for the observed associations. However, it was interesting to note that another haplotype with the same alleles at those three positions did not have a P-value of less than 0.05. Therefore, it might be that another unobserved SNP in the HMOX1 gene found on the h112221 and h211221 backgrounds was responsible for the observed association. (b) Haplotype analysis utilizing the FBAT program More haplotypes were found to be significantly associated with the phenotypes by FBAT analysis. In Factor B, haplotype h11111 was found to be associated with both FEV1 predicted value and standard deviation, while another haplotype h22121 was observed to be strongly associated with the FEV1 standard deviation value. When comparing all the haplotypes seen in Factor B, no consistent pattern was observed in the haplotypes which had a P value of less than 0.05. In the single locus analysis of the SNPs in factor B by FEAT, almost all of them were 98 found to be related to FEV1 predicted value, FEV1 standard deviation or both. This might be a reason for the lack of a consistent pattern of Factor B haplotypes associated with these phenotypic traits. It is possible that several Factor B SNPs independently influence the level of FEV1. In Complement factor 3, haplotypes h1111 and h1212 were discovered to be significantly associated with age of diagnosis and FEV1 standard deviation value, respectively. When comparing the results of individual SNPs and haplotypes, it was difficult to discern a consistent pattern. Sixteen haplotypes were detected and each had a sample size sufficient for individual haplotype analysis. Such a large number of haplotypes made it difficult to search for any interaction among SNPs and the relationship to the phenotypic traits. Twenty-three haplotypes were seen in Toll-like receptor 4 in the patients and only 9 of them were used for the analysis since the frequency of the rest was too low. One of the haplotypes, h1122112, was shown to have a P value of less than 0.05 for its correlation with FEV1 predicted value. The only difference in this haplotype compared with the remaining 8 lay in the sixth SNP, which was SNP 15884. This conclusion was consistent with the single SNP FBAT analysis, and suggested that there was no interaction of SNPs or unidentified SNPs contributing to the phenotypic trait. A similar observation was made in the heme oxygenase-1 gene analysis. Among the 21 haplotypes found in this candidate gene, only 7 of them were frequent enough for analysis. One haplotype, h221211, was determined to be associated with the FEV1 % predicted value. The only noticeable difference between this haplotype and the rest was in the second SNP, SNP 1038. This SNP was significantly associated with the FEV1 predicted value in the single locus FEAT analysis and thus was likely to be the driving factor for the haplotype association seen here. There were some differences between the results obtained by the two programs. Firstly, the possible haplotypes determined in the patients were slightly different by the two programs due to the 99 selection procedure of participants eligible for analysis. Even if the same haplotype was found in both programs, the haplotype frequency was not the same due to the difference in the number of patients in each analysis. Secondly, the RGui program grouped the haplotypes with low frequency together while the FEAT program ignored the haplotypes with less than 10 families. Obviously, some information would be lost in both cases. On the other hand, a similar pattern was observed in the results obtained by both types of analysis. For most of the haplotypes that were found to have a significant association with an outcome variable, it appeared that the driving factor behind the detected association was due to a single SNP, not the interaction of SNPs. This was because the haplotype results were consistent with the single SNP analyses by both ANOVA and FBAT. 8. Haplotype Analysis of the Age of First Pseudomonas aeruginosa Infection utilizing Hapstat Haplotype analysis was also performed for investigation of the age of first Pseudomonas aeruginosa infection, and this was done using the Hapstat program. The same concerns applied to this analysis, with respect to the difficultly of recording the age of first Pseudomonas aeruginosa infection accurately, as for the single SNP analysis. Although TLR4_9263 was concluded to be associated with the age of first Pseudomonas aeruginosa infection, none of the haplotypes in the four candidate genes were found to have a significant relationship with this outcome. As illustrated in the Results section, the number of possible haplotypes analyzed by the program was less than the haplotype analysis done by both RGui and FEAT, due to the fact that the program was set to ignore the haplotypes with a frequency of less than 0.05. Inevitably this decreased the number of participants and therefore the power of this part of the study. 100 9. Position of SNPs with Significant Association and their Effect on Gene Function Among the 22 selected SNPs, 12 of them were found to be significantly linked to one of the tested phenotypic traits and the results are summarized in Tables 55-57. 101 Table 55: Summary of all significant SNP-phenotype associations by ANOVA and FBAT analyses Analytical Test Gene Polymorphism Phenotypic trait Position in the gene ANOVA Factor B BF2557_ FEV1 predicted value Exon FEV1 standard deviation value BF_7202 Age of Diagnosis Intron FEV1 predicted value FEV1 standard deviation value Heme oxygenase-1 HM0X1_2790 FEV1 predicted value Intron FEV1 standard deviation value HMOX1_9531 FEV1 predicted value Intron FEV1 standard deviation value FBAT Factor B BF_2557 FEV1 predicted value Exon FEV1 standard deviation value BF_4022 FEV1 predicted value Intron FEV1 standard deviation value BF_6484 FEV1 standard deviation value Intron BF_7202 Age of Diagnosis Intron FEV1 standard deviation value Complement Factor 3 C3_28795 Age of Diagnosis Intron FEV1 predicted value C3_43118 Age of Diagnosis Intron Toll-like Receptor 4 TLR4_15844 FEV1 predicted value Exon Heme oxygenase-1 HMOX1_1038 FEV1 predicted value Intron HMOX1_9531 FEV1 standard deviation value Table 56: Summary of all significant haplotype-phenotype associations when testing by RGui and FEAT analyses Analytical Test Gene Haplotype Phenotypic trait RGui Factor B hlllll Age of Diagnosis h22112 Age of Diagnosis Complement Factor 3 h1212 Age of Diagnosis Heme oxygenase-1 h221211 Age of Diagnosis h112221 FEV1 predicted value h112221 FEV1 standard deviation value h211221 FEV1 standard deviation value HBAT Factor B h11111 FEV1 predicted value hlllll FEV1 standard deviation value h22121 FEV1 standard deviation value Complement Factor 3 hllll Age of Diagnosis h1112 FEV1 predicted value h1212 FEV1 standard deviation value Toll-like Receptor 4 h1122112 FEV1 predicted value Heme oxygenase-1 h221211 FEV1 predicted value Table 57: Summary table of SNPs which revealed a significant association with the Pseudomonas aeruginosa infection status and the age of first Pseudomonas aeruginosa infection Analytical Test Gene Polymorphism / Haplotype Phenotypic trait Position in the gene Survival Complement Factor 3 C3_963 Age^of^first Pseudomonas aeruginosa infection Intron analysis Toll-like Receptor 4 TLR4_9263 Age^of^first Pseudomonas aeruginosa infection Intron Chi^squared Complement Factor 3 C3_963 Pseudomonas aeruginosa infection status Intron test As indicated in the above table, those 13 SNPs which demonstrated a significant relationship with one of the phenotypic traits could be divided into two main groups according to their position in the gene: intron or exon. Most of the SNPs are situated in the intronic regions. Although introns are spliced out and therefore not translated into amino acids, polymorphisms in this region could also affect the protein products. It is common to find the nucleotides C and T at the beginning of an intron, whereas nucleotides A and G are found at the end of it. [44] However, there are other sequences that control the exact position of where to cut and the process of splicing. [45] This signal is included in the intron, therefore splicing error might occur if there was a mutation in the intron and this might influence the function of the gene. In addition, two of the SNPs, C3_963 and HMOX1_1038 are in a promoter region, which might affect the initiation step of transcription. On the other hand, two SNPs are in an exonic region, i.e., BF_2557 and TLR4_15844. For example, BF_2557 is located in exon 3 of the Factor B gene, however, it changes the third base of the codon from G to A without altering the amino acid produced. The position of this SNP is near the end of exon 3 which may be an exonic splicing enhancer. That is, it is in the region of the exon which promotes the cutting of the adjacent intron and therefore might affect the structure and function of the protein products being formed. This is similar to the C3_43118 SNP, which is a tagSNP of C3_44692 in this study since 104 each other and disequilibrium [47] some of the SNPs showed low levels of linkage i.e. the hypotheses being tested were not independent. C3_44692 is located in the beginning region of exon 41 of the Complement factor 3 gene. The large number of statistical tests performed in this study posed a potential problem in the analysis. A P value of less than 0.05 was reported to be significant in this project. However, such an observation might have been due to chance rather than reflecting a true association. In order to have equal statistical power as the case when only one hypothesis is being investigated, the Bonferroni correction procedure can be used where the statistical significance level is multiplied by the number of tests done when independent hypotheses are being examined for the same dataset. [46] Therefore, Bonferroni correction might be a solution for solving such potential risk, however, it was not used in this project as the correction factor is overly conservative since the phenotypic outcomes were correlated with One way to assess the potential functional consequences of a SNP is to determine the degree of conservation of the surrounding sequence between species. Among the 22 SNPs in the four candidate genes, a few of them were conserved among 17 vertebrate species [48] and these are summarized in table 58: Table 58: Summary table of the conservation score of the selected SNPs in the four candidate genes Gene SNP Conservation Score Factor 8 2557 0.961 6484 0.054 11912 0.011 17050 0.120 Heme oxygenase-1 3303 0.269 A conservation score from 0 (not conserved) to 1 (highly conserved) was assigned when comparing the sequence within the 17 vertebrate species. This conservation score can be interpreted as the probability that any given base is in a conserved element. The SNPs in the above table were found to be conserved among that 17 vertebrate 105 species, which meant that they remained the same through evolution. In another words, they might have an important function in the gene. It is important if the above listed SNPs were found to be significant with one or more of the tested phenotypes, since it further strengthened the conclusion that those SNPs affected the gene function. For example, BF_2557 was shown to be highly conserved among those 17 vertebrate species and therefore, it is likely to be functionally important. This conclusion is interesting because BF_2557 was one of the SNPs which was shown to have an association with the tested phenotypes. 10. Summary In summary, 23 families were deleted from the study due to the presence of Mendelian errors in them. With the remaining 1605 individuals or 535 trios, only 13 of the selected 22 SNPs were found to have a significant association by either ANOVA or FEAT with one or more of the four selected phenotypic traits (age of diagnosis, FEV1 predicted and standard deviation value, age of first Pseudomonas aeruginosa infection). Also only one of the SNPs was determined to have a borderline P value when testing for the Pseudomonas aeruginosa infection status among the participants. In the haplotype analyses done by RGui or FEAT, some haplotypes were found to have a P value of less than 0.05. Most of these haplotypes contained a SNP which was found to have a significant relationship with the phenotype by either ANOVA or FEAT, and such inclusion of this SNP would probably be the driving force behind the observed result. 11. Future studies This is a sub-study of a large, Canada-wide and international research project on CF modifier genes. There were a total of 1605 patients from 535 trios that participated in this part of the study. The recruitment of patients was ongoing at the time when this sub-study started. Therefore, the trends that we observed can be investigated in additional trio members who were consented later in the project and families with more than one affected child. This is particularly critical for some of the selected SNPs where borderline P values were obtained. If the same result is found in a larger dataset then such a 106 consistent pattern would be more convincing. Such a replication study is ongoing. Replication of these results could also be sought in the two large US CF modifier gene consortia. The other two cohorts of patients are from the University of North Carolina/Case Western Reserve University and Johns Hopkins University. For those SNPs that were found to be significantly associated with one or more of the tested phenotypes, it is possible that they affect the gene function and therefore lead to different severity of the disease. This hypothesis could be confirmed by performing further investigation. For example, if the SNP is in the promoter region of the DNA molecule, it is in a region that serves as a control point for regulating transcription. Therefore the amount of mRNA and protein produced would be affected, which could be measured by Real-time PCR and ELISA, respectively. If the SNP is in the intronic region, it may affect the splicing procedure and this could be tested by either sequencing or measuring the length of the transcript by amplifying the cDNA sequence. Although the frequency of Burkholderia cepacia infection of CF patients is lower than that of Pseudomonas aeruginosa, it is even more difficult to eradicate the pathogen once infection is established. An experiment could be done utilizing the SNPs shown to be associated with disease status severity in this project in order to determine the relationship between these SNPs and Burkholderia cepacia infection. For example, the genes that are responsible for the attachment of the pathogen in lungs (TLR4 would be a possible choice since Burkholderia cepacia is a Gram-negative bacteria and its LPS also attaches to TLR4 as is the case for Pseudomonas aeruginosa) and other genes in the immune system for clearing the micro-organism. Finally, it is possible to perform a genome-wide association analysis with 300,000 - 500,000 SNPs being analyzed in a single experiment for each study participant. This can be done by recently developed high-throughput genotyping technologies such as the Illumina or Affymetrix systems. With appropriate adjustments for multiple 107 comparisons and the use of several populations to provide replication this approach provides a systematic assessment of the entire human genome. 108 References 1.^Dupuis, A., et al., Cystic fibrosis birth rates in Canada: a decreasing trend since the onset of genetic testing. J Pediatr, 2005. 147(3): p. 312-5. 2^Riordan, J.R., et al., Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science, 1989. 245(4922): p. 1066-73. 3. Turcios,^N.L.,^Cystic^fibrosis:^an^overview.^J^Clin Gastroenterol, 2005. 39(4): p. 307-17. 4. Haston, C.K. and T.J. Hudson, Finding genetic modifiers of cystic fibrosis. N Engl J Med, 2005. 353(14): p. 1509-11. 5. Merlo, C.A. and M.P. Boyle, Modifier genes in cystic fibrosis lung disease. J Lab Clin Med, 2003. 141(4): p. 237-41. 6. Zielenski, J., Genotype and phenotype in cystic fibrosis. Respiration, 2000. 67(2): p. 117-33. 7. Rowntree, R.K. and A. Harris, The phenotypic consequences of CFTR mutations. Ann Hum Genet, 2003. 67(Pt 5): p. 471-85. 8. McKone, E.F., et al., Effect of genotype on phenotype and mortality in cystic fibrosis: a retrospective cohort study. Lancet, 2003. 361(9370): p. 1671-6. 9. Harris, A., Cystic fibrosis gene. Br Med Bull, 1992. 48(4): p. 738-53. 10. Moraes, T.J., et al., Abnormalities in the Pulmonary Innate Immune System in Cystic Fibrosis. Am J Respir Cell Mol Biol, 2005. 11. Morales, M.M., M.A. Capella, and A.G. Lopes, Structure and function of the cystic fibrosis transmembrane conductance regulator. Braz J Med Biol Res, 1999. 32(8): p. 1021-8. 12. Davis, P.B., M. Drumm, and M.W. Konstan, Cystic fibrosis. Am J Respir Crit Care Med, 1996. 154(5): p. 1229-56. 13. Fanen, P., et al., Structure-function analysis of a double-mutant cystic fibrosis transmembrane conductance regulator protein occurring in disorders related to cystic fibrosis. FEBS Lett, 1999. 452(3): p. 371-4. 14. Stern, R.C., The diagnosis of cystic fibrosis. N Engl J Med, 1997. 336(7): p. 487-91. 109 15. Kerem, B.S., et al., DNA marker haplotype association with pancreatic sufficiency in cystic fibrosis. Am J Hum Genet, 1989. 44(6): p. 827 - 34. 16. Davis, P.B., Pathophysiology of cystic fibrosis with emphasis on salivary gland involvement. J Dent Res, 1987. 66 Spec No: p. 667- 71. 17. Rubinstein, S., R. Moss, and N. Lewiston, Constipation and meconium ileus equivalent in patients with cystic fibrosis. Pediatrics, 1986. 78(3): p. 473-9. 18. Craig, J.M., H. Haddad, and H. Shwachman, The pathological changes in the liver in cystic fibrosis of the pancreas. AMA J Dis Child, 1957. 93(4): p. 357 - 69. 19. Cox, K.L., et al., Orthotopic liver transplantation in patients with cystic fibrosis. Pediatrics, 1987. 80(4): p. 571 - 4. 20. Taussig, L.M., et al., Fertility in males with cystic fibrosis. N Engl J Med, 1972. 287(12): p. 586-9. 21. Currie, A.J., D.P. Speert, and D.J. Davidson, Pseudomonas aeruginosa: role in the pathogenesis of the CF lung lesion. Semin Respir Crit Care Med, 2003. 24(6): p. 671 - 80. 22. Hoiby, N., Understanding bacterial biofilms in patients with cystic fibrosis: current and innovative approaches to potential therapies. J Cyst Fibros, 2002. 1(4): p. 249 - 54. 23. Kharazmi, A., Mechanisms involved in the evasion of the host defence by Pseudomonas aeruginosa. Immunol Lett, 1991. 30(2): p. 201-5. 24. Doring, G., A. Albus, and N. Hoiby, Immunologic aspects of cystic fibrosis. Chest, 1988. 94(2 Suppl): p. 109S-115S. 25. Friedl, P., B. Konig, and W. Konig, Effects of mucoid and non- mucoid Pseudomonas aeruginosa isolates from cystic fibrosis patients on inflammatory mediator release from human polymorphonuclear granulocytes and rat mast cells. Immunology, 1992. 76(1): p. 86-94. 26. Sedlak-Weinstein, E., et al., Pseudomonas aeruginosa:^the potential to immunise, against infection. Expert Opin Biol Ther, 2005. 5(7): p. 967-82. 27.^Mueller-Ortiz,^S.L.,^S.M.^Drouin,^and^R.A.^Wetsel,^The alternative activation pathway and complement component C3 are critical for a protective immune response against Pseudomonas 110 aeruginosa in a murine model of pneumonia. Infect Immun, 2004. 72(5): p. 2899-906. 28. Kronborg, G., et al., Antibody responses to lipid A, core, and 0 sugars of the Pseudomonas aeruginosa lipopolysaccharide in chronically infected cystic fibrosis patients. J Clin Microbiol, 1992. 30(7): p. 1848 - 55. 29. Ernst, R.K., et al., Specific lipopolysaccharide found in cystic fibrosis airway Pseudomonas aeruginosa. Science, 1999. 286(5444): p. 1561 - 5. 30. Ernst, R.K., et al., Pseudomonas aeruginosa lipid A diversity and its recognition by Toll-like receptor 4. J Endotoxin Res, 2003. 9(6): p. 395 - 400. 31. Wilks, A., Heme oxygenase: evolution, structure, and mechanism. Antioxid Redox Signal, 2002. 4(4): p. 603 - 14. 32. Shibahara,^S.,^The heme oxygenase dilemma in cellular homeostasis: new insights for the feedback regulation of heme catabolism. Tohoku J Exp Med, 2003. 200(4): p. 167 - 86. 33. Bach, F.H., Heme oxygenase -1 as a protective gene. Wien Klin Wochenschr, 2002. 114 Suppl 4: p. 1-3. 34. Slebos, D.J., S.W. Ryter, and A.M. Choi, Heme oxygenase -1 and carbon monoxide in pulmonary medicine. Respir Res, 2003. 4(1): p. 7. 35. Sabra, W., E.J. Kim, and A.P. Zeng, Physiological responses of Pseudomonas aeruginosa PA01 to oxidative stress in controlled microaerobic and aerobic cultures. Microbiology, 2002. 148(Pt 10): p. 3195-202. 36. Zhou, H., et al., Heme oxygenase-1 expression in human lungs with cystic fibrosis and cytoprotective effects against Pseudomonas aeruginosa in vitro. Am J Respir Crit Care Med, 2004. 170(6): p. 633-40. 37. Carlson, C.S., et al., Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. Am J Hum Genet, 2004. 74(1): p. 106 - 20. 38.^Knudson, R.J., et al., Changes in the normal maximal expiratory flow-volume curve with growth and aging. Am Rev Respir Dis, 1983. 127(6): p. 725-34. 111 39. Corey, M., H. Levison, and D. Crozier, Five- to seven -year course of pulmonary function in cystic fibrosis. Am Rev Respir Dis, 1976. 114(6): p. 1085-92. 40. Horvath, S., X. Xu, and N.M. Laird, The family based association test method: strategies for studying general genotype--phenotype associations. Eur J Hum Genet, 2001. 9(4): p. 301 - 6. 41. Zeng, D., et al., Efficient semiparametric estimation of haplotype-disease associations in case-cohort and nested case- control studies. Biostatistics, 2006. 7(3): p. 486-502. 42. Lake, S.L., D. Blacker, and N.M. Laird, Family—based tests of association in the presence of linkage. Am J Hum Genet, 2000. 67(6): p. 1515 - 25. 43. Law, B., et al., Effects of population structure and admixture on exact tests for association between Loci. Genetics, 2003. 164(1): p. 381-7. 44. Simpson, A.G., E.K. MacQuarrie, and A.J. Roger, Eukaryotic evolution: early origin of canonical introns. Nature, 2002. 419(6904): p. 270. 45. Havlioglu, N., et al., An intronic signal for alternative splicing in the human genome. PLoS ONE, 2007. 2(11): p. e1246. 46. Perneger, T.V., What's wrong with Bonferroni adjustments. Bmj, 1998. 316(7139): p. 1236 - 8. 47. Nyholt, D.R., A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet, 2004. 74(4): p. 765 - 9. 48. Siepel, A., et al., Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res, 2005. 15(8): p. 1034-50. 112 Appendix Table Al: Concentration of a subset of the DNA samples in the original source plates #1-8 Individual ID DNA concentration (ng/uL) 0003-01 4.43 0003-02 3.93 0003-03 1.13 0019-01 2.52 0019-02 1.16 0019-03 0.25 0052-01 1.95 0052-02 0.87 0052-03 0.72 0063-01 1.94 0063-02 0.93 0063-03 0.46 0069-01 2.54 0069-02 1.60 0069-03 -0.03 0083-01 2.63 0083-02 1.47 0083-03 0.55 0101-01 3.17 0101-02 2.47 0101-03 1.29 0150-01 3.40 0150-02 9.77 0150-03 2.75 113 Table A2: Families with non-Mendelian inheritance when genotyping a particular SNP Family ID SNP 0167 HMOX1_2790 0313 HMOX1_149 0374 HMOX1_149 0445 BF_2557 HMOX1_149 0451 HMOX1_149 0453 TLR4_11912 HMOX1_149 0478 BF _2557 C3_28795 HMOX1_9531 0482 HMOX1_149 1251 HMOX1_149 1300 HMOX1_149 1324 HMOX1_149 1350 HMOX1_149 1383 HMOX1_149 1431 C3_963 1444 HMOX1_149 1878 HMOX1_149 1881 HMOX1_149 2543 C3_963 2800 C3_963 2809 HMOX1_149 2821 HMOX1_149 4120 HMOX1_149 114 Table A3: Genotype frequency of each of the SNPs examined in the gene of Factor B SNP Group Total # of Individuals # of Individuals Genotype Percentage SF_2557 Parents 1066 36 AA 3.4 314 AG 29.5 705 GG 66.1 11 undetermined 1.0 Patients 533 20 AA 3.8 135 AG 25.3 367 GG 68.9 11 undetermined 2.1 BF_4022 Parents 1070 107 AA 10.0 458 AG 42.8 489 GG 45.7 16 undetermined 1.5 Patients 535 67 AA 12.5 201 AG 37.6 259 GG 48.4 8 undetermined 1.5 SF_6484 Parents 1070 858 AA 80.2 189 AG 17.7 19 GG 1.8 4 undetermined 0.4 Patients 535 423 AA 79.1 100 AG 18.7 8 GG 1.5 4 undetermined 0.8 BF_7202 Parents 1070 197 AA 18.4 508 AG 47.5 344 GG 32.2 21 undetermined 2.0 Patients 535 105 AA 19.6 252 AG 47.1 172 GG 32.2 6 undetermined 1.1 BF_8311 Parents 1070 862 CC 80.6182 CT 17.0 7 TT 0.7 19 undetermined 1.8 Patients 535 428 CC 80.0 90 CT 16.8 5 TT 0.9 12 undetermined 2.2 115 Table A4: Genotype frequency of each of the SNPs examined in the gene of Complement factor 3 SNP Group Total # of Individuals # of Individuals Genotype Percentage C3_963 Parents 1064 270 GG 25.4 527 GT 49.5 249 TT 23.4 18 undetermined 1.7 Patients 532 152 GG 28.6 255 GT 47.9 119 TT 22.4 6 undetermined 1.1 C3_28795 Parents 1068 355 AA 33.2 521 AG 48.8 177 GG 16.6 15 undetermined 1.4 Patients 534 202 AA 37.8 237 AG 44.4 92 GG 17.2 3 undetermined 0.6 C3_36735 Parents 1070 202 AA 18.9 519 AG 48.5 330 GG 30.8 19 undetermined 1.8 Patients 535 96 AA 17.9 266 AG 49.7 163 GG 30.5 10 undetermined 1.9 C343116 Parents 1070 239 AA 22.3 529 AG 49.4 292 GG 27.3 10 undetermined 0.9 Patients 535 123 AA 23.0 251 AG 46.9 157 GG 29.4 4 undetermined 0.8 116 Table A5: Genotype frequency of each of the SNPs examined in the gene of Toll-like receptor 4 SNP Group Total # of Individuals # of Individuals Genotype Percentage TLR4_851 Parents 1070 551 AA 51.5 434 AG 40.6 81 GG 7.6 4 undetermined 0.4 Patients 535 285 AA 53.3 203 AG 37.9 38 GG 7.1 9 undetermined 1.7 TLR4_1859 Parents 1070 143 AA 13.4 509 AG 47.6 410 GG 38.3 8 undetermined 0.8 Patients 535 77 AA 14.4 244 AG 45.6 209 GG 39.1 5 undetermined 0.9 TRL4_2856 Parents 1070 24 CC 2.2 266 CT 24.9 770 TT 72.0 10 undetermined 0.9 Patients 535 10 CC 1.9 140 CT 26.2 382 TT 71.4 3 undetermined 0.6 TLR4_9263 Parents 1070 16 AA 1.5 207 AC 19.4 841 CC 78.6 6 undetermined 0.6 Patients 535 8 AA 1.5 106 AC 19.8 418 CC 78.1 3 undetermined 0.6 TLR4_11912 Parents 1068 478 GG 44.8 467 GT 43.7 114 TT 10.7 9 undetermined 0.8 Patients 534 234 GG 43.7 229 GT 42.8 65 TT 12.2 6 undetermined 1.1 TLR4_15884 Parents 1070 23 CC 2.2 270 CG 25.2 764 GG 71.4 13 undetermined 1.2 Patients 535 12 CC 2.2 120 CG 22.4 392 GG 73.3 11 undetermined 2.1 117 Table A5: Genotype frequency of each of the SNPs examined in the gene of Toll-like receptor 4 TLR4_17050 Parents 1070 20 CC 1.9 277 CT 25.9 765 TT 71.5 8 undetermined 0.8 Patients 535 15 CC 2.8 132 CT 24.7 384 TT 71.8 4 undetermined 0.8 118 Table A6: Genotype and allele frequency of each of the SNP examined in the gene of Heme oxygenase-1 SNP Group Total # of Individuals # of Individuals Genotype Percentage HMOX1149 Parents 1036 122 AA 11.8 404 AG 39.0 479 GG 46.2 31 undetermined 3.0 Patients 518 61 AA 11.8 208 AG 40.2 227 GG 43.8 22 undetermined 4.2 HMOX1_1038 Parents 1070 975 CC 91.1 83 CT 7.8 1 TT 0.1 11 undetermined 1.0 Patients 535 478 CC 89.4 51 CT 9.5 2 TT 0.4 4 undetermined 0.8 HMOX12790 Parents 1068 341 AA 31.9 517 AT 48.4 200 TT 18.7 10 undetermined 0.9 Patients 534 166 AA 31.1 253 AT 47.4 108 TT 20.2 7 undetermined 1.3 HMOX1_3308 Parents 1070 2 CC 0.2 106 CG 9.9 954 GG 89.2 8 undetermined 0.8 Patients 535 0 CC 0.0 53 CG 9.9 476 GG 89.0 6 undetermined 1.1 HMOX1_9531 Parents 1068 225 AA 21.1 528 AG 49.4 302 GG 28.3 13 undetermined 1.2 Patients 534 108 AA 20.2 255 AG 47.8 163 GG 30.5 8 undetermined 1.5 HMOX1_16442 Parents 1070 952 AA 89.0 109 AT 10.2 1 TT 0.1 8 undetermined 0.8 Patients 535 475 AA 88.8 55 AT 10.3 1 TT 0.2 4 undetermined 0.8 119 Table A7: Phenotypic characteristics of the CF patients Characteristics # of Individuals Average Range #^(%) Age (years) 488 16.2 1.0^-^61.1 na Age^at Diagnosis (years) 464 2.7 0^-^59.4 na Age of First PA Infection (years) 284 8.5 0.2^-^39.3 na FEV1 % Predicted 409 75.7 14.2^- 140.1 na FEV1 SD 409 0.45 (-1.93)^- 4.08 na Sex 495 na na 236^(47.7%)^female, 259^(52.3%)^male Pancreatic Function 478 na na 428^(89.5%)^PI, 49^(10.3%)^PS, 1^(0.2%) unknown Meconium Ileus 473 na na 380^(80.3%)^no, 90^(19%)^yes, 3^(0.7%) unknown Pseudomonas aeruginosa status 476 na na 141^(29.6%)^none, 73^(15.4%)^once, 102(21.4%)sporadic, 160^(33.6%)^chronic Genotype 463 na na 272(58.7%)8F508/aF508, 155^(33.4%)8F508/other, 36^(7.9%)^other/other 120 Table A8: The ANOVA result of examining age of diagnosis among different genotypes of the selected SNPs in Complement factor 3. The age of diagnosis was logarithmically transformed for normality SNPs Genotype Number Mean Standard Error P value C3_963 GG 128 -0.28 0.08 0.59 GT 208 -0.19 0.06 TT 102 -0.18 0.09 C3_28795 AA 170 -0.14 0.07 0.35 AG 194 -0.25 0.06 GG 80 -0.28 0.10 C3_36735 AA 79 -0.37 0.10 0.17 AG 225 -0.15 0.06 GG 134 -0.19 0.07 C3_43118 AA 107 -0.16 0.08 0.81 AG 208 -0.23 0.06 GG 128 -0.23 0.08 Table A9: The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Complement factor 3 SNPs Genotype Number Mean Standard Error P value C3_963 GG 120 73.86 2.39 0.56 GT 195 76.40 1.88 TT 86 77.59 2.83 C3_28795 AA 161 77.33 2.07 0.62 AG 175 74.74 1.99 GG 70 74.72 3.14 C3_36735 AA 73 73.00 69.67 0.07 AG 202 77.89 1.84 GG 126 75.65 2.34 C3_43118 AA 96 77.37 2.68 0.34 AG 193 76.86 1.89 GG  117 72.85 2.42 121 Table A10: The ANOVA result of examining FEV1 sd value among different genotypes of the selected SNPs in Complement factor 3 SNPs Genotype Number Mean Standard Error P value C3_963 GG 120 0.31 0.09 0.09 GT 195 0.53 0.07 TT 86 0.56 0.10 C3_28795 AA 161 0.54 0.07 0.35 AG 175 0.39 0.07 GG 70 0.47 0.11 C3_36735 AA 73 0.24 0.11 0.10 AG 202 0.52 0.07 GG 126 0.49 0.08 C3_43118 AA 96 0.59 0.10 0.15 AG 193 0.48 0.07 GG 117 0.34 0.09 Table All: The ANOVA result of examining FEV1 predicted value among different genotypes of the selected SNPs in Toll-like receptor 4 SNPs Genotype Number Mean Standard Error P value TLR4_851 AA 210 76.04 1.82 0.70 AG 163 75.96 2.07 GG 30 71.76 4.81 TLR4_1859 AA 55 79.18 3.54 0.46 AG 184 74.31 1.94 GG 166 76.19 2.04 TLR4_2856 CC 7 66.41 9.92 0.56 CT 106 77.02 2.55 TT 294 75.54 1.53 TLR4_9263 AA 6 81.71 10.73 0.86 AC 85 75.89 2.85 CC 316 75.66 1.48 TLR4_11912 GG 176 74.78 1.98 0.71 GT 177 77.02 1.98 TT 50 75.11 3.72 TLR4_15884 CC 8 71.57 9.35 0.89 CG 90 76.16 2.79 GG 301 75.69 1.52 TLR4_17050 CC 10 91.25 8.28 0.16 CT 105 74.90 2.56 TT 291 75.52 1.54 122 Table Al2: The ANOVA result of examining FEV1 sd value among different genotypes of the selected SNPs in Toll-like receptor 4 SNPs Genotype Number Mean Standard Error P value TLR4_851 AA 210 0.44 0.07 0.85 AG 163 0.49 0.07 GG 30 0.44 0.17 TLR4_1859 AA 55 0.57 0.13 0.66 AG 184 0.43 0.07 GG 166 0.46 0.07 TLR4_2856 CC 7 -0.02 0.36 0.39 CT 106 0.47 0.09 TT 294 0.47 0.06 TLR4_9263 AA 6 0.66 0.39 0.81 AC 85 0.50 0.10 CC 316 0.45 0.05 TLR4_11912 GG 176 0.47 0.07 0.41 GT 177 0.51 0.07 TT 50 0.31 0.13 TLR4_15884 CC 8 0.13 0.34 0.56 CG 90 0.50 0.10 GG 301 0.46 0.06 TLR4_17050 CC 10 1.02 0.30 0.16 CT 105 0.43 0.09 TT 291 0.45 0.06 Table A13: The ANOVA result of examining age of diagnosis among different genotypes of the selected SNPs in Heme oxygenase-1. The age of diagnosis was logarithmically transformed for normality SNPs Genotype Number Mean Standard Error P value HMOX1149 AA 54 -0.20 0.12 0.17 AG 167 -0.13 0.07 GG 195 -0.30 0.06 HMOX1_1038 CC 398 -0.20 0.04 0.23 CT 43 -0.30 0.13 TT 2 -1.15 0.61 HMOX1_2790 AA 142 -0.26 0.07 0.60 AT 207 -0.17 0.06 TT 91 -0.21 0.09 HMOX1_3303 CC 0 n/a n/a 0.26 CG 41 -0.36 0.13 GG 401 -0.20 0.04 HMOX1_9531 AA 96 -0.27 0.09 0.44 AG 205 -0.15 0.06 GG 138 -0.25 0.07 HMOX1_16442 AA 403 -0.20 0.04 0.42 AT 39 -0.34 0.14 TT 1 0.57 0.86 123 Table A14: Age of onset analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Factor B SNPs Variable Sub-group Estimated Value Standard Error P value BF_2557 BF_2557 AA 0.098 0.284 0.322 AG -0.204 0.189 Sex Female 0.148 0.088 0.094 CFTR mutation AF508/AF508 -0.036 0.135 0.965 AF508/other -0.009 0.143 BF_4022 BF_4022 AA 0.0246 0.175 0.620 AG -0.105 0.133 Sex Female 0.146 0.087 0.0965 CFTR mutation AF508/AF508 -0.029 0.135 0.974 AF508/other -0.015 0.143 BP6484 BF_6484 AA -0.0445 0.253 0.983 AG -0.0465 0.273 Sex Female 0.156 0.086 0.071 CFTR mutation AF508/AF508 -0.0275 0.133 0.976 AF508/other -0.0148 0.143 BF_7202 BF_7202 AA -0.0684 0.146 0.859 AG 0.058 0.118 Sex Female 0.152 0.087 0.082 CFTR mutation AF508/AF508 -0.0465 0.135 0.942 AF508/other -0.00099 0.144 BF_8311 BF_8311 CC -0.135 0.257 0.719 CT -0.24 0.28 Sex Female 0.149 0.087 0.0887 CFTR mutation AF508/AF508 -0.0406 0.134 0.955 AF508/other -0.005 0.144 124 Table A15: Age of onset analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Complement factor 3 SNPs Variable Sub-group Estimated Value Standard Error P value C3_963 C3_963 GG 0.26 0.125 0.0786 GT -0.177 0.119 Sex Female 0.16 0.087 0.0668 CFTR mutation AF508/AF508 0.0014 0.137 0.9988 AF508/other 0.007 0.146 C3_28795 C328795 AA -0.025 0.131 0.2087 AG 0.216 0.123 Sex Female 0.15 0.0869 0.0842 CFTR mutation AF508/AF508 -0.0068 0.133 0.9981 AF508/other 0.0045 0.144 C3_36735 C3_36735 AA 0.0735 0.157 0.1213 AG 0.168 0.12 Sex Female 0.173 0.0876 0.0495 CFTR mutation AF508/AF508 -0.031 0.134 0.9605 AF508/other -0.027 0.144 C3_43118 C3_43118 AA 0.112 0.136 0.2875 AG 0.107 0.116 Sex Female 0.153 0.0866 0.0958 CFTR mutation AF508/AF508 -0.03 0.133 0.4232 AF508/other -0.0128 0.142 125 Table A16: Age of onset analysis investigating association between age of first Pseudomonas aeruginosa infection and selected SNPs in Heme oxygenase-1 SNPs Variable Sub-group Estimated Value Standard Error P value HNOX1_149 HMOX1_149 AA 0.0189 0.18 0.9514 AG -0.0377 0.131 Sex Female 0.132 0.089 0.14 CFTR mutation AF508/AF508 -0.032 0.135 0.9654 AF508/other -0.02 0.146 HNOX1_1038 HMOX1_1038 CC 3.283 61.03 0.5595 CT 3.31 61.03 Sex Female 0.148 0.086 0.0861 CFTR mutation AF508/AF508 -0.0262 0.133 0.9797 AF508/other -0.0093 0.142 }INOX1_2790 HMOX1_2790 AA 0.077 0.131 0.7633 AT 0.031 0.118 Sex Female 0.145 0.087 0.0966 CFTR mutation AF508/AF508 0.0039 0.136 0.9996 AF508/other 0.00041 0.146 HMOX1_3303 HMOX1_3303 CG 0.02 0.158 0.8987 Sex Female 0.151 0.0867 0.0819 CFTR mutation AF508/AF508 -0.0346 0.133 0.9663 AF508/other -0.0095 0.142 HNOX1_9531 HM0X1_9531 AA 0.0048 0.143 0.9069 AG -0.047 0.118 Sex Female 0.162 0.088 0.0647 CFTR mutation AF508/AF508 -0.033 0.134 0.9628 AF508/other -0.021 0.143 HMOX1_16442 HMOX1_16442 AA -0.886 0.363 0.17 AT -0.938 0.396 Sex Female 0.144 0.087 0.0986 CFTR mutation AF508/AF508 -0.03 0.133 0.9688AF508/other -0.0198 0.143 126 Table A17: Chi squared test for investigating the relationship between different genotypes of the selected SNPs in Factor B and Pseudomonas aeruginosa infection status SNPs Genotype PA status (in percentage) Chi Test P value 0 1 2 3 BF 2557 AA 36.84 10.53 15.79 36.84 2.91 0.82 AG 26.09 13.91 20.87 39.13 GG 30.00 16.36 21.21 32.42 BF 4022 AA 30.51 10.71 25.42 33.90 4.79 0.57 AG 28.25 12.99 20.34 38.42 GG 29.61 17.60 21.89 30.90 BF 6484 AA 29.79 16.22 20.48 33.51 2.11 0.91 AG 30.00 11.11 24.44 34.44 GG 28.57 14.29 28.57 28.57 BF 7202 AA 30.43 11.96 22.83 34.78 5.19 0.52 AG 29.15 14.35 19.28 37.22 GG 29.03 18.71 23.87 28.39 BF 8311 CC 29.92 14.44 21.52 34.12 3.25 0.78 CT 27.50 20.00 20.00 32.50 TT 40.00 0.00 20.00 40.00 127 Table A18: Chi square test for investigating the relationship between different genotypes of the selected SNPs in Toll-like receptor 4 and Pseudomonas aeruginosa infection status SNPs Genotype PA status (in percentage) Chi Test P value 0 1 2 3 TLR4_851 AA 32.16 16.08 21.96 29.80 7.32 0.29 AG 28.89 13.33 20.56 37.22 GG 20.59 8.82 20.59 50.00 TLR4_1859 AA 35.29 17.56 22.06 25.00 4.72 0.58 AG 30.91 13.64 20.00 35.45 GG 26.09 16.30 22.83 34.78 TLR4_2856 CC 37.50 12.50 25.00 25.00 1.78 0.94 CT 32.80 12.80 21.60 32.80 TT 28.45 16.13 21.11 34.31 TLR4_9263 AA 14.29 14.29 28.57 42.86 4.13 0.66 AC 22.83 18.48 21.74 36.96 CC 31.47 14.40 21.33 32.80 TLR4_11912 GG 31.10 13.88 20.10 34.93 2.37 0.88 GT 26.96 16.18 22.06 34.80 TT 29.82 17.54 24.56 28.07 TLR4_15884 CC 30.00 20.00 30.00 20.00 3.62 0.73 CG 33.33 11.11 22.22 33.33 GG 27.87 16.38 21.26 34.48 TLR4_17050 CC 27.27 9.09 27.27 36.36 1.82 0.94 CT 25.64 16.24 23.08 35.04 TT 31.01 15.07 20.58 33.33 128 Table A19: Chi square test for investigating the relationship between different genotypes of the selected SNPs in Heme oxygenase-1 and Pseudomonas aeruginosa infection status SNPs Genotype PA status (in percentage) Chi Test P value 0 1 2 3 HMOX1_149 AA 29.63 11.11 22.22 37.04 6.03 0.42 AG 30.39 12.71 19.34 37.57 GG 30.14 18.66 22.49 28.71 HMOX1_1038 CC 30.35 15.29 21.41 32.94 4.05 0.67 CT 26.09 13.04 21.74 39.13 TT 0.00 50.00 0.00 50.00 HN0X1_2790 AA 30.07 18.95 19.61 31.37 6.86 0.33 AT 31.82 14.09 20.00 34.09 TT 23.71 11.34 27.84 37.11 HMOX1_3303 CC 0.00 0.00 0.00 0.00 6.41 0.094 CG 23.40 19.15 34.04 23.40 GG 30.35 14.82 20.00 34.82 H1OX1_9531 AA 29.13 19.42 20.39 31.07 4.84 0.56 AG 30.91 15.91 20.45 32.73 GG 27.59 11.03 22.76 38.62 HMOX1_16442 AA 30.05 14.79 20.19 34.98 7.86 0.25 AT 23.91 19.57 32.61 23.91 TT 100.00 0.00 0.00 0.00 129 Table A20: FBAT analysis of the age of diagnosis under the additive model. The age of diagnosis was first log-transformed before entering into the program Gene SNPs Allele #^of Individuals P value Factor B BF_2557 A 224 0.377BF_2557 G 224 0.377 BF_4022 A 298 0.755 BF_4022 G 298 0.755 BF_6484 A 148 0.288 BF_6484 G 148 0.288 BF_7202 A 317 0.207 BF_7202 G 317 0.207 BF_8311 C 139 0.136 BF_8311 T 139 0.136 Complement factor 3 C3_963 G 344 0.443 C3_963 T 344 0.443 C3_28795 A 332 0.034 C3_28795 G 332 0.034 C3_36735 A 320 0.250 C3_36735 G 320 0.250 C3_43118 A 342 0.048 C3_43118 G 342 0.048 Toll-like receptor 4 TLR4_851 A 300 0.289 TLR4_851 G 300 0.289 TLR4_1859 A 333 0.730 TLR4_1859 G 333 0.730 TLR4_2856 C 207 0.366 TLR4_2856 T 207 0.366 TLR4_9263 A 153 0.855 TLR4_9263 C 153 0.855 TLR4_11912 G 317 0.366 TLR4_11912 T 317 0.366 TLR4_15884 C 194 0.389 TLR4_15884 G 194 0.389 TLR4_17050 C 196 0.344 TLR4_17050 T 196 0.344 Heme oxygenase-1 HMOX1_149 A 250 0.289 HMOX1_149 G 250 0.289 HMOX1_1038 C 65 0.086 HMOX1_1038 T 65 0.086 HMOX1_2790 A 333 0.688 HM0X1_2790 T 333 0.688 HMOX1_3303 C 80 0.114 HMOX1_3303 G 80 0.114 HMOX1_9531 A 334 0.553 HMOX1_9531 G 334 0.553 HMOX1_16442 A 79 0.194 HMOX1_16442 T 79 0.194 130 Table A21: FBAT analysis of the age of diagnosis under the dominant model. The age of diagnosis was first log-transformed before entering into the program Gene SNPs Allele #^of Individuals P value Factor B BF_2557 A 214 0.370BF_2557 G 57 0.832 BF_4022 A 253 0.818 BF_4022 G 136 0.802 BF_6484 A 22 0.545 BF_6484 G 142 0.342 BF_7202 A 231 0.889 BF_7202 G 196 0.029 BF_8311 C 8 n/a BF_8311 T 138 0.124 Complement factor 3 C3_963 G 228 0.942 C3_963 T 227 0.213 C3_28795 A 189 0.366 C3_28795 G 244 0.029 C3_36735 A 243 0.891 C3_36735 G 179 0.084 C3_43118 A 239 0.177 C3 43118_ G 223 0.096 Toll-like receptor 4 TLR4_851 A 94 0.401 TLR4_851 G 275 0.413 TLR4_1859 A 277 0.533 TLR4_1859 G 165 0.144 TLR4_2856 C 201 0.131 TLR4_2856 T 34 0.231 TLR4_9263 A 149 0.607 TLR4_9263 C 20 0.332 TLR4_11912 G 136 0.133 TLR4_11912 T 261 0.930 TLR4_15884 C 187 0.457 TLR4_15884 G 31 0.577 TLR4_17050 C 192 0.318 TLR4_17050 T 37 0.884 Heme oxygenase-1 HMOX1_149 A 205 0.059 HMOX1_149 G 117 0.495 HMOX1_1038 C 4 n/a HMOX1_1038 T 65 0.198 HMOX1_2790 A 185 0.304 HMOX1_2790 T 246 0.762 HMOX1_3303 C 80 0.064 HMOX1_3303 G 7 n/a HMOX1_9531 A 226 0.195 HMOX1_9531 G 208 0.648 HMOX1_16442 A 7 n/a HMOX1_16442 T 79 0.080 131 Table A22: FBAT analysis of FEV1 predicted value under the additive model Gene SNP Allele #^of Individuals P value Factor B BF_2557 A 201 0.019BF_2557 G 201 0.019 BF_4022 A 265 0.090 BF_4022 G 265 0.090 BF_6484 A 138 0.965 BF_6484 G 138 0.965 BF_7202 A 293 0.598 BF 7202_ G 293 0.598 BF_8311 C 132 0.573 BF_8311 T 132 0.573 Complement factor 3 C3_963 G 321 0.124 C3_963 T 321 0.124 C3_28795 A 307 0.087 C3_28795 G 307 0.087 C3_36735 A 293 0.307 C3_36735 G 293 0.307 C3_43118 A 310 0.823 C3_43118 G 310 0.823 Toll-like receptor 4 TLR4_851 A 273 0.247 TLR4_851 G 273 0.247 TLR4_1859 A 297 0.610 TLR4_1859 G 297 0.610 TLR4_2856 C 187 0.876 TLR4_2856 T 187 0.876 TLR4_9263 A 146 0.721 TLR4_9263 C 146 0.721 TLR4_11912 G 285 0.092 TLR4_11912 T 285 0.092 TLR4_15884 C 170 0.036 TLR4_15884 G 170 0.036 TLR4_17050 C 185 0.940 TLR4_17050 T 185 0.940 Heme oxygenase-1 HMOX1_149 A 228 0.830 HMOX1_149 G 228 0.830 HMOX1_1038 C 59 0.015 HMOX1_1038 T 59 0.015 HMOX1_2790 A 308 0.469 HMOX1_2790 T 308 0.469 HMOX1_3303 C 77 0.333 HMOX1_3303 G 77 0.333 HMOX1_9531 A 316 0.886 HMOX1_9531 G 316 0.886 HMOX1_16442 A 73 0.693 HMOX1_16442 T 73 0.693 132 Table A23: FBAT analysis of FEV1 predicted value under the dominant model Gene SNP Allele #^of Individuals P value Factor B BF_2557 A 193 0.005BF_2557 G 49 0.773 BF_4022 A 232 0.010 BF_4022 G 118 0.556 BF_6484 A 22 0.847 BF_6484 G 132 0.908 BF_7202 A 217 0.132 BF_7202 G 179 0.397 BF_8311 C 9 n/a BF_8311 T 131 0.812 Complement factor 3 C3_963 G 207 0.267 C3_963 T 220 0.226 C3_28795 A 166 0.736 C3_28795 G 233 0.038 C3_36735 A 224 0.574 C3_36735 G 166 0.305 C3_43118 A 216 0.616 C3_43118 G 200 0.865 Toll-like receptor 4 TLR4_851 A 93 0.487 TLR4_851 G 250 0.312 TLR4_1859 A 247 0.448 TLR4_1859 G 150 0.919 TLR4_2856 C 180 0.575 TLR4_2856 T 31 0.298 TLR4_9263 A 142 0.748 TLR4_9263 C 21 0.849 TLR4_11912 G 124 0.294 TLR4_11912 T 236 0.151 TLR4_15884 C 165 0.037 TLR4_15884 G 27 0.539 TLR4_17050 C 181 0.838 TLR4_17050 T 36 0.762 Hem oxygenase-1 HMOX1_149 A 190 0.984 HMOX1_149 G 103 0.668 HMOX1_1038 C 4 n/a HMOX1_1038 T 59 0.018 HMOX1_2790 A 177 0.654 HMOX1_2790 T 230 0.528 HMOX1_3303 C 77 0.509 HMOX1_3303 G 6 n/a HMOX1_9531 A 216 0.800 HMOX1_9531 G 195 0.625 HMOX1_16442 A 5 n/a HMOX1_16442 T 73 0.887 133 Table A24: FBAT analysis of FEV1 standard deviation value under the additive model Gene SNP Allele # of Individuals P value Factor B BF_2557 A 201 0.089BF_2557 G 201 0.089 BF_4022 A 264 0.012 BF_4022 G 264 0.012 BF_6484 A 137 0.727 BF_6484 G 137 0.727 BF_7202 A 293 0.026 BF_7202 G 293 0.026 BF_8311 C 131 0.859 BF_8311 T 131 0.859 Complement factor 3 C3_963 G 320 0.848 C3_963 T 320 0.848 C3_28795 A 306 0.316 C3_28795 G 306 0.316 C3_36735 A 293 0.128 C3_36735 G 293 0.128 C3_43118 A 309 0.224 C3_43118 G 309 0.224 Toll-like receptor 4 TLR4_851 A 272 0.999 TLR4_851 G 272 0.999 TLR4_1859 A 296 0.473 TLR4_1859 G 296 0.473 TLR4_2856 C 187 0.807 TLR4_2856 T 187 0.807 TLR4_9263 A 146 0.356 TLR4_9263 C 146 0.356 TLR4_11912 G 285 0.377 TLR4_11912 T 285 0.377 TLR4_15884 C 170 0.163 TLR4_15884 G 170 0.163 TLR4_17050 C 185 0.643 TLR4_17050 T 185 0.643 Heme oxygenase-1 HMOX1_149 A 227 0.690 HMOX1_149 G 227 0.690 HMOX1_1038 C 59 0.111 HMOX1_1038 T 59 0.111 HM0X1_2790 A 307 0.723 HMOX1_2790 T 307 0.723 HMOX1_3303 C 77 0.521 HMOX1_3303 G 77 0.521 HMOX1_9531 A 315 0.089 HMOX1_9531 G 315 0.089 HMOX1_16442 A 73 0.614 HMOX1_16442 T 73 0.614 134 Table A25: FBAT analysis of FEV1 standard deviation value under the dominant model Gene SNP Allele # of Individuals P value Factor B BF_2557 A 193 0.016BF_2557 G 49 0.290 BF_4022 A 231 0.022 BF_4022 G 118 0.153 BF_6484 A 22 0.054 BF_6484 G 131 0.779 BF_7202 A 217 0.027 BF_7202 G 179 0.258 BF_8311 C 9 n/a BF_8311 T 130 0.964 Complement factor 3 C3_963 G 207 0.602 C3_963 T 219 0.803 C3_28795 A 166 0.763 C3_28795 G 232 0.100 C3_36735 A 224 0.502 C3_36735 G 166 0.087 C3_43118 A 215 0.438 C3_43118 G 199 0.280 Toll-like receptor 4 TLR4_851 A 92 0.694 TLR4_851 G 249 0.826 TLR4_1859 A 246 0.195 TLR4_1859 G 149 0.648 TLR4_2856 C 180 0.911 TLR4_2856 T 31 0.320 TLR4_9263 A 142 0.509 TLR4_9263 C 21 0.236 TLR4_11912 G 124 0.827 TLR4_11912 T 236 0.322 TLR4_15884 C 165 0.220 TLR4_15884 G 27 0.404 TLR4_17050 C 181 0.955 TLR4_17050 T 36 0.176 Herne oxygenase-1 HMOX1_149 A 189 0.812 HM0X1_149 G 102 0.664 HMOX1_1038 C 4 n/a HMOX1_1038 T 59 0.069 HMOX1_2790 A 176 0.089 HMOX1_2790 T 229 0.404 HMOX1_3303 C 77 0.608 HMOX1_3303 G 6 n/a HMOX1_9531 A 215 0.0016 HMOX1_9531 G 194 0.558 HMOX1_16442 A 5 n/a HMOX1_16442 T 73 0.628 135 Table A26: Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and age of diagnosis. 430 participants were included Estimate^of^Haplotype Frequency Standard Error f.hlllll 0.1850 0.0133 f.h12112 0.0005 0.0019 f.h12121 0.0116 0.0041 f.h21111 0.0385 0.0067 f.h21211 0.1040 0.0104 f.h22111 0.0180 0.0046 f.h22112 0.0947 0.0101 f.h22121 0.5477 0.0170 Table A27: Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value. 380 participants were included Estimate^of^Haplotype Frequency Standard Error f.hlllll 0.1804 0.0140 f.h12112 0.0004 0.0022 f.h12121 0.0134 0.0047 f.h21111 0.0411 0.0074 f.h21211 0.0979 0.0108 f.h22111 0.0195 0.0051 f.h22112 0.1030 0.0112 f.h22121 0.5445 0.0181 Table A28: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Factor B and FEV1 predicted value, with no adjustment for the confounding factors Estimate of Regression Coefficient Standard Error Z-score P h11111 -2.0478 3.7388 -0.5477 0.5839 h21211 1.4678 4.1147 0.3567 0.7213 h22121 2.8597 3.2461 0.8810 0.3783 pooled -4.6232 4.6197 -1.0008 0.3169 136 Table A29: Haplotype analysis for investigation of possible correlation between combinations of selected SNPs in Factor B and FEV1 predicted value, with adjustment for the confounding factors Estimate of Regression Coefficient Standard Error Z-score P h11111 -1.4185 3.7412 -0.3791 0.7046 h21211 1.9765 4.1065 0.4813 0.6303 h22121 3.4425 3.2503 1.0591 0.2895 pooled -5.2740 4.6159 -1.1426 0.2532 SEXM -2.8884 2.7585 -1.0471 0.2951 genotypeFO 4.5233 3.1719 1.4261 0.1538 genotypeOO -0.4898 4.3212 -0.1134 0.9098 Table A30: Frequencies of possible haplotypes generated for the Factor B gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value. 380 participants were included Estimate^of^Haplotype Frequency Standard Error f.h11111 0.1804 0.0140 f.h12112 0.0003 0.0022 f.h12121 0.0135 0.0047 f.h21111 0.0410 0.0074 f.h21211 0.0979 0.0108 f.h22111 0.0194 0.0051 f.h22112 0.1031 0.0112 f.h22121 0.5444 0.0181 137 Table A31: Frequencies of possible haplotypes generated for the Complement Factor 3 gene when determining the presence of any correlation between the haplotypes and age of diagnosis. 434 participants were included Estimate^of^Haplotype Frequency Standard Error f.h1111 0.0468 0.0112 f.h1112 0.0814 0.0122 f.h1121 0.1127 0.0143 f.h1122 0.1105 0.0142 f.h1211 0.0492 0.0116 f.h1212 0.0596 0.0115 f.h1221 0.0484 0.0117 f.h1222 0.0146 0.0076 f.h2111 0.0395 0.0099 f.h2112 0.0174 0.0077 f.h2121 0.0877 0.0141 f.h2122 0.1057 0.0141 f.h2211 0.0496 0.0135 f.h2212 0.0968 0.0134 f.h2221 0.0407 0.0125 f.h2222 0.0396 0.0107 Table A32: Frequencies of possible haplotypes generated for the Complement factor 3 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value. 384 participants were included Estimate^of^Haplotype Frequency Standard Error f.h1111 0.0428 0.0120 f.h1112 0.0853 0.0133 f.h1121 0.1191 0.0153 f.h1122 0.1087 0.0147 f.h1211 0.0545 0.0125 f.h1212 0.0656 0.0122 f.h1221 0.0461 0.0119 f.h1222 0.0092 0.0063 f.h2111 0.0359 0.0099 f.h2112 0.0111 0.0063 f.h2121 0.0895 0.0147 f.h2122 0.1158 0.0149 f.h2211 0.0492 0.0134 f.h2212 0.0875 0.0131 f.h2221 0.0418 0.0132 f.h2222 0.0377 0.0102 138 Table A33: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 predicted value, with no adjustment for the confounding factors Estimate of Regression Coefficient Standard Error Z-score P h1112 -3.1249 4.4955 -0.6951 0.4870 h1122 -5.7612 4.2983 -1.3403 0.1801 h1211 -3.2693 6.0141 -0.5436 0.5867 h1212 -7.1414 5.6725 -1.2590 0.2080 h2122 2.7698 4.1147 0.6731 0.5009 h2212 -5.0863 4.5827 -1.1099 0.2670 pooled -1.8448 3.1835 -0.5795 0.5623 Table A34: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 predicted value, with adjustment for the confounding factors Estimate of Regression Coefficient Standard Error Z-score P h1112 -3.1173 4.5129 -0.6908 0.4897 h1122 -5.7416 4.3112 -1.3318 0.1829 h1211 -3.2399 6.1242 -0.5290 0.5968 h1212 -7.5332 5.8414 -1.2896 0.1972 h2122 2.3798 4.1276 0.5765 0.5642 h2212 -5.2010 4.6088 -1.1285 0.2591 pooled -1.7596 3.1975 -0.5503 0.5821 SEXM -2.2927 2.7407 -0.8365 0.4029 genotypeFO 3.4788 3.1592 1.1012 0.2708 genotype00 1.8634 4.3248 0.4309 0.6666 139 Table A35: Frequencies of possible haplotypes generated for the Complement factor 3 gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value. 384 participants were included Estimate^of^Haplotype Frequency Standard Error f.h1111 0.0438 0.0121 f.h1112 0.0850 0.0133 f.h1121 0.1180 0.0153 f.h1122 0.1090 0.0147 f.h1211 0.0534 0.0123 f.h1212 0.0662 0.0122 f.h1221 0.0468 0.0117 f.h1222 0.0091 0.0061 f.h2111 0.0349 0.0098 f.h2112 0.0112 0.0064 f.h2121 0.0897 0.0147 f.h2122 0.1166 0.0151 f.h2211 0.0506 0.0134 f.h2212 0.0870 0.0131 f.h2221 0.0415 0.0130 f.h2222 0.0371 0.0101 Table A36: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 standard deviation value, with no adjustment for the confounding factors Estimate of Regression Coefficient Standard Error Z-score P h1112 -0.1705 0.1811 -0.9411 0.3467 h1122 -0.2052 0.2000 -1.0263 0.3048 h1211 -0.1801 0.2443 -0.7372 0.4610 h1212 -0.1882 0.2235 -0.8421 0.3998 h2121 0.1773 0.2090 0.8486 0.3961 h2122 0.1110 0.1619 0.6858 0.4929 h2212 -0.1177 0.1899 -0.6196 0.5355 pooled -0.0260 0.1428 -0.1818 0.8557 140 Table A37: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Complement factor 3 and FEV1 standard deviation value, with adjustment for the confounding factors Estimate of Regression Coefficient Standard Error Z-score P h1112 -0.1686 0.1817 -0.9278 0.3535 h1122 -0.1966 0.2001 -0.9826 0.3258 h1211 -0.1766 0.2469 -0.7152 0.4745 h1212 -0.2058 0.2274 -0.9051 0.3654 h2121 0.1790 0.2093 0.8550 0.3926 h2122 0.1032 0.1623 0.6362 0.5246 h2212 -0.1142 0.1901 -0.6009 0.5479 pooled -0.0204 0.1430 -0.1427 0.8865 SEXM 0.0384 0.0975 0.3940 0.6936 genotypeFO 0.0948 0.1120 0.8463 0.3974 genotypeOO -0.0210 0.1534 -0.1366 0.8914 Table A38: Frequencies of possible haplotypes generated for the Toll- like receptor 4 gene when determining the presence of any correlation between the haplotypes and age of diagnosis. 432 participants were included Estimate^of^Haplotype Frequency Standard Error f.h1122112 0.1447 0.0121 f.h1122122 0.2303 0.0144 f.h1212112 0.0012 0.0012 f.h1212221 0.0428 0.0070 f.h1212222 0.1042 0.0104 f.h1221212 0.0016 0.0015 f.h1221221 0.1026 0.0104 f.h1221222^• 0.0057 0.0027 f.h1222112 0.0013 0.0014 f.h1222122 0.0097 0.0035 f.h1222221 0.0010 0.0015 f.h1222222 0.0729 0.0089 f.h2222112 0.0015 0.0015 f.h2222121 0.0017 0.0017 f.h2222122 0.2788 0.0153 141 Table A39: Frequencies of possible haplotypes generated for the Toll- like receptor 4 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value. 382 participants were included Estimate^of^Haplotype Frequency Standard Error f.h1122112 0.1350 0.0125 f.h1122122 0.2354 0.0154 f.h1212112 0.0013 0.0013 f.h1212221 0.0405 0.0072 f.h1212222 0.1022 0.0110 f.h1221212 0.0018 0.0017 f.h1221221 0.1043 0.0111 f.h1221222 0.0065 0.0031 f.h1222112 0.0013 0.0013 f.h1222122 0.0070 0.0031 f.h1222221 0.0011 0.0017 f.h1222222 0.0733 0.0095 f.h2222112 0.0017 0.0017 f.h2222121 0.0019 0.0019 f.h2222122 0.2867 0.0164 Table A40: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 predicted value, with no adjustment for the confounding factors Estimate of Regression Coefficient Standard Error Z-score P h1122112 -0.6825 3.1893 -0.2140 0.8306 h1122122 1.7004 2.7327 0.6222 0.5338 h1212222 -1.1586 3.6077 -0.3212 0.7481 h1221221 -0.0822 3.6590 -0.0225 0.9821 h1222222 2.0593 4.0267 0.5114 0.6091 pooled 1.7087 4.4352 0.3853 0.7001 142 Table A41: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 predicted value, with adjustment for the confounding factors Estimate of Regression Coefficient Standard Error Z-score P h1122112 -0.9215 3.2089 -0.2872 0.7740 h1122122 1.7599 2.7359 0.6433 0.5201 h1212222 -1.1055 3.6083 -0.3064 0.7593 h1221221 -0.1823 3.6909 -0.0494 0.9606 h1222222 2.1089 4.0556 0.5200 0.6031 pooled 1.7826 4.4310 0.4023 0.6875 SEXM -3.0651 2.7789 -1.1030 0.2700 genotypeFO 2.8110 3.1823 0.8833 0.3771 genotype00 0.5437 4.3542 0.1249 0.9006 Table A42: Frequencies of possible haplotypes generated for the Toll- like receptor 4 gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value. 382 participants were included Estimate^of^Haplotype Frequency Standard Error f.h1122112 0.1350 0.0125 f.h1122122 0.2354 0.0154 f.h1212112 0.0013 0.0013 f.h1212221 0.0405 0.0072 f.h1212222 0.1022 0.0110 f.h1221212 0.0017 0.0017 f.h1221221 0.1043 0.0111 f.h1221222 0.0065 0.0031 f.h1222112 0.0013 0.0013 f.h1222122 0.0070 0.0031 f.h1222221 0.0011 0.0017 f.h1222222 0.0733 0.0095 f.h2222112 0.0017 0.0017 f.h2222121 0.0020 0.0020 f.h2222122 0.2867 0.0164 143 Table A43: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value, with no adjustment for the confounding factors Estimate of Regression Coefficient Standard Error Z-score P h1122112 -0.0400 0.1136 -0.3521 0.7248 h1122122 -0.0148 0.0974 -0.1518 0.8793 h1212222 -0.1856 0.1285 -1.4440 0.1487 h1221221 -0.0675 0.1303 -0.5182 0.6043 h1222222 -0.1347 0.1437 -0.9371 0.3487 pooled -0.1072 0.1580 -0.6782 0.4976 Table A44: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value, with adjustment for the confounding factors Estimate of Regression Coefficient Standard Error Z-score h1122112 -0.0315 0.1144 -0.2756 0.7828 h1122122 -0.0125 0.0977 -0.1280 0.8981 h1212222 -0.1776 0.1287 -1.3798 0.1676 h1221221 -0.0629 0.1316 -0.4782 0.6325 h1222222 -0.1501 0.1449 -1.0354 0.3005 pooled -0.0991 0.1581 -0.6268 0.5308 SEXM 0.0269 0.0991 0.2714 0.7861 genotypeFO 0.0901 0.1135 0.7938 0.4273 genotype00 -0.0533 0.1553 -0.3433 0.7314 144 Table A45: Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and age of diagnosis. 433 participants were included Estimate^of^Haplotype Frequency Standard Error f.h111211 0.0074 0.0031 f.h111221 0.0015 0.0015 f.h112122 0.0013 0.0013 f.h112221 0.3273 0.0160 f.h211211 0.3730 0.0166 f.h211212 0.0100 0.0040 f.h211221 0.1085 0.0107 f.h211222 0.0036 0.0022 f.h212121 0.0135 0.0040 f.h212122 0.0363 0.0064 f.h212221 0.0612 0.0084 f.h212222 0.0019 0.0016 f.h221211 0.0522 0.0077 f.h221212 0.0010 0.0018 f.h221221 0.0013 0.0013 Table A46: Haplotype analysis for investigation of correlation relationship between combinations of selected SNPs in Heme oxygenase-1 and the age of diagnosis, with no adjustment for the confounding factors. The age of diagnosis was logarithmically transformed for normality Estimate of Regression Coefficient Standard Error Z-score P h112221 -0.1633 0.1525 -1.0710 0.2842 h211221 -0.1412 0.2224 -0.6348 0.5255 h212221 -0.2404 0.3017 -0.7967 0.4256 h221211 -0.5300 0.3007 -1.7626 0.0780 pooled -0.0541 0.2641 -0.2049 0.8377 145 Table A47: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and the age of diagnosis, with adjustment for the confounding factors. The age of diagnosis was logarithmically transformed for normality Estimate of Regression Coefficient Standard Error Z-score P h112221 -0.1535 0.1517 -1.0116 0.3117 h211221 -0.1162 0.2216 -0.5245 0.6000 h212221 -0.2301 0.3008 -0.7647 0.4445 h221211 -0.5149 0.3017 -1.7068 0.0879 pooled -0.0804 0.2632 -0.3053 0.7602 SEXM 0.0633 0.1820 0.3477 0.7281 genotypeFO 0.4132 0.2078 1.9883 0.0468 genotype00 0.3573 0.2910 1.2278 0.2195 Table A48: Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and FEV1 predicted value. 383 participants were included Estimate^of^Haplotype Frequency Standard Error f.h111211 0.0085 0.0035 f.h111221 0.0018 0.0018 f.h112122 0.0014 0.0014 f.h112221 0.3308 0.0171 f.h211211 0.3744 0.0176 f.h211212 0.0100 0.0039 f.h211221 0.1072 0.0113 f.h211222 0.0026 0.0022 f.h212121 0.0097 0.0036 f.h212122 0.0413 0.0072 f.h212221 0.0592 0.0088 f.h212222 0.0020 0.0018 f.h221211 0.0499 0.0079 f.h221221 0.0013 0.0013 146 Table A49: Frequencies of possible haplotypes generated for the Heme oxygenase-1 gene when determining the presence of any correlation between the haplotypes and FEV1 standard deviation value. 383 participants were included Estimate^of^Haplotype Frequency Standard Error f.h111211 0.0085 0.0035 f.h111221 0.0018 0.0018 f.h112122 0.0014 0.0014 f.h112221 0.3307 0.0171 f.h211211 0.3745 0.0176 f.h211212 0.0098 0.0039 f.h211221 0.1073 0.0113 f.h211222 0.0026 0.0022 f.h212121 0.0097 0.0036 f.h212122 0.0413 0.0072 f.h212221 0.0591 0.0088 f.h212222 0.0021 0.0018 f.h221211 0.0499 0.0079 f.h221221 0.0013 0.0013 Table A50: Frequencies of possible haplotypes generated for the Factor B gene by the FEAT program haplotypes Estimates of frequency: hl 22121 0.556 h2 11111 0.177 h3 21211 0.104 h4 22112 0.089 h5 21111 0.039 h6 22111 0.021 h7 12121 0.011 h8 22122 0.001 h9 22221 0.000 h10 21221 0.000 147 Table A51: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Factor B and age of diagnosis by the FBAT program. Age of diagnosis was logarithmically transformed for normality Haplotype # of Family S E(S) Var(S) Z P value hl 251 -104.948 -92.713 76.79 -1.396 0.163 h2 182 -27.32 -33.303 54.699 0.809 0.418 h3 117 -14.405 -17.557 27.811 0.598 0.550 h4 120 0.241 -7.648 21.898 1.686 0.092 h5 55 -15.496 -10.599 11.493 -1.444 0.149 h6 31 -1.428 -2.042 6.609 0.239 0.811 h7 19 -0.349 -0.24 2.519 -0.068 0.946 h8 1 n/a n/a n/a n/a n/a h9 1 n/a n/a n/a n/a n/a h10 0 n/a n/a n/a n/a n/a Table A52: Frequencies of possible haplotypes generated for the Complement factor 3 gene by the FBAT program Haplotypes Estimates of frequency hl 2122 0.123 h2 1121 0.105 h3 1122 0.102 h4 2212 0.097 h5 2121 0.072 h6 1212 0.071 h7 1112 0.070 h8 2221 0.055 h9 1221 0.054 h10 2111 0.048 hil 1211 0.046 h12 2211 0.046 h13 1111 0.044 h14 2222 0.025 h15 1222 0.021 h16 2112 0.020 148 Table A53: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Complement factor 3 and FEV1 predicted value by FBAT program Haplotype # of family S E(S) Var(S) Z P value hl 146 7487.676 7443.953 235348.436 0.09 0.928 h2 133 6542.610 6142.430 223082.326 0.847 0.397 h3 124 6316.309 5745.721 180879.428 1.34 0.180 h4 108 4553.519 5070.933 176068.694 -1.233 0.218 h5 93 4105.946 4166.365 145694.779 -0.158 0.874 h6 89 3466.416 3999.288 141407.77 -1.417 0.156 h7 97 4850.073 4148.990 137093.53 1.893 0.058 h8 63 2794.117 3115.907 113221.903 -0.956 0.339 h9 70 2983.738 2909.471 114481.785 0.219 0.826 h10 66 2559.862 2640.701 95908.587 -0.261 0.794 hll 64 2678.309 2723.217 92187.825 -0.148 0.882 h12 61 2630.163 2708.422 100397.066 -0.247 0.805 h13 62 2878.470 2702.198 98142.93 0.563 0.574 h14 33 1396.275 1337.111 51932.961 0.26 0.795 h15 30 998.844 1197.097 37983.311 -1.017 0.309 h16 29 807.060 997.585 28156.775 -1.135 0.256 149 Table A54: Frequencies of possible haplotypes generated for the gene of Toll-like receptor 4 by the FBAT program Haplotypes Estimates of frequency: hl 2222122 0.277 h2 1122122 0.225 h3 1122112 0.145 h4 1212222 0.104 h5 1221221 0.104 h6 1222222 0.068 h7 1212221 0.042 h8 1222122 0.016 h9 1221222 0.008 h10 2222112 0.002 hll 1222221 0.002 h12 1122121 0.001 h13 1212122 0.001 h14 1212121 0.001 h15 1212212 0.001 h16 1222112 0.001 h17 1112122 0.001 h18 1222212 0.000 h19 1122222 0.000 h20 1221212 0.000 h21 2222121 0.000 h22 1221111 0.000 h23 1221121 0.000 150 Table A55: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and age of diagnosis by the FBAT program. Age of diagnosis was logarithmically transformed for normality Haplotype # of family S E(S) Var(S) Z P value hl 238 -43.878 -53.555 61.458 1.234 0.217 h2 221 -31.153 -22.668 51.439 -1.183 0.237 h3 170 -24.737 -30.509 34.862 0.978 0.328 h4 133 -20.116 -16.067 26.067 -0.793 0.428 h5 117 -17.313 -12.126 25.422 -1.029 0.304 h6 90 -20.642 -19.783 21.797 -0.184 0.854 h7 61 -15.974 -14.514 15.556 -0.370 0.711 h8 21 -3.244 -4.737 4.760 0.684 0.494 h9 13 -2.272 -5.136 4.983 1.283 0.200 h10 2 n/a n/a n/a n/a n/a h11 3 n/a n/a n/a n/a n/a h12 2 n/a n/a n/a n/a n/a h13 1 n/a n/a n/a n/a n/a h14 1 n/a n/a n/a n/a n/a h15 1 n/a n/a n/a n/a n/a h16 1 n/a n/a n/a n/a n/a h17 1 n/a n/a n/a n/a n/a h18 0 n/a n/a n/a n/a n/a h19 0 n/a n/a n/a n/a n/a h20 1 n/a n/a n/a n/a n/a h21 1 n/a n/a n/a n/a n/a h22 0 n/a n/a n/a n/a n/a h23 0 n/a n/a n/a n/a n/a 151 Table A56: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Toll-like receptor 4 and FEV1 standard deviation value by the FEAT program Haplotype # of family S E(S) Var(S) Z P value hl 215 89.665 87.686 83.367 0.217 0.828 h2 198 86.617 78.920 81.316 0.854 0.393 h3 149 38.735 50.966 56.578 -1.626 0.104 h4 114 28.836 28.579 33.093 0.045 0.964 h5 112 44.000 41.269 37.156 0.448 0.654 h6 76 20.103 16.635 22.980 0.724 0.469 h7 57 13.377 13.234 20.769 0.031 0.975 h8 18 0.364 5.476 7.912 -1.817 0.069 h9 10 -0.178 0.898 1.261 -0.959 0.338 h10 2 n/a n/a n/a n/a n/a h11 2 n/a n/a n/a n/a n/a h12 2 n/a n/a n/a n/a n/a h13 1 n/a n/a n/a n/a n/a h14 1 n/a n/a n/a n/a n/a h15 1 n/a n/a n/a n/a n/a h16 0 n/a n/a n/a n/a n/a h17 1 n/a n/a n/a n/a n/a h18 1 n/a n/a n/a n/a n/a h19 0 n/a n/a n/a n/a n/a h20 1 n/a n/a n/a n/a n/a h21 1 n/a n/a n/a n/a n/a h22 0 n/a n/a n/a n/a n/a h23 0 n/a n/a n/a n/a n/a 152 Table A57: Frequencies of possible haplotypes generated for the Herne oxygenase-1 gene by the FBAT program Haplotypes Estimates of frequency: hl 211211 0.404 h2 112221 0.308 h3 211221 0.102 h4 212221 0.068 h5 221211 0.039 h6 212122 0.037 h7 212121 0.013 h8 211212 0.009 h9 111211 0.005 h10 111221 0.004 hll 112211 0.002 h12 211222 0.001 h13 221221 0.001 h14 112122 0.001 h15 212222 0.001 h16 112222 0.001 h17 122221 0.001 h18 212211 0.001 h19 221212 0.001 h20 121211 0.001 h21 111222 0.000 153 Table A58: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and age of diagnosis by the FBAT program. Age of diagnosis was logarithmically transformed for normality Haplotype # of family S E(S) Var(S) Z P value hl 255 -41.328 -49.460 71.522 0.962 0.336 h2 221 -22.466 -30.480 58.029 1.052 0.293 h3 124 -13.264 -14.073 23.826 0.166 0.868 h4 83 -11.206 -8.905 17.249 -0.554 0.580 h5 52 -16.957 -12.072 12.119 -1.403 0.161 h6 49 -5.699 -2.119 8.887 -1.201 0.230 h7 18 -8.772 -5.713 3.132 -1.728 0.084 h8 10 -1.623 -2.611 2.088 0.684 0.494 h9 9 n/a n/a n/a n/a n/a h10 8 n/a n/a n/a n/a n/a h11 2 n/a n/a n/a n/a n/a h12 2 n/a n/a n/a n/a n/a h13 1 n/a n/a n/a n/a n/a h14 1 n/a n/a n/a n/a n/a h15 2 n/a n/a n/a n/a n/a h16 1 n/a n/a n/a n/a n/a h17 1 n/a n/a n/a n/a n/a h18 0 n/a n/a n/a n/a n/a h19 1 n/a n/a n/a n/a n/a h20 1 n/a n/a n/a n/a n/a h21 0 n/a n/a n/a n/a n/a 154 Table A59: Haplotype analysis for investigation of correlation between combinations of selected SNPs in Heme oxygenase-1 and FEV1 standard deviation value by the FEAT program Haplotype # of family S E(S) Var(S) Z P hl 233 115.283 105.434 89.373 1.042 0.298 h2 198 75.282 76.587 69.100 -0.157 0.875 h3 114 17.461 26.854 33.189 -1.630 0.103 h4 69 14.362 14.004 14.980 0.092 0.926 h5 44 19.581 15.284 12.130 1.234 0.217 h6 47 7.874 10.227 12.511 -0.665 0.506 h7 15 1.953 2.011 1.592 -0.046 0.963 h8 8 n/a n/a n/a n/a n/a h9 8 n/a n/a n/a n/a n/a h10 7 n/a n/a n/a n/a n/a hil 3 n/a n/a n/a n/a n/a h12 2 n/a n/a n/a n/a n/a h13 2 n/a n/a n/a n/a n/a h14 0 n/a n/a n/a n/a n/a h15 1 n/a n/a n/a n/a n/a h16 1 n/a n/a n/a n/a n/a h17 1 •n/a n/a n/a n/a n/a h18 0 n/a n/a n/a n/a n/a h19 0 n/a n/a n/a n/a n/a h20 1 n/a n/a n/a n/a n/a h21̂ 0 n/a n/a n/a n/a n/a ^ Table A60: Haplotype analysis between the haplotypes formed by the five selected SNPs in the Factor B gene and age of Pseudomonas aeruginosa infection by Hapstat, in order to determine any association between the haplotypes and the phenotype Estimated value Standard Error Z score P value 00000 -0.306 0.196 -1.549 0.121 11001 0.014 0.179 0.076 0.939 11001 -0.174 0.229 -0.759 0.448 Sex -0.117 0.171 -0.687 0.492 CFTR genotype 0.089 0.120 0.739 0.460 155 Table A61: Haplotype analysis between the haplotypes formed by the four selected SNPs in the Complement factor 3 gene and age of Pseudomonas aeruginosa infection by Hapstat, in order to determine any association between the haplotypes and the phenotype Estimated value Standard Error Z score P value 0001 -0.341 0.317 -1.077 0.282 0010 -0.401 0.238 -1.687 0.092 0011 -0.264 0.268 -0.983 0.326 0100 -0.601 0.383 -1.571 0.116 0101 -0.506 0.385 -1.315 0.188 1010 -0.526 0.487 -1.078 0.281 1101 -0.070 0.323 -0.216 0.829 1110 -0.474 0.360 -1.317 0.188 Sex -0.090 0.184 -0.489 0.625 CFTR genotype 0.0589 0.1305 0.4512 0.6518 Table A62: Haplotype analysis between the haplotypes formed by the seven selected SNPs in the Toll-like receptor 4 gene and age of Pseudomonas aeruginosa infection by Hapstat, in order to determine any association between the haplotypes and the phenotype Estimated value Standard Error Z score P value 0011001 0.133 0.198 0.670 0.503 0011011 -0.070 0.160 -0.435 0.664 0101111 -0.080 0.236 -0.340 0.734 0110110 0.089 0.249 0.358 0.720 0111111 0.096 0.209 0.457 0.648 Sex -0.188 0.178 -1.060 0.289 CFTR genotype 0.169 0.134 1.261 0.207 156 Table A63: Haplotype analysis between the haplotypes formed by the six selected SNPs in the Heme oxygenase-1 gene and age of Pseuclomonas aeruginosa infection by Hapstat, in order to determine any association between the haplotypes and the phenotype Estimated value Standard Error Z score P value 001110 0.072 0.156 0.464 0.643 100110 0.034 0.220 0.156 0.876 101011 0.363 0.271 1.338 0.181 110100 0.173 0.265 0.653 0.514 Sex 0.171 0.182 0.943 0.346 CFTR genotype 0.107 0.129 0.827 0.409 157

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.24.1-0066649/manifest

Comment

Related Items