Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

An automated multicolour fluorescence in situ hybridization workstation for the identification of clonally… Dubrowski, Piotr 2008

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2008_spring_dubrowski_piotr.pdf [ 15.92MB ]
Metadata
JSON: 24-1.0066363.json
JSON-LD: 24-1.0066363-ld.json
RDF/XML (Pretty): 24-1.0066363-rdf.xml
RDF/JSON: 24-1.0066363-rdf.json
Turtle: 24-1.0066363-turtle.txt
N-Triples: 24-1.0066363-rdf-ntriples.txt
Original Record: 24-1.0066363-source.json
Full Text
24-1.0066363-fulltext.txt
Citation
24-1.0066363.ris

Full Text

AN AUTOMATED MULTICOLOUR FLUORESCENT IN-SITU HYBRIDIZATION (FISH) WORKSTATION FOR THE IDENTIFICAITON OF CLONALLY RELATED CELLS by PIOTR DUBROWSKI BSc., The University of Guelph, 2005  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in THE FACULTY OF GRADUATE STUDIES (Physics) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) April 2008 © Piotr Dubrowski, 2008  Abstract The methods presented in this study are aimed at the identification of subpopulations (clones) of genetically similar cells within tissue samples through measurement of loci-specific Fluorescence in-situ Hybridization (FISH) spot signals for each nucleus and analyzing cell spatial distributions by way of Voronoi tessellation and Delaunay triangulation to robustly define cell neighbourhoods. The motivation for the system is to examine lung cancer patient for subpopulations of Non-Small Cell Lung Cancer (NSCLC) cells with biologically meaningful gene copy-number profiles: patterns of genetic alterations statistically associated with resistance to cis-platinum/vinorelbine doublet chemotherapy treatment. Current technologies for gene-copy number profiling rely on large amount of cellular material, which is not always available and suffers from limited sensitivity to only the most dominant clone in often heterogeneous samples. Thus, through the use of FISH, the detection of gene copy-numbers is possible in unprocessed tissues, allowing identification of specific tumour clones with biologically relevant patterns of genetic aberrations. The tissue-wide characterization of multiplexed loci-specific FISH signals, described herein, is achieved through a fully automated, multicolour fluorescence imaging microscope and object segmentation algorithms to identify cell nuclei and FISH spots within. Related tumour clones are identified through analysis of robustly defined cell neighbourhoods and cell-to-cell connections for regions of cells with homogenous and highly interconnected FISH spot signal characteristics. This study presents experiments which demonstrate the system’s ability to accurately quantify FISH spot signals in various tumour tissues and in up to 5 colours simultaneously or more through multiple rounds of FISH staining. Furthermore, the system’s FISH-based cell classification performance is evaluated at a sensitivity of 84% and specificity 81% and clonal identification algorithm results are determined to be comparable to clone delineation by a human-observer. Additionally, guidelines and procedures to perform anticipated, routine analysis experiments are established. ii  Table of Contents Abstract .......................................................................................................... ii Table of Contents ......................................................................................... iii List of Tables ................................................................................................ vi List of Figures.............................................................................................. vii Acknowledgements ...................................................................................... ix Chapter 1 – Project Motivation and Aims ................................................. 1 1.1 Clonal Hypothesis for Cancer .......................................................................... 1 1.2 Lung Cancer ...................................................................................................... 1 1.3 Genome-wide Analysis...................................................................................... 3 1.4 The Role of Fluorescence in-situ Hybridization ............................................. 3 1.5 Main Motivation................................................................................................ 4 1.6 Project Hypothesis, Goals and Aims ............................................................... 5 1.6.1 Formal hypothesis ....................................................................................... 5 1.6.2 Hypothesis testing ....................................................................................... 5 1.6.3 Other goals and aims ................................................................................... 6  Chapter 2 – Background Information ........................................................ 7 2.1 Cancer Genetics ................................................................................................ 7 2.2 Fluorescent in-situ Hybridization (FISH) Basics ........................................... 9 2.2.1 DNA probes ................................................................................................ 9 2.2.2 The hybridization reaction ........................................................................ 10 2.2.3 Background reduction ............................................................................... 12 2.2.4 Counterstaining and antifade .................................................................... 13 2.2.5 Relevant limitations and drawbacks ......................................................... 13 2.3 Bacterial Artificial Chromosomes (BACs) ................................................... 15 2.4 DNA labeling reactions ................................................................................... 15 2.5 Comparative Genomic Hybridization (CGH) .............................................. 16 2.5.1 The CGH technique .................................................................................. 16 2.5.2 The SMRT-array spotting library ............................................................. 19 2.6 Automated, Quantitative FISH Analysis ...................................................... 19 2.6.1 Relevant image analysis techniques.......................................................... 20 2.6.2 Previous work on FISH spot enumeration automation ............................. 22 2.7 Mathematical Constructs Used to Define Neighbourhoods ........................ 23  Chapter 3 – Materials and Methods ......................................................... 26 3.1 FISH Protocols ................................................................................................ 26 3.1.1 Bacterial Artificial Chromosome (BAC) extraction ................................. 26 3.1.2 FISH probe synthesis ................................................................................ 26  iii  3.1.3 Optimized tissue FISH protocol................................................................ 28 3.1.4 ReFISH protocol ....................................................................................... 28 3.2 Imaging and System Hardware ..................................................................... 29 3.3 In-house Developed Software......................................................................... 30 3.3.1 Image pre-processing ................................................................................ 31 3.3.2 The ‘Enhanced-Edge Detection’ segmentation algorithm ........................ 34 3.3.3 The spot counting algorithm ..................................................................... 36 3.3.4 The architecture analysis metrics .............................................................. 39 3.3.4.1 Neighbourhood Homogeneity (NH) score ........................................... 40 3.3.4.2 Clone Connectivity (CC) score ............................................................ 43 3.3.4.3 Heat Map Visualization ....................................................................... 45  Chapter 4 – Results ..................................................................................... 46 4.1 Metasystem’s Metafer-4 Evaluation ............................................................. 46 4.2 Segmentation Evaluation Experiment .......................................................... 49 4.3 Spot Counting Evaluation Experiment ......................................................... 52 4.4 Cell Classification and Clonal Identification Validation............................. 55 4.4.1 Experiment introduction ........................................................................... 55 4.4.2 Experimental results.................................................................................. 57 4.4.2.1 Intra-observer and inter-observer variability characterization ............. 58 4.4.2.2 Classification sensitivity and specificity .............................................. 59 4.4.2.3 Neighbourhood homogeneity analysis ................................................. 62 4.4.2.3.1 Neighbourhood Homogeneity map noise and spot counting ......... 64 4.4.2.4 Clone Connectivity analysis ................................................................ 65 4.4.2.4.1 Clone Connectivity score results and spot counting ....................... 67 4.4.2.4.2 Thresholding the CC score to identify clones ................................. 67 4.4.3 Conclusions ............................................................................................... 74 4.5 Proof-of-Concept Experiment........................................................................ 76 4.5.1 Experiment introduction ........................................................................... 76 4.5.2 Materials and methods .............................................................................. 77 4.5.2.1 Sample selection .................................................................................. 77 4.5.2.2 Determining gene loci of interest ......................................................... 77 4.5.2.3 FISH probe synthesis and tissue staining............................................. 78 4.5.3 Experiment results .................................................................................... 80 4.5.3.1 Nuclei segmentation and spot counting ............................................... 80 4.5.3.1 Neighbourhood Homogeneity and Clone Connectivity analyses ........ 83 4.5.4 Conclusions ............................................................................................... 85 4.6 ReFISH Experiment ....................................................................................... 86 4.7 Use in Other Tissues: Oral Carcinoma in-situ (CIS) ................................... 90 4.7.1 Experiment introduction ........................................................................... 90 4.5.2 Experiment results .................................................................................... 90 4.5.2.1 Spot enumeration ................................................................................. 91 4.5.2.2 Neighbourhood Homogeneity and Clone Connectivity analyses ........ 93 4.5.3 Conclusions ............................................................................................... 96 4.8 FISH Synthesis and Staining Protocol Optimization .................................. 98 4.8.1 FISH probe synthesis ................................................................................ 98  iv  4.8.2  Tissue FISH staining protocol optimization ........................................... 100  Chapter 5 – Discussion and Conclusions ................................................ 104 5. 1 5. 2 5.3  General Conclusions ..................................................................................... 104 Essential Future Work ................................................................................. 106 Future Work Towards the Ultimate Goals for the System ....................... 108  Bibliography .............................................................................................. 109 Appendices ................................................................................................. 117 Appendix A ................................................................................................................ 117 Appendix B ................................................................................................................ 124 Appendix C ................................................................................................................ 127  v  List of Tables TABLE 3-1 – Filter specifications.................................................................................... 29 TABLE 3-2 – Nomenclature of colour filters used .......................................................... 32 TABLE 4-1 – Segmentation results by human classification ........................................... 51 TABLE 4-2 – Comparison of human vs. system spot counting ....................................... 53 TABLE 4-3 – Comparison of intra and inter observer mouse-cell classification............. 58 TABLE 4-4 – Results of human vs. system mouse-cell classification ............................. 61 TABLE 4-5 – Specificity and sensitivity analysis ............................................................ 61 TABLE 4-6 – Means and standard deviations of NH and CC maps showing systematic differences between the human and system classified data. ..................................... 65 TABLE 4-7– Single thresholding of CC maps to delineate independent clones .............. 69 TABLE 4-8 – Two thresholding of CC maps to delineate independent clones................ 72 TABLE 4-9 – POC experiment FISH probe overview. .................................................... 78 TABLE 4-10 – Summary of expected and measured spot counts in sample 82050465... 83 TABLE 4-11 – Cell classification resutls for oral CIS tissue ........................................... 91 TABLE 4-12 – Spot counting results for oral CIS tissue ................................................. 93 TABLE 4-13 – Summary of FISH probe synthesis experiments ................................... 100 TABLE 4-14 – Compositions of hybridization buffers used . ........................................ 101 TABLE 4-15 – Three post-hybridization washes evaluated. .......................................... 102  vi  List of Figures FIGURE 1-1 – Schematic representation of clonal evolution of tumours.. ........................ 2 FIGURE 2-1 – Schematic representation of FISH technique ........................................... 10 FIGURE 2-2 – The emission and excitation spectra a fluorophore .................................. 14 FIGURE 2-3 – Illustrative representation of aCGH technique......................................... 18 FIGURE 2-4 –Delaunay triangulation and Voronoi Tessellation..................................... 24 FIGURE 3-1 – Example of HIND III digestion fingerprinting ........................................ 27 FIGURE 3-2 – FOV nomenclature ................................................................................... 31 FIGURE 3-3 – Hot/noisy CCD pixels and correction. ..................................................... 33 FIGURE 3-4 – Directory structure after file_sort.m ......................................................... 34 FIGURE 3-5 – The .img file structure .............................................................................. 35 FIGURE 3-6 – Example of top-hat filtering. .................................................................. 37 FIGURE 3-7 – Example of stitching together datasets from the original and subsampled image segmentation .................................................................................................. 39 FIGURE 3-8 –Neighbourhood Homogeneity and Clone Connectivity score maps. ........ 41 FIGURE 3-9 – Comparison of CC maps for unlimited vs. truncated algorithm. ............. 45 FIGURE 4-1 – Metafer-4 vs. in-house developed classification analyses ....................... 47 FIGURE 4-2 – Examples of over and under segmented nuclei ........................................ 50 FIGURE 4-3 – Example of bad and good qualitly staining/nuceli-densities.................... 50 FIGURE 4-4 – Typical spot counting results ................................................................... 52 FIGURE 4-5 – Automated spot counting evaluation against manual spot counting ........ 53 FIGURE 4-6 – Merged colour image of infiltrating mouse cells into human tumour and NH and CC maps for the area ................................................................................... 56 FIGURE 4-7 – NH and CC maps of sample 1 classified by observer 1 (twice) and observer 2, to illustrate inter and intra observer variablility. .................................... 60 FIGURE 4-8 – NH Maps for samples 1 through 4 showing both human and system classified data. ........................................................................................................... 63 FIGURE 4-9 – CC Maps for samples 1 through 4 showing both human and system classified data.. .......................................................................................................... 66  vii  FIGURE 4-10 – Example of varying threshold level of CC maps to idenify clones........ 68 FIGURE 4-11 – Clones identified in CC maps of samples 1 through 4 using single thresholds. ................................................................................................................. 71 FIGURE 4-12 – Clones identified in CC maps of samples 1 through 4 using multiple (two) thresholds.. ...................................................................................................... 73 FIGURE 4-13 – CGH data of sample 82050465 showing regions of interest.................. 79 FIGURE 4-14 – 0.7% agarose gel showing probe lengths of 100-500 bp ....................... 79 FIGURE 4-15 – Counterstain and segmentation object coordinate image of squamous cell carcinoma sample (sample 82050465)...................................................................... 82 FIGURE 4-16 – NH and CC maps for probe amplifications in sample 82050465 .......... 84 FIGURE 4-17 – Example of 3 sequential rounds of ReFISH ........................................... 87 FIGURE 4-18 – Image of ReFISHed region which failed to hybridize ........................... 89 FIGURE 4-19 – Counterstain and segmentation object coordiante images of the human excised oral CIS sample ............................................................................................ 92 FIGURE 4-20 – An examples of FISH spot signal of the oral CIS tissue ........................ 93 FIGURE 4-21– NH and CC maps scoring cells with abnormal FISH signatures as defined in the LAVysion guidelines ...................................................................................... 94 FIGURE 4-22 – NH and CC maps examining for normal cells ...................................... 95 FIGURE 4-23 – NH and CC maps scoring cells with amplifications or single deletion of both signals simultaneously. ..................................................................................... 96  viii  Acknowledgements I would like to extend my deepest gratitude to my supervisor, Dr. Calum MacAulay and the researchers and staff at the Cancer Imaging Department, with special thanks to Mehrnoush Khojasteh, Dr. Martial Guillaud and Rassim Kamalov. I would also like to offer thanks to various summer students, co-op students and volunteers whose contributions to this project were invaluable.  Lastly, I would like to thanks Dr. Wan  Lam and the rest of his Cancer Genetics Lab for their continued patience and the shared use of their facilities and resources. I thank my parents, Ewa and Edward, for their irreplaceable place in my life and their strength and passion. I thank my brother, Adam, for kindling my love of science and critical thinking from a young age. I would like to thank the rest of my ever growing family: Heather, Lili, Viktor, Ewa, Jacek, Ania and Maya who bring so much happiness to us all. I would also like to thank all the family, friends and coworkers whom I’ve missed but whose roles were boundless. Lastly, I would like to dedicate this work to my love, Stef. You are the essence of joy in my life. Thank you eternally.  ix  Chapter 1 – Project Motivation and Aims 1.1 Clonal Hypothesis for Cancer Cancer is a disease involving genetic as well as epigenetic changes that are transferable from cancer cells to subsequent generations of progeny cells. Thus, cancer progression is thought to be an evolutionary process that involves the expansion of genetically-altered clones under selective pressures1.  Evolving cells gain growth  advantage through accumulation of genetic alterations in two main groups of genes: the protooncogenes and the tumour suppressor genes2,3. For a schematic diagram of clonal evolution of tumours, see figure 1-1 below.  It has been well established that this  neoplastic process (in many cancers such as colorectal carcinomas,4,5 prostate cancer6, renal cell tumors7, head and neck squamous cell carcinomas8, gliomas9,10 and lung tumours 11) indeed follow such a genetic pathway, where an increasing number of genetic aberrations accumulate in these altered tissues. Furthermore, specific patterns of genetic aberrations are often candidates for drug and target therapies and have been associated with more advanced, more aggressive stages of cancers12-17 and even resistance to therapies. 18  1.2 Lung Cancer It is becoming evident that this is also the case in lung cancer, which has the highest mortality rate among all other malignancies and is responsible for approximately 29% of all cancer-related deaths in the United States.19 In 2006 alone, it was estimated that 174,000 new lung cancer cases would be diagnosed and 163,000 people would die of this disease.19 Furthermore, the 5-year survival rate for patients with lung malignancies is only approximately 15% and has not significantly improved in the last 30 years.  1  Cancer Genetic Instability  Genetic Change Genetic Change  Genetic Change  FIGURE 1-1 – Schematic representation of clonal evolution of tumours. The green balls represent cells that have developed a genetic abnormality and are expanding or growing into a clone of cells. One of these cells develops a second genetic abnormality, illustrated by a blue ball, seen to expand into its own clone of cells or subclones of the green population. A third genetic change is made, illustrated by a brown ball, with clonal expansion of this cell population. Eventually, another genetic mistake is made in one of the cells of the brown population that allows that cell to become cancerous. This red, cancerous clone is characterized by high genetic instability and high level of proliferation. The final state of the tumour is a highly-diverse, multiclonal entity with a convolved interaction of competition and cooperation between its individual clones.  20  NOTE: Changes that result in new clones can be both, genetic or epigenetic in origin, which is  left out of the diagram for simplicity.  Surgery is the mainstay of treatment for patients with early (stage 0 to II) non–small-cell lung cancer (NSCLC) which includes three major histological subtypes, adenocarcinoma, squamous cell carcinoma, and large cell carcinoma accounts  18  . Unfortunately, most  patients are diagnosed with inoperable advanced stage III and IV disease chemotherapy at the forefront NSCLC treatment.  19  placing  However, NSCLC response to  chemotherapy is very low and survival times continue to be poor.  This lack of  effectiveness is often accredited to the rampant chemoresistant phenotype of many NSCLCs.18,21  2  1.3 Genome-wide Analysis With the emergence of genomic scanning techniques, the information now available opens the possibilities to pharmacogenomic-guided chemotherapy studies which could be used to predict drug resistance18 based on associated patterns genetic abnormalities.  Such genome wide technologies, such as comparative genomic  hybridization (CGH) have uncovered potential genomic aberrations relevant in multidrug resistance phenotypes in a variety of tumors and chemotherapy agents22,23 This offers the possibility of not only identifying a tumour’s resistance to specific therapies but also its possible sensitivity to others; opening the door to personalized treatment and improved clinical management. However, these genome wide tests usually require a large quantity of DNA, often isolated from up to millions of cells.24 With such a large population of cells needed, intratumour heterogeneity becomes a very important factor.16 Consequently, CGH results may only reflect the dominant genotype present within the material analyzed, which may not be the only one of biological importance. Likewise, information about tumor heterogeneity is masked. This is a major drawback as intratumour genetic heterogeneity has been shown to have important clinical implications25,26 and thus, should be preserved and furthermore quantified and studied. Moreover, invasive tumours almost always harbour various amounts of genetically normal cells intermixed to varying degrees with the tumour cells, again further masking any true genotypes present in the CGH data.  1.4 The Role of Fluorescence in-situ Hybridization Current methods to overcome the shortcomings of CGH are often based on histological or cytological validation techniques such as fluorescence in-situ hybridization (FISH). FISH is often used to validate CGH data based on a few genetic regions at a time by detecting gene copy or chromosome copy number alteration (amplification and deletions) within formalin-fixed, paraffin-embedded tissues, fresh frozen tissues or often within cell suspensions of either matched immortalized cell lines or disaggregated tissue cells in suspension.  FISH has proven to be as accurate as  3  Southern blot analysis for detecting genetic aberrations, while also allowing the measurement of the fraction of altered cells and the heterogeneity within a given cell population27,28 Although the FISH technique is used in numerous variations and for varied purposes, with respect to this thesis so far only its ability to quantify/validate information mirroring CGH data has been really explored; i.e. the enumeration of genetic regions in terms of copy numbers, which manifest as different coloured ‘FISH spots’ within each nucleus. Currently most such FISH spot enumeration or scoring is evaluated in a semi-qualitative fashion by human observers, which is a difficult, time consuming and user-dependent task.29 Furthermore, quantitative analysis is usually done at the fieldof-view level, limiting scoring to small discrete areas rather then of the entire tumour specimen.  1.5 Main Motivation Thus, we propose an automated FISH enumeration system to overcome the shortcomings of both CGH and manual FISH analysis and fill the gap between these two platforms. The automation of the FISH enumeration is obvious and will enable efficient and standardized tissue-wide characterization of loci-specific FISH signals. Furthermore, if these loci-specific FISH probes are based on statistically significant, biologically important genetic loci as derived through the aforementioned CGH profiling, their spatial distributions within tissues scanned can potentially identify regions of biologically relevant tumour clones that may have been missed in CGH profiling due to high tumour heterogeneity or due to the presence of other, more numerous but less important clones. By identifying such small regions of cells with specific, biologically important genetic aberrations the system can serve an important role in both simple research and more importantly, clinical cancer management. The ultimate motivation for the system will be to examine lung cancer patient samples in order to validate and search for subpopulations of NSCLC cells with recently established biologically meaningful CGH profiles; 5-50 significant genetic alterations statistically associated with resistance to cis-platinum/vinorelbine doublet chemotherapy treatment based on.30-33 This, however, will not be within the scope of this proof-ofconcept study.  4  1.6 Project Hypothesis, Goals and Aims 1.6.1 Formal hypothesis The hypothesis for this study proposes that identification of cell subpopulations (clones) of genetically similar cells within paraffin-embedded, formalin-fixed tissue samples is possible by means of measuring loci-specific FISH spot signals for each cell present and consequently analyzing the spatial distributions and cell neighbourhoods by way of Voronoi tessellation and Delaunay triangulation to robustly define cell neighbours. Secondly, we hypothesize that CGH ratios can also be validated in this fashion, if both, the FISH probes used and the tissue scanned are matched to a specific CGH profile. Furthermore, it is thought that this FISH enumeration and clonal analysis approach can be made in a variety of tissues and in repeated rounds of staining on the same tissue section to increase the number of genetic loci investigated using a limited number of fluorophores.  1.6.2 Hypothesis testing To test the aforementioned hypothesis, an automatic FISH enumeration system was established capable of accurate nuclei identification and loci-specific FISH signal enumeration.  Furthermore, two spatial analysis metrics were developed to identify  regions of cells with similar FISH characteristics. Apart from accuracy and validation tests for the object segmentation and FISH spot enumeration processes, the hypothesis will be tested by using squamous cell lung carcinoma mouse xenograft tissue samples which present both human tumour cells as well as small clumps of normal mouse cells, as a model for clonal identification.  FISH probes were chosen not to cross-hybridize  between mouse and human species and thus, clonal identification was tested by finding regions of cells lacking human FISH loci signals.  This task was then performed  manually to evaluate the truthfulness of automatic classification and region identification.  5  To determine if CGH profiles can be validated in-situ, a previously CGH profiled, excised squamous cell lung carcinoma tissue sample will be scanned with custom made FISH probes identifying significantly amplified loci. Furthermore, spatial analysis will be performed to identify clones with CGH-derived amplification profiles and visualize their arrangements. Finally, the entire process will be investigated in oral tumour tissues to show that it is fully applicable in tissues other the lung tumours. Likewise, the feasibility of repeating FISH staining on previously FISH-stained tissue sections will be investigated as a way to increase the number of genetic loci investigated using a limited number of fluorophores.  1.6.3 Other goals and aims Specifically, the proposed system should be highly automated and capable of high-throughput scanning. Furthermore, the clone identification processes should be able to identify minimal areas of cells exhibiting specific genetic profiles of interest, consisting of much less then the 0.1 – 1 million of cells generally needed for identification through current array CGH technologies. Furthermore, the extent and size required to define a clone, which likely have an important impact on the characteristics of the tumour under investigation, will also be investigated. The automatic FISH enumeration system will also be useful for validation of CGH results by enumeration of FISH spots on a cell-by-cell level across previously CGH profiled tissues. In this fashion, not only can CGH profiles be confirmed to exist ‘on average’ across the entire tissue region but also ascertain information about the extent of its existence in individual cells. To fulfill all these roles, the FISH workstation needs to do the following: image an extended selected tissue area, accurately identify cells within, enumerate their FISH spots and quantify both the extent of homogeneity and connectivity of cells that exhibit the specific genetic profiles being investigated.  6  Chapter 2 – Background Information 2.1 Cancer Genetics Cancer is the result of changes and alterations of a cell’s DNA, particularly in oncogenes and tumor-suppressor genes. A single genetic change is rarely sufficient for the development of a malignant tumor but instead, evidence suggests a convolved process of accumulating genetic alterations.34 Moreover, non-genetic or epigenetic changes can further alter the behavior of not only malignant cells but also host cells that interact with the tumour, such as immune, vascular, and stromal cells.35 The transformation into malignancy requires these accumulated genetic and epigenetic changes to confer some form of growth and/or cellular survival advantage. This is often the case when genetic or epigenetic changes result in the loss of tumor suppressor genes and/or activation of (proto)oncogenes; responsible for critical functions such as cell proliferation control, DNA repair mechanism, regulation of apoptosis, telomerase control, angiogenesis and tissue invasion.36 As alluded to, oncogenes are genes that once activated can cause normal cells to proliferate out of control and become cancerous. These are altered forms of normal genes responsible for control growth and differentiation, called proto-oncogenes.  Tumour  suppressor genes, conversely, refer to genes whose loss of function results in the promotion of malignancy. Tumor suppressor genes are usually regulators or inhibitors of growth functions.37 The loss of tumour suppressor genes is hypothesized to be a two-step process involving both alleles rather then a single allele as in the case of oncogene activation. First, loss of allele heterozygosity (LOH) by chromosomal deletion or translocation disrupts one functional allele while the second is altered by point mutation or epigenetic changes leading to a complete loss of gene function.38 Conversely, the activation of oncogenes usually requires changes in only one allele. This can be achieved by chromosomal rearrangements, random or stimulated  7  mutations, gene amplification and epigenetic changes.36 Chromosome inversions and translocations cause genetic rearrangements that can bring a gene carrying an active promoter to another that carries the oncogenic activity. Oncogene activation by mutation must confer a specific structural change to the encoded protein such that it enhances its activity. Oncogene activation can also be brought about by gene amplification and epigenetic changes. Recently, epigenetic changes have become viewed with more importance for their roles in modulating gene transcription in the initiation and progression of human cancers.39 Epigenetic changes refer to changes in gene expression that are stable over cell division and between generations, but do not involve direct changes to the DNA sequence. The molecular basis of epigenetics involves modifications to DNA and the chromatin proteins that associate with it, specifically, DNA methylation, RNA-associated silencing and histone modification.40 The disruption of these systems leads to inappropriate expression or silencing of genes which play an important role in human cancer and genetic instability, much like the direct genetic changes of tumour suppressor and oncogenes mentioned previously. Consequently, whether related to genetic and/or epigenetic changes, cancer is ubiquitously characterized by a high level of genetic instability. This instability usually results in significant and rampant changes in the native genome of the cells, often as large regions of amplification and/or deletion.34 That said, gene amplifications and deletions are not exclusive to late stage, progressing tumours but have also been shown to play important roles in tumour initiation and early stage tumours, such as early stage of colorectal tumorigenesis41, ovarian tumors42 and various others43. Accordingly, gene amplifications and deletions play an important role in determining progressing-tumour as well as early-stage tumour characteristics. These are often identified as prospects for drug development and targeted therapies, however, due to the high genetic instability, tumors often possess several genetically distinct clones. These clones, varying in genetic alterations and states of differentiation, can each have unique sensitivities to chemotherapy, radiotherapy, and other treatments as well as unique associated risk factors, making clinical management difficult and often unpredictable.34  8  However, with advances in genomic wide scanning technologies like array based Comparative Gemonic Hybridization (CGH), important cancer genotypes are being identified at an incredible rate.  Thus, the development of an analysis approach, as  proposed in this study, which can identify such clinically biologically relevant clones within the high heterogeneity background of a tumour may play a very important role in clinical management of cancer on a more personalized basis.  2.2 Fluorescent in-situ Hybridization (FISH) Basics Fluorescent in situ hybridization (FISH) is used for many purposes including analysis of chromosomal damage, gene mapping, clinical diagnostics, molecular toxicology. FISH allows an investigator to identify the presence and location or absence of a region of cellular DNA within morphologically preserved chromosome preparations, fixed cells or tissue sections. Briefly, FISH staining is accomplished by exposing a DNA probe labeled with a reporter molecule to the native DNA of a sample under investigation. By varying the reaction conditions, the probe and native DNA can combine or hybridize in a stable and specific fashion. Finally, the hybridized DNA probe is detected, usually but not exclusively through the use of fluorescence microscopy.44 Please see figure 2-1 for a schematic representation of the FISH procedure.  2.2.1 DNA probes The production of high quality DNA probes is essential to good FISH results and can be achieved by labeling cloned or otherwise acquired DNA using various means, most often nick translation or polymerase chain reaction (PCR) using random primers. Labeling itself can come in two distinct varieties: direct and indirect. Direct probes are labeled using nucleotide bases conjugated directly with a fluorophore. In most cases these are fluorophore-dUTP (Uracil triphosphate) conjugates that can be directly incorporated into the DNA probe assuming the fluorophores are not too bulky.  9  FIGURE 2-1 – Schematic representation of important steps in the Fluorescent in-situ hybridization technique. Usually, FISH starts with an isolated DNA segment of interest called the probe DNA (A) which can be a single genetic loci, centromere, entire chromosome, etc. Next, the DNA is conjugated (directly or indirectly) with a fluorescent molecule (B). The probe DNA and target DNA (cell) can be denatured when subject to alkaline or low salt concentrations while in the presence of heat and/or formamide. By mixing denatured probe and target DNA and returning conditions to normal, the probe and target DNA sequences will hybridize if they contain enough common sequences. Lastly, the hybridized signal is observed (D) either directly though fluorescent imaging or by other detection means. Credit: Thomas Reid  Indirect labeling involves incorporating one part of a conjugate pair directly into the DNA, while the other part of the conjugate has the bulky reported molecule. In this fashion, incorporation is made more efficient and detection can be amplified by using conjugates with multiple fluorophores, etc.45  2.2.2 The hybridization reaction DNA in its relaxed state forms continuous double helices with minor imperfections. However, DNA may become denatured under alkaline or low salt concentrations while in the presence of heat and/or formamide. When conditions are  10  returned to normal, the denatured DNA will again form complementary base pairs and return to its original conformation.  When a denatured DNA probe and denatured  sample DNA are mixed and allowed to return to cellular conditions, they may form complementary bonds with each other rather then themselves. During hybridization, stable and transient interactions occur specifically: target-target, probe-probe and specific and unspecific target-probe hybrids. The first three are stable and can withstand stringent hybridization and wash conditions due to the high degree of base pairing. Thus, nonspecific interactions are usually eliminated during stringent wash steps while probeprobe or target-target duplexes are either not bound and easily removed during the wash steps or not labeled at all and thus not detected.46 The hybridization reaction usually takes place at 37°C in the presence of 2 X SSC (saline sodium citrate), formamide, blocking, carrier and probe DNA.  Of these  conditions, the two major factors in influencing the rate, extent and quality of hybridization are the formamide and hybridization temperature. Hybridization temperature generally depends on base composition, length of the DNA segment and salt concentration of the reaction.  Formamide is used during  hybridization because it lowers the hybridization temperature by destabilizing hydrogen bonding across base pairs.47 For most applications, overnight hybridization (12 hr) is the minimum time necessary for observing adequate hybridization signal, however this time can be dramatically decreased through clever techniques with pulse-microwave irradiation.48 Longer times (up to 3 days) may help to increase signal intensity but may result in increased noise.44 Finally, the last steps in any hybridization reaction are the post-hybridization washes. Usually two washes are applied with the first used to remove excess probe and hybridization buffer from the slide and the secondary, more stringent wash primarily removes non-specific and/or repetitive DNA hybridizations. The stringency of the wash is increased by decreasing the concentration of the wash salt solution (0.1-2xSSC), increasing the duration (2-10 min) and the temperature of the wash (25-75oC). That said, it is impossible to present a standard washing procedure and most parameters need to be worked out experimentally. Lastly, if secondary detection is needed as the case with indirect probe labeling, additional steps and washes will also be needed.47  11  2.2.3 Background reduction In addition to the presence of 2 X SSC, formamide and probe DNA, the hybridization reaction also includes blocking and carrier DNA. Blocking DNA is used to reduce nonspecific hybridization between probes to non-target DNA. Blocking DNA consists of repetitive sequences or whole-genomic DNA extracts that competitively bind to the repetitive elements of the target DNA, decreasing undesirable hybridization artifacts. Generally, blocking DNA concentrations are higher then the probe DNA but the labeled probe concentration for a given locus still greatly exceeds the blocking DNA. Consequently, the hybridization signal is not diminished significantly.47 Carrier DNA, DNA segments of distantly related species (e.g. herring or salmon sperm) is included to further reduce background hybridization signal by associating with non-biological sites (e.g. microscope slide) that may otherwise bind the DNA probe. Again, carrier DNA concentrations can be up to 30-fold the concentration of probe during hybridization.44 Probe DNA segment size can also have an effect on the extent of background hybridization signal.  The optimal probe size is between 200-800 bp and probes  exceeding 1 kb usually increase nonspecific background signal, thus decreasing signal to noise. Probe segment lengths less than 200 bp have insufficient hydrogen bonding and require very low stringency hybridization and washes for adequate signal detection.44 Probe concentration may also affect the hybridization signal quality. High probe concentrations will increase non-specific background hybridization; low probe concentrations will result in very low hybridization rates, both factors reducing the signal to noise ration of the sample. Usually 1.5 to 50 ng/mL of probe in the hybridization mixture is used during hybridization, with repetitive targets or whole chromosome painting experiments requiring lower probe concentrations while small target or simultaneous multiple probe hybridization usually exceed concentrations 40 ng/mL.44 Finally, the overall size of the probe will have a direct and logical impact on hybridization signal. This has the most important in FISH experiments involving spot counting or gene fusion detection.  In such cases, probe lengths of 300-600kb are  optimal for bright but still specific spot signals. This amounts to the pooling of two to three sequential BACs (see below) to investigate a single, specific genomic region.49 12  2.2.4 Counterstaining and antifade Lastly, most FISH experiments employ the use of a counterstain to detect the nuclei or chromosomes in which one expects FISH signals to appear. Two common counterstains are 4',6-diamidino-2- phenylindole (DAPI), which binds strongly to the minor groove of DNA sequences containing three to four adjacent AT base pairs and propidium iodide, which binds nonspecifically to DNA and RNA.  Lastly, most  fluorophores fade rapidly upon excitation light illumination. Antifade solutions are compounds that prolong the intensity of the fluorescence by scavenging reactive oxygen species, which are though to be responsible for photobleaching.50  2.2.5 Relevant limitations and drawbacks Specimen thickness has been a limiting factor in FISH experiments both in terms of automated or manual interpretation and imaging itself. Recent imaging methods such as confocal microscopy, optical projection tomography and two-photon microscopy have allowed FISH imaging in thick tissue sections.51 However, the ability to image and identify nuclei in thick tissue slices does not overcome the effect of truncation. In a normal, healthy, intact interphase cell probed for a single locus, two FISH signal spots would be observed, one for each homologous chromosome. The same cell in metaphase would exhibit four spots due to the duplication of DNA in the S-phase of the cell cycle. A healthy but sectioned or truncated cell as occurs in sectioned tissue samples, might demonstrate less then two FISH spots due to the chance that the second or both FISH spots were located in the missing section of the truncated nucleus. The amount of signal loss is obviously dependant on the size of the nuclei involved and the thickness of the sections, but on average one can expect spot signals between 1.4 and 1.7 in normal cells on average for sections ranging from 5 – 15 µm thick.52 Another limitation to FISH is the number of simultaneously imaged fluorophores. Since conventional fluorescent sperctra are often quite broad (figure 2-2), both in excitation and emission, only a discrete number of spectrally unique fluorophores-filter  13  pairs can be used for simple imaging. The upper limit for such simple fluorescent detection is about 7-8 different fluorophores. Numerous methods have been employed to increase this colour-space.  Some employ creative labeling schemes, such as  combinatorial or ratio labeling. Combinatorial labeling effectively uses a binary labeling system such that, for example, using only two probes three different regions can be identified, one with each probe and a third labeled with of both.  FIGURE 2-2 – The emission and excitation spectra of the TexasRed© fluorophore. The left curve is the excitation or absorption curve. Note the broadness of both curves. This is a limiting characteristic of simultaneous imaging of multiple fluorophores which is facilitated by carefully chosen filters. Credit: PerkinElmer  Ratio labeling is similar and can further increase the number of gene loci investigated by not only using the co-labeling of two or more probes but by identifying probes by their unique ratio of fluorophore intensities.  Other strategies involve image or signal  processing steps to de-mix heavily overlapping fluoresce spectra by techniques such as hyperspectral imaging or the use of quantum dots which fluoresce with less-broad spectra.53 Lastly, a simple approach to such a problem is to reuse the limited number of fluorophores in several rounds of repeated FISH (reFISH) staining on the same tissue slide. The only difficultly is in the subsequent registration of images and some potential degradation of tissues.54 Lastly, there has been considerable criticism concerning the reproducibility and irregularity of the signal and background autofluorescence. Not only have sample-to-  14  sample measurements been identified as highly variable, but material from the same slide and even the same cells have also been shown to be unevenly fluorescent.55 Likewise, autofluorescence is very tissue dependant and highly variable and can significantly impact interpretation of FISH experiments. Many schemes have been introduced to compensate for the autofluorescence inherent in certain tissues such as simple pretreatment by irradiation with light, but these are not always possible and are seldom completely effective.56  2.3 Bacterial Artificial Chromosomes (BACs) FISH probe production and more so, the CGH technique both rely heavily on the use of bacterial artificial chromosomes (BACs) to clone specific regions of the human genome, whether to use as template DNA for labeling or as target DNA for CGH. Briefly, to clone a gene or significant DNA segment, the DNA sequence is linked a kind of carrier (a vector), also made of DNA that can be taken into a host cell, usually Escherichia coli.  A vector contains other genes recognized by the host cell, most  importantly an origin of replication and a selective marker. The origin of replication is a DNA sequence that stimulates the host cell to make more copies of the vector and the attached clone DNA. The selective marker is often a gene that confers antibiotic resistance to the host cell in order to ensure only successfully transvected cells are present. This is achieved by adding antibiotics to the growth media to kill any host cells that do not possess the antibiotic resistant selective marker gene and thus the vector.57,58 A bacterial artificial chromosome is such a DNA construct specifically based on the naturally occurring fertility plasmid found in E. coli. BACs usually have an insert size between 70 to 150 kbp, which makes it is possible to study larger genes, several genes at once, or entire viral or pathogen genomes.57,58  2.4 DNA labeling reactions FISH probe synthesis is an integral part of custom-research based FISH experiments. The need for custom made FISH probes arises from the general lack of loci 15  specific FISH probes available commercially coupled with the great breadth of loci identified as biologically relevant through CGH profiling. Consequently, custom FISH probe synthesis will play an integral part in the detection of unique, biologically important tumour subpopulations. There are numerous techniques for synthesizing FISH probes, with nick translation and random-primed polymerase chain reaction being the most popular. Nick translation uses a restriction enzyme, deoxyribonuclease also known as DNase, to remove certain nucleotides from one strand in the template DNA sequence producing singlestranded ‘nicks’. Next, DNA polymerase I repairs the cleaved sequence using the four nucleotides provided to the reaction, one of which is a fluorescently-labeled analogue. If the DNase cleaves both strands in the same area, a complete break will occur in the double-stranded DNA polymerase and thusly the template DNA becomes randomly partitioned into segments as needed. Nick translation is an efficient labeling technique whose strength is its ability to label large quantities of DNA, such as BACs.59 A second very common DNA labeling technique is variant of PCR. It is almost identical to an ordinary PCR reaction except instead of using specific primers designed to amplify a certain region of template DNA, using degenerate or random primers (DOPPCR) will result in randomly amplified DNA segments representative of the entire template.60 The random amplified segments are labeled by simply substituting one of the four nucleotides with a labeled conjugate. The polymerase is thus forced to incorporate the labeled nucleotide into the amplified products. The advantage of this technique is that due to the amplification properties of PCR, significant quantities of probe may be derived from a limited amount of original template.  2.5 Comparative Genomic Hybridization (CGH) 2.5.1 The CGH technique Comparative genomic hybridization (CGH) is a variant of the previously described FISH technique. Essentially, as FISH can be used to investigate the genetic  16  content of a sample, CGH is used to examine the segmental DNA copy number differences between two samples. This is achieved by labeling a reference DNA and test DNA with two spectrally unique fluorophores.  The labeled DNA can be a simple  chromosome or even smaller region, but usually it is the entire extracted genome that is labeled. Next the two labeled probes are allowed to hybridize to the same normal metaphase spread, i.e. a cell that is arrested in its metaphase stage. Here, all 23 pairs of individual chromosomes are easily identifiable by common karyotyping techniques and accessible to both labeled probes. Thus, the two labeled probes will competitively bind to the available chromosomes and by simply determining the ratio of the fluorescence intensity for each chromosome, accounting for concentration and fluorophore intensity differences, one can determine if either the reference and test DNA sample genomes consist of any extra copies of any of the chromosomes. This process can be taken one step further and instead of examining entire chromosomes, analyzing different sections of each chromosome in the metaphase spread can determine gains and losses of chromosome-regions characteristic of the test sample compared to the reference sample.61 The resolution of CGH technologies made a drastic jump in both resolution and use, shortly after the entire human genome was sequenced and cloned into individual BACs covering its entirety. The resolution increase was achieved by employing an array of said BACs spotted into distinct elements on a slide (each one representing a 150kb section of the genome), instead of the metaphase chromosome. Upon competitive hybridization, post processing analysis can map each array element to its genomic location, i.e. the chromosomal location of the sequence of each specific BAC.62 Please see figure 2-3 for an illustration of array-CGH (aCGH). Thus, the resolution of the technique was now essentially limited to the size of each BAC element in the array instead of the chromosome as in the conventional CGH approach. Numerous versions and varieties of such array-CGH technologies exist; one employed in connection to this project is the Sub-Megabase Resolution Tiling set array or SMRT-array CGH.  17  FIGURE  2-3  –  Illustrative  representation of the array-CGH technique.  First,  test  and  reference DNA are differentially labeled.  They  are  then  hybridized to an arrayed target consisting of individual spots of sequenced covering  DNA, the  usually  entire  human  genome. The slides are imaged under  fluorescence  and  fluorophore ratios are calculated and mapped to their specific genomic locations.  From said  fluorescence ratios, DNA copy numbers are calculated. Credit: Mehrnoush Khojasteh  The SMRT-array CGH is composed about 32.5 thousand individual BACs covering the entire sequenced human genome and can achieve a detection resolution as small as 40–80 kb. This is realized by using overlapping tiled clones which can then be used to extrapolate a resolution beyond the size of any single BAC.63 As mentioned previously, genomic amplification and deletion events are prevalent in progressing tumours. Thus, use of CGH technologies to generate high-resolution amplification and deletion profiles can elucidate and characterize important tumour phenotypes relevant to clinical management. Consequently, aCGH technologies and platforms have great potential as a tool for clinical classification and may come to play a large and important role in personalized medicine.64  18  2.5.2 The SMRT-array spotting library As mentioned above, the SMRT-array is composed of 32.5 thousand individual BACs arrayed onto a slide and covers the entire sequenced human genome. These array slides are spotted in-house at the Cancer Genetics Department and thus, a complete library of these BACs is kept frozen and replenished by degenerate oligonucleotideprimed PCR amplification. Thus, this library represents an ideal, easily accessible and renewable starting material for high-throughput FISH probe synthesis.  2.6 Automated, Quantitative FISH Analysis The power and versatility of the FISH technique is rivaled by few and although it can be traced back to the 50’s, the popularity of the assay has increased dramatically since the 90’s. This resulted in the adapted routine use of FISH experiments in both research and clinical and diagnostic fields. However, manual interpretation of FISH experiments, particularly FISH spot enumeration, is a difficult task performed over a large number of nuclei and over different tissue samples as it is both very time consuming and highly user-dependent.51 Automation would not only overcome the aforementioned drawbacks, it would improve the throughput, speed and statistical output of FISH experiments.  Such  automation can be broken down into two steps: imaging and quantification. Imaging is simply the acquisition of raw data, most often images of hybridized FISH probes, while quantification involves various image analysis techniques that can be used to measure certain features of FISH experiments. Although important, imaging hardware is rather a choice of accessibility and convenience, and most automated quantification hinges heavily on the choice and approach of image analysis. Also, the choice of imaging and quantification approach will be determined by the application of the FISH technique to be automated. For the purposes of this study, automated analysis of simple spot counting FISH experiments will be examined. Image analysis specific but not limited to spot-counting FISH experiments can be broken down into two distinct goals:  nuclei segmentation and signal analysis. 19  Segmentation is basically the partitioning of an image or dataset into distinct groups; in the case of FISH, classification of image pixels as belonging to objects (nuclei or spot signal) and background (anything else).65  Segmentation is by no means limited to  microscopy image analysis and has been explored for various applications such as object identification from satellite imagery, face recognition and machine vision, etc.66 However, the importance image segmentation to automated cytological and histological images cannot be understated.  2.6.1 Relevant image analysis techniques There has truly been countless segmentation algorithms proposed for the purposes of medical imaging, particularly for cell nucleus recognition. Furthermore, segmentation can be achieved in two or three dimensions67 as commonly done in Magnetic Resonance68 or Computed Tomography66 datasets. Some common and widely used segmentation approaches are based on a variety of algorithms, prevalently simple thresholding, edge detection, region growing, watershed transformation and model based. Segmentation based on thresholding is the least complicated and probably most widely implemented.66 Thresholding, in its simplest variation, works by classifying pixels of an image into two groups, based on whether each is above or below a predefined intensity level (threshold).65 This is desirable because nuclei or objects of interest are usually stained differently from their background. The complexity lays in the methods of choosing the threshold level which can range from fully automated histogram based selection to a simple, user defined value.69 Finally, thresholding can also be performed globally or for every image individually and multiple thresholds have also been utilized.66 Edge detection segmentation methods are also very popular and comprise an entire field within image processing.70 Here, object boundaries and edges are defined not by their absolute intensity levels but by the presence of a sharp change in intensity values, as occurs at the edge of a stained cell. In other words, edge detection traces and connects maxima in the gradient image (the first derivative of the intensity image).71 The edges identified are most often disconnected and do not form closed objects and edge detection algorithms are often coupled with other processes that close these region boundaries. 20  These are often based on some predefined shape, distance, or curvature restrictions which are modeled after the known properties of the objects being segmented or through the use of specialized combinations of filters.72 The watershed transformation is also widely used.66 As was the case with edge detection based segmentation, watershed based segmentation is usually but not exclusively, performed on the gradient image.73 Watershed transform usually considers the image under analysis as that representing a topographic surface. Water is then allowed to flood the topography starting from local mimia. The result is the partitioning of the image into two different sets: the catchment basins (independent low lands of the topography) and the watershed lines (areas where two floods meet).74 If the image under analysis is a gradient image, the watershed lines theoretically correspond to the edges of the objects imaged. The watershed transform is also often used in conjunction with other transforms such as the distance transform to perform shape based segmentation directly on intensity images.75 Although each technique, and the countless ones omitted, independently achieves some form of image segmentation, their real strength lays in their partnership. Established segmentation algorithms commonly utilize different segmentation techniques together along with features and extracted constraints modeled after properties of the objects sought.  Consequently, there is a great variety of segmentation techniques  currently in use and this diversity will continually grow as advancements in computing do as well. For automated FISH spot quantification, once nuclei are accurately identified, FISH spot signals must be enumerated. This is usually achieved by segmenting the image again, this time within each nuclei-mask. Although the particulars and details different, most hybridization spot segmentation algorithms utilize the top-hat transform to adjust the spot signal image. The tophat transform76 is a powerful operator which permits the detection of contrasted objects on a non-uniform background. It can be applied to an image in order to suppress slow varying intensity trends and consequently, enhance the contrast of specific image features. Essentially, the top-hat transform is equivalent to subtracting the result of a morphological opening operation on the input image from the input image itself. In  21  other words, it preserves objects not eliminated by opening and removes objects larger then the structuring element.76  The choice of structuring element is important and  should fit the size and shape of the desired image feature to be preserved.76  In spot  counting applications, a circular disk of a certain size is usually used as the structuring element in order to contrast-enhance and preserve the round FISH spots.77  Since the  background variation of the FISH signal image usually does not consist of bright, circular spots, it is effectively removed from the image. In FISH spot identification, the top-hat transform is accompanied by a thresholding operation in order to create a binary image of extracted image features, although numerous other FISH spot counting approaches have been published.78  2.6.2 Previous work on FISH spot enumeration automation Several methods have been proposed for the automated evaluation of FISH signals. The first such attempt was by Netten et al.77 in 1997 and focused on quantification of FISH stained lymphocytes. The method consisted of manual region selection followed by automatically selected thresholding to segment cell nuclei. For each nucleus, hybridized spots were detected using the top hat and Laplacian filters and various spot features were used to ensure true signals. Solorzano et al.79 developed a method to study leukocytes in blood samples. Segmenting nuclei was achieved using the ISODATA thresholding algorithm (as utilized by Netten) followed by a watershed algorithm on the distance transform to isolate touching nuclei. FISH spot signals were detected using the top hat transform and a statistical error correction step was used to improve the detection performance. The next obstacle for FISH automation was to establish methods capable of tissue and 3-D quantification. Kozubek et al.80-82 developed systems capable of both two and three-dimensional FISH imaging of HL-60 cell suspensions. For the 2D analysis, segmentation was achieved using bimodal histogram thresholding followed by various adjustments based on morphological features of the thresholded objects. Spot detection was based on a watershed algorithm using sequential thresholding. 3D analysis was achieved by analyzing 2D slices of the 3D volume. Lerner et al.83-85 designed a FISH 22  image classification system based on image stacks taken at different focal planes to deal with tissue samples. A Bayesian classifier was used to analyze multispectral FISH images in the RGB and HIS color spaces for both nuclei segmentation and hybridization spot detection. Chawla et al.86 established an automated system for the detection of FISH signals three-dimensional, multi-spectral confocal image stacks of brain tissue sections. Nuclei and FISH spots were identified using a 3-D watershed algorithm based on a gradient-weighted distance transform and artifacts were eliminated through model-based region merging and clustering. Finally, work by Raimondo et al.29 presented work very similar to Lerner83-85 focusing not only on the detection of FISH dots but on overall case classification of breast tissue her2/neu diagnosis. As seen above, about 10 years of significant work has already gone into automation of FISH signal enumeration.  Although research into automated FISH  quantification will most likely continue, probably toward thick-tissue and 3-D topology,87 numerous commercial FISH enumeration workstations are now available, although their accuracy, degree of autonomy and reliability are still unclear. Most of these commercial FISH workstations are aimed at semi-automated FISH spot analysis and facilitating complicated muticolour karyotyping FISH experiments.  2.7 Mathematical Constructs Used to Define Neighbourhoods Aside from implementing automated FISH enumeration processes similar to the ones described above, the system proposed in this study is aimed at identifying regions of similar cells. This is achieved through the use of Delaunay triangulation and Voronoi tessellation in order to robustly define cell neighbours and cell-to-cell associations, thusly quantifying and investigating neighbourhood regions. Delaunay triangulation and Voronoi tessellation (figure 2-4) are two very common mathematical constructs widely used in characterizing the spatial distributions of various systems. Applications of these two concepts range across numerous fields of research including pure mathematics, chemistry, sociology and biology. In computational biology, they are often employed in the modeling of tissue growth, tumour clone development, tumour metastasis research, cell spatial arrangements and many more.88-91  23  A  B  C  FIGURE 2-4 – Example of (B) Delaunay triangulation and (C) Voronoi Tessellation on a sample data set (A). The circumcircle of a particular Delaunay triangle is shown. Delaunay triangulation is defined such that no other data point may fall within each circumcircle generated by a triangle.  Specifically, a Delaunay triangulation, DT(S), for a set of coplanar points S is a triangular connection of all points in S such that the circumcircle of every triangle does not contain any other data points. The Delaunay triangulation tends to avoid "sliver" triangles, which is why it is used to quantify neighbourhood relationships. The Voronoi tessellation on the same set results in the partitioning of the plane into individual regions,  24  V(p), such that all points in V(p) are closer to p than any other point in S. This can be though to as the area belonging to p and often used simulate the cytoplasm and cell boundary of a nucleus.  Lastly, Delaunay triangulation and Voronoi tessellation on the  same data set can be translated into one another geometrically.92,93 The advantage of these two constructs is their ability to define the neighbours and boundaries of the individuals of a data set. This can allow researchers to simulate and study biological process which evolve and change and have been demonstrated in both the two-dimensional and three-dimensional cases. Basically, cells can be considered neighbours if they share a common Voronoi polygon edge or if they comprise a vertex of a Delaunay triangle.91 Thusly, each cell’s surrounding neighbourhood can be defined and countless measurements and properties can be calculated. The use of Voronoi polygons can also provide estimates of cell boundaries.89 The applicability of Delaunay triangulation and Voronoi tessellation to computational biology is obvious. For example, simulations of tumour development by introducing a heritable marker to individual cells, Meineke et al.88 were able to visualize the development and resulting structure of a single clone development in a tumour. This is the exact rationale of using of both constructs for this project. However, instead of characterizing simulations, the aim here is to characterize clones using specific FISH patterns and eventually compare results with simulations.  25  Chapter 3 – Materials and Methods This section deals mostly with some details of commonly used protocols involved in the staining and production of FISH probes as well as details of microscope hardware and descriptions of software algorithms developed or used throughout. This section is presented in three general sections; FISH protocols, imaging hardware and analysis software.  3.1 FISH Protocols 3.1.1 Bacterial Artificial Chromosome (BAC) extraction The primary step in custom FISH probe synthesis is choice of template DNA. As mentioned in the upcoming results section (section 4.8.1), directly isolated BAC DNA is the most desirable starting material for probe synthesis. Protocol 1: Appendix A was used to pick and grow E. coli hosts and extract as well as verify their unique BAC DNA sequences. This protocol was fairly straightforward, but care should be taken to ensure proper clones are selected from the frozen plates. Likewise the use of chloramphenicol in the growth media is paramount to ensure only survival of hosts with properly transvected vectors. Care should also be taken in step 18 not to lose the concentrated DNA pellet as it is extremely small and any heating of it will dissolve it back into the solution. Lastly, BAC HINDIII fingerprinting (figure 3-1) should used to ensure correct BAC DNA is indeed extracted  3.1.2 FISH probe synthesis After the template DNA is purified it is ready for conversion into FISH probes. As again mentioned section 4.8.1, the preferred method of achieving this is through nick translation.  This nick translation protocol (Protocol 2: Appendix A) is a direct  26  reproduction of the Vysis nick translation protocol that is provided with the Nick Translation Kit form Vysis. Please see section 4.8.1, table 4-13 for a list of fluorophores that were successfully incorporated into FISH DNA probes.  Random Primer PCR  (Protocol 8: Appendix A) was also used to generate FISH probes from a library of previously extracted and amplified BAC DNA with marginal success.  The nick  translation protocol was fairly simple and care should be taken to keep solutions with fluorophores  away  from  light  and  the  only  important  step  involves  the  concentrations/time of the enzyme reaction to yield probes of appropriate length (~200bp)  FIGURE 3-1 – Example of HIND III digestion fingerprint database output (left) and the 0.7% agarose gel HINDIII digestion of actual BAC DNA with matched fingerprint data (right) to indicate validation of BAC the six clones chosen.  27  3.1.3 Optimized tissue FISH protocol The optimized protocol (Protocol 3: Appendix A) is the based largely on the original tissue FISH protocol used clinically at the BCCA Cytogenetics Lab for routine diagnostic FISH staining.  Some amendments to it have been added as a result of  experiments described in section 4.8.2. The protocol presented is the final version that resulted in adequate and reproducible FISH staining for a variety of tissues. Also, see Protocol 9: Appendix A for the staining protocol used for cell suspensions and metaphase spreads as commonly done during probe synthesis. As elaborated in 4.8.2, the proper digestion of tissue is paramount to good quality FISH staining. Other important steps in this protocol include the proper use of xylene which dissolves complex organics such as paraffin and must be removed before proceeding. The sodium isothiocyanate step is also very important as it removes the formalin fixation complexes, disrupts protein:DNA complexes and denatures cell wall proteins and crosslinks, allowing the protease digestion to work more effectively and generally improves DNA:DNA hybridization. However, NaSCN is highly toxic and proper personal protection should be used. Finally, the HCl step is also important as it solubilizes basic nuclear proteins and removes components of the extra-cellular matrix. This results in increased access to cells and reduced autofluorescence.  3.1.4 ReFISH protocol The ReFISH protocol utilized to increase the number of FISH probes used on a single tissue slice is described in Protocol 4: Appendix A. The first round of FISH staining should follow the Lung-Tissue FISH staining protocol (Protocol 3: Appendix A). Upon successful imaging, the steps in Protocol 4 were taken to remove previously hybridized probes and re-stain the tissue with new ones. For important observations regarding this protocol please see section 4.6.  28  3.2 Imaging and System Hardware The microscope used in this study was purchased as part of the commercial FISH quantification package from Metasystem. (MetaSystems GmbH, Altlussheim, Germany). Specifically, it is a Zeiss AxioImager Z1; a fully motorized upright microscope with motorized and computer controlled filter-cube and objective turrets, z-focus control and shutter control. The microscope uses an HBO100 illumination source for fluorescence and a HAL 100 for transmitted-light illumination. Fluorescence image acquisition is achieved with the use of a monochrome 10-bit JAI M4+ camera providing images 1280 x 1024 pixels in dimensions (JAI A/S, Copenhagen, Denmark) and 7 spectrally wellseparated filters (Chroma Technologies, Rockingham, VT) listed in table 3-1. The CCD sensor pixels measure 6.45 µm per side and can achieve images with a maximum of 1380 x 1030 pixel dimensions. TABLE 3-1 – Filter specifications used for multi-colour fluorescence imaging Chroma Product  Excitation  Beamsplitter  Number  (nm)  (nm)  DAPI  Sp100  D360/40  400DCLP  D470/40  2  Spectrum Aqua  31036v2  D436/20  455DCLP  D480/30  3  Spectrum Green  MF-101  HQ493/16  Q507LP  HQ527/30  4  Spectrum Orange  31003  D546/10  560DCLP  D580/30  5  Cy3.5  Sp-103  HQ581/10  Q593LP  HQ617/40  6  Spectrum FarRed  Sp-104v2  HQ630/20  Q649LP  HQ667/30  7  Cy5.5  Sp-105  HQ682/12  Q697LP  HQ721/42  #  Filter Name  1  Emission (nm)  Tissue scanning in a high-throughput manner is achieved with an 8 slide, highprecision (<1um repeatability and ± 4um accuracy) xy-scanning stage (Märzhäuser GmbH, Wetzlar-Steindorf, Germany).  Scanning control is provided by Metasystem’s  software, Metafer-4, however to expand the flexibility of the system, in-house C++ software has been written for hardware integration and control as well as automatic exposure and focus and will replace reliance on the Metafer-4 software in the near future. Colour imaging is achieved through combination of filters and the monochrome camera and images are usually acquired sequentially for each colour channel specified.  29  For example, the imaging of a sample stained with three fluorophores (red, blue and green) would consist of taking 3 images in succession: a grayscale an image through the red filter to visualize the red probe, followed by a grayscale image thought the blue filter and so forth. This is a common method of acquiring fluorescent images and colour images can be reproduced after image acquisition by simply merging different colour channels together. Although colour cameras have been employed in fluorescent imaging in the past, the necessity of using filters to separate fluorophore excitation and emission make grayscale cameras completely adequate. Moreover, imaging also usually involves taking a series of images at incremental focal depths (focal stacks). Often, this consists of 5-7 individual images covering a range of 4-6 microns in depth through the tissue, for each colour being imaged. Focal stacks are taken to preserve the 3D nature of the tissue, but currently, the analysis is performed on compressed maximum-contrast, enhanced depth-of-focus images. These enhanced depth-of-focus images are generated through an undisclosed algorithm (Metasystem), but likely consisting of highest intensity pixel across the focal stack for each pixel location. The compressed images are then exported from the Metasystem’s proprietary formatting into simple non-compressed TIFF files for each colour imaged. These exported TIFF images are then analyzed by the algorithms described forthwith. Tissue scanning is performed by translating the stage in a tiling fashion as to cover the entire tissue area with or without any field-of-view (FOV) overlap.  3.3 In-house Developed Software Upon evaluating the performance of the Metafer-4 software package against a human observer as done in section 4.1, it was determined that in-house software must be developed to achieve the desired goals outlined in the introduction. The following sections will examine the different in-house developed software processes sequentially as they are used in the process of quantifying the scanned tissues’ FISH spot signals. All of the software is written in Matlab with the exception of the nuclei segmentation algorithm, which is coded in Java and developed previously for other purposes. For clarification of following the process, see Appendix B for schematic representations of the in-house  30  developed software. Furthermore, see Appendix C for the utilized and default parameters of the Matlab functions and scripts described forthwith. Finally, all software developed and described forthwith, with the exception of the Obelics segmentation and the architectural analyses were implemented in a GUI (FISH.m) to provide the user with a convenient and simple method of processing FISH scans (see Appendix D for illustration and simple guide to the FISH.m GUI).  3.3.1 Image pre-processing To begin, images are exported from the proprietary Metasystem’s format into more useable TIFF files. The nomenclature of each image exported consists of the name of the scan, the FOV number, the colour channel which the grayscale image represents and  the  .TIF  file  extension;  all  separated  with  a  decimal  delimiter.  SlideName.0001.G.Tif is an example of such nomenclature. The channel colour is a single letter code as determined in table 3-2. The Metafer-4 scanning control software only supports a maximum of 6 channels imaged, thus the ‘Y’ code is reserved for either filter 6 or 7. The numerical order of the FOVs consists of a snake like pattern moving from the lowest right corner of the scan area, as depicted in figure 3-2. Both the nomenclature and scanning pattern were used to reconstruct the global scan from individual FOV images.  Finish  Start  FIGURE 3-2 – FOV nomenclature schematic depicting the snake like scanning of tissue areas and the associated FOV naming which is used extensively to determine the location of each FOV within the scan.  31  TABLE 3-2 – Look up table of nomenclature used to identify colour filters used during acquisition Colour  Standard Single  Channel  Letter Code  1  Blue  B  2  Aqua  C  3  Green  G  4  Orange  R  5  Red  M  6  FarRed  Y  7  InfraRed  Y  Filter #  Once images are exported, each image is corrected for the presence of ‘hot’ or noisy pixels present in the camera CCD.  The location of the 282 hot-pixels was  determined by exposing the CCD for its maximum 15 seconds in complete darkness. Figure 3-3 gives an example of a hot-pixel and it’s correction which is achieved by simply averaging out the pixel directly above and below the hot-pixel. This is done as part of the file_sort.m function which once hot pixels of each picture are averaged out, sorts all images into a specific file structure to aid in file location and organization, as depicted in figure 3-4. The segmentation algorithm described in the upcoming section 3.3.2, was developed for segmentation of absorbance based images taken at 20x magnification and due to the inability to change its source code, exported counterstain images must be adjusted to its match input criteria. This includes resizing the images by ½ to mimic images taken at 20x instead of 40x, as well as removing noise by applying a median filter and Gaussian filter and finally inverting the counterstain images to resemble absorbance stained images, which consist of dark objects on a lighter background. Furthermore, these inverted counterstain images are contrast adjusted by dropping off 1% of the darkest and lightest pixels in the image and linearly remapping the resulting pixel values to the entire 256 grayscale levels. This is performed because the contrast of absorbance stained nuclei is much higher then that of fluorescence counterstained nuclei. Prior to this step, however, the user can specific to correct all images for the non-uniform illumination. This is should only be done when a scan consists a large number of images  32  (>400) since the illumination correction simply averages all the counterstain images together, fits a plane to the resulting average image and then subtracts this plane from all the images. If there are insufficient images to comprise a homogenous average-image, the fitted plane will be incorrect and all images will be erroneously adjusted. These adjusted and corrected images are now ready for processing thorough the ‘enhanced edge-detection’ segmentation algorithm. However, this segmentation process often omits objects touching image peripheries.  Thus, once all the found objects are  examined, large areas of missing information would occupy the stitch-areas where two FOVs come together. To overcome this drawback, counterstain FOV images are merged together two columns or two rows of the scan at a time and images 120 pixels wide (or about 10% of the each image) are subsampled out along the common boundary. Please see figure 3-7 for an image of the subsampled area. This is done in the horizontal and vertical directions. These are then saved into a separate ‘Cutters’ directory under the main slide folder.  Effectively, this creates a complete dataset consisting only of  subsampled images which can be segmented and spot counted independently.  FIGURE 3-3 – An example of a hot/noisy pixel and correction. Obvious is the anisotropic nature of the 2 pixels involved along the line or read-out. Correction is achieved through averaging above and below pixels.  33  Slide Folder  Images Folder  Individual FOV Folders  Original images of each FOV including counterstain  Counterstain Folder  Copies of counterstain images adjusted and inverted Denoted with I_S_ prefix  FIGURE 3-4 – Resultant directory structure after file_sort.m This file structure is used extensively to locate image files as needed.  3.3.2 The ‘Enhanced-Edge Detection’ segmentation algorithm Object segmentation has been a fundamental issue with automated histology for over 30 years.  This is a rate-limiting problem which has had countless proposed  solutions and is an area of constantly evolving research, fueled by the increasing capabilities of computer processing. The segmentation algorithm used for this workstation, is a variant of the ‘Enhanced-edge finding’ approach,94 implemented at the British Columbia Research Centre (BCCRC) into a larger java platform software package95 which is currently used in numerous absorbance-based cytology and histology studies. The algorithm makes use of intensity information, edge magnitude information and both object and edge connectivity information. It is used in conjunction with simple manual and automatic threshold selection algorithms and artifact removal routines. This automated procedure generates a closed contour precisely along the edge of the nucleus; successfully splitting larger clusters of cells into individual cells as often occurs in tissue sections.  34  The result of the segmentation process is a series of smaller binary masks of every object found in the image along with its address location within the original image. The binary object masks along with the original image are saved as a string of images in a binary .img file. Please see figure 3-5 for an example of the .img output. Some 125 features also are calculated automatically for each object and these are saved as a binary feature file with the extension .fb5.  These formats, although cumbersome for our  immediate purposes, can be analyzed through numerous software modules currently in use for analysis of cytology specimens and thus effort was made to continue their use.  Header  c#00000000000 0 100.000  Image  A000077000003  24 28  {24*28 = 672 bytes of 8-bit representing actual image within mask}  Header  c#00000000000 0 100.000  Mask  A000077000003  24 28  {24*28 = 672 bytes of 8-bit image of  …  …  actual mask}  FIGURE 3-5 – The .img file structure showing the 68 byte long header and the resulting image and mask of an identified object.  The bytes 41- 46 and 47-52 are x-coordinates and y-coordinates and the two  subsequent fields are the x and y dimensions. These are used to read-out the appropriate number of bytes to for the mask.  35  Although the segmentation results were evaluated as valid, as seen in 4.2, some issues do arise in high-nuclei density areas as well as artifacts from the pre-processing steps described above. Thus, currently work is being done to modify the segmentation algorithm to process raw fluorescence images without requiring any adjustments and improving the segmentation results in higher nuclei-density regions. The ‘enhanced-edge finding’ algorithm is embedded within a java program named Obelics, which accesses a project database that stores the user-adjustable parameters that define the segmentation algorithm. Of these, the most significant and most adjusted parameter was the threshold level.  The threshold level ranged between 165 – 210  depending on the images processed. Unfortunately, the threshold level had to be set globally for the entire batch of images processed and although automatic thresholdselection algorithms are available, they were found not to provide reproducible and consistent results.  Future work will focus on implementing a reproducible local  thresholding selection technique to ensure all images are optimally segmented. Lastly, the nomenclature of the images exported from Metafer-4 cannot be fed directly into Obelics directly due to the repeated use of the decimal within the exported filenames. Likewise, Obleics cannot handle an unlimited number of images per run and as a result, images waiting for processing need to be separated into folders consisting of up to 70 images (512x640) with the file names are adjusted by replacing all but the last decimal with an equal sign. This is done automatically with the functions Obelics_in.m and Obelics_out.m.  3.3.3 The spot counting algorithm As just mentioned, the effective end result of image segmentation identifies which pixels of the image belong to which individual nucleus. With this information, we can examine if any FISH spots in the ‘signal’ images fall within a segmented nucleus. The spot counting process selects the .img file along with the matched signalimage for the selected FOV being processed. 29  77  established by Raimondo , Netten  Next, following a similar process  and others, a top-hat filter is applied to the signal  channel image in order to suppress slowly varying intensity changes thus enhancing local 36  contrast of structures in the image that match the structuring element used in the filter: a disk with a user defined radius (usually 3-7 pixels). See figure 3-6 for an example of tophat filtering. Thresholding of this transformed image creates a mask that is then used to process the raw signal image by a local maxima search.96 This identifies maxima that fit a Gaussian shape with a specific width and above a certain threshold in intensity. In this way, pixels of each FISH spot are segmented within the signal image. The maxima search has the benefit of resolving fairly close FISH spot signals, each having its own local maxima.  A  C  B  D  FIGURE 3-6 – Example of top-hat filtering. Profiles through 3 prominent spots are displayed before A and after B tophat filtering with a 3 pixel radius disk. This preserves spots and removes the underlying, slow variation background. Figures C and D show a real image exemplifying the reduction of background.  Next, the mask for each object segmented by the Obelics batch segmentation process is extracted from the single .img file for the FOV being analyzed. This is achieved with aid of a header which precedes each object in the file which effectively states the size of the mask image and its pixel address within the 512x640 counterstain 37  image that was processed. See figure 3-5 for an example and description and illustration of the .img file. By knowing the size of the mask, the appropriate amount of bytes following the header are read and converted into a binary mask. The mask is then resized by a factor of 2 to match the 1024x1280 dimensions of the signal channel images, dilated by 2 pixels to account for spots which fall on top of the mask boundary and again using the x and y coordinate information in the header, the mask is registered atop the signal channel image. Next, if the majority of segmented FISH spot’s pixels fall within an object, the spot is counted. A list is generated tracking the number of FISH spots of each object found. Upon completion of the spot counting process for all masks in the .img file, the number of FISH spots in each object is saved as its 126th feature within the .fb5 file. Ultimately, spot counting is performed for every ‘signal’ image acquired and each time a new feature is added to the .fb5 feature list for each object. This is then done for each FOV acquired in the scan. This process is not only applied to every FOV but also independently to the subsampled images in order to compensate for the edge-effects of each FOV. The end result of the spot counting procedure is a set of .fb5 files with additional features added. Since the aim of this project is to examine the spatial distribution and cell-to-cell interactions over the entire tissue scanned, the individual .fb5 datasets need must be fused together into one large dataset. This was achieved simply by merging each .fb5 file into one large one, and offsetting the x and y coordinates (which are two of the 125+ features) by the appropriate location of the FOV within the global scan. This produced a global .fb5 which comprised all the segmented cells and their individual features found in the scan. Furthermore, the .fb5 files of the subsampled images were also offset by their appropriate locations within the global tissue scan and subsequently compared to the global dataset. The aim of this comparison was to find and replace any objects only partly segmented because of their location at the edge of a FOV. These partly segmented objects are replaced with the complete objects from the corresponding subsampled images if they are within +/- 15 pixels of each other and within the central 60 pixel region. Completely new objects identified inside this 60 pixel region can only be added to  38  the global .fb5 list if they are sufficiently far away form preexisting objects. Please see figure 3-7 for an example to clarify this process. This methodology essentially replaces cells right on the boundary between FOVs or adds objects missed previously. Upon completion of this merging and stitching of data, the final .fb5 file is a compete representation of the tissue scanned and can be analyzed by the Voronoi and Delaunay scores. Region of sub-sample image  60 pxl wide region in which cells can be added or replaced if close enough to previously segmented object (<15pxls)  FIGURE 3-7 – Example of stitching together datasets from the original and subsampled image segmentation. The green line represents the boundary between adjacent FOVs. The outer box delineates the subsampled area and the inner green box depicts the region in which cells can be replaced if they are within 15pxls of previously detected cells (green arrow) or added if they are completely new objects far away from other objects (blue arrow).  3.3.4 The architecture analysis metrics It is convenient at this time to mention that for further architectural analysis, cells enumerated up to this point must be converted into two separate groups: positive (cells that fit a certain criteria) and negative (cells which do not). In most cases, the criteria of 39  interest would be the presence of a certain loci-specific FISH trend often related to a biologically relevant CGH profile under investigation. However, the specific meaning of a positive cell will obviously change between experiments and can be defined appropriately in order to elucidate some specific information from the tissue scanned. For example, when examining for clonal subpopulations defined as amplified with the LAVysion probe set, a positive cell would be defined as one with more then 2 FISH spots in at least 1 of the 4 colour channels imaged. This criterion could be adjusted to examine for high level amplifications by simply defining positive cells as ones with more then 5 FISH spots for at least one or more channels stained. Further refinements would be made as the investigator sees fit. Moving on, after the entire scan area is processed, the spatial distribution of nuclei with certain FISH spot signal characteristics of interest are analyzed. Specifically, two separate spatial distribution quantities were proposed: one to measure the percentage of positive cells present within a cell’s immediate and extended neighbourhood and secondly to measure the extent of connectedness of the positive cells. Essentially, the first would be a measure of positive cell homogeneity measured in terms of neighbours opposed to area or distance and the second would be a way to identify large regions of interconnected positive cells. To achieve this, two related mathematical constructs are employed: the Voronoi tessellation and Delaunay triangulation. Employing both of these constructs provides a robust and reliable way of defining and identifying the cell-to-cell connectivity and thus, each cell’s immediate and extended neighbourhoods. For a brief description of the Delaunay and Voronoi metrics and their use in spatial analysis, please see section 2.7. 3.3.4.1  Neighbourhood Homogeneity (NH) score Upon applying the Voronoi tessellation algorithm to a set of discrete data points,  in this case representing the spatial coordinates of all the cells within a scan, the area constrained by the data set is partitioned into individual polygons such as no area is left out as seen in figure 2-4c. In such a tessellation, adjacent data points are defined as such when they share a common side of their Voronoi polygons.91  40  Thus, for each Voronoi polygon it is possible to define its immediate neighbours; all other data points sharing one side of its Voronoi polygon, assuming they are not too far away (limit on distance between cell coordinates). Furthermore, it is possible to identify each neighbour’s direct neighbours, and so forth.  By such a process one can  begin to identify not only direct neighbours of each data point within a data set but also more distant, extended neighbourhoods. This is exactly the approach used when looking to define a particular cell’s neighbours and then calculate the percentage of positive neighbours. In more detail, the Matlab voronoin.m function applied to a set of data points generates two lists. One is the complete list of vertices of all the Voronoi polygons found and the other is an array of indices for each data point which points to the specific vertices of that data point’s Voronoi polygon. Essentially, if two data points share two vertices, they must share a polygonal side and thus must be neighbours. In this way, a list of primary neighbours is generated for each cell.  A  B 7  6 2 10  2  10/39  4  2  FIGURE 3-8 – Examples of Neighbourhood Homogeneity and Clone Connectivity Scores where green and red data markers indicate the centre of positive and negative nuclei, respectively. (A) Voronoi tessellation of a single FOV with darkest blue indicating cell being scored, while 3 neighbourhoods ranging from primary through tertiary fade from darkest to lightest. The score for this cell is indicated. (B) Delaunay triangulation of same data with highlighted connections between directly connected nuclei and scores of greater then 1 indicated.  Next, the second-layer neighbours are generated for each cell by simply looking at the neighbours of each cell’s primary neighbours. Thus, all the 2nd and likewise 3rd  41  neighbourhood layer lists are generated. See figure 3-8a for a pictorial example of a cell with its three Voronoi layers. Lastly, a Neighbourhood Homogeneity Score (NH) is produced and is simply the total number of positive cells within a cell’s 3-layer neighbourhood (including itself) divided by the number of cells present within said 3layer neighbourhood. This represents the homogeneity or positive cell density of that cell’s robustly defined region.  Obviously, the NH score can range from 0 to 1,  indicating none or all cells within the defined neighbourhood of a cell are positive. Although this approach worked very well, the voronoin.m function and the processing of vertices and identifying neighbours turned out to be very computationally consuming. Considering the considerable size of data sets often seen in these tissue scans (often upwards of hundreds of thousands of individual cells), this is a critical rate-limiting process. To remedy this issue, the code could be translated into C+, Java or another coding language which would speed up the process as Matlab is one of the most inefficient programming language in terms of time and computational resources, especially when performing numerous loops. However, because the shape and size of the Voronoi polygons generated through the aforementioned process are not used explicitly, there is no authentic need for the use of Voronoi polygons. Thus, a more time and resource efficient method was to use the related Delaunay triangulation to determine, in an identical way, the different neighbourhoods of each cell and determine its positive cell density score.  The major difference between the two approaches is that the Delaunay  triangulation simply provides each cell’s direct connections to its closest neighbours; not vertices and index lists that form polygons. This reduced amount of data to process considerably, reducing the computational time and resources while still delivering identical results. As an example, for a small dataset of 500 points, the Voronoi-based NH analysis took 23.4 seconds while the Delaunay-based analysis (arch_pseudovoronoi.m) took only 0.5106 seconds: an almost 46 fold improvement. Furthermore, this improvement actually increases by about 10 fold for every 100 additional points added. So for a typical scan dataset of about 10 000 cells the difference between the two approaches would be a factor of 1000. Consequently, the pure-Voronoi approach was abandoned and if the shapes and areas of the actual Voronoi polygons are ever needed for applications such as estimating  42  cytoplasmic markers, the pure Voronoi approach will have to be recoded in a more efficient way and in a more efficient language. See appendix B for arch_voronoi.m and arch_pseudovoronoi.m. 3.3.4.2  Clone Connectivity (CC) score As just eluded to in the previous section as well as in more detail in introduction  section 2.7, the application of Delaunay triangulation to a set of coplanar data points results in a unique triangulation connecting all the points in a fashion as to avoid small angle triangles. The triangulation is unique to the data set except in trivial cases such as a set of linear point or four points directly on the vertices of a rectangle.92 See figure 2-4b for an example of Delaunay triangulation. The results of the triangulation are then used to determine the extent of direct connectedness of positive cells, which theoretically comprise a clone of similar cells. Just as with the previous Voronoi analysis, there is a built-in Matlab function for Delaunay triangulation, delaunay.m. The result of this is a three column matrix, whose rows provide the indices into the coordinate dataset comprising each triangle of the triangulation. Simply, it denotes the three individuals of each and every triangle found. In order to determine the connectedness or connectivity of cells, the first task is to produce two lists of direct neighbours: one listing all direct neighbours of each cell and the second, listing only those which are positive according to some criteria (a specific trend of FISH spot signals). Again, direct neighbours are restricted to be closer then a predetermined distance or else they cannot be considered such. Moreover, only positive cells have their positive neighbours tabulated as non-positive neighbours connected to positive ones do not provide us with any information. Next, much like defining 2nd and 3rd neighbours in the Voronoi analysis above, each cell’s positive neighbours have their neighbours identified and tabulated. This process is repeated indefinitely until no new cells are added to the list, meaning that none of the latest neighbours have any new positive-cell neighbours to add. This indicates that there is no more direct connectivity to any new positive neighbours and technically the entire clone is identified. The Clone Connectivity score is then applied to each and every cell connected  43  into a clone defined as above: the value of which is defined by the number of members comprising that clone. For example figure 3-8b shows one outlined clone in the upperright corner of the dataset that has 7 interconnected positive (green) cells. Each one of these cells would be given a score of 7, based on the number of individuals comprising that clone. Much as with the authentic ‘Voronoi’ NHS described in the previous section, the unlimited CC score is massively resource consuming and some shortcuts were employed. Preferably, this connectivity algorithm will be rewritten in C+ or another more efficient and optimized language. However, for the purposes served for this proof-of-concept work the algorithm was simply adjusted to iterate up to a maximum of 20 times. In other words, instead of connecting every neighbour of the clone by looking for all neighbours of neighbours of neighbours indefinitely, the process was truncated at 20 iterations. The effect of this truncation was a reduction of the overall connectivity score and the reduction of score at peripheries and thinner regions of very large clones as seen in figure 3-9. No effect of this truncation was seen on smaller clones since all neighbours are found within the first 20 iterations. Likewise, the qualitative visualization of the clonal regions is comparable between the two versions as displayed in figure 3-9. In this example (figure 3-9) of only 5000 data points, which about 40% were positive, the truncated algorithm took 10.6 seconds while the unlimited version took upwards of 2 minutes. This difference is highly dependant on the size of the dataset used and content of positive cells. When the dataset size was increased to 100 000 cells with the same percentage of positive members, the truncated algorithm finished in about 5 minutes while the unlimited version actually crashed due to a lack of memory and required a reboot. Thus, despite the use of the truncated version of the connectivity algorithm, all results are deemed valid on semi-quantitative basis and clones are visually identifiable to the same extend. However, for any analytical work the unlimited version of the Clone Connectivity score should be used.  The algorithm can be found under  arch_delaunay_20.m or arch_delaunay_unlimited.m Matlab files found in Appendix B.  44  A  B  FIGURE 3-9 – Clone Connectivity score maps for 5000 randomly generated points of which about 40% are positive. The map in A) was generated with the truncated version of the connectivity score algorithm and took only about 11 seconds to process while B) was generated with he authentic, unlimited CC algorithm and took over 2 minutes to process. Clearly seen is the ability to visually distinguish the same clones despite differing absolute scores. Also evident is the reduction of score at or thin sections of clones.  3.3.4.3  Heat Map Visualization For visualization proposes, both scores generated above are converted to a 2-D  heat map through the use of two native Matlab functions: meshgrid.m and griddata.m. This essentially generates an interpolated, continuous surface or heat map covering the entire area comprised by the data of the scan with a resolution of 1000 x 1000 pixels or as defined by the user. Higher resolutions were found to be too resource intensive.  45  Chapter 4 – Results The following chapter is split into eight sections describing the experiments and observations regarding validation and performance characteristics of the in-house automated tissue FISH scanning system along with its unique analysis methodologies as well as more general FISH staining procedures.  4.1 Metasystem’s Metafer-4 Evaluation As stated, the underling task of this thesis, as described in section 1.6 is to develop a system and methodology to investigate the presence of genetically related tumour or normal cell subpopulations within human tissues, specifically through the quantification and spatial analysis of FISH spot signals within cell nuclei. Originally, a commercially available FISH cytology/histology analysis system was purchased to accomplish this task. The purchased product, Metasystem’s Mefarfer-4, is a state-of-theart complete fluorescence scanning workstation as described in section 3.2. Furthermore, through its Mefafer-4 software package, it was described as able to analyze, enumerate and otherwise quantify tissue FISH spot signals. Although the Metafer-4 control of the Zeiss scanning microscope was sufficient for the purposes of our goal, the quantization of the FISH spot signals within each nucleus was insufficient, particularly due to its fairly clumsy and oversimplified segmentation algorithm. More specifically, the Metafer-4 software accomplishes object segmentation by placing a small square atop the DAPI image such as to maximize the amount of DAPI within that square. Then, if more then a certain percentage (50% by default) of the square’s area is covered by DAPI, the square is defined to be an object. As one can easily imagine, this approach is very inaccurate and only appropriate in situations where cells are well spaced apart, a reference FISH probe is required along with the probe of interest and results only interpreted as an cellaverage over very large areas and hence large number of cells.  46  Normalized Spot Count Comparison between Manual and Metafer-4 (manual n=448, Metafer-4 n=58) 80.0 70.0  % of cells  60.0 50.0 40.0 30.0 20.0 10.0 0.0 0  1  2  3  4  5  6  # of Spots / cell Metafer-4 classification  Manual Classification  Normalized Spot Count Comparison between Manual and Inhouse Spotcounting and Segmentation (manual n=448, in-house n=439) 80 70  % of cells  60 50 40 30 20 10 0 0  1  2  3  4  5  6  # of Spots / cell In-house classification  Manual Classification  FIGURE 4-1 – Comparison of human classification of cells from 4 FOVs from a xenograft human squamous cell carcinoma tumour against A) metafer-4 automated classification and B) in-house classification. The in-house segmentation/spot counting process compares much more closely to the human observations than does the commercial Metafer-4 system classification.  Specifically, when the results of the Metafer-4 scan were compared to a human observer, as seen in figure 4-1a, some major errors were noted. Firstly, of the approximately 450 spots classified by a human observer, Metafer-4 only recognized about 60. This indicates that numerous cells are omitted or clumped together into single objects. Second, there are large differences in the measured spot count distributions of the few images analyzed by both human and Metafer-4.  Both of these observations 47  elucidate that this platform would not be adequate for our purposes, since to enable the investigation of the spatial relationships of cells exhibiting specific FISH signal signatures, the ability to find all cells and accurately quantify their FISH spot signal content is paramount. This revelation required us to develop the unique in-house segmentation and spot counting algorithms described in section 3.3, to enable what the Metafer-4 was intended to accomplish.  The following sections of Chapter 4 describe the performance and  characteristics of these in-house algorithms as well as proof-of-concept experiments to test our hypotheses presented in section 1.6.1. For comparison however, the finalized segmentation and spot counting processes were also applied to the same data set that was used to evaluate the Metafer-4 above. The results, in figure 4-1b, are much closer to a human interpreter in both the amount of cells identified and their spot count distributions, indicating that the in-house approach is much more suitable for the intended analysis than the currently available state of the art commercial Metafer-4 software.  48  4.2 Segmentation Evaluation Experiment Although the above evaluation of the custom developed segmentation and spotenumeration algorithms is encouraging, further experiments were undertaken to characterize the performance of both the nuclei-segmentation and spot-enumeration processes independently. The segmentation algorithm (described fully in section 3.3.2) used for the identification of DAPI-stained nuclei in tissue sections is based on a previously developed ‘enhanced-edge detection algorithm’.94 This process was originally developed for absorption microscopy of Tthionin-SO2 and orange II-stained cervical cells obtained from normal and dysplastic samples. Of the 3680 images analyzed, this edge relocation algorithm resulted in the correct segmentation of over 98% of the nuclei with only 1.7% all nuclei being incorrectly segmented.94 To evaluate the enhanced-edge detection segmentation algorithm’s ability to identify DAPI counterstained nuclei in fluorescence images, a human observer classified the results of the algorithm on over 3500 nuclei taken from four different samples but all selected for low nuclei densities and proper staining conditions.  The four samples  consisted of two different slides from xenograft human squamous cell lung carcinoma tumours and two from excised human squamous cell carcinoma samples. See section 4.4.1 and 4.5.2.1 for more information on the xenograft or human resected tumour samples respectively. The human observer was asked to classify the results of over 3500 identified objects as either correctly segmented, under-segmented or over-segmented nuclei. Correctly segmented nuclei were deemed to fit qualitatively well within cell boundaries the human observer interpreted. Over-segmented objects were nuclei that were erroneously cut into two or more smaller objects while under-segmented objects were effectively objects that consisted of two or more nuclei or pieces of nuclei. Please see figure 4-2 for examples of under and over segmented nuclei. Segmentation results on the fluorescence based images were found to be similar in accuracy to segmentation of absorbance based images, provided that samples being analyzed had low to moderate nuclei densities and proper staining (see figure 4-3a for examples) and adequate segmentation parameters were used.  49  A  B  C  D  FIGURE 4-2 – Sample grayscale images of DAPI stained nuclei with automatically generated segmentation boundaries drawn in white. Human observer classification of these nuclei was either as over-segmentation (A, B) and under-segmentation (C, D). This approach was used to asses the accuracy of identifying nuclei within DAPI stained sections.  A  B  FIGURE 4-3 – Sample grayscale images of DAPI stained nuclei with automatically generated segmentation boundaries drawn in white. These images show examples of segmentation results in (A) good staining conditions and moderate nuclei density and (B) poor staining conditions and high nuclei density conditions. Note the obvious and drastic decrease in segmentation accuracy in B.  50  As described in section 3.3.2, the only adjusted parameter for segmentation was the userselected threshold level, which ranged from 160-210 grey levels depending on the intensity of the stain and all other settings were left at their default levels. Specifically, the enhanced-edge detection algorithm was evaluated to segment approximately 88% of nuclei accurately, with under-segmentation and over-segmentation rates of about 8% and 5% respectively, see table 4-1 for a synopsis of raw evaluation results.  TABLE 4-1 – Human classification of segmentation results across the 4 samples examined. For examples of over and under segmentation please see figure 4-2. All samples were chosen for low nuclei density and good staining conditions as seen in figure 4-3a. Average correct classification of ~88% was observed. The integer numbers represent the number of nuclei. Raw Results of Segmentation  As Percentage  Sample  Total  Correct  Under  Over  Correct  Under  Over  Xenograft 1  525  439  32  54  83.6%  6.1%  10.3%  Xenograft 2  709  643  28  38  90.7%  3.9%  5.4%  1259  1106  129  24  87.8%  10.2%  1.9%  1055  924  102  29  87.6%  9.7%  2.7%  2493  2188  189  116  87.8%  7.6%  4.7%  Patient sample -a Patient sample-b Total  However, in areas of high nuclei-densities (figure 4-3b) or in poor staining/imaging conditions this accuracy begins to suffer greatly.  In such cases  however, even a human interpreter has difficulty distinguishing individual nuclei and thus the algorithm is deemed still valid. That said, future work needs to be done to improve the routine and add adjustments in order to increase the accuracy in these high-nuclei density or poor staining situations.  51  4.3 Spot Counting Evaluation Experiment Likewise, to evaluate the accuracy of the spot counting process itself, a human observer first created a validation data set consisting of 573 segmentation objects for which FISH signal spots were manually enumerated. This classification was performed on 2 different tissue samples (one xenograft human squamous cell lung carcinoma tumour sample and one excised human squamous cell carcinoma patient sample) and for 3 FISH-probe colours stained with Spectrum-Orange (red), Spectrum-Green (green) and Texas-Red (magenta). See figure 4-4 for an example of the spot counting results. One important aspect of this evaluation was that the human observer was requested to enumerate FISH spots within segmented objects, regardless of segmentation accuracy. In this manner, segmentation inaccuracies do not confound our spot counting evaluation. As seen in figure 4-5 and table 4-2, the spot counting algorithm (described in detail in section 3.3.3) shows a very high agreement with manual counting as indicated by a slope close to unity and a high R2 value, refer to figure 4-5 for linear regression line and equation. The spot counting evaluation was essentially identical for each colour channel (FISH probe colour) used and similar across both samples examined as long as staining quality was comparable.  FIGURE 4-4 – Typical spot counting results. Upper panels: merged original, unaltered images showing segmentation result. Bottom panels: Results of spot counting including position of accepted spots and the number of spots for each signal channel. For example, the third cell has one deletion of the green channel and 1 amplification in the red channel while the magenta channel has the normal number of hybridization spots  52  y = 0.9682x + 0.0222 R2 = 0.9487  FIGURE 4-5 – Spot counting evaluation by comparison of automated spot counting with manual spot counting based on about 600 segmentation results. The size of each data marker represents the frequency of data at that discrete data point, omitting (0,0). For actual numbers at each discreet spot count, please see table 3-2. The linear regression fit is indicated with the equation and R2-value.  TABLE 4-2 – Raw results of human enumeration of spot counts for each nucleus as compared with the automated spot counting algorithm enumeration. The matrix corresponds to the number of cell nuclei with the denoted discrete FISH signal results for the observer vs the automated algorithm. Numbers along the diagonal of the matrix represent nuclei for which both methods agree, while numbers above and below the diagonal indicate either a loss or gain of signal as compared to human observer. Most nuclei were corrects classified, with any misclassification resulting in only a one FISH spot number error.  Automated Spot Counting  Human Observer Spot Counting 0  1  2  3  4  5  6  0  280  1  1  0  0  0  0  1  3  123  8  1  0  0  0  2  0  2  100  2  0  0  0  3  0  0  2  37  0  0  0  4  0  0  0  1  10  0  0  5  0  0  0  0  0  1  0  6  0  0  0  0  0  0  1  Along Diagonal  96.3%  53  The algorithm agreed with a human observer in 96% of spots evaluated with less then 1% off by more then a single spot count. The slope of the linear regression being slightly under unity suggests that the spot counting algorithm has a small tendency to miss FISH spots rather then find extra ones when compared against a human observer. This trend would be reversed or augmented if the spot counting parameters of the algorithm (described in below and in more detail in Appendix C) were further adjusted. The spot counting algorithm was also noted to be quite robust and the agreement between a human and automated observer was noted to be good even in areas with relatively high background staining, as long as the spot signal is higher in intensity than the background staining.  However, when the signal and background intensities were  comparable, background correction often removed the spot signal or interpreted variations in background as a signal.  Consequently, clean staining with limited  background was found to be paramount in order to obtain accurate and reliable results. As just mentioned, the sensitivity and accuracy of automated spot counting algorithm was heavily dependant on the choice of associated parameters (described in Appendix C); particularly the radius of the top-hat transform filter and the threshold used for the maxima detection. To optimize these parameters for each section analyzed a few randomly chosen FOVs were processed in which the user adjusted the parameters prior to applying them to the entire scan as a whole. The default parameters of a 5 pixel radius for the top-hat filter (Tophat_diameter variable) and a maxima threshold level of about 50 grey levels (Maxima_thresh variable) seem to work well in most cleanly stained tissues. These two parameters were usually the only ones of spot_count.m that wer adjusted to optimize spot enumeration. Increasing the tophat radius was noticed to increase the size of spots not removed from the image and reducing the maxima detection threshold would enable detection of weaker intensity dots, but increase the possibility of false positives. As just presented, the in-house developed algorithms used for the segmentation of nuclei and enumeration of FISH probe spot-signals within, can be deemed valid and comparable to a human observer. The validity of each algorithm was determined independently and in combination and consequently, this process of automated FISH spot enumeration was subsequently carried out on tissue samples to generate data sets for clonal identification as described in section 3.3.4.  54  4.4 Cell Classification and Clonal Identification Validation 4.4.1 Experiment introduction As previously mentioned, a squamous cell lung carcinoma xenograft tumour AB117-16-8156 (LU16-SDCC) sections, cut at 7um thickness from a formalin fixed, paraffin embedded (FFPE) block were used to optimize tissue FISH protocols and as test samples for imagining and clonal algorithm development purposes described herein. For these experiments, a commercial probe set known as LAVysion (Abbott Laboratories. Abbott Park, Illinois, U.S.A, formerly Vysis) was purchased and consists of four directly labeled DNA FISH probes. The locus-specific (LSI) EGFR (epidermal growth factor receptor) gene probe is labeled in Spectrum-Red and covers a 300 kb region that contains the entire EGFR gene (7p12). The approximately 750 kb SpectrumGold labeled LSI C-MYC probe, contains the entire C-MYC gene (8q24.12-q24.13). The LAVysion Multi-color Probe set also contains an approximately 450 kb Spectrum-Green sequence labeled LSI D5S23, D5S271 (5p15.2) and a centromere enumeration probe (CEP) for chromosome 6 labeled in Spectrum-Aqua. The CEP 6 probe hybridizes to the alpha satellite DNA region located at the centromere of chromosome 6. During the tissue-FISH protocol optimization and clonal algorithm development, xenograft tissue samples were imaged and analyzed with NH and CC algorithms. Initial analysis results indicated large, often connected clusters of segmented cells lacking signal in all four probes simultaneously; essentially double deletions of the entire DNA specific to the probes being investigated.  Further visual investigation of the stained tissue was  performed by a trained professional and it was verified that not only did large clusters of cells with double deletions in all probes exist, but that these cells also showed no diffuse staining of unspecific hybridization signal often exhibited by the human tumour cells. This phenomenon is exemplified in figure 4-6 along with the system classified Neighbourhood Homogeneity and Clone Connectivity Scores for the matched areas. The lack of hybridization signal and significant diffuse staining seen in figure 4-6 can be attributed to the general low cross hybridization of specific human DNA probe sequences to mouse DNA. 97-99 Thus these were deemed to be infiltrating mouse cells.  55  FIGURE 4-6 – A) Small 12 FOV sub-image of sample 1 showing definite regions of normal mouse cells among human tumour cells. Colour images displayed as a result of DAPI (blue nuclear stain) and only 2 of the 4 probes (Green and Red) for clarity. Matched Neighbourhood Homogeneity map (B) and Clone Connectivity map (C) showing clone definition and delineation of these areas of mouse cells. White arrows indicate regions of infiltrating mouse cells.  56  Thus, this infiltration of mouse cells into human xenograft tumours presents a great model for evaluating the Neighbourhood Homogeneity and Clone Connectivity algorithms in terms of finding and delineating clones of mouse cells among human tumour cells. This model also partly establishes the ability to include double deletions across multiple channels as a characteristic which can be used to distinguish clonal populations. To create a validation data set, two observers (I1 and I2) were asked to classify cells in the 4 samples stained with the aforementioned probes as either mouse or human. The observers were asked to distinguish cells based on the presence or absence of FISH signal in all 4 probes and general lack of diffuse staining. Furthermore, cell selection was performed based only on segmentation results in order to exclude effects of the segmentation algorithm on the classification and validation of the spot counting per nuclei and the Neighbourhood Homogeneity and Clone Connectivity algorithm recognition of areas of mouse cell clones. In total, both observers classified close to 110 000 individual cells across the four samples.  4.4.2 Experimental results Four xenograft tissue samples were stained with the aforementioned LAVysion probe set, imaged, segmented and spot counted with parameters optimized for each slide and finally analyzed with our Neighbourhood Homogeneity (NH) and Clone Connectivity (CC) Score algorithms.  For these analyses, the cells of interest were  defined as ones lacking FISH signal in all four colour channels. Each sample is a composition of 100 FOVs except sample 1 which is actually 150 FOVs. To asses the accuracy of the automated identification of mouse infiltrate clones, the four slides stained were evaluated by two human interpreters.  This was achieved by allowing a human  observer to indicate which segmentation results correspond to mouse cells (i.e. cells with no signal in any channel). By having a human observer score the segmentation results directly, results for both the human and automated clonal scoring are based on identical data and thus reflect only the differences in clonal detection sensitivity and not segmentation variability.  57  4.4.2.1  Intra-observer and inter-observer variability characterization Prior to using the results of human observer classifications to evaluate the  automated classification/clonal identification accuracy of the system, a comparison was made to determine the inter- and intra-observer variability. Sample 1 was assessed by observer I1 twice (I1a and I1b) to gain a measure of intra-observer variability, by observer (I2) provided a sense of inter-observer variability.  The results of these  comparisons are shown in table 4-3.  TABLE 4-3 – The comparison of intra and inter observer classification results on sample 1. Results and their percentages of total cells scored in brackets are given for each classification. Results were quite good with relatively low intra and inter observer variability, with at most ~7% intra-observer variability and ~6% inter-observer variability in the classifying mouse cells. Number of Mouse Cells Found Cells Scored Intra-observer  23966  Number of Human Tumour Cells Found  I1-a  I1-b  Common  I1-a  I1-b  Common  8342  9455  8036  15624  14511  14205  (34.8)  (39.5)  (33.5)  (65.2)  (60.5)  (59.3)  I2  I1a  Common  I2  I1a  Common  9423  8342  8051  14543  15624  14247  (39.3)  (34.8)  (33.6)  (60.7)  (65.2)  (59.4)  I2  I1b  Common  I2  I1b  Common  9423  9455  8936  14543  14511  14022  (39.3)  (39.5)  (37.3)  (60.7)  (60.5)  (58.5)  (I1a vs. I1b)  Inter-observer  23966  (I2 vs. I1a) and (I2 vs. I1b) 23966  As shown above, the majority of cells classified as mouse or human are common to both observers or common to both observer classifications. From table 4-3, it’s easy to see that about ~93% of cells classified as mouse are common between the two classification sessions of observer 1. Likewise, about 94% of the cells are common to 58  both I2 and I1. Thus, intra-observer variability in classifying mouse cells was assessed as about 7% and inter-observer variability was slightly lower at around 6%. The Neighbourhood Homogeneity and Clone Connectivity scores were then calculated for each of these data sets to determine the extent of the aforementioned variability measures on the actual maps. As seen in figure 4-7, the HN and CC maps do differ to some extent between the human observer data sets. This is encouraging as the extent of agreement between the human interpreted variability in the maps is similar to the agreements between the automatically classified data, as shown forthwith. 4.4.2.2  Classification sensitivity and specificity The human classified data sets were first used to determine the quality of  automated classification of mouse cells. Across all samples examined, about half of the cells segmented were scored as mouse cells by both the human interpreters and by the automatic methods (table 4-4).  Furthermore, in each case, 75-90% of the cells  characterized as mouse infiltrates by a human are common to the automatic methods. Extending this analysis to a measure of specificity and sensitivity across the 4 samples and data set comparisons yields a sensitivity of 83.4% and a specificity of 80.5% (table 45). This summarizes the system’s ability to accurately distinguish mouse cells (positive) from human tumour cells (negative) compared to a human observer. Comparing only the human classified datasets resulted in an average human-to-human sensitivity of about 90% and specificity of 96%, which is relatively close to the system’s capability. Although encouraging, these results only account for the correct classification of cells as human or mouse cells and do not asses the quality or performance of the clonal identification algorithms. To evaluate the clonal identification algorithms, the Neighbourhood Homogeneity and Clone Connectivity scores were applied to both the human observer data and to the automated spot counting results.  59  A  D  B  E  C  F  FIGURE 4-7 – The Neighbourhood Homogeneity Maps (A,B,C) and Clone Connectivity Maps (D,E,F) for the data sets classified by observer I1 (A,D), by observer I1 a second time (B,E) and finally by observer I2 (C,F). The HNS maps were calculated using 3-layer neighbourhood and positive cells were defined as ones without FISH probe staining. Although maps show high agreement between each round of classification, minor differences are apparent and such inconsistencies are similar to ones observed in comparing automated system classification.  60  TABLE 4-4 – Raw results of human observer and system classification results for each sample. Sample 1 was repeated twice by interpreter 1 and once by interpreter 2 to gain a sense of inter- and intra-interpreter variability. All results are given in total cells detected as well as the % (in brackets) of all cells found in that sample. Number of Human  Number of Mouse Cells Found Cells  Sample  found  1a (I1)  23966  1b (I1)  23966  1c (I2)  23966  2  8860  3  11661  4  11382  Human  Spot Counting  Tumour Cells Found  Common  Human  Spot  Common  Counting  8342  11540  7540  15624  12426  11610  (34.8)  (48.2)  (31.5)  (65.2)  (51.8)  (48.4)  9455  11540  8388  14511  12426  11346  (39.5)  (48.2)  (35.0)  (60.5)  (51.8)  (47.3)  9423  11540  8426  14543  12426  11417  (39.3)  (48.2)  (35.2)  (60.7)  (51.8)  (47.6)  5620  4505  4201  3240  4355  2936  (63.4)  (50.8)  (47.4)  (36.6)  (49.2)  (33.1)  6143  5226  4748  5518  6435  5041  (52.7)  (44.8)  (40.7)  (47.3)  (55.2)  (43.2)  6200  4986  4611  5182  6396  4806  (54.5)  (43.8)  (40.5)  (45.5)  (56.2)  (42.2)  TABLE 4-5 – Specificity and Sensitivity analysis of raw data in table 4-4. Note the reversed trend between sample 1 and sample 2-4. True  False  False  True  Positive  Positive  Negative  Negative  1a (I1)  7540  4000  816  1b (I1)  8388  3152  1c (I2)  8426  2  Sample  Sensitivity  Specificity  11610  90.2  74.4  1080  11346  88.6  78.3  3114  1009  11417  89.3  78.6  4201  304  1419  2936  74.8  90.6  3  4748  478  1394  5041  77.3  91.3  4  4611  375  1590  4806  74.4  92.8  TOTAL  37914  11423  7308  47156  83.8  80.5  61  4.4.2.3  Neighbourhood homogeneity analysis Generating a Neighbourhood Homogeneity (NH) score on the human and  automatically classified data sets generated a robust map of tissue homogeneity with respect to a specific clonal population, in this case the lack of signal in all four colour channels representing the infiltrating mouse cells. In these examples the NH score simply represents the total percentage of mouse cells in each particular cell’s three-layer neighbourhood.  The NH maps in figure 4-8 each display unique areas of highly  homogenous mouse cells as well as areas in which no mouse cells are present. Examining these results more closely, in figure 4-8, sample 1 has larger areas of human tumour cells while samples 2 through 4 show a majority of mouse cells. Even though the areas imaged and analyzed are not matched to each other, this trend of increased mouse cell content was confirmed by visual inspection of the 4 slides under the microscope. The explanation for this increase is that it is a result of the location each sample was sectioned from within the xenograft tumour block. Specifically sample 1 was the 10th slice while samples 2 through 4 were slices 31-33 respectively. Because the tumour block in question was already sectioned past its midpoint, these latter slices were getting closer to the edge of the human tumour, resulting in increasing heterogeneity and penetration by infiltrating mouse cells, i.e. increased human tumour and mouse cell mixing. Comparing the NH maps of the human-interpreted and system-interpreted cells (left vs. right columns of figure 4-8) reveal that there is considerable qualitative agreement between the two.  General characteristics of the NH maps for each sample  are comparable with human validation results; with locations of high and low clonal densities largely preserved.  Although these general trends are conserved, some  significant deviations are noticeable between the system and human classified data sets.  62  FIGURE  4-8  –  Neighbourhood Homogeneity  Maps  for samples 1 through 4 (rows 1 through 4 respectively) The left most column shows the human interpreter classified data while the  right  columns  show  system  classified data, both used  a  3  layer-  neighbourhood  for  their scoring.  Note  the good agreement in high regions  NH  score  but  higher  variability score  in  low  areas.  reverse  The  trend  is  noticeable in samples 2-4, with lower NH score  areas  having  less deviation from the human classified data then high-valued regions. In general the automated  results  show more variability which translates as the appearance of a noisier image.  63  4.4.2.3.1  Neighbourhood Homogeneity map noise and spot counting  Visually, the NH maps for data classified by the system seem to have more noise and variability then for data classified by the human observers. This increased noise is present in all the samples, but varies in type between them. In sample 1, areas of high NH score seem to be comparable to the human validation map and have less noise then the low value regions that have more variability and deviation from the human data set. The opposite effect is noticed in samples 2 through 4. Here, areas of low NH scores seem to agree more so then the high-density areas.  This observation suggests that the  spot counting algorithm, which was optimized individually for each sample analyzed, underperformed in sample 1 and over performed in samples 2-4. Such could cause the differing but systematic characteristics in the noise. For example, if the spot counting parameters for sample 1 were too weak, valid spots as indicated by a human observer would be missed and cells would be incorrectly classified as mouse.  However, mouse cells, which have no FISH spot signal would not be  incorrectly classified to the same extent. These observations follow the trends noticed in the previous specificity and sensitivity analysis (table 4-4) with sample 1 having higher false positive (incorrectly classified as mouse) cells compared to samples 2 through 4, which had higher false negative rates. These observations suggest that optimizing spot counting parameters by visual assessment over a few randomly selected FOVs may not provide enough consistency. Thus, future work should examine a standardized way to optimize said parameters. This conclusion is further supported by the means of each data set as presented in table 4-6 where the average NH scores of the automatic classifications in sample 1 are larger then the NH scores of human interpreters. This trend reverses in samples 2, 3 and 4 and supports the over or underperformance of the spot counting algorithms. Although, the above results show that the NH calculation can be said to give accurate cell homogeneity results as compared to a human observer, the NH score itself does not provide a robust way to determine clonal clusters within the sample imaged. One simple way to determine which areas of the NH map can be considered as part of a clone with the desired FISH characteristics is to threshold score maps at a certain  64  neighbourhood homogeneity density level. This would partition the images into regions of high mouse cell homogeneity which can be considered clonally related.  These  segmented regions however, could not be differentiated from each other, or in other words, each clone region would not be uniquely identified. Thus, for more robust and meaningful definitions of clonal clusters within these samples, use the Clone Connectivity score would be preferred. As mentioned previously, the CC score is simply the total number of positive (mouse) cells connected to each other to form a continuous string or region of positive cells.  TABLE 4-6 – Mean and Standard deviation of Neighbourhood Homogeneity Maps (3 layer neighbourhood) and Clone Connectivity Maps showing systematic differences between the human and system derived data. Sample  Neighbourhood Homogeneity Score System  Human  Clone Connectivity Score System  Human  mean  std  mean  std  mean  std  mean  std  1 (I1a)  0.517  0.340  0.402  0.415  250.0  414.3  47.7  95.6  1 (I1b)  0.517  0.340  0.442  0.404  250.0  414.3  54.3  103.5  1 (I2)  0.517  0.340  0.435  0.404  250.0  414.3  50.2  96.1  2  0.562  0.343  0.699  0.389  15.1  31.7  38.8  63.3  3  0.510  0.352  0.604  0.404  15.8  29.0  35.9  56.5  4  0.468  0.343  0.577  0.409  12.7  23.4  42.0  68.1  4.4.2.4  Clone Connectivity analysis Applying the Clone Connectivity Score to the human and automatically classified  data sets imaged (figure 4-9) provides a robust map of mouse cell connectivity, therefore defining mouse cell clones. As evidenced in figure 4-9, there is again considerable qualitative agreement between the human classified data sets and ones classified automatically.  General  characteristics of the Clone Connectivity maps for each sample are comparable; with locations of high and low clonal connectivity regions being preserved.  65  FIGURE 4-9 –  Clone  Connectivity Score maps for samples 1 through 4 (rows  1  through  4  respectively) The left most column shows the human interpreter classified data (for observer I1a) while the right columns show system  classified  data,  both analyzed with the Clone  Connectivity  Scores.  Visually, most  general trends of each sample  are  mirrored  between the human and system  classified  data.  However, there are some scoring differences as can be seen from each figure’s colour bar scale. Also, the human observer image sets have more smoothness to the  shapes  represented automated  of  clones  while  the  system  data  (column 2) exhibit more irregular  contours  suggesting  some  intermixing of mouse and human cells, caused by improper and  spot  counting  hence  cell  misclassification.  66  4.4.2.4.1  Clone Connectivity score results and spot counting  These regions however, differed slightly in their quantitative Clone Connectivity scores. In sample 1, the CC scores for similar areas are about 20% lower for human classified data. This trend reverses in samples 2-4, where human classification lead to Clone Connectivity scores about 20% higher then comparable clones of automatically classified cells. This is further supported by the changes in mean CC scores for each region as seen in table 4-6. This quantitative disagreement can be explained by the result of systematic erroneous spot counting parameter optimization as alluded to previously. Lower CC scores for automated classification of samples 2-4 further supports that the spot counting algorithm was too sensitive for these samples. Misclassified cells would disrupt the clonal connectivity calculated by the algorithm and thus would drastically reduce the scores. In sample 1, the opposite effect was observed and indicates that here, the spot counting algorithm was not sensitive enough. In such a case, the failure to detect true FISH spot signals could result in erroneously classified mouse cells incorporated into clonal clusters, greatly enhancing the Clone Connectivity scores.  These trends mirror  the observations made from figure 4-9, table 4-6 and even the specificity and sensitivity analysis.  4.4.2.4.2  Thresholding the CC score to identify clones  The previous observations were based on mostly qualitative analysis of the Clone Connectivity maps. To address our goal to identify and delineate significant regions of mouse cell clones we performed the following analysis. A threshold needed to be applied to the clone connectivity maps in order to separate the continuous score data into discrete regions or clones.  As seen in table 4-7 and specifically figure 4-10 the choice of  threshold level, i.e. the number of connected individuals needed to comprise a clone, can have a great effect on which areas are considered separate clones and which clones are common to both human and automatic classification.  67  FIGURE 4-10 – Clone Connectivity Maps of sample 1 with thresholds at 5, 15 and 20 neighbour strings (column 1 through 3 respectively). Row 1) The comparison of human and system classified clonal areas. Areas of agreement are indicated in red, areas unique to human classification are in yellow and those unique to system classification are in light blue. Row 2) Image representing automated scoring of clonal regions and row 3) image representing scoring by human interpreter, in these two rows red areas indicate clones found by both techniques while green represents clones only found by that technique. As shown, the selection of appropriate thresholding method can have a great influence on the validity of the results. Here the threshold with the largest agreement seems to be about 15 neighbours, resulting in the fewest (zero) missed clones and maximizing common areas.  68  TABLE 4-7– Analysis of single threshold selection of CC maps to delineate independent clones. The statistics for the three thresholds of sample 1 (5, 15, 20) as displayed in figure 4-10 are given. In this case, the 15 neighbourhood threshold level seems to agree best with human interpreter. For samples 2-4 and the human vs. human data (sample 1), only the optimized thresholds (done so by calculation of threshold score and qualitative visually inspection) are shown.  Sample 1  Threshold Level  Sample 3  Sample 4  I1a (Human)  vs.  vs.  I1(System)  I1b(System)  5  15  20  8  8  9  15  16  1  Positive by Human (pxls)  158175  119239  106987  507601  503367  508913  106502  368145  2  Positive by System (pxls)  253855  210311  199216  309668  349888  313079  113954  351801  108204  101798  102968  3227  4043  2171  16778  12460  12524  10726  10739  201160  157522  198005  9326  28804  145651  108513  96248  306441  345845  310908  97176  339341  3 4 Features  Sample 2  I2 (Human)  5  Over classified pxls (System + , Human -) Under classified pxls (System - , Human +) Correctly classified pxls (System +, Human +)  6  Number of clones by Human  52  37  35  13  26  19  28  26  7  Number of clones by System  86  35  36  37  38  54  26  32  50  12  14  1  0  0  1  1  3  0  2  4  6  1  0  1  1.816  2.720  1.899  2.830  3.009  3.301  3.150  2.987  8 9  Number of clones missed (Human -, System +) Number of clones missed (Human +, System -) Threshold Score  69  The optimal threshold level was investigated by examining the set of features summarized in table 4-7 for a range of thresholds from 5 to 50 neighbours (data not shown). Ideally, an optimal threshold level would result in nearly identical clonal regions common to the automated and human data sets; indicated by minimal differences in areas classified as positive with similar sets of clones found between the two methods. Thus a score (equation 4-1) was produced to asses the threshold level in terms of minimizing the number of under and over classified pixels, maximizing common pixels, and maximizing the number of clones in common between the two data sets. With the help of this score and by qualitative appraisal, a threshold of between 8-15 neighbours was determined to provide the best agreement with the human validation data sets as seen in table 4-7 and figure 4-11.  EQUATION 4-1 – Threshold score used to assist in defining optimal thresholding levels. The 9 features indicated in table 4-7 were calculated at thresholds ranging from 1 to 50. Features 2,3,5,8 and 9 were then normalized and the score was calculated as below. The score would be a maximum (max = 5) when correctly classified pixels were at their maximum while under and over classified pixels and missed clones were all at their minima.  Threshold Score = Feat.5 + (1 - Feat.2) + (1 - Feat.3) + (1 - Feat.8) + (1 - Feat.9) Since there are differences between the Clone Connectivity scores of the human and automated classification sets, the use of two unique thresholds could improve on the clonal agreement. Thus, thresholds were varied up to 50 neighbours independently for the human validation CC map and the automated CC map. Again, using the ‘threshold score’ and some visual evaluation, threshold ranges of 10 to 20 for the human data and 5 to 15 for the automated data provided best results. The higher scores (table 4-8) of this multi-threshold analysis suggest that by using two separate definitions of clones, agreement of clonal regions between the human and automated sets can be maximized even further. This was also qualitatively true when visually examining images in figure 4-12.  70  FIGURE 4-11 – CC maps with threshold using single thresholds. Rows: A) Sample 1 with a threshold of 15 connected neighbours, B) sample 2 with a threshold of at least 8 connected neighbours, C) sample 3 with a threshold of 8 connected neighbours and D) sample 4 with a threshold of 9 connected neighbours. Each row has 3 images as follows: Left) Comparison of human and system classified clonal areas. Areas of agreement indicated in red, areas unique to human classification in yellow and unique to system classification is in light blue. Middle) Image representing automated scoring of clonal regions and Right) image representing scoring by human interpreter, where red areas indicate clones found by both techniques while green represents clones only found by that technique.  71  TABLE 4-8 – Analysis of two threshold selection of Delaunay Connectivity maps to delineate independent clones. Only optimized thresholds are shown and results seem in more agreement with human defined data then by applying single threshold to both data sets. Sample 2  Sample 3  Sample 4  Threshold Level 1 (System)  16  10  10  7  Threshold Level 2 (Human)  7  22  13  10  1  Positive by Human (pxls)  147424  354938  447154  496724  2  Positive by System (pxls)  208012  275500  320274  348513  79233  36360  10689  12840  18645  115798  137569  161051  128779  239140  309585  335673  3 4 Features  Sample 1  5  Over classified pxls (System + , Human -) Under classified pxls (System - , Human +) Correctly classified pxls (System +, Human +)  6  Number of clones by Human  49  15  37  20  7  Number of clones by System  39  35  39  55  8  8  1  3  5  2  11  1  3.975  3.908  3.971  4.204  8  9  Number of clones missed (Human -, System +) Number of clones missed (Human +, System -) Threshold Score  72  FIGURE 4-12 – CC maps with two thresholds, one for the system classified data and the other for the human classified data: Row A) sample 1 with thresholds of at least 16 & 7 connected neighbours, B) sample 2 with a threshold of 10 & 22 connected neighbours, C) sample 3 with a threshold of 10 & 13 connected neighbours and D) sample 4 with a threshold of 7 and 10 connected neighbours. Each row has 3 images as follows: Left) Comparison of human and system classified clonal areas. Areas of agreement indicated in red, areas unique to human classification in yellow and unique to system classification is in light blue. Middle) Image representing automated scoring of clonal regions and Right) image representing scoring by human interpreter, where red areas indicate clones found by both techniques while green represents clones only found by that technique.  73  4.4.3 Conclusions The six sets of validation data just described provide an excellent test set to evaluate the classification ability of the system to identify cells as mouse (double deletion of FISH probe in all channels) or human tumour. Furthermore, this validation set also enables some investigation into clonal identification through the use of Clone Connectivity and Neighbourhood Homogeneity analyses. Although dependant on many parameters, the average sensitivity and specificity of the system to correctly characterize mouse cells against human tumour cells is 82.4% and 84.3% respectively.  This is comparable to the human observers and quite  encouraging and fits well with the independent assessments of spot counting and segmentation mentioned previously (sections 4.2 and 4.3). However, the high variability of the specificity and sensitivity between samples suggests special attention must be paid to refine and optimize image analysis parameters for unique tissue samples, probes and staining quality. Analyzing the human and system classified data sets with the Neighbourhood Homogeneity and Clonal Connectivity Scores revealed qualitative and semi-quantitative validation. The NH score maps for each sample set exemplified this general agreement between the human and system classification albeit, with some systematic differences. These differences are hypothesized to be the results of improper spot counting parameters optimized for the respective tissues. Although the NH maps create a great way to investigate heterogeneity of the tissue, the Clone Connectivity Score seems superior to delineate independent clones. As with NH maps, comparing the CC maps constructed from the automatically and human classified samples shows identical qualitative features of each analysis with small scoring and delineation inaccuracies between the two sets. This can again be a result of improper spot counting optimization for each particular sample. Thresholding the CC score maps at differing numbers of neighbours results in the CC map segmented into regions of ‘clones’. It is evident that the selection of proper threshold level is paramount to match results of the system to those of human observers. Furthermore, the slight disagreement between human and automatic classification maps leads to better 74  agreement with the use of individual threshold levels for each.  For most samples  examined, clones defined at a minimum of 5-20 neighbours resulted in optimal agreement with human observers as determined semi-quantitatively. Lastly, comparisons of NH and CC maps for each sample exhibit common regions of high density and interconnectivity, although not always of identical shapes. The high NH regions indicate low heterogeneity of the cells present there. Thus, regions common to both the NH and CC maps indicate clusters of interconnected cells with low heterogeneity, which should prove to be a robust method of identifying genetically related cell subpopulations. The general agreement between the human and system determined data not only validates spot counting classification but also validates the proof-of-principle of this approach for clonal analysis through the use of Neighbourhood Homogeneity and Clone Connectivity mapping.  75  4.5  Proof-of-Concept Experiment  4.5.1 Experiment introduction In the previous sections, validation and accuracy experiments indicated that automated spot counting and segmentation processes were comparable to human results provided proper quality staining conditions and moderate nuclear density tissues,. Furthermore, xenograft human tumour tissues infiltrated with normal mouse cells were used as a model to validate clonal population detection through the use of Neighbourhood Homogeneity (NH) and Clone Connectivity (CC) Scores. The following experiment was designed as a proof-of-concept example incorporating the complete methodology likely to be followed in the early uses of the FISH workstation to investigate the presence and distribution of particular cells with specific CGH-derived gene signatures. The aim here is to confirm, in-situ, the gene copy numbers of a few significant loci as determined by previous CGH profiling of the tissue chosen for investigation.  This will require synthesizing FISH probes for gene loci  identified as highly significant, staining the tissue with said probes and performing the automated FISH spot enumeration process.  Additionally we wish to identify cell  subpopulations within the tissue sharing similar FISH spot signal patterns by employing the Neighbourhood Homogeneity (NH) and Clone Connectivity (CC) Scores. Although this experiment will focus only on significant amplifications of the genome for the sample chosen, future plans will see an identical experiment performed on high-significance deletion loci and lastly a genetic signature featuring a combination of both amplifications and deletions as well as a negative control.  76  4.5.2 Materials and methods 4.5.2.1  Sample selection One human patient, resected, squamous cell carcinoma sample (bar code:  82050465) was selected from a batch of 64 available paraffin embedded, formalin fixed tissue blocks for whom CGH data was available (data not shown). Selection from this batch of 64 was based primarily on the quality of the associated CGH data, this step eliminated 42 samples. The remaining 22 samples were evaluated based on the quality of their CGH ‘signal’, i.e. regions of obvious amplification and deletion of the genome. Four samples in total fit all the descriptions and final choice of sample was made based on the amount of tissue available and the absence of high nuclei-density regions (as evaluated on H&E stained sections). Sample 82050465 was thus eventually chosen for this experiment. For this experiment, slides of sample 82050465 were cut to 7um. 4.5.2.2  Determining gene loci of interest Figure 4-13 shows the CGH results, as performed in  63  , of sample 82050465.  Five genomic regions of significant amplifications were selected for investigation as determined through a statistical algorithm, aCGH-smooth100, and confirmed by manual inspection. However, each significant region consisted upwards of 100s of BACs in length, thus the final choice of the two to three individual BACs used for FISH probe synthesis was left largely arbitrary. Effort was made to pick clones that covered regions without previously documented inherited copy number variations (CNV)101 and covering known oncogenes genes (table 4-9).  77  TABLE 4-9 – FISH probe synthesis overview, indicating fluorophore used, the chromosome region and BAC clone names, length of region covered and any known genes that were covered therein. Fluorophore  Chromosome  BAC Clone  Total Length  Used  Region  Names  (base pairs)  DEAC  4q12-q13.1  Cy3  11p15.2  N0284L03 N0640J20 N0793O12 N0511J01  403123  299038  N0036B02 Cy5  19q13.2  N0808D17  439532  N0587F12 Spectrum Red  6q13  Spectrum Green  2p25  4.5.2.3  N0077O17 N0374I18 N0451A14  Genes Covered * indicates known oncogene  Rest, MGC3232, POLR2B, SPINK2 SPON1, RRAS2*, COPB, PSMA1 PLD3, MAP3K10, FLJ13265, AKT2*  Successfully Synthesized  No  Yes  Inconsistent  302479  KCNQ5  Yes  176189  MYCN*  Yes  FISH probe synthesis and tissue staining The BACs selected from the CGH profile were pulled from the frozen E. coli  BAC library at the Genome Sciences Centre (Vancouver, BC) and BAC DNA was isolated and confirmed by following Protocols 1 (Appendix A) using the BACs from table 4-9.  Next, the isolated BAC DNA was nick translated with the appropriate  fluorophore-dUTP as indicated in table 4-9, using a Protocol 2 (Appendix A). Briefly, 1 ug of total BAC DNA was used as the starting material for each probe and upon completion of the overnight reaction each probe was checked on a 0.7% agarose gel to ensure size fragments were on average between 200 - 400 bp (figure 4-14). The results of FISH probe synthesis are also summarized in table 4-9. Of the 5 fluorophores used, four were successful in labeling the desired DNA. Problems were observed with the DEAC fluorophore, which failed to significantly incorporate into the DNA (determined through a spectrometer) and was not included in the experiment. The Cy5 fluorophore seemed to incorporate well into the DNA, but signal was inconsistent and weak. Since ozone is known to degrade Cy5 fluorescence, all reactions were performed in an ozone-free environment to no improvement in signal.  78  FIGURE 4-13 – CGH data of sample 82050465 for entire genome. Data is presented in three parts for each chromosome: 1) the chromosome to indicate genomic location 2) blue spots representing each individual BAC of the array: its genomic position and its relative amount, with the spots located to the right f the midline indicating higher content of the BAC in the tumour sample 3) the results of aCGH-Smooth, with red sections indicating genomic regions of significantly higher content in tumour sample then a normal reference genome. The orange boxes indicate regions which contain the loci selected for FISH probe synthesis.  SpGreen Æ SpRed Æ Cy5 Æ Cy3 Æ DEAC Æ  FIGURE 4-14 – 0.7% agarose gel with DNA ladder. The smallest DNA ladder segment is 100 bp, with 8th being 1000 bp. As seen in the image, most of the DNA streaks reside in the 100-500 bp lengths, which is optimal for FISH staining. For probes 4 and 5, some unincorporated dye is present and manifests as blotches at high bp lengths.  79  Nonetheless, a multicolour probe mix was made by mixing different quantities of the 4 probes determined by intensity and the hybridization efficiency of each probe. This procedure necessitates the need to stain a few control slides with the probes and determine their respective intensities. In this case 120ng (6ul) of Cy3, 50ng (3ul) of SpectrumRed, 125 ng (7ul) of Spectrum-Green and 180ng (10ul) of Cy5 were mixed with 25 ug Cot-1 and 37.5 ug of SHS, ethanol precipitated and resuspended into 10ul of hybridization buffer (details in 4.8.2). Finally, 7 µm thick sections of sample 82050465 tissue were hybridized with the probe mix following Protocol 3 (Appendix A), with digestion for approximately 13 minutes, 10 µl of probe mix per slide and hybridized for 30 hours.  4.5.3 Experiment results 4.5.3.1  Nuclei segmentation and spot counting A set of images was generated by scanning the section stained with the described  probe mix. The scan area analyzed was approximately 3.5 x 4.5 mm in dimensions and consisted of 18 x 18 individual FOVs stitched together. The counterstain image of the scan area, figure 4-15a, exemplifies a few issues with the image acquisition step, namely inaccuracies in auto-exposure control when a FOV is empty or has the presence of brightly fluorescing dust or other particulates.  Due to these inaccuracies, the  segmentation and spot counting algorithms also suffer in these regions as seen in figure 4-15b, where the segmentation process finds false objects with no staining in any channel indicated by the blue markers. These segmentation and spot counting results are summarized in table 4-10. Segmentation identified 52813 individual cells within the scan area and almost 47% of the cells detected did not have any FISH spot signal present.  As seen in figure 4-15b,  most of these unlabelled cells (blue markers) seem to congregate in the upper right corner which was a region of high-nuclei density. As previously discussed, this is a domain with which the segmentation algorithm and even FISH staining behave erratically. Upon rejecting these non-valid cells, the spot counts presented in table 4-10 were measured. 80  The spot counts for each of the 4 probes were then compared to expected spot counts derived from CGH ratios from the CGH profile for this sample. The CGH ratios were first converted to expected spot counts following equation 4-2, where SCnormal is the observed spot count for normal tissues with truncated nuclei usually between 1.4 and 1.7 52,102,103 and CGHratio is the log 2 ratio of the normal to tumour measures on the array for these loci. CGHratio here is also the average CGH ratio of all the BACs used to synthesize each FISH probe.  EQUATION 4-2 – Used to calculate expected spot counts from known CGH ratios of loci used for FISH probe synthesis. SCnormal is the observed spot count for normal tissues with truncated nuclei; usually between 1.4 and 1.7 52,102,103 and CGHratio is the log 2 ratio of the normal to tumour measures on the array for these loci. CGHratio here is the average CGH ratio of all the BACs used to synthesize the FISH probe.  SpotCount exp ected = (SC normal ) ⋅ 2 ( CGHratio )  As seen in table 4-10, the average measured spot counts for each probe, except the Cy5 labeled probe, show correlation to the expected CGH ratio data. Also, the poor correlation of the Cy5 channel was likely due to its weak signal strength and high unspecific staining observed, thus it is likely that this genetic loci was not quantified accurately. Consequently, the Cy5 probe was omitted from further analysis using NH or CC metrics. The correlation between observed average spot-counts and expected CGH ratios, although preliminary, suggests validation of CGH data on an in-situ, tissue-wide level. The streamlining of this ‘CGH loci to FISH probe’ protocol could potentially be used to cross-validate CGH data in a high-throughput fashion.  81  A  B  FIGURE 4-15 – (A) Counterstain image composite of 324 individual FOVs measuring about 3.4 x 4.5 mm of a 7um thick squamous cell carcinoma sample. Evident are areas of improper exposure caused by dust or lack of fluorescent objects during acquisition. (B) Image of only nuclei centroid coordinates, with blue markers indicating nuclei without any staining while red markers indicate some staining occurred. The non-uniformity of FISH staining is evident, with upper-right and lower central regions failing to stain. Also, areas of improper auto-exposure during image acquisition show false objects, most without staining  82  TABLE 4-10 – Summary of expected and measured spot counts of each channel.  Expected spot counts  were calculated using eq.4-2 and measured counts fall within range for all channels except Cy5, which showed low intensity.  Probe  Average CGH ratio of all BACs used  SpG  Expected Spot Count  Measured Spot Count  SCnormal = 1.4  SCnormal = 1.7  0.223  1.63  1.98  1.87  Cy3  0.369  1.81  2.20  2.13  SpR  0.521  2.01  2.44  2.53  Cy5  0.782  2.41  2.92  1.64  4.5.3.1  Neighbourhood Homogeneity and Clone Connectivity analyses Next, Neighbourhood Homogeneity and Clone Connectivity analyses were  carried out on the resultant segmentation and spot counting results. The NH maps were calculated using a neighbourhood of three cell layers in radius and both NH and CC analyses interpreted positive cells as ones with any degree of loci amplification in at least 2 of the 3 probes imaged. The results of the metrics are shown in figure 4-16 for a few different scenarios. Examining the Neighbourhood Homogeneity results reveals that most regions of the malignant tissue are positive for the amplified probes as expected. Furthermore, it shows several large regions or clumps of cells which have high neighbourhood homogeneity scores (light blue through yellow areas) often upwards of 50% positive neighbours. Between these high positive-density areas lay regions of cells with little or no positive neighbours and could potentially support the theory of clonal expansion within tumour tissues. The Clone Connectivity score, figure 4-16b, suggests a similar trend. Here, the score of each clump of cells is defined by the number of direct connections between positive cells.  The high NH score regions just discussed also show very high  connectivity of positive cells, up to 120 directly connected positive individual cells. These regions are again separated from each other by regions of little or no connectivity. Lastly, figures 4-16b and 4-16c denote NH and CC maps for cells with higher  83  level amplifications for the probes imaged, i.e. 4 or more FISH signal spots for 2 of 3 of the probes. Once again, a few distinct clusters of these high-level amplification cells can be readily identified and could be viewed as an important clone of advanced genetic rearrangements.  A  B  C  D  FIGURE 4-16 – (A, C) Neighbourhood Homogeneity score results for a 3 layer neighbourhood and (B, D) Clone Connectivity score results. For maps A and B, a positive cell was defined if it consisted of 3 or more FISH spots in 2 of the 3 probes imaged; SpG, SpR, and Cy3. Panels C and D show NH and CC maps for high-level amplification clones. Thus in this case, positive cells were ones with 5 or more spots in 2 of 3 channels.  84  4.5.4 Conclusions From these preliminary results, a few interesting conclusions can be drawn. Our approach may yield an effective and high-throughput cross-validation platform to reproduce and validate CGH ratio data for a few genetic loci at a time in an in-situ fashion. Furthermore, through the Neighbourhood Homogeneity and Clone Connectivity metrics, we can visually illustrate the genetic heterogeneity of tissue and within it, potentially find regions of cells with similar FISH characteristics by adjusting user defined criteria. Furthermore, this experiment set down a complete demonstration all the steps of the type of routine analysis the FISH workstation will be required to perform. Moreover, this experiment established some FISH staining and FISH probe synthesis protocols which will also be needed for such routine investigations, both to validate CGH results and to act as a research tool to help investigate specific genetically related cancer cell clones with interesting and biologically meaningful phenotypes.  85  4.6 ReFISH Experiment A major advantage of the FISH technique is the ability to reuse the same tissue section for multiple rounds of staining.  This was first introduced in section 3.1.4 and  provides a scheme to overcome the limitation of only 7-8 discrete excitation/emission probes available for sequential imaging (see section 3.2). This limits one to a maximum of only 7-8 genetic loci at each scan, often much less then the possible number of significant loci comprising genetic signatures uncovered from CGH data. Although there are a few different approaches to overcoming these limitation as mentioned in section 2.2.5, ReFISH provides a robust and straightforward way to expand the number of genetic loci investigated. To demonstrate this ability we performed 3 rounds of FISH on the same tissue sections with the same probes to investigate the effects of subsequent ReFISH cycles on the hybridization of probes to the same site. The tissue used in this experiment was a section of a squamous cell lung carcinoma xenograft tumour AB117-16-8156 (LU16-SDCC), 7um in thickness from FFPE block.  The probes used for this experiment were FITC labeled centromere  enumeration probes for chromosomes 7, 8 and X (ID Labs, London, Ontario) mixed in a solution of 1:9 concentrated-dye:hybridization-buffer as specified in the manufactures protocols. The initial FISH staining was performed as described in the materials/methods section Protocol 3 (Appendix A), using 6ul of probe:hyb-buffer mixture for each slide. Subsequent rounds of FISH relabelling were performed by following Protocol 4 (appendix A). For the purposes of this experiment only; the reFISH protocol was paused following the probe removal wash and slides were rinsed in water and imaged to confirm loss of all hybridization signals. Immediately after imaging, Protocol 4 (Appendix A) was resumed as normal with hybridizing for ~20 hours for each round of ReFISH. After every round of FISH staining the identical area of each tissue sample was rescanned, usually consisting of 600-700 individual FOVs. Following the removal of hybridization signal, a smaller area of 25 FOVs was imaged to ensure no signal was present. Results were identical between the three slides and thus only the images and data for the slide stained with CEP 7 are presented.  86  A  D  B  E  C  FIGURE 4-17 – Example of 3 sequential rounds of ReFISH (A, B, C). Shown is one of ~600 FOVs scanned for each round of ReFISH, showing identical loci (CEP 7) stained and the reproducibility of each round.  The red circle highlights a particular cell  between the three rounds. Panels D and E are images of the tissue after the first and second stain-removal wash section (unfortunately does not matched to areas A,B,C). As shown, neither the FISH signal nor the DAPI is present after the signal is stripped away.  As seen in figure 4-17, each round of FISH appears to label identical loci in each cell. This is evident in the circled cell, where 5 spots in a ‘C’ pattern are reproduced in each round of staining. To show this is a unique hybridization signal and not a previous signal that failed to be removed, figure 4-17d-e show the images of the tissue after the removal step with no perceptible signal present.  87  Examining the three ReFISH rounds in greater detail reveals some interesting phenomena. Firstly, the quality of signal is generally constant with a small reduction in the ‘crispness’ of the loci hybridization signals.  Also, minor differences in DAPI  brightness and background fluorescence are present in each round of ReFISH. These small changes most likely occur as a result of the natural variability in the FISH protocols55 and imaging conditions. Secondly, the morphological states of the cell nuclei do not seem greatly affected by the three ReFISH procedures. This was the case in all three slides; however, the CEP 8 stained tissue section experienced damage during removal and replacement of the coverslip in an attempt to rid the slide of air bubbles for the last ReFISH staining. This suggests that the majority of damage to the tissue will come from the physical manipulation of the slide, rather then the chemical treatments it is subjected to. Thirdly, an interesting and unexpected observation was noted in the small 25 FOV area that was imaged to ensure the removal of signal following each round of ReFISH. As partly seen in figure 4-18, the area scanned after each signal-stripping wash failed to hybridize any new probes in subsequent rounds of ReFISH. This suggests that imaging the tissue immediately after the denaturing step of the ReFISH protocol, damages the DNA in a way that prevents further hybridization. The light and heat of imaging most likely supplies energy and increases the chemical reactions occurring in the unprotected denatured DNA. This is evidenced by areas highlighted in figure 2 where one can see a rather sharp boundary of failed ReFISH signal in the exact area that was imaged during the signal-removal step. These three observations show that the ReFISH protocol is repeatable and sustainable in at least 3 rounds, provided the imaging step to confirm signal removal is omitted and tissues are handled with greater physical care. Consequently, the number of gene loci able to be quantified using only the 7 excitation/emission filters of our system, can be increased from a maximum of 6 (one filter channel is reserved for the counterstain) to at least 18 in three rounds of FISH with little additional work besides actual staining and simply image registration. Furthermore, other studies have been successful in using the ReFISH approach to four hybridize four rounds, expanding the detectable number of gene loci to 24. This increase will allow the investigation of cell  88  subpopulations with more specific and complex gene copy-number profiles and thus expand the usefulness of the system.  FIGURE 4-18 – A 2nd round ReFISH merged green and blue image matched with the boundary of the signal-removal scan (red boundary). As seen inside the boundary of previous signal-removal scan, no FISH spots are evident, however just a small distance form the edge of the boundary, FISH spots begin to successfully hybridize, although staining quality this close was still very poor. Further away from the area imaged after each signal-removal wash, hybridization and image quality in general increase dramatically  89  4.7 Use in Other Tissues: Oral Carcinoma in-situ (CIS) 4.7.1 Experiment introduction A final aspect of this study was to examine the applicability of automated FISH classification and clonal analysis to tissues other then lung. Thus, an experiment was performed on FFPE tissue sections from oral (CIS) excised from a patient. This experiment used the same commercial LAVysion (Vysis) probe set as described in earlier sections, consisting of the four directly labeled DNA FISH probes labeling for EGFR (Spectrum-Red), C-MYC (Spectrum-Gold), 5p15.2 (Spectrum-Green) and a CEP 6 (Spectrum-Aqua).  Product guidelines for probe set indicate that an  amplification or deletion in any of the 4 probes indicates an important genetic abnormality common to solid tumours. An excised oral carcinoma in-situ (CIS) tissue section, cut at 5um thickness from FFPE blocks was stained according to Protocol 3 (Appendix A).  The section was  imaged in two separate areas to reduce scanning and analysis time and each consist of 234 and 180 individual FOVs respectively. Following spot enumeration, Neighbourhood Homogeneity and Clone Connectivity Maps were produced from these data sets to examine the presence of cell subpopulations sharing amplifications of the probes.  4.5.2 Experiment results Firstly, although care was taken to determine the appropriate digestion times for the oral CIS tissues, examination of the slides under fluorescence revealed strong autofluorescence and unspecific background staining. Specifically the Spectrum-Gold and Spectrum-Green channels were too noisy to enumerate any FISH spots, either by eye or with the spot counting procedure. Thus, the rest of the experiment examined the status of only the Spectrum-Red and Aqua (CEP6 and EGFR) probes. Furthermore, due to the low quality of staining the choice of optimal segmentation and spot counting parameters were difficult. Despite this, the FISH signal enumeration process was carried out as previously described and raw results are tabulated in table 4-11.  90  TABLE 4-11 – Raw Classification data for the oral CIS tissue scanned. Due to very poor staining quality, a near majority of cells was rejected. Table shows breakdown of rejected, valid, normal and abnormal cell classifications. Abnormal cells are either single deletions or amplifications.  Total Cells Found Rejected (0 Cyan + 0 Red) Valid Normal (2 Cyan + 2 Red) Abnormal (single channel staining required) Abnormal Subclass (both channel staining required)  4.5.2.1  Area 1  Area 2  Total  Percentage  Notes  30732  24964  55696  14041  12297  26338  47.3%  as % of all  16691  12667  29358  52.7%  as % of all  457  319  776  2.6%  as % of valid  16234  12348  28582  97.4%  as % of valid  3124  2341  5465  18.6%  as % of valid  Spot enumeration Here, the effect of poor digestion/staining conditions is immediately evident, with  nearly 50% of cells rejected due to no staining detected. This is also illustrated in figure 4-19b, where the non-stained cells are seen to lay fairly randomly thought-out the tissue. Both of these effects suggest that the segmentation and spot counting operations were fairly clumsy and course due to the poor quality staining conditions. Upon rejecting non-stained cells, the vast majority (97.4%) of the nearly 30 000 valid cells proved to be abnormal by LAVysion guidelines, with only 776 cells or 2.6% of the valid cells classified as normal (i.e. 2 FISH spot signals in both channels). Furthermore, the average FISH signal spot counts for the EGFR gene probe (red) and CEP 6 probe (aqua) were about 0.80 and 1.75 respectively (see table 4-12).  This  suggests that on average, the CEP 6 probe is somewhat amplified while the EGFR gene locus is actually deleted. Upon rejecting cells so that only both probes must be present present, these spot counts increase to 1.79 and 2.47, again suggesting that the CEP6 probe  91  is amplified while the EGFR probe is not. As seen in figure 4-20, this is indeed the case by visual inspection, as the green spots (representing the cyan/CEP6 colour channel) often outnumber the red-EGFR loci probe. This simple analysis shows the application of the FISH workstation for ordinary automated FISH enumeration and scoring as often performed by a trained professional.  A  B  C  D  FIGURE 4-19 – The two areas imaged of a human excised oral CIS sample. Panels A and C depict a composite of 238 and 180 individual inverted DAPI FOVs, respectively. Panels B and D display the coordinates of objects found through segmentation with red markers denoting objects which have some FISH staining while blue markers indicate objects without any FISH spots found. Note the lack of FISH signal in areas of high cell density, where both the segmentation algorithm suffers greatly and FISH staining is often weak.  92  TABLE 4-12 – Spot counting averages over different cell populations; visual inspection seemed to confirm the cyan-CEP6 probe amplified while the EGFR-red probe was unaffected or even deleted. Channel 1  Channel 2  (Cyan - CEP6)  (Red - EGFR)  All Cells  0.95  0.43  0.46  All Valid Cells  1.75  0.80  0.46  2.47  1.79  0.72  Ratio  Valid Cells with staining in both channels  FIGURE 4-20 – An examples of FISH spot signal of the oral CIS tissue, with blue (DAPI), red (SpRed) and Green (SpAqua). These two images were chosen to display the abundance of Cyan (green) spots compared to the Red probe.  4.5.2.2  Neighbourhood Homogeneity and Clone Connectivity analyses Next, NH and CC maps were created and are presented in figure 4-21 and as  foretold by the high majority of abnormal cells, the identified clones nearly cover the entire tissue area. The NH maps show the tissue to be quite homogenous and the clone connectivity map essentially connects all the positive cells to each other, without forming individual clones. This is not technically the case in section 1, but the splitting of the  93  clones is seems a result of the lack of cells due to an indentation of the tissue, thus breaking the clonal connectivity into a few smaller clumps. All this suggests that this oral CIS sample is fairly uniform and contains mostly abnormal cells. Since these results could have been generally predicted by simply looking at the percentage of abnormal cells within the scan, a more interesting investigation was carried out using the NH and CC scoring operations were carried out examining for clones of normal cells and the results are displayed in figure 4-22. In both the NH and CC maps, there exist a few extremely small regions of homogenous and connected normal cells, although at most they are only connections of 2 or 3 cells. This further supports that this oral CIS sample is composted almost entirely of uniformly distributed abnormal cells. FIGURE  4-21–  NH  (A,C) and CC (B,D) Score cells  maps with  FISH  A  B C  scoring abnormal  signatures  as  defined  in  the  LAVysion  guidelines.  Row 1 and Row 2  D  correspond to area 1 and area 2 from figure 4-19. Both areas imagined are uniform consisting  regions mostly  abnormal cells.  of The  low NH scores in the centre of A indicate a lack of viable cells there (as also denoted in figure 4-19b which in turn break up the larger clone in to three separate regions as seen in B. A similar phenomenon is displayed in C and D, where cells in the upper half of the tissue were found to have no signal (see figure 4-19d) and thus the clone in D is terminated there. NOTE: Images were masked to zero outside of tissue region to aid in visualization.  94  A  B  C  D  FIGURE 4-22 – NH (A and C) and CC (B and D) Maps of the two areas imaged examining for normal cells. Only minimal normal cells are present as indicated with the near zero NH score for most of the tissues. The CC maps do identify a few clones of cells, but as seen from the scale, clones consist of 2-3 cells.  Lastly, a search for clones of cells with amplifications and/or single deletions in both channels simultaneously was undertaken. Essentially, these cells are a subclass of abnormal cells which must have some signal other then the normal dual spots in both channels and do not reflect any significant phenotype. Examining these NH and CC Maps in figure 4-23, one can see that a few significant clones were identified, usually consisting of 10-20 interconnected cells.  The existence of these cells was confirmed  visually. Since only a few small clusters of these cells were identified suggests that most abnormal cells were the result of double deletion in one channel (usually the red-EGFR probe) as opposed to amplification/single deletion events in both probes simultaneously.  95  A  B C  D  FIGURE 4-23 – NH (A,C) and CC (B,D) Score maps scoring cells with amplifications or single deletion of both signals simultaneously. Row 1 and Row 2 correspond to area 1 and area 2 from figure 1.  In both  rows, there exist regions of higher homogeneity (50% positive) which also correspond to areas of 10-20 neighbours connectivity as indicated by the red circles. These two areas were visually confirmed. NOTE: Images were masked to zero outside of tissue region to aid in visualization.  4.5.3 Conclusions In short, application of the automated FISH workstation to tissues other then lung tissues is confirmed by successfully analyzing excised oral CIS patient tissue. It is obvious from the above results that the FISH staining of tissues from different origins must be carefully optimized to yield quality staining and by extension, quality results.  96  The application of Neighbourhood Homogeneity and Clone Connectivity scores to this tissue was successful and served to show that the majority of the tissue imaged is uniformly and homogenously composed of abnormal cells. This further supported the simple analysis of FISH spot classification, where over 97% of cells were found to be genetically abnormal. Changing the criteria being examined for through NH and CC scores showed no significant regions or clones of normal cells among the excised tissue but did identify a few significant clonal clusters of cells with abnormal staining in both channels, i.e. not just a single channel spots. Although this is just a subclass of abnormal cells that do not reflect a particular phenotype, this illustrates that the criteria used to define cells of interest plays an extremely important role in the quality and type of results observed. The identification of these subclass clones also illustrates the FISH workstation’s ability to uncover related regions as small as 20 neighbours.  97  4.8 FISH Synthesis and Staining Protocol Optimization The overall quality of results generated by the segmentation and spot counting and consequently the Neighbourhood Homogeneity and Clone Connectivity analyses all hinge on consistent quality FISH staining. Poor FISH staining can manifest itself as erroneous results in many different aspects of each analysis.  Consequently, it was  important to optimize the original tissue FISH staining protocol and probe synthesis protocols to yield as best results as possible. The following is a quick synopsis of these two particular aspects of the FISH methodology used in this work.  Most results  described are anecdotal and based on empirical qualitative interpretation of FISH staining quality predominantly on metaphase spreads of human tumour cell. The observations are broken into two sections as follows; probe synthesis and tissue FISH staining. Since the quality of FISH staining is so paramount to downstream results, future work should be aimed at more quantitative assessment of FISH staining optimization.  4.8.1 FISH probe synthesis Although some FISH probes are available from commercial vendors such as Vysis/Abbot Molecular and other smaller companies, more often then not, these probes are very limited in terms of the genetic loci they probe for. For many instances of this thesis, commercial probes were preferred for their quality and consistency, thus eliminating effects from poorly made home-made probes confounding further analysis. That said, the proposed methodology of investigating cell clusters exhibiting significant CGH ratios necessitates the conversion of selected BACs identified from CGH profiles into suitable FISH probes. See sections 2.2.1, 2.4, 3.1.2 and particularly 4.8.1 for a through synopsis of probe synthesis techniques. Originally, probes were synthesized from the frozen library of RP-PCR amplified BAC-DNA that is used as the spotting solution for the production of CGH array slides (see section 2.5.2). This library is very convenient for choosing specific BACs covering the gene loci of interest and converting them into FISH probes. The RP-PCR protocol (Protocol 8: Appendix A) also used by the Wan Lam Laboratory of the Cancer Genetics  98  and Developmental Biology department of the BCCA to label the full genomic DNA for CGH studies with either Cy5 or Cy3, was used to label BACs from the array spotting solution instead of human genomic DNA. Unfortunately, numerous attempts failed to result in consistent or even reproducible FISH probes and marginal success was seen with the SpectrumGreen, SpectrumOrange, SpectrumRed or TexasRed fluorophores. However, the lack of consistency and reproducibility suggested that this technique was not reliable, see table 4-13. Next, nick translation was used to attempt to label the RP-PCR BAC DNA templates with a variety of fluorophores mentioned in table 4-13. Unfortunately, this labeling reaction is limited to longer DNA chains and the numerous short segments of the RP-PCR BAC DNA prevented any perceivable fluorophore incorporation to take place. After these attempts, the use of the RP-PCR BAC DNA library was abandoned and instead, BACs identified for FISH probe synthesis were directly isolated from the E. coli hosts as described in section 3.1.1. This involved growing the selected E. coli host bacteria overnight in individual media and then isolating the BAC DNA from each host. Nick translation of isolated DNA resulted in successful incorporation of most fluorophores investigated, as seen in table 4-13.  Although labeling was mostly  reproducible and gave consistent quality probes, this methodology may not be appropriate for the high-throughput process likely needed to efficiently investigate numerous gene profiles due its time consuming nature. This is further compounded since it is often necessary to choose a few (2-3) neighboring BACs covering a genetic loci of interest to yield sufficient FISH signal strength. Consequently, future work will need to focus on troubleshooting the RP-PCR labeling procedure to take advantage of the RP-PCR BAC DNA library. However, for simple low throughput experiments, nick translation using directly isolated BAC DNA yielded good quality and reproducible FISH probes.  99  TABLE 4-13 – Summary of FISH probe synthesis experiments. As seen, Nick translation on directly isolated BAC DNA resulted in the best and most reproducible FISH probes for most fluorophores. Cy5 and DEAC fluorophores failed to be incorporated into FISH probes. Labeling Reaction  RP-PCR  Fluorophore  Template DNA  Nick Translation  Nick Translation Directly Isolated BAC DNA  RP-PCR BAC DNA Score  Observations  Score  Observations  Score  Observations  dUTPDEAC  0  - no labeling  0  - no labeling  0-1  - no labeling - extremely weak  dUTPSpGreen  2  - very noisy - signal  0  - no labeling  4  - generally very bright signal - often noisy  dUTPSpOrange  3  - good quality - inconsistent  0  - no labeling  5  dUTP-Cy3  2.5  - good - constantly weak  0  - no labeling  5  dUTPSpRed  1  - very weak if present at all  0  - no labeling  4.5  dUTPTxRed  1  - very weak if present at all  0  - no labeling  4  dUTP-Cy5  0  - no labeling  0  - no labeling  0  - no labeling  2-3  - variable labeling - difficult to evaluate b/c highly dependant on antibody detection  dUTP-biotin (indirect labelling)  0  - no labeling  0  - no labeling  - reproducibly great signal - very bright - reproducibly great signal - very bright - reproducibly great signal - very bright - less reproducible - good signal  4.8.2 Tissue FISH staining protocol optimization Whether using commercial or in-house synthesized FISH probes, it is important to have a consistent and effective FISH protocol for staining tissues or cell suspensions. The original tissue FISH protocol as acquired from the Clinical Cytogenetics lab at the BCCA was found to be very effective. However, deviations from the protocol were often needed to improve on probe intensity or reduce excessive background. The following is a short synopsis of the adjustments made to the standard FISH protocol and some other important observations regarding FISH staining. Six elements of the acquired tissue FISH staining protocol were investigated, specifically: hybridization temperature, hybridization duration, hybridization buffer  100  concentration and composition, enzymatic digestion of tissue, post-hybridization washes and finally probe and tissue denaturing. Hybridization temperature and hybridization duration were hypothesized to have similar effects on staining quality. Varying the hybridization temperature from 36oC to 45oC as well as extending the hybridization duration from ~20 hours to ~30 hours was hypothesized to improve the signal to noise ratio of the probes by effectively increasing the chemical kinetics of hybridization. The hybridization extension consistently yielded brighter, crisper and generally better quality FISH spots.  The increased hybridization  temperature did not perceptibly alter FISH results. Next, two different hybridization buffers were tried.104 The compositions of these are described in detail in table 4-14 and results did not vary significantly between them, except the concentrated hybridization buffer dissolved the pelleted FISH probes more readily then did the non-concentrated buffer.  TABLE 4-14 – Compositions of normal strength and concentrated strength hybridization buffers used. The concentrated strength buffer makes resuspension of high-DNA content probes easier. Normal Hybridization Buffer (50ml) 25 ml Formamide 5 ml 20xSSC 5 ml 10xPhosphate Buffer 15 ml H2O 5 g Dextran  Concentrated Hybridization Buffer (50ml) 37.5 ml Formamide 5 ml 20xSSC 5 ml 10xPhosphate Buffer 2.5 ml H2O 7.5 g Dextran  10x Phosphate Buffer = 5 : 1 diabasic : monobasic NaPO4 (19.78g into 100 ml:3.07g into 40ml for 240 ml total)  A few deviations from the original second-day wash were also investigated with the aim to reduce unspecific signal and improve staining quality, see table 4-15. Of these the intermediate wash was observed to provide reproducibly better quality results, with the least amount of background signal. The denatureing step of the original FISH protocol was also optimized. Commonly, probes and tissue are denatured separately (each heated to 73-80oC for 5 min) or by co-denaturing (at ~80oC for 10 min). Although no differences were noticed, co-denaturing of the tissue and probes simultaneously was found to be more convenient,  101  since it does not subject the stock probe mix to repeated freeze-thaw-denature cycles, which shorten the lifetime of fluorescence probes.  TABLE 4-15 – Three post-hybridization washes evaluated. The intermediate wash seemed to provide best results in terms of FISH signal and lowest amount of nonspecific staining. Wash Type Short/Original Wash Intermediate Wash  Extensive Wash  Steps 1  2xSSC/0.3%NP-40 for 2 min @ 73oC  1  0.4xSSC/0.3%NP-40 for 2 min @ 73oC  2  2xSSC/0.3%NP-40 for 1 min @ RT  1  4xSSC/0.2% Tween-20 for 3x5 min @ 42oC  2  1xSSC 3x5 min @ 60oC  3  4xSSC/0.2% Tween-20 for 3 min @ 42oC  Finally, probably the most important aspects of the entire tissue FISH protocol is the digestion of the target tissues. The choice of digestion enzymes, duration, etc, must be determined for each individual tissue type, fixation method and even thickness. Although this step is widely accepted as most crucial to proper FISH staining, it was found that there is a general lack of guidelines. Thus, some work was done to create a set of procedures to assist in tissue digestion. Firstly, two digestion enzymes were used for enzymatic tissue digestion: pepsin and protease K. Through largely empirical trails, it was found that the pepsin digestion produced more desirable results when digesting the xenograft tumour tissues and human resected squamous cell carcinoma tissues.  The  Protease K digestion was seen to cause too much damage to tissues, often to the point of detachment from the glass slides. This may be offset by lowering its concentration from 25ng/ml in 2XSSC. At the concentration of pepsin solution noted in Protocol 3 (Appendix A), average digestion times ranged from 5 – 20 minutes for adequate digestion. It was observed that large tissue sections made mostly of tumour cells, particularly the xenograft tumours, required about 10 minutes of digestion whereas even small normal biopsy tissues were often under-digested at this time point, probably due to the increased amount of connective tissues and the associated general increase in organization and compactness.  102  This emphasizes the need for a way to evaluate the digestion for each tissue type. This can be achieved by using simply a series of experiments at different time points of digestion and evaluating which results in best signal, lowest auto-fluoresce and lowest unspecific staining. However, this is a wasteful approach in terms of probes, tissues and time. Thus, the following techniques were tried. One way to do this is through the staining a tissue section with propidium iodide directly following digestion.  Propidium iodide (PI) is a fluorescent molecule that is  often used to stain DNA. PI is also membrane impermeant and generally excluded from viable cells, thus a good measure of cell membrane intactness. Consequently, one can gauge the extent of digestion by viewing PI stained tissues under florescence and establish whether the stain is reacting with the nuclei and the extent of any background or unspecific binding. Although this is a good approach for evaluating the extent of tissue digestion it required extra time for both the staining and imaging, the sacrifice of a section of tissue and it is quite a hazardous chemical to handle. Thus, a third methodology to assess tissue digestion was to simply view the tissues being digested under the phase contrast microscope.  This is the method of  digestion optimization used at the BCCA Cytogenetics Lab and although quick and simple, it requires a level of experience to differentiate properly treated tissue from under- or over- digested samples. This technique was demonstrated by Shahira Clemens at the BCCA Cytogenetics Lab and she kindly provided the following guidelines to determine optimal tissue digestion.  Specifically, properly digested tissues should have  evident nuclei which often look slightly raised above the surrounding material and should show a dark ring around their outer edges. The surrounding intra and extra cellular space should become lighter and connective tissues should be slightly degraded. Finally, the morphology of the tissue should not change too much and over-digestion is evidenced by large areas of tissue or large clusters of cells breaking up and floating away. Please refer to image 4-24 for an illustration of proper vs. under-digested cells. It was this technique that was utilized for the optimization of tissue digestion forthwith.  103  Chapter 5 – Discussion and Conclusions 5. 1 General Conclusions The methods presented in this study are aimed at identification of clonally related cells within tissue samples. Clonal relatedness is based on specific FISH spot signal trends determined a priori through analysis of array CGH results. This study successfully proposed a framework for the complete experimental procedures, ranging from FISH probe synthesis and FISH staining optimization through to specific software development for quantification of clonal relationships and homogeneity measures, likely involved with future clonal analyses.  The motivation for this work stemmed from identification of  genetic loci related to characteristics of cancers which have a significant impact on cancer management. Such is the case for genetic markers identified in non-small cell lung cancer cases and theorized to confer resistance to cis-platinum/vinorelbine doublet chemotherapy treatments.30-33 The need for clonal identification stems for the high degree of tumour heterogeneity ubiquitously present in progressing tumours34 and thus, the potential to identify significant cell populations with important genetic characteristics within the genetically heterogeneous tumours could play an important role in clinical management. Furthermore, tumour heterogeneity can also mask the presence of dangerous clones within CGH data and the proposed FISH system can uncover these masked clones in suspected cases. Furthermore, interpretation of CGH results often needs secondary validation techniques to confirm actual presence of the CGH trends identified. The proposed FISH scanning system could perform these validations in a high-throughput manner as to match the data generation rates of current aCGH systems. The results of the experiments described in the previous sections are only preliminary but quite encouraging and major conclusions of each are summarized briefly below. The use of the ‘enhanced edge-detection’ segmentation previously developed for absorbance-based image analysis is applicable to fluorescence images generated by the Metafer-4 scanning system assuming some adjustments are made beforehand. The  104  segmentation algorithm was evaluated to be accurate at a level of about 90% under good staining conditions.  However, major errors and inaccuracies were observed when  staining was poor or in high nuclei-density areas of tissues. However, these drawbacks may not be completely surmountable as such conditions also prove difficult for human observers to interpret. The specific implementation of the spot counting algorithm was also found to be quite accurate when evaluated by a human observer. The use of the top-hat transform provided the process with significant robustness, where spot counting could be performed successfully even in relatively poor staining conditions. As stated, the accuracy and consequent reliability of the aforementioned segmentation and spot enumeration process is highly dependant on FISH staining conditions.  Thus, guidelines were established and optimized for FISH staining  procedures and FISH probe synthesis for use on Lung tissues. The guidelines resulted in fairly reproducible staining that can be used for quantification although inherent variability of the FISH technique is mostly unavoidable55 but has been somewhat minimized. The use of a xenograft squamous cell lung carcinoma provided us with a great model to evaluate FISH spot classification as well as clone identification algorithms: the Neighbourhood Homogeneity Score and Clone Connectivity Score. Firstly however, a simple comparison was made between the automated and human-interpreter classification of cells. The classification was based on the presence or absence of FISH spots which indicated human-tumour or mouse cells respectively, and resulted in a sensitivity of 84% and specificity 81%. Once the two clone identification scores were applied to the data it was encouraging that both scores agreed quite well with visually delineated mouse cell clones as determined by the human-interpreter. Furthermore, comparison of the two clone identification score results between the human and the system classified data sets showed the system to be comparable to a human observer, with some systematic errors as a result of improper spot counting or segmentation parameters. This agreement was largest when comparing clones which consisted of 15-20 neighbours. Next, a complete proof-of-concept was performed to establish the process likely to be taken in future studies. Specifically, significant genetic loci were selected from a  105  patients CGH profile based on high level amplifications. These regions were turned into useable FISH probes and the patient tissue was scanned. Results showed the CGH ratio of selected regions was successfully reproduced by FISH spot counting and illustrated ability to find regions of potential interest by analyzing the spatial distribution of certain FISH spot signatures. Lastly, by performing a similar experiment on oral carcinoma in-situ tissues we showed the applicability of the system to different tissue types and showed the potential use as a triage technique able to identify regions with valuable genotypes for further inspection.  5. 2 Essential Future Work Although these successful pilot results suggest this technique holds much potential, however some more work is crucial to improving reliability and precision. Firstly, although large efforts were made to optimize and establish reliable guidelines for high quality FISH protocols, a few troublesome steps remain. FISH probe production was only achieved reproducibly with a limited number of fluorophores and only from directly isolated BAC DNA, which is an inherently time consuming step. This will not be sufficient to perform the high-throughput custom experiments planned. However, by using more fluorophores and utilizing the DOP-PCR product library as the template DNA, the high-throughput manner needed can be realized. Furthermore, more guidelines experience and expertise are needed to determine optimal digestion of tissues subject to FISH staining. Likewise, the use of pre-staining light irradiation was briefly explored in order to reduce tissue autofluorescence but the results were not significantly better. However, since autofluorescence is such a large issue for imaging, further use of this technique with more spectrally broad light sources should be explored. Further work also needs to be focused on software algorithms described in this study. The subversion of the proprietary Metasystem control is essential to improving and streamlining of the entire scanning process towards a more high-throughput process. Control has essentially been completely transferred to a custom built system and will be put into use shortly. A more fundamental and important aspect that requires further  106  development is the segmentation algorithm for use in native fluorescence images. This is a major bottleneck in terms of the desired streamlining of the process and is also vital in terms of the analysis. Thus, adjustments to the segmentation algorithm should focus on its usability in fluorescent images, increasing its accuracy in dense tissues, more automated for local threshold selections and streamlining for speed and flexibility. Likewise, this work on is already in progress. Although the spot counting algorithm was evaluated to be very reliable and robust, it is still fairly simple and further work could be focused on improving its reliability by performing feature checks, such as examining the intensity, area, etc of each spot and then determining if it is truly a FISH spot. Lastly, it may be beneficial for both the segmentation and spot counting functions to be implemented on focal stacks as a whole, rather then analyzing extended depth-of-focus images. This would preserve more 3D information and potentially allow for imaging of thicker tissue sections. Finally, all the software developed for this project should be recoded in a more efficient manner and a more efficient programming language. This would increase productivity, speed and avoid some of the shortcuts used. For example, the architecture scores could be implemented to their full potential, i.e. using Voronoi shapes and unlimited connectivity searches. Lastly, several experiments should be performed prior to routine use of the system, particularly more control experiments to further characterize the system’s behavior.  Although this study provided evidence that the FISH workstation can  successfully quantify and validate amplification (proof-of-concept experiment) and double deletions (mouse xenograft experiment) profiles, an examination of its performance on single deletions and negative control experiment would be beneficial and are needed.  Furthermore, mock signal experiments, which were attempted without  success, need to be repeated. Briefly, the mock signal experiments were performed on normal male lung samples and stained with centromere probes as follows to mimic different known spot numbers: chX (singe deletion), ch7 (normal 2 spots), chX + ch7 (single amplification), ch7 + ch8 (double amplification), etc. This experimental design will provide controlled models for each of the possible FISH spot cases as well as determine the true extent of nuclear truncation artifacts.  107  5.3 Future Work Towards the Ultimate Goals for the System This automated FISH tissue-scanning workstation can be used to potentially identify tumour cell sub-populations within highly heterogeneous tumour samples. Particularly, it could be used to identify cells with an identified resistance-conferring gene profile recently elucidated.30-33  A pool of samples with known outcomes will be  scanned for presence of significant clones with these gene profiles. The hope is to identify these clones in samples that failed to show said gene profiles through CGH due to genetic heterogeneity but did show resistance to the chemotherapy agent in questions. In such a manner, the FISH scanning system could come to play a cooperative role with CGH for personalized medicine and tumour management, by identification of potentially relevant clones within patient biopsies or excised tumour samples. However, these clones need not be limited to chemotherapy resistance but markers for progression, sensitivities to various therapies and countless others could be used to vastly improve tumour management and treatment outcomes. Likewise, this analysis is not limited to FISH spot enumeration and can include immunohistochemistry markers via additional image analysis and Voronoi polygons to estimate for cytoplasmic marker content. Finally, the workstation can also be used for simpler automated FISH enumeration diagnostics currently in use, such as her-2/nue amplifications for breast cancer diagnosis. 29  108  Bibliography [1] Nowell,P. C. "The clonal evolution of tumor cell populations", Science, 194(4260), 23-28, (1976) [2] Yamazaki,K., Abe,S., Takekawa,H., et al. "Tumor angiogenesis in human lung adenocarcinoma", Cancer, 74(8), 2245-2250, (1994) [3] Fontanini,G., Bigini,D., Vignati,S., et al. "Microvessel count predicts metastatic disease and survival in non-small cell lung cancer", J.Pathol., 177(1), 57-63, (1995) [4] Giaretti,W. "A model of DNA aneuploidization and evolution in colorectal cancer", Lab.Invest., 71(6), 904-910, (1994) [5] Fearon,E. R.,Vogelstein,B. "A genetic model for colorectal tumorigenesis", Cell, 61(5), 759-767, (1990) [6] Visakorpi,T., Hyytinen,E., Koivisto,P., et al. "In vivo amplification of the androgen receptor gene and progression of human prostate cancer", Nat.Genet., 9(4), 401-406, (1995) [7] Kovacs,G. "Molecular cytogenetics of renal cell tumors", Adv.Cancer Res., 62, 89124, (1993) [8] Califano,J., van der Riet,P., Westra,W., et al. "Genetic progression model for head and neck cancer: implications for field cancerization", Cancer Res., 56(11), 24882492, (1996) [9] Collins,V. P.,James,C. D. "Gene and chromosomal alterations associated with the development of human gliomas", FASEB J., 7(10), 926-930, (1993) [10] Louis,D. N.,Gusella,J. F. "A tiger behind many doors: multiple genetic pathways to malignant glioma", Trends Genet., 11(10), 412-415, (1995) [11] Chung,G. T., Sundaresan,V., Hasleton,P., et al. "Clonal evolution of lung tumors", Cancer Res., 56(7), 1609-1614, (1996) [12] Jin,C., Jin,Y., Wennerberg,J., et al. "Karyotypic heterogeneity and clonal evolution in squamous cell carcinomas of the head and neck", Cancer Genet.Cytogenet., 132(2), 85-96, (2002) [13] Garcia,S. B., Novelli,M. and Wright,N. A. "The clonal origin and clonal evolution of epithelial tumours", Int.J.Exp.Pathol., 81(2), 89-116, (2000)  109  [14] Fulci,G., Ishii,N., Maurici,D., et al. "Initiation of human astrocytoma by clonal evolution of cells with progressive loss of p53 functions in a patient with a 283H TP53 germ-line mutation: evidence for a precursor lesion", Cancer Res., 62(10), 2897-2905, (2002) [15] Kattar,M. M., Kupsky,W. J., Shimoyama,R. K., et al. "Clonal analysis of gliomas", Hum.Pathol., 28(10), 1166-1179, (1997) [16] Coons,S. W.,Johnson,P. C. "Regional heterogeneity in the DNA content of human gliomas", Cancer, 72(10), 3052-3060, (1993) [17] Sayagues,J. M., Tabernero,M. D., Maillo,A., et al. "Intratumoral Patterns of Clonal Evolution in Meningiomas as Defined by Multicolor Interphase Fluorescence in Situ Hybridization (FISH) Is There a Relationship between Histopathologically Benign and Atypical/Anaplastic Lesions?", J Mol Diagn, 6(4), 316-325, (2004) [18] Seve,P.,Dumontet,C. "Chemoresistance in non-small Curr.Med.Chem.Anticancer Agents, 5(1), 73-88, (2005)  cell  lung  cancer",  [19] Jemal,A., Siegel,R., Ward,E., et al. "Cancer statistics, 2006", CA Cancer.J.Clin., 56(2), 106-130, (2006) [20] Axelrod,R., Axelrod,D. E. and Pienta,K. J. "From the Cover: Evolution of cooperation among tumor cells", Proceedings of the National Academy of Sciences, 103(36), 13474, (2006) [21] Spira,A.,Ettinger,D. S. "Multidisciplinary N.Engl.J.Med., 350(4), 379-392, (2004)  management  of  lung  cancer",  [22] Buys,T. P., Chari,R., Lee,E. H., et al. "Genetic changes in the evolution of multidrug resistance for cultured human ovarian cancer cells.", Gene Chromosome Canc, 46(12), 1069-1079, (2007) [23] Struski,S., Doco-Fenzy,M., Trussardi,A., et al. "Identification of chromosomal loci associated with non-P-glycoprotein-mediated multidrug resistance to topoisomerase II inhibitor in lung adenocarcinoma cell line by comparative genomic hybridization", Genes Chromosomes Cancer, 30(2), 136-142, (2001) [24] Coe,B. P., Ylstra,B., Carvalho,B., et al. "Resolving the resolution of array CGH", Genomics, 89(5), 647-653, (2007) [25] Morikawa,K.,Fidler,I. J. "Heterogeneous response of human colon cancer cells to the cytostatic and cytotoxic effects of recombinant human cytokines: interferon-alpha, interferon-gamma, tumor necrosis factor, and interleukin-1", J.Biol.Response Mod., 8(2), 206-218, (1989)  110  [26] Freedland,S. J., Pantuck,A. J., Paik,S. H., et al. "Heterogeneity of molecular targets on clonal cancer lines derived from a novel hormone-refractory prostate cancer tumor system", Prostate, 55(4), 299-307, (2003) [27] Kallioniemi,O. P., Kallioniemi,A., Kurisu,W., et al. "ERBB2 amplification in breast cancer analyzed by fluorescence in situ hybridization", Proc Natl Acad Sci US A, 89(12), 5321-5325, (1992) [28] Shapiro,D. N., Valentine,M. B., Rowe,S. T., et al. "Detection of N-myc gene amplification by fluorescence in situ hybridization. Diagnostic utility for neuroblastoma", Am.J.Pathol., 142(5), 1339-1346, (1993) [29] Raimondo,F., Gavrielides,M. A., Karayannopoulou,G., et al. "Automated evaluation of Her-2/neu status in breast tissue from fluorescent in situ hybridization images", IEEE Trans.Image Process., 14(9), 1288-1299, (2005) [30] Buys,T., Chae,J. M., Chari,R., et al. "Genome features predict response to chemotherapy for non-small cell lung cancer.", J Thorac Oncol, 2(8), S437, (2007) [31] Buys,T., Wang,Y. and Yee,J. "Identification of genomic changes associated with resistance to standard chemotherapy in a xenograft model of human NSCLC", Lung Cancer, 49(Suppl 2), S118, (2005) [32] Buys,T., Zhang,M., Galac,L., et al. "Uncovering genomic alterations associated with drug resistance in a xenograft model of human lung cancer", (2006) [33] Buys,T. P. H., Galac,L. L., Yee,J., et al. "Genome-wide breakpoint identification delineates clonal origin for multiple primary lung tumors obtained from the same patient", AACRMTG, 1(1), 614-614, (2006) [34] Croce,C. M. "Oncogenes and Cancer", N.Engl.J.Med., 358(5), 502, (2008) [35] Osborne,C., Wilson,P. and Tripathy,D. "Oncogenes and Tumor Suppressor Genes in Breast Cancer: Potential Diagnostic and Therapeutic Applications", Oncologist, 9(4), 361, (2004) [36] Breuer,R. H., Postmus,P. E. and Smit,E. F. "Molecular pathology of non-small-cell lung cancer", Respiration, 72(3), 313-330, (2005) [37] Aaronson,S. A. "Oncogenes and Cancer", (1987) [38] Knudson,A. G.,Jr. "The ninth Gordon Hamilton-Fairley memorial lecture. Hereditary cancers: clues to mechanisms of carcinogenesis", Br.J.Cancer, 59(5), 661-666, (1989) [39] Jones,P. A.,Baylin,S. B. "The fundamental role of epigenetic events in cancer", Nat.Rev.Genet., 3(6), 415-428, (2002)  111  [40] Egger,G., Liang,G., Aparicio,A., et al. "Epigenetics in human disease and prospects for epigenetic therapy", Nature, 429(6990), 457-463, (2004) [41] Shih,I. M., Zhou,W., Goodman,S. N., et al. "Evidence that genetic instability occurs at an early stage of colorectal tumorigenesis", Cancer Res., 61(3), 818-822, (2001) [42] Wang,V. W., Bell,D. A., Berkowitz,R. S., et al. "Whole genome amplification and high-throughput allelotyping identified five distinct deletion regions on chromosomes 5 and 6 in microdissected early-stage ovarian tumors", Cancer Res., 61(10), 4169-4174, (2001) [43] Osada,H.,Takahashi,T. "Genetic alterations of multiple tumor suppressors and oncogenes in the carcinogenesis and progression of lung cancer", Oncogene, 21(48), 7421-7434, (2002) [44] Swiger,R. R.,Tucker,J. D. "Fluorescence in situ hybridization: a brief review", Environ.Mol.Mutagen., 27(4), 245-254, (1996) [45] Henegariu,O., Bray-Ward,P. and Ward,D. C. "Custom fluorescent-nucleotide synthesis as an alternative method for nucleic acid labeling", Nat.Biotechnol., 18(3), 345-348, (2000) [46] Darnell,J. E., Lodish,H. F., Baltimore,D., et al. "Molecular cell biology", (1997) [47] Wilkinson,D. G. "In Situ Hybridization: A Practical Approach", (1998) [48] Kobayashi,K., Kitayama,Y., Igarashi,H., et al. "Intratumor Heterogeneity of Centromere Numerical Abnormality in Multiple Primary Gastric Cancers: Application of Fluorescence in situ Hybridization with Intermittent Microwave Irradiation on Paraffin-embedded Tissue", Cancer Science, 91(11), 1134-1141, (2000) [49] Morrison,C. "Fluorescent in situ hybridization and array comparative genomic hybridization: complementary techniques for genomic evaluation", Arch.Pathol.Lab.Med., 130(7), 967-974, (2006) [50] Florijn,R. J., Slats,J., Tanke,H. J., et al. "Analysis of antifading reagents for fluorescence microscopy", Cytometry, 19(2), 177-182, (1995) [51] Levsky,J. M.,Singer,R. H. "Fluorescence in situ hybridization: past, present and future", J.Cell.Sci., 116(Pt 14), 2833-2838, (2003) [52] Aubele,M., Zitzelsberger,H., Szucs,S., et al. "Comparative FISH analysis of numerical chromosome 7 abnormalities in 5-micron and 15-micron paraffinembedded tissue sections from prostatic carcinoma", Histochem.Cell Biol., 107(2), 121-126, (1997)  112  [53] Muller,S., Neusser,M. and Wienberg,J. "Towards unlimited colors for fluorescence in-situ hybridization (FISH)", Chromosome Res., 10(3), 223-232, (2002) [54] Epstein,L., DeVries,S. and Waldman,F. M. "Reutilization of previously hybridized slides for fluorescence in situ hybridization", Cytometry, 21(4), 378-381, (1995) [55] Nederlof,P. M., van der Flier,S., Raap,A. K., et al. "Quantification of inter- and intra-nuclear variation of fluorescence in situ hybridization signals", Cytometry, 13(8), 831-838, (1992) [56] Neumann,M.,Gabel,D. "Simple method for reduction of autofluorescence in fluorescence microscopy", J.Histochem.Cytochem., 50(3), 437-439, (2002) [57] O'Connor,M., Peifer,M. and Bender,W. "Construction of large DNA segments in Escherichia coli", Science, 244(4910), 1307-1312, (1989) [58] Shizuya,H., Birren,B., Kim,U. J., et al. "Cloning and Stable Maintenance of 300Kilobase-Pair Fragments of Human DNA in Escherichia coli Using an F-FactorBased Vector", Proc.Natl.Acad.Sci.U.S.A., 89(18), 8794-8797, (1992) [59] Strachan,T.,Read,A. P. "Human Molecular Genetics 2", (1999) [60] Feinberg,A. P.,Vogelstein,B. "A technique for radiolabeling DNA restriction fragments to high specific activity", Anal.Biochem., 132(1), 6-13, (1983) [61] Kallioniemi,A., Kallioniemi,O. P., Sudar,D., et al. "Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors", Science, 258(5083), 818-821, (1992) [62] Hughes,S., Beheshti,B., Marrano,P., et al. "Comparative Genomic Hybridisation Analysis Using Metaphase or Microarray Slides", Immunohistochemistry and In Situ Hybridization of Human Carcinomas, 2008(4/2) [63] Ishkanian,A. S., Malloff,C. A., Watson,S. K., et al. "A tiling resolution DNA microarray with complete coverage of the human genome", Nat.Genet., 36(3), 299303, (2004) [64] Schwaenen,C., Nessling,M., Wessendorf,S., et al. "Automated array-based genomic profiling in chronic lymphocytic leukemia: Development of a clinical tool and discovery of recurrent genomic alterations", Proceedings of the National Academy of Sciences, 101(4), 1039, (2004) [65] Zhou,Z., Pons,M. N., Raskin,L., et al. "Automated image analysis for quantitative fluorescence in situ hybridization with environmental samples", Appl.Environ.Microbiol., 73(9), 2956-2962, (2007) [66] Pham,D. L., Xu,C. and Prince,J. L. "Current methods in medical image segmentation", Annu.Rev.Biomed.Eng., 2(1), 315–337, (2000)  113  [67] Hohne,K. H.,Hanson,W. A. "Interactive 3D segmentation of MRI and CT volumes using morphological operations", J.Comput.Assist.Tomogr., 16(2), 285-294, (1992) [68] Clarke,L. P., Velthuizen,R. P., Camacho,M. A., et al. "MRI segmentation: methods and applications", Magn.Reson.Imaging, 13(3), 343-368, (1995) [69] Lee,S. U., Chung,S. Y. and Park,R. H. "Performance study of several global thresholding techniques for segmentation", Computer Vision, Graphics, and Image Processing, 52(2), 171-190, (1990) [70] Torre,V.,Poggio,T. "On edge detection", IEEE Trans.Pattern Anal.Mach.Intell., 8(2), 147-163, (1986) [71] Lindeberg,T. "Edge detection and ridge detection with automatic scale selection", International Journal of Computer Vision, 30(2), 117-154, (1998) [72] Marr,D.,Hildreth,E. "Theory of Edge Detection", Proceedings of the Royal Society of London.Series B, Biological Sciences, 207(1167), 187-217, (1980) [73] Hill,P. R., Canagarajah,C. and Bull,D. R. "Texture gradient based watershed segmentation", Acoustics, Speech, and Signal Processing, 2002.Proceedings.(ICASSP'02).IEEE International Conference on, 4, (2002) [74] Huang,Y. L.,Chen,D. R. "Watershed segmentation for breast tumor in 2-D sonography", Ultrasound Med.Biol., 30(5), 625-632, (2004) [75] Beucher,S. "The watershed transformation applied to image segmentation", Scanning Microscopy International, 6, 299–314, (1992) [76] Meyer,F. "Iterative image transformations for an automatic screening of cervical smears", J.Histochem.Cytochem., 27(1), 128-135, (1979) [77] Netten,H., Young,I. T., van Vliet,L. J., et al. "FISH and chips: automation of fluorescent dot counting in interphase cell nuclei", Cytometry, 28(1), 1-10, (1997) [78] Navulur,K. "Multispectral Image Analysis Using the Object-Oriented Paradigm", (2006) [79] Ortiz de Solorzano,C., Santos,A., Vallcorba,I., et al. "Automated FISH spot counting in interphase nuclei: statistical validation and data correction", Cytometry, 31(2), 9399, (1998) [80] Kozubek,M., Kozubek,S., Lukasova,E., et al. "High-resolution cytometry of FISH dots in interphase cell nuclei", Cytometry, 36(4), 279-293, (1999) [81] Kozubek,M., Kozubek,S., Lukasova,E., et al. "Combined confocal and wide-field high-resolution cytometry of fluorescent in situ hybridization-stained cells", Cytometry, 45(1), 1-12, (2001)  114  [82] Kozubek,M., Matula,P., Matula,P., et al. "Automated acquisition and processing of multidimensional image data in confocal in vivo microscopy", Microsc.Res.Tech., 64(2), 164-175, (2004) [83] Lerner,B., Clocksin,W. F., Dhanjal,S., et al. "Automatic signal classification in fluorescence in situ hybridization images", Cytometry, 43(2), 87-93, (2001) [84] Lerner,B. "Bayesian fluorescence in situ hybridisation signal classification", Artif.Intell.Med., 30(3), 301-316, (2004) [85] Lerner,B., Koushnir,L. and Yeshaya,J. "Segmentation and classification of dot and non-dot-like fluorescence in situ hybridization signals for automated detection of cytogenetic abnormalities", IEEE Trans.Inf.Technol.Biomed., 11(4), 443-449, (2007) [86] Chawla,M. K., Lin,G., Olson,K., et al. "3D-catFISH: a system for automated quantitative three-dimensional compartmental analysis of temporal gene transcription activity imaged by fluorescence in situ hybridization", J.Neurosci.Methods, 139(1), 13-24, (2004) [87] Walter,J., Joffe,B., Bolzer,A., et al. "Towards many colors in FISH on 3D-preserved interphase nuclei", Cytogenet.Genome Res., 114(3-4), 367-378, (2006) [88] Meineke,F. A., Potten,C. S. and Loeffler,M. "Cell migration and organization in the intestinal crypt using a lattice-free model", Cell Prolif., 34(4), 253-266, (2001) [89] Okabe,A., Boots,B. and Sugihara,K. "Spatial tessellations: concepts and applications of Voronoi diagrams", (1992) [90] Schaller,G.,Meyer-Hermann,M. "Kinetic and dynamic Delaunay tetrahedralizations in three dimensions", Comput.Phys.Commun., 162(1), 9-23, (2004) [91] Schaller,G.,Meyer-Hermann,M. "Multicellular tumor spheroid in an off-lattice Voronoi-Delaunay cell model", Physical Review E, 71(5), 51910, (2005) [92] Lee,D. T.,Schachter,B. J. "Two algorithms for constructing a Delaunay triangulation", International Journal of Parallel Programming, 9(3), 219-242, (1980) [93] Tanemura,M., Ogawa,T. and Ogita,N. "A New Algorithm for Three-Dimensional Voronoi Tessellation", Journal of Computational Physics, 51, 191, (1983) [94] MacAulay,C.,Palcic,B. "An edge relocation Anal.Quant.Cytol.Histol., 12(3), 165-171, (1990)  segmentation  algorithm",  [95] Kamalov,R., Guillaud,M., Haskins,D., et al. "A Java application for tissue section image analysis", Comput.Methods Programs Biomed., 77(2), 99-113, (2005) [96] Kovesi,P. "Non-maximal Suppression Maxima Detection", (2005)  115  [97] Choi,S. S., Park,S. H., Kim,U. J., et al. "Bfl-1, a Bcl-2-related gene, is the human homolog of the murine A1, and maps to chromosome 15q24.3", Mamm.Genome, 8(10), 781-782, (1997) [98] Katona,R., Szeles,A. and Hadlaczky,G. "Mouse euchromatin specific "genomepainting" with a LINE probe: a rapid method for identification and mapping of human chromosomes in mouse-human microcell hybrids by two-color FISH", Hereditas, 124(2), 131-135, (1996) [99] Gianfrancesco,F., Sanges,R., Esposito,T., et al. "Differential divergence of three human pseudoautosomal genes and their mouse homologs: implications for sex chromosome evolution", Genome Res., 11(12), 2095-2100, (2001) [100] Jong,K., Marchiori,E., Meijer,G., et al. "Breakpoint identification and smoothing of array comparative genomic hybridization data", Bioinformatics, 20(18), 36363637, (2004) [101] Wong,K. K., deLeeuw,R. J., Dosanjh,N. S., et al. "A comprehensive analysis of common copy-number variations in the human genome", Am.J.Hum.Genet., 80(1), 91-104, (2007) [102] Bartlett,J. M. S., Watters,A. D., Ballantyne,S. A., et al. "Is Chromosome 9 Loss a Marker of Disease Recurrence in Transitional Cell Carcinoma of the Urinary Bladder?", J.Urol., 161(2), 716-716, (1999) [103] Watters,A. D., Ballantyne,S. A., Going,J. J., et al. "Aneusomy of chromosomes 7 and 17 predicts the recurrence of transitional cell carcinoma of the urinary bladder", BJU Int., 85(1), 42-47, (2000) [104] Henegariu,O. "Tavi's Multicolor hybridization)"2008, (2001)  FISH  Page  (fluorescence  in  situ  116  Appendices Appendix A: Fluorescent in-situ Hybridization Protocols Protocol 1: Complete BAC DNA isolation protocol Source: Adapted from Nucleobond BAC 100 kit protocol (Macherey-Nagel GmbH) with amendments, additions and adjustments by Lindsey Kimm and Ron de Leeuw (BCCRC, Wan Lam Lab)  E. coli Clone Selection from Frozen BAC Library. Day 1 1.  2. 3. Day 2 4.  Select correct plate from -80oC freezer. BAC clone name is actually the plate location and well number of the corresponding E.coli host. i.e. N0668N18 would be in the N18 well on plate # 668. Pick clone from correct well with fresh sterile pipette tip by gently scraping the well. Streak pipette tip onto agar plate made of 2xLB broth with chloramphenicol (12.5 ug/ml), label plates carefully. Incubate plates overnight at 37oC Pick single colony with sterile pipette tip and drop into ~150 mL 2x LB broth with 12.5 ug/ml Chloramphenicol, grow on shaker overnight.  BAC DNA extraction using Nucleobond BAC 100 kit (Macherey-Nagel GmbH) Day 3 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.  Transfer volume of grown culture into 50mL falcon tubes. Spin down cultures in 50mL falcon tubes @ 3600 rpm, 4oC for 20 min Remove effluent, add 6mL S1 + RNase (premixed and 4oC storage) to each pellet and vortex Add 6mL S2, mix by inverting 6-8 times and leave @ room temp for 2-3 minutes (no more then 5 min) Note: Time only the first one and do other with same speed while first one is running. Add 6mL of ice cold S3 mix by inverting 6-8 times and put on ice for 5 min. Spin down for 15minutes @ 3600 rpm and 4oC Filter into 1 (or more depending on volume) 50mL falcon tubes using filter paper Prep nucleobond column with 6mL N2 into new 50ml Falcon tube and dump off N2. Load filtered lysate into column (twice) Wash column 3 times with 12ml N3 (put N5 into 65oC hot bath) Elute DNA into fresh 50ml Flacon tube with 14 ml of prewarmed N5 Add 10ml isopopanol and split into 2 15 ml Falcon tubes Centrifuge @ 9000 RCF for 30 min at 4oC Place directly on ice and remove isopropanol by dumping into new tube. Note: pellets will be very small so store isopropanol in a new tube incase pellets get lost. Add 1mL ice cold 70% ethanol and remove each pellet into separate 1.7ml tube Spin @ max speed in cold room for 10 minutes. Do not let centrifuge stop or warm up because pellets will dissolve. Pipette out ethanol and let rest of pellet dry by placing tubes on hotplate momentarily Resuspend in 20ul of dH2O  117  23. 24. Day 4 25. 26.  Place in 65oC for 10 minutes Incubate at 37oC overnight Remove from incubator, vortex and spin down Check concentration of DNA with ND-1000 spectrophotometer (NanoDrop Technologies Inc., DE, USA) a. Clean with water. b. Tare spectrometer with suspension media (MilliQ water) c. Load 1.5 µl of DNA sample - 260/280nm ratio ~1.8 indicates pure DNA. (if >2= RNA, <2= Proteins) - ~700ng.µl is good concentration  Verifying BAC clone via Hind III digestion and clone database comparison 27. 28. 29. 30. 31. 32. 33. 34. 35. 36.  Add 7.5 µl dH2O to 1.7 ml tube. Add 1 µl DNA (@ ~600ng/µl) Add 1ul 10x React2 Buffer (freezer 9) 0.5 Hind III enzyme (freezer 9) Keep enzyme on ice prior to use With 10 µl total volume, shake slightly, spin down momentarily and incubate for 2 hours at 37oC Do not vortex enzymes but agitate slightly every hour Check digestion on 0.7% agarose gel, take image. Access Clone database by opening X-Win32 Load FPC file from /fpcdb/ Load humanmap_020517/fpc/master/master_HumanMap.fpc Search for clone names selected and compare their HIND III digestion fingerprint. See figure 3-1 for example.  Protocol 2: DNA probe labeling via Nick Translation Source: Abbott Laboratories. Abbott Park, Illinois, U.S.A, formerly Vysis Nick Translation Protocol for directly isolated BAC DNA 1. 2.  Place a microcentrifuge tube on ice and allow the tube to cool. Add these components to the tube in the order listed. Briefly centrifuge and vortex the tube before adding the enzyme (last component):  3. 4. 5. 6. 7.  Briefly centrifuge and vortex the tube. Incubate 8 - 16 hours at 15°C. Stop the reaction by heating in a 70°C water bath for 10 minutes. Chill on ice. Determining the probe size on 0.7% agarose gel as seen in figure 3-1 and Protocol 5 (below) Note: Optimum size fragments for tissue FISH should average about 200 bp. To produce smaller probe fragments use the following conditions that are listed in order of decreasing fragment size: 5 µl enzyme mix/8 hour incubation, 5 µl enzyme mix/16 hour incubation, 10 µl enzyme mix/8 hour incubation and 10 µl enzyme mix/16 hour incubation. Adjust the amount of nuclease-free water added to keep the total reaction volume at 50 µl.  118  8. 9. 10. 11. 12. 13.  Pipet 5 µl (~100 ng of probe) of the nick translation reaction mixture into a microcentrifuge tube. Add 1 ug COT-1 DNA, 2 ug human placental DNA (Cot-1 DNA) and 4 µl purified water to the tube. Add 1.2 µl (0.1 volume) 3 M sodium acetate, then add 30 µl (2.5 volumes) of 100% EtOH to precipitate the DNA. Vortex briefly and place on dry ice for 15 minutes. Centrifuge at 12,000 rpm for 30 minutes at 4°C to pellet the DNA. Remove the supernatant and dry the pellet for 10 - 15 minutes under a vacuum at ambient temperature. Resuspend the pellet in 3 µl purified water and 7 µl Hybridization Buffer.  Protocol 3: Optimized FISH staining protocol for formalin fixed, paraffin embedded lung tissues Source: Based on original protocol acquired from the BCCA Cytogenetics Lab, with amendments determined by this study. The original protocol is adapted from the PathVysion FISH probe protocol (Abbott Laboratories. Abbott Park, Illinois, U.S.A, formerly Vysis)  Lung Tissue (Formalin Fixed, Paraffin Embedded) FISH Staining Protocol Important Reagents: 50 mL of Pepsin Solution (75000 units): - 2.5mL of 0.2N HCL + 0.5 ml of Pepsin Aliquot (1mg/ml) + 47mL dH2O Prior to Day 1: 1. Incubate sections overnight at 56oC (minimum 4hrs.) – Only necessary if tissues are fresh, less then 2-3 day old. 2. If applicable, select area of interest with diamond tipped pen on reverse of slide. Day 1: 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Day 2: 14. 15.  De-wax paraffin section with fresh Xylene (3x), 10 minutes each under fume hood. Allow air dry before next step. Re-hydrate section in 100% EtOH for 5 minutes. Allow slide to air dry. Immerse slide in 0.2 HCL for 20 min @ room temperature. Rinse slide in dH2O for 10 min @ room temperature Rinse slide in 2xSSC for 3 min @ room temperature Incubate slide in 1M NaSCN for 30 min @ 80oC (set water bath to ~92oC) Rinse slide in dH2O for 3min @ room temperature Incubate slide in Pepsin solution for ~10-15min at 37oC NOTE: Actual digestion time determined by optimization and highly dependant on tissue, tissue thickness, pepsin concentration and temperature. Rinse slide in dH2O for 5 min @ room temperature. Dehydrate in 100% EtOH (x2), 4 min @ room temperature. Allow slide to dry. Add ~6-8ul of probe mix to coverslip (22x22mm), wick up with slide (to avoid air bubbles) and co-denature tissue and probe by heating in Slide Moat or on slide warmer to 80oC for 10 minutes. Place in hyb-chamber with 10ul H2O in humidity well Incubate for ~30hours at 37oC Gently remove cover slips and wash with 0.4xSSC/0.3%NP-40 for 2 min @ 73oC **Make sure solutions are at proper temp and always wash exactly, 4 slides Wash 2xSSC/0.1%NP-40 1min @ RT  119  16. 17. 18. 19.  Rinse slides in 1xPBS Air dry slides in upright position for 0.5-1 hr (optional, do not air dry if using Cy5 fluorophore due to reaction with ozone) Apply 8ul of (50%DAPI1/50%ANTIFADE) counter-stain. Slides are ready to image or store at 4oC  Protocol 4: ReFISH staining protocol for sequential hybridizations using previously hybridized tissue Source: Adapted from the Sequential FISH Procedure (Abbott Laboratories) and 54.  ReFISH Protocol Signal Removal Steps Day 1: 1. Stain first round of FISH as described in the Tissue-FISH staining protocol above. 2. Upon successful imaging, remove coverslip, and was slides in 2xSSC for 5 min at 73oC 3. Remove previously hybridized probes with 70% Formamide/30% 2xSSC for 5 min at 73oC NOTE: For subsequent rounds of ReFISH, denaturing time should be reduced by a minute each new rounds, to a minimum of 2 minutes Variant of common FISH staining protocol 4. Dehydrate slides through ethanol series (70%, 80%, 90%, 100%) for 1 min each at room temperature. Air dry after last wash. 5. Denature slides by incubation in 70% formamide/30% 2x SSC in coplin jars at 72oC for 1 minutes. 6. Fix slides through ethanol series (70%, 80%, 90%, 100%) for 1 minutes each on at -20oC in freezer. Air dry after last wash. 7. Apply probe mix to cover slip, wick up with slide (to avoid air bubbles) and co-denature in SlideMoat or slide warmer for 10 min at 80oC. 8. Place slides into hyb chamber with 10ul of water in small reservoirs to provide humidity. 9. Incubate at 37oC overnight (~20-30 hours) Day 2: 10. 11. 12. 13. 14. 15.  Wash slides in coplin jars with 0.4xSSC + 0.3%NP-40 for 2 minutes at 73oC Wash slides with 2xSSC + 0.1%NP-40 for 1 min at room temperature. Rinse in 1xPBS and air dry Add ~10uL total volume made of 50% DAPI/Antifade solution and 50% just Antifade. Overlay cover slip Ready for imaging, storage or subsequent rounds of ReFISH (start back at step 1)  Protocol 5 (Auxiliary): Agarose gel preparation, run and visualization Source: Adapted from previous work by Spencer Watson (Wan Lam Lab, BCCRC)  0.7% Agarose gel preparation, run and visualization - For 0.7% agarose gel: 0.35g/50ml and optimize for volume of gel tray (200 ml in our case) - Concentration can be as high as 2% to distinguish even smaller DNA fragments but DNA will migrate slower 1. Weigh out 1.4 grams of agarose into 500ml flask. Use flask about 2 times bigger to avoid spilling during boiling  120  2. Add 200 ml of 1 x TAE buffer 3. Microwave for 3-5 min until agarose dissolved (no floaties): stopping and stirring frequently. Use glove because flask will be hot and stir softly as liquid can be superheated 4. Tape sides of tray VERY well, place comb and let flask cool a little before pouring. 5. Pour gel such that comb is dipped into gel (~few mm’s) 6. Rinse out flask with lots of water and send to be Autoclaved 7. After gel is hardened and opaque fill gel box with 1xTAE buffer 8. Remove tape from tray and comb and place it in gel box making sure buffer is above and in the wells. 9. Add loading dye to samples i. 1.1 µl of dye for 10ul of DNA for total of 11.1ul of sample 10. Load 5ul of sample into each well Do not puncture wells with pipette 11. Place gel box top, connect electrodes (negative at side of DNA+ wells) 12. Run below 10volts/cm of gel until loading dyes reach 1-3 cm from end of gel. Visualization 13. Let gel cool off 14. Immerse in ethidium bromide for 15 min Caution with EtBr 15. Use dustpan to remove gel and rinse it off under running water for 2 min 16. Put on transillumiator and close the Eagle-Eye doors 17. Adjust zoon and focus under external lights 18. Turn off external lights, turn on UV light and image with “Dynamic Intergation” 19. View, print and/or save 20. Rinse and wipe off illuminator and throw away gel.  Protocol 6 (Auxiliary): Procedure for picking previously amplified BAC clones from frozen PCR Product Library Source: Adapted from previous work by Spencer Watson (Wan Lam Lab, BCCRC)  Making Fish Probes from BAC DNA PCR Product Library 1. 2. 3. 4. 5. 6. 7.  Locate Plate # and Well # of clone of interest using spread sheet: “Piotr, platemap FISH file.xls” Pull Plates from Freezer, allow to defrost fully, place well plate into black holders and spin down for 1minute at 7000 rcf. This ensures that all solution and condensation falls back into wells Carefully peel open the well plate, discard used aluminum cover, and pipette 2.5 µl of DNA product. Avoid breathing directly onto the plate and other sources of contamination. Once finished picking clones, get some Sticky aluminum tape-stuff and seal back the plate, ensuring that all wells are covered and use the paddle to create a pressed seal. Leave a small tab at the front for next user. Amplify the pulled clone product using the Normal PCR protocol (Protocol 7) Clean up PCR products with Microcon. Gets rid of small PCR products Use the Random Primer/Klenow Protocol (Protocol 8) to label the amplified products with appropriate fluorophores  121  Protocol 7 (Auxiliary): Normal PCR procedure for amplification of BAC DNA PCR product Source: Adapted from previous work by Spencer Watson (Wan Lam Lab, BCCRC) Normal PCR to Amplify Reagents Prep: 1. 2. 3. 4. 5. 6.  10x Buffer (Promega PCR buffer) 8 mM MgCl2 100 uM MseI long Primer (LM Primer) 5 U/µl Taq polymerase dNTPs (5mM each) Template DNA (~directly from reservoir)  Procedure: 1. Mix together the following reagents: 10x PCR Buffer MgCl2 dNTP mix Primer Template DNA Taq Enzyme ddH2O Total Volume 2.  5 µl 12 µl 0.4 µl 2 µl 2.5 µl 1 µl 27.1 µl 50 µl  Use LM-PCR2 conditions: 95oC for 1 minute 35 cycles of : 94oC for 1 minute 63oC for 1 minute 72oC for 1 minute 72oC for 10 minutes final extension  Protocol 8 (Auxiliary): Random Primer PCR procedure for labeling of BAC DNA PCR product with fluorophores Source: Adapted from previous work by Spencer Watson (Wan Lam Lab, BCCRC) and 63.  Random Primer PCR Protocol Reagents Prep: 1. 5x Primers Buffer (PB Buffer) - Klenow Buffer + Promega 5x + 250 mM Random Hexamers 2. 10x dNTPs - 8mM dCTP, dATP, dGTP, 4mM dTTP 3. Fluorophore Dyes (~1mM) 4. Hyb Buffer (Spencer’s Mix) 5. PCR Amplified LM-PCR BAC DNA products 6. Klenow Fragment Enzyme (5U/µl) Procedure: 1. Mix together the following reagents: PB Buffer  10 µl  122  500ng DNA 10x dNTPs ddH20 Klenow Enzyme Dye (1mM) Total Volume 2.  x µl 5 µl x µl 1 µl 2 µl 50 µl  Incubate overnight in 37oC oven  Precipitate Probe: 3. 4. 5. 6. 7. 8. 9. 10.  Add 10 µl Cot 1 DNA Add 3.3 µl NaOAc (3M) 82.5 EtOH (100%) incubate 30 minutes at -20oC Spin for 15 minutes at 4oC at maxspeed (13 000 rpm) Remove and save supernatant (incase pellet is lost) Re-suspend pellet in 40 µl of Hyb Buffer Run probes on gel to estimate probe size  Alternate Precipitation Method: 1. Add probe to Microcon tube 2. Add 25ul Cot1, 2ul yeast tRNA 3. Spin down at 13G for 5min 4. Dump effluent 5. Wash with 500ul ddH20, spin again for 5 min 6. Invert tube, add hybe buffer (40ul) 7. Spin inverted (unclosed) into fresh tube at 3G for 5minutes  Protocol 9 (Auxiliary): FISH protocol for staining cell simple suspensions or metaphase spreads Source: Adapted from previous work by Spencer Watson (Wan Lam Lab, BCCRC) and CEP FISH staining protocol (Abbot Laboratories). Hybridization Protocol with cell suspension slides Day 1 1. Dehydrate slides through ethanol series (70%, 80%, 90%, 100%) for 1 minutes each at room temperature in coplin jars under the fume hood. Air dry after last wash. 2. Denature slides by incubation in 70% formamide/30% 2x SSC in coplin jars at 70oC for 2 minutes. 3. Fix slides through ethanol series (70%, 80%, 90%, 100%) for 1 minutes each on at -20oC in freezer. Air dry after last wash. 4. Denature probes at 73oC for 5 min (or 70oC for 10min) 5. Apply probe mix to cover slip and place on slide (avoid air bubbles, use only 90% of probe mix) Alternative: Co-denature probes and tissue together at 80oC for 10 minutes 6. Place slides into plastic vice holder with 10ul of water in small reservoirs to provide humidity. 7. Incubate at 37oC overnight (~20hours) Day 2 8. Wash slides in coplin jars with 0.4xSSC + 0.3%NP-40 for 2 minutes at 73.5oC warmed up previously by microwave (10 sec max). - Coverslips will fall off or remove after 9. Wash slides with 2xSSC + 0.1%NP-40 for 1 minutes at room temperature. 10. Rinse in 1xPBS 11. Air dry slightly 12. Add ~10uL total volume made of 50% DAPI + antifade solution and 50% just antifade. 13. Overlay cover slip and store in dark at 4oC (short term) or -20oC (long term)  123  Appendix B: In-house Developed Software: Algorithm Schematics Schematic 1:  Image Pre-Processing Steps  Schematic 2:  Image Segmentation Steps  124  Schematic 3:  Spot Counting Process Steps  Schematic 4:  Global Results Generation Step  125  Schematic 5:  Neighbourhood Homogeneity Score (NHS)  Schematic 6:  Clone Connectivity Score (CCS)  126  Appendix C: In-house Developed Software: Important Parameters and Notes  Function File_sort.m slidelist=file_sort(directory,illum)  CS_adjust.m  INPUT  - Slide list is the list of independent slides sorted  - root = the C:/.../Slide ~Filename~/ directory  - Invert each image - Contrast adjust if result still more then 25 levels between 0-256 - Median Filter of [3,3] - Gaussian filter of [3,3] - Write as new files with prefix I_S_~Filename~ - illumination corrected B.TIF files only  function CS_adjust(root)  Illum_correction.m  OUTPUT  - directory = file directory where export files reside - illum = logical parameter to indicate the use of illumation correction (if illum = 1)  - directory = the C:/.../Slide ~Filename~/  function illum_correction(directory) - new directory of subsmampled images C:/.../Cutters ~Filename~/  function image_slicer(in_directory,x_fov,y_fov)  - in_directory = the C:/.../Slide ~Filename~/ - x-fov and y-fov = the # of FOVS in either directions - width = the width of slice of each pic. 120pxl wide subsample  Obelics_in.m  User interactivity required to select Counterstain_~Filename~/ directory  - change “.” delimiter in ~Filename~ into “=” delimiter - Separate contents of direct into folders of 70 files max named sequentially.  Image_slicer.m  function Obelics_in  NOTES - Maps out Hot Pixel Map from files/Hotpixel_map.m - If desired, performs illumnation corretion (via illum_correction.m) - Sorts files into folder structure - Inverts, shrinks and filters Counterstain Files (via CS_adjust.m) Work will only be done on the Counterstain_~Filename~/ directory  Works well enough except discretenization causes stripes on some images, due to subtraction and conversion to uint8. Works by joining two colums (or rows) and subsampling image along boarder max 2048 or (2560) for obelics input. This is done first for coloured images then for Counterstain images (only 1/2 of origional size) Done in both directions and files saved as mergies.number.colour.TIF and if images are not present, a blank is inserted  127  Function nonmaxsuppts.m function [r,c, rsubp, csubp] = nonmaxsuppts(cim, radius, thresh, im)  spot_count.m function [SpotData]=spot_count(Img_file, Spotfiles, Tophat_diameter, Maxima_thresh, Maxima_diameter, Mask_dilate_size, Mask_close_size)  addfeature.m function status=addfeature(name,val,infile,outfile)  merge_fb5.m function merge_fb5(outfile,directory,x_fov,y_fov)  INPUT  OUTPUT  - cim = corner strength image. - radius = radius of region considered in nonmaximal suppression. Typical values to use might be 1-3 pixels. - thresh = threshold. - im = optional image data. If this is supplied the thresholded corners are overlayed on this image. This can be useful for parameter tuning.  - r = row coordinates of corner points (integer valued). - c = column coordinates of corner points. - rsubp = If four return values are requested sub-pixel - csubp = localization of feature points is attempted and returned as an additional set of floating point coords. Note that you may still want to use the integer valued coords to specify centres of correlation windows for feature matching. - SpotData is a nx2 matrix with first column being cell index. 2nd is # of spots found. n is the number of objects in the .img file  - Img_file = location of the .img file corresponding to FOV being analyzed - Spotfiles = single image of FOV for a colour - Tophat_diameter = radius to be used for the tophat disk structuring element - Maxima_thresh = intensity level at which to maxima may be considered valid - Maxima_diameter = raidus used for nonmaxima supression - Mask_dilate_size = radius of pixels used for the structuring element disk for dilation - Mask_close_size = radius of pixels used for the structuring element disk for closing - name = name of new feature - val = value of feature for each object - infile = file to add feature to - outflie = new file to save to - outfile = file to save merged file - directory = Slide ~Filename~/Results/Added Features/ - x-fov and y-fov = number of fovs in either direction  NOTES Copyright (c) 2003-2005 Peter Kovesi School of Computer Science & Software Engineering The University of Western Australia  Default Values: Tophat_diameter=5 Maxima_thresh=50 Maxima_diameter=2 Mask_dilate_size=5 Mask_close_size=5  - merged .fb5 file with name as output (default: merged.fb5 into directory)  128  Function stiching_fb5s.m function stiching_fb5s(in_dir, merge_file, x_fovs, y_fovs, width, region, girth)  read_fb6.m  INPUT  OUTPUT  NOTES  - in_dir = the Slide ~Filename~/Results/Added Features/ - merge_file = merged fb5 file to stitch - x-fov and y-fov = number of fovs in either direction - width = width used for image subsampling - region = % of width allowed to include new coordinates from fb5 file - girth = +/- of cooridnates used to replace existing values  - MERGED AND STICHED.fb5 file saved into in_dir  - fb6filname = location of .fb5 file to read  - feature_names in 8bit - Feature_values = features for each object (one per column) - header is [0,max # of features]  function [feature_names, feature_values, header]=read_fb6(fb6filename)  Scripts: Arch_Voronoi.m Arch_Pseudovoronoi.m Arch_Delaunay_Unlimited.m Arch_Delaunay_20.m  INPUT - fb6filename = location of the .fb5 file - colours = matrix of single letter colour codes to be analyzed (eg = ['R';'G';'M']) - maxdist = maximum seperation between two cells to be considered neighbours (default 135 pxls) Same as above with addition: - validonly = logical switch if only valide data (ie not zero in every channel) should be analyzed Same as above Same as above  % defaults width=60 region=0.5 girth=10  NOTES: - For each of the Scripts above, it is necessary to convert Spot Counts to a Logical Framework of Positive and Negative Cells FOR EXAMPLE: - Positive cells defined as greater then 3 spots in at least 2 colour channels Spotstatus=SpotData>=3 ; Spotstatus=(sum(Spotstatus'))'; %at least 2 channels Spotstatus=Spotstatus>=2;  129  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0066363/manifest

Comment

Related Items