Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Development, implementation and evaluation of segmentation algorithms for the automatic classification… MacAulay, Calum Eric 1989

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1989_A1 M32_5.pdf [ 8.76MB ]
Metadata
JSON: 831-1.0085030.json
JSON-LD: 831-1.0085030-ld.json
RDF/XML (Pretty): 831-1.0085030-rdf.xml
RDF/JSON: 831-1.0085030-rdf.json
Turtle: 831-1.0085030-turtle.txt
N-Triples: 831-1.0085030-rdf-ntriples.txt
Original Record: 831-1.0085030-source.json
Full Text
831-1.0085030-fulltext.txt
Citation
831-1.0085030.ris

Full Text

DEVELOPMENT, IMPLEMENTATION AND EVALUATION OF SEGMENTATION ALGORITHMS FOR THE AUTOMATIC CLASSIFICATION OF CERVICAL CELLS by Cal vim Eric MacAulay B.'Sc, Dalhousie University, 1982 M.Sc, Dalhousie University, 1984 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES Department of Physics We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August 1989 ©Calum Eric MacAulay, 1989 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of The University of British Columbia Vancouver, Canada •ate fr^// / f t ? DE-6 (2/88) ABSTRACT Cancer of the uterine cervix is one of the most common cancers in women. An effective screening program for pre-cancerous and cancerous lesions can dramatically reduce the mortality rate for this disease. In British Columbia where such a screening program has been in place for some time, 2500 to 3000 slides of cervical smears need to be examined daily. More than 35 years ago, i t was recognized that an automated pre-screening system could greatly assist people in this task. Such a system would need to find and recognize stained cel l s , segment the images of these cells into nucleus and cytoplasm, numerically describe the characteristics of the cells, and use these features to discriminate between normal and abnormal cel l s . The thrust of this work was 1) to research and develop new segmentation methods and compare their performance to those in the literature, 2) to determine dependence of the numerical c e l l descriptors on the segmentation method used, 3) to determine the dependence of c e l l classification accuracy on the segmentation used, and 4) to test the hypothesis that using numerical c e l l descriptors one can correctly classify the c e l l s . The segmentation accuracies of 32 different segmentation procedures were examined. It was found that the best nuclear segmentation procedure was able to correctly segment 98% of the nuclei of a 1000 and a 3680 image database. Similarly the best cytoplasmic segmentation procedure was found to correctly segment 98.5% of the cytoplasm of the same 1000 image database. Sixty-seven different numerical c e l l descriptors (features) were calculated for every segmented c e l l . On a database of 800 classified cervical cells these - i i -features when used in a linear discriminant function analysis could correctly classify 98.7% of the normal cells and 97.0% of the abnormal ce l l s . While some features were found to vary a great deal between segmentation procedures, the classification accuracy of groups of features was found to be independent of the segmentation procedure used. The cellular classification accuracy was found to be very dependent on the number and types of features used to form the discriminant functions. The thesis that a computerized system can classify cervical cells at least as well as an experienced cytologist has been demonstrated. This result requires that the system can segment cervical cells . and reliably recognize incorrectly segmented cel l s . - i i i -TABLE OF CONTENTS PAGE ABSTRACT i i LIST OF TABLES v i LIST OF FIGURES v i i ACKNOWLEDGEMENT ix 1. INTRODUCTION 1 1.1 Status of Automatic Cervical Cell Analysis Using Image Systems 4 2. MATERIALS AND METHODS 6 2.1 Sample Preparation 6 2.2 Image Acquisition 7 2.3 Segmentation Methods 16 2.3.1 Simple 2D Histogram Analysis 18 2.3.2 Three Histogram Analysis 19 2.3.3 Threshold Selection Based on a Simple.Image Statistic 22 2.3.4 Local Histogram Threshold Selection 27 2.3.5 Three Dimensional Thresholding 28 2.3.6 A Split and Merge Procedure 29 2.3.7 Nuclear Radial Contouring 34 2.3.8 A Relaxation Process 35 2.3.9 An Edge Relocation Algorithm 40 - iv -2.4 Cellular Features 2.4.1 Markovian Testure Analysis 2.4.2 Discrete Texture Analysis 2.4.3 Post-Processing 2.5 Cell Classification RESULTS 3.1 Segmentation Results 3.2 Variation of Features 3.3 Variation of Classification DISCUSSION and CONCLUSION 4.1 Segmentation Accuracy 4.2 Feature Variation 4.3 Cell Classification and Discriminating Power of Features 4.4 Conclusion REFERENCES L I S T OF TABLES T i t l e Page Table 1: Discriminant Function C l a s s i f i c a t i o n Table Table 2: Comparison of Segmentation Procedures on a Database of 150 Images Table 3: Comparison of Selected Segmentation Procedures on a Database of 1000 Images Table 4: Segmentation Performance of a Simple 2D Histogram Analysis Followed by Two Iterations of the Edge Relocation Algorithm Plus Postprocessing Table 5: V a r i a t i o n of the Shape Features and Some Texture Features Among the 15 Segmentation Procedures Table 6 : V a r i a t i o n of Discrete Texture Features Among the 15 Segmentation Procedures Table 7: V a r i a t i o n of Continuous Texture Features Among the 15 Segmentation Procedures Table 8: C l a s s i f i c a t i o n Results for Various Segmentation Procedures and Combination of Features Table 9: C l a s s i f i c a t i o n Results for Various Segmentation Procedures and Combinations of Features Table 10: Feature Importance i n Discriminant Function Analysis Table 11: Combined Normal and Abnormal Jackknife C l a s s i f i c a t i o n Results for Various Segmentation Procedures and Combinations of Features Table 12: Combined Normal and Abnormal Jackknife C l a s s i f i c a t i o n Results for Various Segmentation Procedures and Combinations of Features 68 75 80 81 83 84 85 97 99 104 115 119 - v i -L I S T OF FIGURES T i t l e Page Figure 1: Spectra of Cellular Stains Figure 2: The Digitized RGB Images of a Stained Cervical Cell Figure 3: Interactive Segmentation of a Stained Cervical Cell Using the Two Dimensional Histogram of Red and Blue Images Figure 4: Diagram of Major Components of the Modified Cell Analyzer Imaging System Figure 5: Schematic Diagram of System Figure 6: Sony DXC-3000 3 Chip CCD Camera Light Intensity Response Curve Figure 7: Gradient-Weighted and Average Gradient Histograms Figure 8: Individual Pixel Threshold Assignment Using a Four Point Langrangean Interpolation Figure 9: Three Different Methods of Determining A Threshold Figure 10: An Example of the Nuclear Radial Threshold Selection Process Figure 11: An Example of the Effects of a Relaxation Process on the Intensity Distribution of an Image Figure 12: Generation of Possible Edge Mask Figure 13: Erosion of Possible Edge Mask Figure 14: Results of the Edge Relocation Algorithm Figure 15: Determination of Cytoplasm to be used to Correct Nuclear OD Value Figure 16: Determination of Areas FA1 an FA2 for the Fractal Dimension Calculation Figure 17: Division of Nucleus into areas of Different Chromatin Condensation States 10 12 13 15 20 26 30 36 41 43 45 46 53 60 64 - v i i -Figure 18: A Two Dimensional Example of Group Separation Using Linear Discriminant Function Analysis 69 Figure 19: Optimal Separation Boundaries for Groups with Different Covariance Matrices 73 Figure 20: Four Correctly Segmented Images 77 Figure 21: Four Mildly Incorrectly Segmented Images 78 Figure 22: Distribution of the Nuclear Area of Normal and Abnormal Cells 87 Figure 23: Distribution of the Cytoplasmic Area of Normal and Abnormal Cells 88 Figure 24: Distribution of the NA/CA Ratio for Normal and Abnormal Cells 89 Figure 25: Distribution of the Nuclear IOD of Normal and Abnormal Cells 90 Figure 26: Distribution of the Compactness Ratio of Normal and Abnormal Cells 91 Figure 27: Distribution of the DNUM for Normal and Abnormal Cells 92 Figure 28: Distribution of the Markov Texture Feature Correlation for Normal and Abnormal Cells 93 Figure 29: Distribution of the Discrete Texture Feature TARH for Normal and Abnormal Cells 94 Figure 30: Distribution of the Fractal Dimension Feature for Normal and Abnormal Cells 95 - v i i i -ACKNOWLEDGEMENT I would like to acknowledge the Dr. Haluk Tezcan's assistance in the staining and deposition of the cervical c e l l s . Our discussions and his suggestions were always most helpful. Also thanks to Dr. B. Palcic whose guidance and supervision made a l l this work possible. Alan Harrison and Steven Poon's programming and hardware assistance was greatly appreciated. Finally, I would like to thank Vel Kinnie, Susan Grose and Paddi Tieszen for their unestimable assistance in the typing and presentation of this thesis. - ix -1. INTRODUCTION Cancer of the uterine cervix is one of the most common cancers in women. The incidence rate of this cancer is approximately 28 in 100,000 women and the mortality rate can be as high as 15 per 100,000 women.1 The incidence of pre cancerous lesions, which are believed to eventually 2 transform into cancerous lesions, is increasing. If pre-cancerous and cancerous lesions are undetected or l e f t untreated, the mortality rate upon age 65 ranges from .1% to 1%.3 Where an effective screening programme has been in place the 2 mortality rate can drop to as low as 3 per 100,000 women per year. Such a screening programme has been in place in British Columbia for many years. In British Columbia, a cervical smear is taken from every woman on average once every 2 years from the onset of sexual activity 2 u n t i l age 35, and every five years thereafter. For a cervical smear a tissue sample is scraped off the cervix and smeared onto a slide. This material is then fixed, stained and the cervical cells on the slide are examined. In examining the slide the pathologist or experienced cytotechnicians are looking for abnormal cells which do not exhibit the usual features of cervical c e l l s . Changes in the nucleus are the most important c r i t e r i a used for 4 the cytological diagnosis of cervical cancer. The visible nuclear organization reflects the cell's biological status. No single structural change in the nucleus is considered diagnostic in i t s e l f . A combination of several nuclear abnormalities is necessary for a diagnosis. 4 The following are some of the cellular changes used in visual 4 examinations: 1) Nuclear hypertrophy or size: Usually compared with the size of the cytoplasm. A large nucleus with l i t t l e cytoplasm can be indicative of an abnormal c e l l . 2) Nuclear shape variation: When the nucleus is no longer oval or e l l i p t i c a l but becomes irregular in shape. Single or multiple nuclear protrusions can be important. 3) Hyperchromatin: Usually indicative of increased amounts of DNA in the nucleus. 4) Chromatin irregularity: Changed, non-uniform distribution of DNA in the nucleus measured by nuclear texture parameters. These give one of the most important c r i t e r i a for malignancy. 5) Multinucleation: Multinucleation of the c e l l can result in a convoluted nuclear shape. 6) Nuclear membrane changes: Indentation, lobulation, protrusion, and extensive wrinkling are important indicators in the diagnosis of atypia. In British Columbia (population approx. 2.4 million), on average 2500 to 3000 of these slides are examined every working day. This is a very labour intensive task, which only highly s k i l l e d cytotechnicians can perform. Therefore, only a few countries in the world have succeeded in monitoring the population. Even some developed countries, e.g. U.S.A., Great Britain and others cannot manage this task due to the tedious nature of this work. More than 35 years ago, i t was recognized that an automated pre-6 screening system could greatly assist people in this task. During this period excellent ideas and algorithms for the recognition of cellular components and the numerical description of the cellular features have 6 7 8 9 10 been developed. • > > > Numerous systems have been designed and , r- , . 8 1 1 1 2 13 1 4 , , , . tested for this purpose. • • ' ' It now appears li k e l y that a cost effective system which can perform the required tasks in an automated way w i l l be made possible by the high speed computational device and solid state sensors developed in this decade. An automated device which is able to perform quantitative measurements on stained cells and discriminate between normal and atypical cells must perform the following tasks: 1) find and recognize stained cells on the slide; 2) segment the images of the cells into nuclear and cytoplasmic areas; 3) numerically describe the characteristics (features) of the c e l l s , and; 4) use these features to discriminate between normal and malignant c e l l s . The most d i f f i c u l t of these tasks to automate are those that humans find the easiest: the recognition of the cells and their constituents. The tasks which are simpler to automate are the ones humans find the most d i f f i c u l t to perform reliably: the feature extraction and the c e l l discrimination. The thrust of this thesis is to research and develop several new segmentation methods, compare their performance to those from the literature, and after finding the best, test the hypothesis that using quantitative cellular descriptors (features) one can use these to match the performance of s k i l l e d cytology technicians. Doing this we want to determine the dependence of the numerical descriptors of the cellular images upon segmentation accuracy and the dependence of c e l l c lassification upon the type and accuracy of the numerical descriptors. 1.1 Status of Automatic Cervical Cell Analysis Using Imaging Systems Since the introduction of the PAP smear to detect the early stages of cervical cancer in the 1950's, various researchers have undertaken to develop a quantitative system to perform or aid in the analysis of cervical smears or c e l l s . In the late 1950's a group at the Airbourne 1 5 Instruments Lab developed the Cytoanalyzer, the f i r s t system developed to analyze cervical smears. Its screening performance was judged to be 1 6 inadequate for the instrument to be practical. An outgrowth of 1 6 Cytoanalyzer was Cydac but this system was never used to automatically screen cervical smears. The TICAS project started in 1967 and depended 8 heavily on operator interaction and was never intended as an automatic screening device. However, i t was intended to be used as an aid in the 1 7 f i n a l diagnosis in hard cases. It also was used as a research tool to investigate the parameters and decision c r i t e r i a that should be used in an automated screening system. This project has evolved into a system 1 8 for rapid high-resolution cytometry. Following TICAS a number of 19 2 0 2 1 interactive image analysis systems were developed ' ' . In the early 1980s, several groups were beginning to build systems (prototypes) to automatically screen cervical smears. Some of these systems are the DIASCANNER (Swedish group), 1 1 the CYBEST (Biomedical Laboratories, Japan), 2 2 the BioPEPR (Nijmegen University, Netherlands), 2 3 CERVIFIP (Edinburgh, Scotland), 2 4 and the LEYTAS (Netherlands). 1 3 CYBEST is one of the most advanced systems that has yet been developed for prescreening cervical smears but has a false positive rate 22 of 30.7% and a false negative rate of 2% on a slide by slide basis. The DIASCANNER system is the culmination of eleven years of image analysis algorithm development, software implementation and s t a t i s t i c a l evaluation. 1 1 As such i t is one of the devices closest to f u l f i l l i n g the requirements of an automated prescreening device. Most of the algorithms developed for this work w i l l be compared with those defined in the body of work describing DIASCANNER. 6 2. MATERIALS AND METHODS 2.1. Sample Preparation The samples were collected from the transformation zone of the 2 5 uterine cervix of 37 different subjects. The standard Paponicolaou staining results generally in clumps of cells with many of the cells overlapping with poor colour separation between the nucleus and the cytoplasm. In addition, the nuclear stain is not stoichiometric for DNA. While this method is the one routinely used by cytological laboratories for the human interpretation of cervical smears, i t is a non-optimal procedure for automatic, quantitative assessment of cervical c e l l s . To extract meaningful features from stained cells one needs slides optimized for quantitative measurements. These slides should meet the following requirements: 1) that the slide contains a relatively constant number of cells in a monolayer with a low overlap rate among the ce l l s ; 2) that there be a large detectable difference in the spectral characteristics (colour) between the cytoplasm and the nuclei; and 3) that the nuclear stain should be stoichiometric for DNA so that quantitative measurements of the DNA content of the nuclei are possible. 2 6 Dr. Haluk Tezcan has developed a sample preparation method which meets the above requirements. In this method the sample is collected with a wooden spatula and suspended in a c e l l f i x solution of 15% PBS ethanol and dlthiothreitol. The samples are disaggregated by using a two stage syringing approach. One syringing takes place before c e l l enrichment (increases c e l l concentration in suspension) and one just before the c e l l deposition on a microscope slide. Cell enrichment is achieved by centrifuging the cells at high speeds and then dissolving them to the desired c e l l concentration. The cells are then deposited 2 7 using a simple sedimentation and smearing deposition method. Finally, the cells are stained using the Feulgen-Thionin ( S 0 2 ) nuclear stain 2 8 combined with the Orange (II) cytoplasmic stain. These stains gave superior spectral separation when using a colour camera which we employed in this work. The absorption spectra of the two stains are shown in figure 1 and a three colour image of a stained c e l l is shown in figure 2 . The Orange (II) absorption is very weak in the red region of the spectrum where the absorption by the Feulgen-Thionin ( S 0 2 ) is very strong. This enabled one to perform the nuclear segmentation (recognition of the extent and location of the nucleus) task using the red image and the cytoplasmic segmentation using the blue image. Figure 3 shows how this can be done interactively. The intra slide variation of these stains is very low and the inter slide 20 variation, while somewhat larger, is s t i l l manageable. 2.2 . Image Acquisition A l l of the images used in this work were acquired, stored and 3 o analyzed on a modified Cell Analyzer Imaging System. The system consists of four major modules: 1 ) image acquisition module consisting of a microscope and camera, 2 ) microscope control module (not used in this work), 3 ) image processing module consisting of a frame grabber 400 450 500 550 600 650 700 Wavelength (nm) FIGURE 1: SPECTRA OF CELLULAR STAINS In this work Feulgen-Thionin S02 nuclear stain and Orange II cytoplasmic stains were used. Their visible absorption spectra differ as shown above. The Feulgen-Thionin S02 stain is stoichioretic for DNA, thus optical density is proportional to DNA amount. 9 FIGURE 2: THE DIGITIZED RGB IMAGES OF A STAINED CERVICAL CELL These are images of a stained c e r v i c a l c e l l taken from three d i f f e r e n t parts of the v i s i b l e spectrum. Image A) i s from the red part of the spectrum. Image B) i s from the green part of the spectrum. Image C) i s from the blue part of the spectrum. 1 0 FIGURE 3: INTERACTIVE SEGMENTATION OF A STAINED CERVICAL CELL USING THE TWO DIMENSIONAL HISTOGRAM OF RED AND BLUE IMAGES. The two d i m e n s i o n a l (2D) h i s t o g r a m s shown i n A ) , B), C) and D) were g e n e r a t e d by f i n d i n g the f r e q u e n c y o f o c c u r r e n c e o f a p i x e l i n t h e same p o s i t i o n i n the r e d and b l u e images h a v i n g a r e d i n t e n s i t y v a l u e , r , a n d a b l u e i n t e n s i t y , b. I n the two d i m e n s i o n a l h i s t o g r a m t h e f r e q u e n c y o f o c c u r r e n c e o f an r , b p a i r i s r e p r e s e n t e d by how b r i g h t the s p o t i s and the p o s i t i o n o f the s p o t i s d e t e r m i n e d by the v a l u e s o f r and b . S i n c e the b a c k g r o u n d i s u n i f o r m l y b r i g h t ( h i g h i n t e n s i t y v a l u e ) i n b o t h the r e d and b l u e i m a g e s , i t a p p e a r s as a b r i g h t b l o b i n the l o w e r r i g h t hand c o r n e r o f t h e 2D h i s t o g r a m i n A) . I n B) the b a c k g r o u n d has been c i r c l e d i n t e r a c t i v e l y (human i n t e r v e n t i o n n o t a u t o m a t i c a l l y ) and the c o r r e s p o n d i n g p i x e l s i n an image o f the c e l l shown i n f i g u r e 2 have been removed. The c y t o p l a s m , w h i c h a b s o r b s b l u e l i g h t , b u t n o t r e d l i g h t , a p p e a r s in the 2D h i s t o g r a m as a v e r t i c a l l i n e o f p o i n t s and has been i n t e r a c t i v e l y c i r c l e d i n C ) . The c o r r e s p o n d i n g p i x e l s have b e e n removed i n an image o f the c e l l and the r e s u l t i s d i s p l a y e d i n t h e l o w e r l e f t c o r n e r o f C ) . The n u c l e u s a b s o r b s b o t h r e d and b l u e l i g h t a n d a p p e a r s as a c l u s t e r o f p o i n t s i n the u p p e r m i d d l e a r e a o f the 2D h i s t o g r a m . T h i s a r e a has been i n t e r a c t i v e l y c i r c l e d i n D ) , the c o r r e s p o n d i n g p i x e l s i n an image o f the c e l l have a g a i n b e e n removed and the r e s u l t i s d i s p l a y e d i n t h e l o w e r l e f t c o r n e r o f D) . and imaging board, and 4) host computer, which includes storage devices. The microscope used in this work was a Nikon Optiphot with a PlanApo 40x (40/0.95) objective, a 100W halogen light with a stabilized power supply, a dispersion f i l t e r , a neutral color balance f i l t e r , and a IX video projection lens. When used with the 3-chip CCD video camera (Sony, DXC-3000) the corresponding pixel size was 0.34mm x 0.34mm. This camera provides a simultaneous acquisition of red (600nm), green 3 1 (540nm), and blue (460nm) images. The frame grabber was an MVP-AT Matrox image processing board. Figures 4 and 5 show the lay out of the system. The quality of images as reflected by the spatial and photometric resolution is very important. High quality images make segmentation process simpler and more robust, and are also required for meaningful measurements of chromatin distribution. To achieve this, the following steps were performed: 1) at the beginning of each image acquisition session the camera was calibrated to ensure the proper color balance of the images and that the f u l l photometric range of the digitizer (256 gray levels) was utili z e d ; 2) 30-50 images for each color were collected and averaged to reduce the random noise in the images; 3 2 3) each image was decalibrated, i.e. the image of the f i e l d of view without any cells or other objects present was subtracted from the image of the cells and an offset added to return the image background level to it s pre-subtraction value, removing the effects of uneven illumination caused by RGB ANALOGUE MONITOR TTL MONOCHROME MONITOR RGB l l l l i l l l l l l l IIIIIIRHHIinHllllllllllllllnllllllll RGB 3- CHIP CCD CAMERA II @ 1 f-l I J KEYBOARD MOUSE MICROSCOPE Figure 4: DIAGRAM OF MAJOR COMPONENTS OF THE MODIFIED CELL ANALYZER IMAGING SYSTEM The major components of the system are: a Nikon optiphol microscope, a Sony DXC-3000 3-chip CCD camera, a PC AT microcomputer and add in boards, an IBM monochrome monitor, and an RGB analogue monitor. I M A G E ACQUIS IT ION IMAGE P R O C E S S I N G RGB - CD or CID Image Intensified Camera Digital SolidState Scaner —7v Camera Controler Microscope % i i X,Y,Z Microscope Stage Controler Stabilized Light Source Interface & Scaner Controler I/P Buf. DSP I A/D X T " ] RGB Monitor 11 O 1 r r Image Memory * I Bufer/Contro I H LUT^MUXlCn -H ALU ~1 ACRTC~fc> Procesor Frame Memory (1 Mbyte) H _ _ f t - f r _ — = ? F f f D/A => LUT/MUX1 1 80286/87 M I C R O S C O P E C O N T R O L HOST C O M P U T E R Figure 5: SCHEMATIC DIAGRAM OF SYSTEM the microscope optics and the fixed pattern noise of the camera. Using the above procedures, the background variation was less than or equal to ±1 gray level. A fundamental requirement for the determination of the integrated optical density of nuclei is an accurate measurement of the optical 3 3 density of the individual pixels of the image. While charge coupled 3 2 devices (CCD) are notable for their response linearity, the amplification/translation circuits in the camera electronics are such that they exhibit a markedly non-linear response to light intensity. Therefore, a Kodak step tablet No. 3 was used to determine the camera's photometric response. The response curve is shown in figure 6, which was then translated into a look-up table (LUT). This LUT was subsequently used to correct individual pixel density measurements. Approximately 4700 RGB images stained cervical cells were collected from 18 cytologically normal (also not infected with the Human Papilloma Virus) samples and 19 dysplastic samples. From the overall database two subset databases were also formed. One subset contained 150 cells which were more d i f f i c u l t to segment. They contained one or more of the following t r a i t s : overlapping and/or folded cytoplasm, staining artifacts in the c e l l s , or debris in and around the cel l s . These images were used to perform a preliminary evaluation of the various segmentation procedures. A second database of 1000 images was used to investigate the segmentation accuracy of several of the procedures used on the 150 image database. Also examined were the effects of the segmentation procedure on the calculated features, and on the classification of the individual c e l l s . Due to a failure of the 15 Camera Calibration Curve 120 - r 0 50 100 150 200 250 300 Measured Intensity (gray levels) FIGURE 6: SONY DXC-3000 3 CHIP CCD CAMERA LIGHT INTENSITY RESPONSE CURVE Due to the non-linear response to this camera (and most other video cameras) the output of the camera must be measured as a function of known illumination. A Kodak step tablet #3 was used to generate known l i g h t i n t e n s i t i e s and enable the construction of the above graph. This response curve was transformed into a look-up table (LUT) so that a l l subsequent i n t e n s i t y measurements were corrected using t h i s LUT. back up media, 200 images of normal cells were lost near the end of the analysis of the 1000 image database. Therefore, some of the results quoted for the 1000 image database may be calculated from only 800 images, this has been estimated to have a negligible effect on the f i n a l results and conclusions. The f u l l 4700 image database was used to establish the accuracy of the most appropriate (accurate and rapid) segmentation procedure on a large set of images. The 150 image database consisted of approximately 120 images of normal cells and 30 images of abnormal cells (CIN II or worse). The 1000 image database consisted of images of 500 normal cervical cells and 500 images of abnormal cells (CIN I or worse) . A c e l l by c e l l c lassification for the f u l l 4700 image database was not performed. 2.3. Segmentation Methods An adequate segmentation of the areas of interest in images of stained cells is an important prerequisite for the extraction of meaningful cellular features. The areas of interest in this work are background, cytoplasm, nuclei, and artifacts. Artifacts are considered to be non- nuclear objects which display one or more of the characteristics of nuclei, i.e. absorbing light in a similar manner to nuclei therefore being significantly darker than the surrounding background or cytoplasm. To perform quantitative measurements of morphological features of stained cells in an automated way, adequate segmentation is extremely important. For automated systems, i t is the precise and correct segmentation of the nuclei which is the most d i f f i c u l t , but c r i t i c a l step is this process. For this reason the majority of the segmentation methods described i n t h i s work delineate n u c l e i from the cytoplasm. Generally, the more a priori knowledge that i s incorporated into a segmentation algorithm, the more robust, r e l i a b l e , and accurate the 3 4 automated algorithm becomes. A priori knowledge i n t h i s context r e f e r s to information and conditions assumed to e x i s t i n the image p r i o r to i t s a n a l y s i s . The following chapter discusses several segmentation methods as well as the type and extent of the a priori knowledge used i m p l i c i t l y or e x p l i c i t l y by these methods. To detect and delineate objects i n an image, some form of l o c a l i z e d photometric non-uniformity must e x i s t i n e i t h e r the i n t e n s i t y values, the perceived color values, or some l o c a l i z e d texture measure. This non-uniformity information can be u t i l i z e d i n one of the two ways. One method i s to assume that the areas of i n t e r e s t are "moderately" uniform i n some measured property. The other, complimentary method i s to search f o r l o c a l regions of " s i g n i f i c a n t " non-uniformity which i s assumed to separate the areas of i n t e r e s t . In the l a t t e r case, separation boundaries are u s u a l l y r e f e r r e d to as "edges" and are generally characterized by a "large" change i n some " l o c a l " photometric property. How the terms "moderately", " s i g n i f i c a n t l y " , "large" and " l o c a l " are defined by the algorithms, greatly determines t h e i r segmentation performance. Using the above p r i n c i p l e s , several segmentation algorithms have been developed. Those r e l a t e d to t h i s thesis are described below. 2.3.1. Simple 2D Histogram Analysis C r i t i c a l to t h i s algorithm i s the c o l l e c t i o n of two images at d i f f e r e n t wavelengths, A l t and A 2, e.g. red and blue images. This algorithm assumes that the background i s moderately uniform and br i g h t i n both images. Here, "moderately" indicates that the background i s an area of uniform i n t e n s i t y with a narrow gaussian d i s t r i b u t i o n . The standard deviation of the gaussian i s assumed to be f i n i t e and small r e l a t i v e to the differences i n i n t e n s i t y between the background, cytoplasm and nucleus. Consequently t h i s method finds the f i r s t trough, below the high transmission peak i n the histogram of the A 2 image, as a global threshold to separate the background from the cytoplasm. The background's contribution to the histogram of the A x image i s then removed. F i n a l l y , the f i r s t trough below the high transmission peak i n the modified histogram of the Ax image i s found and used as a global threshold to delineate the nucleus i n the A x image. This step uses knowledge of the s t a i n i n g properties described e a r l i e r and assumes that i n the A x image (e.g. red) the cytoplasm transmits more l i g h t than does the nucleus and that the cytoplasm has a r e l a t i v e l y uniform absorption over i t s e n t i r e area, thus generating a peak i n the histogram. The use of troughs i n histograms as thresholds f o r segmentation i s quite general 36 and can be found i n almost any text on image processing. However, a multitude of methods exi s t s f or generating a v a r i e t y of d i f f e r e n t types of histograms, most of which incorporate some forms of a priori information. 2.3.2. Three Histogram Analysis This algorithm may be sequentially applied to the blue then the red image when used as a color segmentation algorithm or applied to only a s i n g l e image when used as a gray-scale segmentation method. The three histogram analysis segmentation method calc u l a t e s three d i f f e r e n t histograms. Two histograms, H j ( i ) and H 2 ( i ) , are gradient-weighted 36 histograms while the t h i r d histogram H 3 ( i ) , i s an average gradient 3 7 histogram. Examples of these histograms are shown i n fi g u r e 7. The a priori knowledge or assumptions about the images used i n t h i s algorithm are that the areas of i n t e r e s t (background, cytoplasm and nucleus) are approximately uniform i n i n t e n s i t y . Therefore they appear as peaks i n the gradient weighted histograms and are separated by v a l l e y s i n these histograms. In a conventional histogram the v a l l e y s represent the i n t e n s i t y values which occur infrequently i n the image. I t has been suggested that the p i x e l s i n areas of large gradients are p r e f e r e n t i a l l y located on the edges of objects. I f they are eliminated from the histogram, the v a l l e y s of the histogram w i l l become more 3 8 pronounced. The histograms H x ( i ) and H 2 ( i ) are conventional histograms except that the p i x e l s with the gradient magnitudes larger than a s p e c i f i e d c u t o f f value, "c", are not included i n the histograms. The difference between histograms H t ( i ) and H 2 ( i ) i s that the value of "c" i s twice as large f o r histogram H 2 ( i ) as that of H i ( i ) , hence H 2 ( i ) includes more of the edge p i x e l s than does H 1 ( i ) . The values of the average gradient histogram H 3 ( i ) are c a l c u l a t e d as follows. For each p i x e l of i n t e n s i t y i , the gradient magnitude value i s c a l c u l a t e d and the average, gradient f o r a l l p i x e l s of i n t e n s i t y i , i s 20 Histogram 1 Histogram 2 o c CD 3 CT J L b/c Gray-levels b/c c/n Average Gradient Histogram •a C CT CO CD O) CO CD > < Gray-levels b/c c/n Gray-levels FIGURE 7: GRADIENT-WEIGHTED AND AVERAGE GRADIENT HISTOGRAMS Gradient-weighted histograms are e s s e n t i a l l y conventional histograms (depicting the number of pix e l s i n the image with the same inte n s i t y value as a function of i n t e n s i t y value) except that p i x e l s with a larger than pre-determined gradient magnitude are removed from the histogram. The difference between histogram 1 ( H i ( i ) ) and histogram 2 ( H 2 ( i ) ) i s that the pre-determined gradient magnitude value i n H 2 ( i ) i s twice as large as that of H i ( i ) . The average gradient histogram, H 3 ( i ) , i s calculated as follows; for each p i x e l of i n t e n s i t y i , the gradient magnitude value i s calculated and the average gradient f o r a l l p i x e l s of i n t e n s i t y i , i s then determined. then determined. The histogram H 3(i) represents the average gradient 37 value of the pixels of intensity i . The peaks in H 3(i) represent a range of intensity values for which the photometric property of intensity varies "significantly" and consequently these intensity values should be in the v i c i n i t y of a "local" edge. Assuming that the edges separate areas which are moderately uniform in intensity, the location of the peaks in H 3(i) should represent acceptable thresholds. The gradient operator involved in the formation of a l l three histograms is a modified Sobel gradient operator: gradient magnitude -of central pixel A l l histograms are smoothed twice by a one-dimensional (1-D) median f i l t e r (1x3 median f i l t e r ) and an ID mean f i l t e r (1x3 mean f i l t e r ) to remove small irregularities. A simple valley-finding procedure is used on each of Hx(i) and H 2(i) to find the thresholds: i) Tib/c, the threshold from histogram H x(i) which subdivides the image into background and cytoplasm, i i ) T2b/c, the threshold from H 2(i) which subdivides the image into background and cytoplasm, i i i ) T^/c, the threshold from H ^ i ) which subdivides the image into cytoplasm and nucleus, and iv) T 2n/c, the threshold from H 2(i) which subdivides the image into cytoplasm and nucleus. The thresholds T 3b/c and T 3n/c are determined by finding the largest peaks in histogram H 3(i) in the vi c i n i t y of Txb/c and T 2b/c for T 3b/c, and in the v i c i n i t y of T^/c and T 2n/c for T 3n/c. From the three background/cytoplasm thresholds Tjb/c, T 2b/c, and T 3b/c the median threshold is the one used to perform the •5 0 5 •7 0 7 •5 0 5 5 7 5 0 0 0 •5 -7 -5 "1 2 1/2 (1) actual segmentation. S i m i l a r l y , the median threshold i s selected from the cytoplasm/nucleus threshold group. In the color segmentation implementation f i r s t the blue image i s analyzed and the background/cytoplasm threshold i s used to remove the background from both the red and blue images. What i s l e f t of the red image i s then analyzed and the background/cytoplasm threshold found by the three histogram analysis i s used to separate the cytoplasm and nucleus i n the red image. 2.3.3. Threshold Sel e c t i o n Based on a Simple Image S t a t i s t i c This algorithm may be used to perform monochrome image segmentation, the subdivision of a single-image into background, cytoplasm and nucleus, or c o l o r segmentation, the subdivision of two images of d i f f e r e n t colors into three classes; background, cytoplasm and 3 9 nucleus. I t i s based on K i t t l e r and Illingworth's formulation of an ima g e - s t a t i s t i c threshold s e l e c t i o n procedure which i s simple and widely applicable. While t h i s method was o r i g i n a l l y developed to segment monochrome images into only two classes, background and the object, i t i s p ossible to extend i t to segment monochrome images into three classes or to segment three classes from blue image and red image data for color segmentation. The following i s a short d e s c r i p t i o n of t h i s threshold s e l e c t i o n algorithm. I f an image i s made up of n * m p i x e l s , and the value of the p i x e l i n row x and column y i s defined as s(x,y) then the gradient magnitude e(x,y) f o r p i x e l s(x,y) can be defined as the greater of e x or ey, where 3 9 i n the o r i g i n a l work: e x = | s(x-l,y) - s(x+l,y) | (2) and e y - | s(x.y-l) - s(x,y+l) | (3) 3 2 However, for a more robust implementation one can instead use: e x = | s(x+l,y-l) + 2s(x+l,y) + s(x+l,y+l) - (4) s(x-l.y-l) - 2s(x-l,y) - s(x-l,y+l) | and e y - | s(x-l,y+l) + 2s(x,y+l) + s(x+l,y+l) - (5) s(x-l.y-l) - 2s(x,y-l) - s(x+l,y-l) | The threshold selection algorithm defines the threshold, T, as: n-1 m-1 2 S [e(x.y) x s(x,y)] x-2 y-2 T - (6) n-1 m-1 S 2 e(x,y) x-2 y-2 Equation (6) could be described as a quotient between the sum for the entire image of the multiplication between the gradient of each pixel and i t s intensity value, and the sum of a l l gradients of the image. Thus, pixels that contribute significantly to the numerator of the equation are only those that have large gradients and hence are l i k e l y to be a part of the edge of the object. This algorithm assumes that the pixels which are part of the background or the inner part of the objects are more uniform in intensity than those at the edges of the objects. Division by the total gradient of the image is a means of averaging a l l of the edge pixel values in order to arrive at a pixel value which best represents the average edge pixel intensity and hence would be the best choice to divide the image into the background and the objects. As t h i s algorithm i n the above form assumes that there i s only background and the object i n the image, i t w i l l not perform properly i f the object has two or more areas of d i f f e r e n t i n t e n s i t i e s , i . e . a dark nucleus and l i g h t e r cytoplasm. This problem can be circumvented by not c a l c u l a t i n g a global threshold, but by computing a s p a t i a l l y v a r i a b l e threshold. In that case, the image i s f i r s t p a r t i t i o n e d into smaller windows f o r which relevant thresholds are independently c a l c u l a t e d . The assumption here i s that f o r stained c e l l s the nucleus i s usually surrounded by cytoplasm, and one only needs to p a r t i t i o n the image such that i t i s u n l i k e l y that any one of the smaller windows encompasses edges from both the background/cytoplasm and cytoplasm/nucleus regions. For images that e x h i b i t a contrast between the background and cytoplasm and between the cytoplasm and the nucleus, as f o r example i n a stained c e r v i c a l c e l l , the thresholds c l u s t e r into two d i s t i n c t groups: one group f o r the background/cytoplasm boundary and the other for the cytoplasm/nucleus boundary. A threshold derived from the global histogram of the image i s used to determine i f the window background/cytoplasm thresholds are reasonable. Bright spots i n the image can generate high l o c a l thresholds which are e a s i l y detected and ignored when compared against the global threshold. S i m i l a r l y , low thresholds generated by the edges of overlapping cytoplasm (causing dark zones i n the cytoplasm) can also be detected and ignored. Some care must be taken since some windows w i l l not encompass any true edges. The algorithm recognizes windows with meaningful edges by discarding any window threshold for which the denominator of the 3 0 equation (6) i s not large enough. This can be done by using the simple image s t a t i s t i c algorithm to threshold the denominators of a l l windows covering the image. This assumes that the edge magnitude of the background/cytoplasm boundary i s approximately the same s i z e as the edge magnitude of the cytoplasm/nucleus boundary. While t h i s i s generally true, i t i s sometimes not the case. For example, i n some images the edge of the nucleus i s very intense while the background/cytoplasm boundary i s weak. In these images the simple image s t a t i c algorithm chooses a denominator threshold which i s too high and the windows containing the cytoplasm boundary are discarded (not thresholded). The window thresholds can also be used to c a l c u l a t e appropriate thresholds f o r i n d i v i d u a l p i x e l s . In t h i s algorithm the method defines the threshold on a p i x e l by p i x e l basis using a four point Langrangean 3 9 i n t e r p o l a t i o n among window thresholds (figure 8). The threshold at each p i x e l i s determined by a l i n e a r weighted sum of the threshold of the windows nearest the p i x e l . Each window threshold i s assumed to be located i n the window center and the weight assigned each threshold i s determined by i t s distance from the p i x e l l o c a t i o n under consideration, as indicated by the formula i n f i g u r e 8. . For the color segmentation a p p l i c a t i o n of t h i s algorithm, the A 2 (blue) image i s segmented into only background and object of i n t e r e s t , cytoplasm i n c l u d i n g the nucleus. The background i s then removed from the Xi (red) image, and the algorithm segments the remainder of the Xt image into two classes: cytoplasm and nucleus. In t h i s way the algorithm must determine only two classes f o r each processed Image, the task for which i t was o r i g i n a l l y designed. Consequently, the color implementation should be more robust and accurate than the monochrome implementation. I I 26 WINDOW A WINDOW C WINDOW B WINDOW D To" (a + b)(c + d) [bdT A + bcT R + daT c + caT D] j FIGURE 8: INDIVIDUAL PIXEL THRESHOLD ASSIGNMENT USING A FOUR POINT LANGRANGEAN INTERPOLATION Individual p i x e l threshold (Tp) assignment i s based on a four-point i n t e r p o l a t i o n (actual formula i s given i n the figure) between the window thresholds (T A, T f i > T c and f n ) of the nearest four windows. 2.3.4. Local Histogram Threshold S e l e c t i o n This algorithm is similar to that of 2.3.3 in that i t subdivides the images into smaller windows and calculates a threshold only for those windows which cover significant edges. However, the actual window threshold selection is performed by a valley-finding routine on a conventional histogram of the window. This algorithm assumes that the areas of interest are relatively uniform only locally and that these are separated by sparse, non-uniform areas. Thus the algorithm does not depend on the areas of interest being uniform over the entire image. The color segmentation application of this algorithm is also similar to that of 2.3.3, however the local thresholds are calculated by the local histogram threshold selection algorithm. When algorithms 2.3.3 and 2.3.4 were used to perform monochrome segmentations, i t was found that the calculated cytoplasm/nucleus thresholds were neither consistently nor reliably the best possible thresholds. However, these thresholds were usually in the immediate neighborhood of the best thresholds and could thus be used as the starting location for a restricted search employing the global histogram. The smoothed (filtered) global histogram of the image can be searched for the deepest valley in the immediate v i c i n i t y of the previously determined cytoplasm/nucleus thresholds rather than throughout the histogram, saving time while also increasing accuracy. When used to perform color segmentation algorithms 2.3.3 and 2.3.4 usually segment the nuclei correctly, but occasionally miss most or part of the cytoplasm. For this reason a separate cytoplasm color segmentation method was implemented which could be applied independently of the nuclear segmentation method. This is described below (2.3.5). 2.3.5. Three-Dimensional Thresholding This method defines the cytoplasm using a three-dimensional thresholding of a t r i v a r i a t e histogram. The algorithm i s very s i m i l a r 4 0 to the two-dimensional thresholding used by E. Bengtsson et al. , but assumes that three images at d i f f e r e n t wavelengths, A 1 ( A 2, A 3, are a v a i l a b l e . I t i s assumed that the highest transmission area i n the image i s the background which also has a narrow, symmetric, sharply peaked i n t e n s i t y d i s t r i b u t i o n . These are reasonable assumptions f o r c a l i b r a t e d images of c e l l s deposited as monolayers. The i n t e n s i t y l o c a t i o n of the high transmission peak i n each of the one-dimensional histograms corresponding to the A l f A 2, and A 3 images i s found and denoted as P x, P 2, and P 3. A threshold for each image i s c a l c u l a t e d i n the following manner. Each histogram i s smoothed ( f i l t e r e d ) and examined. I f a trough e x i s t s i n the histogram between the high transmission peak P and P minus 15 gray l e v e l s , P-15, then t h i s trough l o c a t i o n i s used as a threshold. I f a trough can not be found, then the second thresholding method i s t r i e d . This method finds the t o t a l number of p i x e l s , N, which have i n t e n s i t i e s higher than the high transmission peak P. Also c a l c u l a t e d f o r t h i s method are the number of p i x e l s N^(I) whose i n t e n s i t i e s f a l l between the high transmission peak and the i n t e n s i t y value, I, including the number of p i x e l s with i n t e n s i t y I. The l a r g e s t value of I f o r which N L(I) > 1.5 N i s used as the threshold f o r the image, provided P > I > P-15. I f the condition N L(I) > 1.5N i s not s a t i s f i e d f o r the range of I then the t h i r d thresholding method i s used. In t h i s method P-5 i s a r b i t r a r i l y used as the threshold. The three thresholds found, i n each of A l f A 2 and X3 images, and the highest i n t e n s i t y values of the 3D histogram w i l l define a box around the high transmission peak which contains only background p i x e l s . A l l p i x e l s which have i n t e n s i t i e s l a rger than a l l three thresholds are assumed to belong to the background, and the remaining p i x e l s to belong to the cytoplasm and n u c l e i . Figure 9 depicts t h i s procedure. 2.3.6. S p l i t and Merge Procedure In t h i s procedure the areas of i n t e r e s t are assumed to be regions of connected image points that are moderately homogeneous i n some l o c a l photometric property, e.g. i n t e n s i t y values, texture, etc. The homogeneity i s measured by a uniformity predicate. This predicate determines the i n t e r p r e t a t i o n of "moderately homogeneous". The detection of uniform areas i n images can be divided into three d i f f e r e n t approaches.*\ 1) Region merging: the image i s i n i t i a l l y constructed of many small regions (pixels) which are merged so that the image i s constructed of a few larger regions; 2) Region s p l i t t i n g : a large region, i . e . the e n t i r e image, i s s p l i t i nto smaller and smaller regions u n t i l the i n d i v i d u a l regions s a t i s f y a uniformity c r i t e r i o n ; 3) Region s p l i t t i n g and merging: a combination of the previous two approaches. 42 This approach was o r i g i n a l l y used by Horowitz and P a v l i d i s and incorporates the procedural modification suggested by Cheevasuvit et 4 3 al. A quick summation of t h i s procedure i s as follows. Let X be an image and s ( i , j ) the i n t e n s i t y value of the p i x e l located i n p o s i t i o n 30 100 P T P-15 o 8 0 -53 6 0 -53 4 0 -2 0 -0 -2 5 0 2 0 0 150 100-n >-> 8 0 -o 53 6 0 -5 - 4 0 -<D 2 0 -0 -100 5 0 Intensity (gray levels) P T P -15 2 0 0 150 100 5 0 Intensity (gray levels) 2 5 0 2 0 0 Intensity (gray levels) FIGURE 9: THREE DIFFERENT METHODS OF DETERMINING A THRESHOLD In segmentation method 2.3.5, the thresholds for the individual colors may be determined by three different procedures shown in A), B) and C). A l l three procedures need to f i r s t find the position of two features in the histograms, one the background peak location, P, and two the background peak location minus 15 gray levels, P-15. In A) the existence and location of a valley between P and P-15 determines the threshold value. In B) the background peak is not symmetric and the threshold is . positioned where the peak asymmetry becomes larger than some pre-determined amount (see text). In C) no valley exists between P and P-15, and the peak is symmetric, thus the segmentation method 2.3.5 decides that the threshold should be located at P-5, a value which was heuristically determined. Let B n be the n-th connected subset of X, i . e . Bn i s a group of connected p i x e l s i n X which share some common property. The uniformity predicate P can be defined for any subset B n of the image X as: P(B n) - True i f and only i f the image subset B n f u l f i l l s the uniformity c r i t e r i a . - f a l s e otherwise 4 2 The uniformity c r i t e r i o n used i s that of Horowitz and P a v l i d i s which requires that I s d j L . J i ) - s ( i 2 , j 2 ) | < e <7> fo r a l l p i x e l s ( i , j ) , a n d (i 2 . J 2 ) i n B n This simply states that a l l subsets, B n, of the image X be uniform, such that the i n t e n s i t y v a r i a t i o n within each subset i s le s s than e over the en t i r e subset. The s p l i t and merge procedure provides a segmentation f o r which the following conditions are met: 1. The regions of connected p i x e l s , B n, when taken together completely make up the image X, i . e . there are no p i x e l s i n X which are not i n one of the subsets B n. 2. None of the regions of connected p i x e l s , B n, overlap or more formally, the i n t e r s e c t i o n of subset B x with By i s the empty set. 3. For a l l the p i x e l s i n each subset, B n, the equation (7) i s true. 4. I f B„ and B„ are two regions of connected p i x e l s which are next to each other, equation (7) i s f a l s e f or some of the p i x e l s i n the region made up of the p i x e l s of B x plus the p i x e l s of B . The a c t u a l procedure followed by the algorithm i s : 1. P a r t i t i o n the image into a regular array of large square blocks e.g. 8 x 8 p i x e l s . 2 . Evaluate the uniformity predicate f o r each block. 3 . Merge adjacent blocks which i n d i v i d u a l l y and c o l l e c t i v e l y s a t i s f y the uniformity predicate (the actual merging sequence used i s not s t r a i g h t forward and i s described below). 4. S p l i t a l l blocks f o r which the uniformity predicate i s f a l s e into four smaller blocks. 5 . I f blocks are now the si z e of s i n g l e p i x e l s do step 3 and stop, otherwise go to step 2 and continue. Each square block f o r which the uniformity predicate has j u s t been evaluated has four adjacent neighbor regions. The standard merging 4 2 procedure does not check a l l the neighbors of a region (block) to determine the optimal merging of regions. The union of each neighboring region (for which the uniformity predicate i s true) with the block under consideration should be evaluated and the grouping of regions, f o r which the uniformity c r i t e r i o n i s the smallest, are merged. This modification to the previously described merge procedure improves the q u a l i t y of the 4 3 merging. The above modification even with the s p l i t and merge method i s optimized only l o c a l l y and not g l o b a l l y . Thus, due to the competition between adjacent subsets, a small v a r i a t i o n i n the segmentation process, e.g. a s l i g h t v a r i a t i o n i n the s t a r t i n g point or 4 3 the s i z e of e can produce very large differences i n the r e s u l t . Another modification to the s p l i t and merge procedure, suggested by Hassman and L i e d t k e , 4 1 i s to bias the merging procedure such that blocks are merged preferentially with larger, older neighboring regions as opposed to smaller, younger neighboring regions. The determination of e greatly affects the performance of the s p l i t and merge algorithm. A straightforward determination of e, as suggested by Hassman and Liedtke* 1 is based on the number of significant peaks found in the intensity histogram of the image. This algorithm requires that the spacing between peaks be constant. Thus a piecewise linear gray level transform must be applied to the image such that the spacing between the peaks becomes constant and the gray levels are adjusted such as to cover the f u l l eight b i t dynamic range. The gray levels in between the newly shifted peaks are linearly interpolated. The value of e, is then calculated to be 256 - c e - (8) Numpeaks where Numpeaks - number of significant peaks, and c is a corrective term, the value of which is determined experimentally. This s p l i t and merge routine usually results in the background becoming one connected region and the cytoplasm and nucleus forming a collection of connected regions which cannot be merged without violating the uniformity predicate. Also, the region boundaries have a tendency to be unduly coarse. The latter can be reduced by replacing each region in the image by the average intensity for that region and then applying 4 4 4 S a relaxation process ' to force the collection of connected regions to belong to either the nucleus or cytoplasm. The relaxation process w i l l also smooth out the coarse boundaries of the various regions. The actual relaxation process is described in 2.3.8. Without using an involved l a b e l l i n g scheme and a much more complicated r e l a x a t i o n process, i t i s d i f f i c u l t to segment a s i n g l e gray scale image into background, cytoplasmic and nuclear areas using the s p l i t and merge process. When using color images, the s p l i t and merge method uses the red image to f i n d the nuclear areas and the blue image to i d e n t i f y the background and cytoplasm. 2.3.7. Nuclear Radial Contouring The algorithm, as well as a l l subsequent methods, require a segmentation r e s u l t from one of the previously described methods. They are used to improve the d e l i n e a t i o n of the nuclear area. A l l require some general information as to the si z e and l o c a t i o n of the n u c l e i f o r which the segmentation i s to be improved. 46 A r a d i a l contouring method as described by Bengtsson et al. may be used to r e f i n e the segmentation of the n u c l e i . In t h i s procedure the area of i n t e r e s t has already been roughly defined using a nuclear segmentation. This p a r t i c u l a r a p p l i c a t i o n uses previous threshold l e v e l information and t o p o l o g i c a l knowledge about the n u c l e i , such as the l o c a t i o n of the center of the i n d i v i d u a l n u c l e i and the shape of n u c l e i . This i s accomplished by f i r s t f i n d i n g the center, C, of the roughly defined nucleus. This point C i s then used as the o r i g i n of a polar coordinate system and the image i s transformed from a c a r t e s i a n coordinate array to an array of a prescribed number of evenly spaced r a d i a l vectors. The r a d i a l image i s then f i l t e r e d using a 1x3 average f i l t e r . The f i l t e r o r i e n t a t i o n i s such that i t averages p i x e l s which are the same distance from the nucleus center. Thus t h i s f i l t e r i s a c i r c u l a r averaging f i l t e r . From t h i s new image isodensity contours are generated f o r a range of i n t e n s i t y l e v e l s centered around the threshold used by the o r i g i n a l nuclear segmentation. For each i n t e n s i t y l e v e l examined, an isodensity contour i s generated by connecting the f i r s t ( c l o s e s t to the nuclear centre) l o c a t i o n along each radius that i s l i g h t e r than the i n t e n s i t y l e v e l being analyzed. The r e l a t i v e smoothness of each isodensity contour i s determined. Relative smoothness i s ca l c u l a t e d as the sum of the absolute differences i n r a d i a l length between each p a i r of successive locations along the isodensity contour divided by the average r a d i a l length of the contour. Thus, t h i s smoothness measure w i l l be small f o r contours which have constant r a d i a l distance, i . e . are c i r c u l a r i n the c a r t e s i a n coordinate image. The normalization by the average r a d i a l length i s to make the smoothness measure l e s s dependent on the s i z e of the nucleus. The i n t e n s i t y value, from the previously s p e c i f i e d range of i n t e n s i t y values, which has the smoothest isodensity contour i s chosen as the new threshold which i s used to segment the n u c l e i i n the c a r t e s i a n coordinate image. An example of some of the steps i n t h i s algorithm are shown i n fig u r e 10. 2.3.8. Relaxation Process This method uses a p r o b a b i l i s t i c r e l a x a t i o n process to modify the nuclear segmentation of one of the previous methods and I t also requires a selected threshold. The a priori information used by t h i s algorithm i s derived from the r e s u l t s of the previous segmentation and i s expressed i n the form of a p r o b a b i l i t y that a p i x e l belongs to the cytoplasm and/or background (thereafter c a l l e d background), or belongs to the nucleus, or belongs to the background but i s also adjacent to the 36 FIGURE 10: AN EXAMPLE OF THE NUCLEAR RADIAL CONTOURING THRESHOLD SELECTION PROCESS The nuclear r a d i a l contouring algorithm i s used to r e f i n e the segmentation of the nucleus, thus a rough nuclear mask i s assumed to have been generated previously. This rough mask i s used to f i n d the approximate center of the nucleus i n image A) and a r a d i a l transform image centered on this point i s generated and shown i n B). In the r a d i a l transform image the nucleus i s seen as a dark s t r i p along the l e f t side of B). The roughness (see text) of isodensity contours at various i n t e n s i t i e s i s ca l c u l a t e d and can be displayed as histogram C) where height i s proportional to roughness. The i n t e n s i t y l e v e l with the lowest roughness value, the deepest trough i n the histogram, i s used as a threshold to segment the nucleus of image A). The r e s u l t i n g nuclear segmentation i s shown as the dark area i n D). nucleus, or to the nucleus but is also adjacent to the background. This algorithm assumes that a pixel which has bright neighboring pixels should i t s e l f be bright and similarly a pixel which has dark neighbors should i t s e l f be dark. The relaxation process used in this work is very similar to that 4 6 4 7 4 8 described by Rosenfeld et al. ' ' with some of the modifications suggested by Peleg et a l . * * Probabilistic relaxation can be used to classify a set of A^, . . .P^ objects into m classes C^, ...Cm. In our case, we wish to classify n (where n is the number of pixels in the image) pixels into only 2 classes: background and nucleus. Thus the process described below is for a subset of two classes of a more general case. Each object pixel has a local measurement of Intensity i x which can be used to estimate the probability, PX(C^) or P X(C 2), of the object A x belonging to the class or the class C 2. The probability of an object belonging to one of the two possible classes or C2 must equal unity. Thus P x ( d ) - 1 - P X(C 2) for a l l A^ and 0 <PX(C,)^1. (9) The relaxation process is an algorithm which iteratively updates the probabilities of an object belonging to class Cy (where y can be either 1 or 2). These probabilities are updated using a set of calculated compatibility coefficients r(Ax,Cy,,AjJ,C(j), which vary from 1 to -1. The compatibility coefficient r(A x,Cy;A b,C d) reflects the compatibility of pixel A x being assigned to the class Cy and pixel A^ belonging to class C^. In this case we are interested only in nearest neighbor interactions, thus r() - 0 for a l l non-neighboring pairs of p i x e l s i . e . r()=0 i f p i x e l A^ i s not next to p i x e l A x. In the image only the 8 p i x e l s touching a p i x e l are considered to be neighboring p i x e l s . The c o m p a t i b i l i t y c o e f f i c i e n t should be close to one for compatible c l a s s assignments, (A^ and A Q are both background pi x e l s ) close to minus one for incompatible cl a s s assignments. In the p r o b a b i l i t y updating procedure the average value of r() for each of the possible assignments of to c l a s s C y i s cal c u l a t e d : 8 2 q x y - I 2 2 rCA^Cy A b , C d ) P b ( C d ) (10) 8 b=l d=l Thus, for each p i x e l A^ to be updated two q x y ' s are calculated, q x ^ and q x 2 corresponding to assigning A^ to cl a s s 1 (background) and assigning A x to cl a s s 2 (nucleus). In the general case, the updating i 4 8 process i s defined as P x Z + i ( C y ) . w < * * v E P x Z ( C j ) ( l + q z x j ) thus for our case of 2 classes, t h i s i s reduced to: P 2 + 1 (C )- y 7  x K y' 1 + P x Z ( C 1 ) ( q Z x l - q Z x 2 ) + q Z x 2 (12) In these two equations P z() represents the c l a s s i f i c a t i o n z+1 p r o b a b i l i t y f or the zth i t e r a t i o n of the updating process and P () the c l a s s i f i c a t i o n p r o b a b i l i t y f or the z+1 i t e r a t i o n . o One can estimate the i n i t i a l p r o b a b i l i t i e s P (C ) i n various 4 8 ways. One way was suggested by Rosenfeld and i s used i n t h i s work to approximate the image's histogram by a l i n e a r combination of two Gaussian p r o b a b i l i t y d i s t r i b u t i o n s (one for each class) and then define \ P x (C-^ ) and P x (C2) as the probability of a gray level belonging to one or the other of the Gaussian distributions. The value of r() can also be estimated in a number of ways. In 4 5 this instance the method used was that suggested by Rosenfeld et al. r (,Cy;A b,C d) - log-^ Q [prob(object x in class y and object b in class d)/((prob(object x in class y)*prob(object b in class d))] (13) If one class of objects is prominent (makes up most of the image), using the above value of r() can lead to a non-informative segmentation 4 4 (entire image becomes one class). Peleg et al. suggested a modification which corrects for this problem. r*(A x,C y; A b,C d) = r(A z,C y;A b,C d)[l-P x°(C y)][l-P b°(C d)] (14) Unfortunately, this has a tendency to make the self support term (y-d) for rare classes very large which is usually an undesirable effect. We have found that the following equation produces reasonable results for the majority of the images used in this study. r*(A x,C y;A b,C d) - r(A x,C y;A b,C d)[l-l/2(prob(object x in class y and object b In class d))] (15) Further the values of r () are normalized such that the largest value of r () has an absolute value of unity. The exact definition of r() does not seem to matter greatly in ^ -• „ , 4 5 4 8 practice according to Rosenfeld et al. ' To summarize, this method assumes that i f a pixel's neighbors are predominantly of one class, the probability that the central pixel also belongs to that class should be increased. The amount of the increase is dependent upon the compatibility coefficients, and the number and certainty (probability of belonging to one class or another) of i t s neighbors. The iterative probability modification continues u n t i l 90% of the pixels are unambiguously defined (probability of belonging to once class or another >—90%).' Figure 11 demonstrates the effects of this procedure on the intensity distribution of the image. 2.3.9. Edge Relocation Algorithm This segmentation method generates a closed contour precisely along the edge of an area of interest, e.g. nucleus, which has been roughly defined previously. The a priori information u t i l i z e d by the edge relocation algorithm are the intensity difference between the nucleus and the cytoplasm/background, connectivity of the nucleus, boundary gradient magnitude information along the edge of the nucleus, size of the nucleus, edge connectivity information, and the approximate location of the nucleus in the image. The edge relocation algorithm requires as input: 1) the image for which the segmentation of the nucleus is to be refined; 2) the nuclear segmentation which is to be refined (to be known as the roughly segmented nuclear mask); 3) the gradient transform of the image to be segmented. 4 0 SO Gradient operators tested were Sobel operator, a 3x3 Range f i l t e r , s 1 and the Kirsch operator. The algorithm uses nuclear boundary information from the input nuclear segmentation in the following fashion. The input nuclear segmentation boundary, (the pixels in the nucleus which have non-nuclear neighbors) is dilated several times, i.e. any pixel touching a boundary pixel becomes part of the dilated boundary. The actual number of dilations used should be matched to the magnification and size of the o G <u 3 cr a) u B) o c tu 3 cr ai u C) C/N Threshold Gray Levels C/N Threshold Gray Levels C/N Threshold Gray Levels Figure 11: AN EXAMPLE OF THE EFFECTS OF A RELAXATION PROCESS ON THE INTENSITY DISTRIBUTION OF AN IMAGE The r e l a x a t i o n process i s used to increase the contrast between the cytoplasm and the nucleus. This enhances the a b i l i t y of the o r i g i n a l threshold generated by one o f the primary segmentation procedures to c o r r e c t l y segment the nucleus. Figure A) i s the o r i g i n a l histogram of an image along w i t h the threshold selected by one of the primary segmentation procedures. Figure B) shows the histogram of the image af t e r 6 i t e r a t i o n s of the r e l a x a t i o n process. Figure C) shows the histogram of the image af t e r 20 i t e r a t i o n s o f the r e l a x a t i o n process. Note how narrow the peaks i n the histogram are and how large the v a l l e y between the peaks i s . Thus any threshold within the large v a l l e y w i l l now c o r r e c t l y segment the nucleus whereas i n A) the range for a correct threshold i s much smaller. 4> objects being segmented. I t was found that f o r the 40X objective two d i l a t i o n s were adequate. The d i l a t e d boundary i s the f i r s t step i n the formation of a mask which represents the image locations which must be analyzed to f i n d the r e f i n e d nuclear boundary and thus i s denoted as the possible edge mask. The second step i s to also include i n t h i s mask a l l p i x e l s i n the roughly segmented nucleus which have i n t e n s i t y values larger than a given threshold. The actual threshold may be determined i n a number of ways. For example one can use a set value r e l a t i v e to the range of i n t e n s i t y values i n the nucleus. A more complex procedure would include analysis of the histogram of the nuclear I n t e n s i t i e s , c a l c u l a t i n g the number of p i x e l s below a given threshold as a function of the threshold value and s e l e c t i n g the threshold which includes a set proportion of the nuclear p i x e l s . The t h i r d step i s to remove a l l p i x e l s i n the roughly segmented nuclear mask below a given threshold such that t h e i r removal does not a l t e r the connectivity ( E u c l i d i a n topology) of the possible edge mask, i . e . the p i x e l s which connect d i f f e r e n t parts of the possible edge mask together cannot be removed. The threshold used i n t h i s step i s independent of that described i n the second step. Those areas which are included i n the roughly segmented nuclear mask but do not belong to the edge mask (they are part of the dark i n t e r i o r of the nucleus) are saved as they w i l l l a t e r be used as seed areas to r e f i l l the nucleus once the exact nuclear boundary i s determined. Figure 12 shows an example of the determination of a possible edge mask. 43 FIGURE 12: GENERATION OF POSSIBLE EDGE MASK A) Roughly segmented and l a b e l l e d nuclear mask of the image shown i n Figure 14A. B) D i l a t i o n of the boundaries of image A. C) Inclusion of the l i g h t areas of the nucleus to the possible edge mask i n B. This i s the res u l t of the second step as described i n the text. D) Exclusion of the dark areas of the nucleus from the possible edge mask C. This i s the res u l t of step three which includes the requirement that a p i x e l removal does not a l t e r the topology of the possible edge mask. The dark area i n image D i s the possible edge mask. Once the possible edge mask has been determined i t is conditionally eroded. The c r i t e r i a for the erosion of a pixel from the possible edge mask are: 1) The gradient magnitude of the corresponding pixel in the gradient magnitude image must be below a given threshold. 2) When processing the image from l e f t to right, top to bottom, the pixels immediately to the l e f t and above the pixel under consideration must not have been just removed. This ensures that in an area of similar gradient values, the edge w i l l be located in the middle of the area and not along one of the extremities. 3) The removal of the pixel must not change the connectivity/topology of the possible edge mask. In the beginning of the erosion process, a low gradient threshold is selected and the possible edge mask is eroded until no more pixels can be eroded for that threshold. Then the threshold is raised slightly and the erosion process is repeated. The amount i t is raised each time can be constant or variable; in this work we used the latter approach. The amount the threshold was raised each time was chosen so that a constant number of pixels in the possible edge mask would have gradient values less than the new threshold. This process continues un t i l the gradient threshold reaches the maximum gradient value present in the gradient image. See figure 13 for an example of the erosion process on a possible edge mask. The result of this conditional erosion is a closed contour which surrounds the nucleus and follows the path which has the largest gradient values around the circumference of the nucleus (see figure 14). A) Possible edge mask (same as Figure 12D). B) Erosion of the possible edge mask i n image A. C) Erosion of image B. D) Erosion of image C. 46 A • 0 FIGURE 14: RESULTS OF THE EDGE RELOCATION ALGORITHM A) Red image of a stained cervical c e l l . This is the c e l l processed in Figure 12 and 13. B) Final edge mask from the erosion process as described in Figure 13. C) Gradient transformation of red image with the superposition of the f i n a l edge mask. D) The fi n a l nuclear mask of the nucleus of the c e l l i n A. This closed contour represents the edge of the nucleus. The seed areas previously saved are used to f i l l the appropriate closed contours. In this manner the closed contours, generated by the interaction of nuclear concave indentations in the nuclei during the dilation step, are not f i l l e d in as nuclei. In the fi n a l step, the algorithm must determine whether the individual edge pixels belong to the nucleus or to the background. Two methods were examined: 1) based on the pixel's intensity, assign the pixel to the nucleus i f i t ' s intensity value is closer to the nuclear mean intensity than to the outside mean intensity; 2) for a 3x3 pixel square neighborhood centered on the pixel in question, calculate the means of those pixels belonging to the nucleus and those belonging to the outside and then assign the central pixel to that class which has the closer mean. A feature of this algorithm is that the nuclear mask so generated can be used as an input for the roughly segmented nuclear mask defined previously. The iterative use of the edge relocation algorithm very quickly, in 2 to 3 iterations, results in a steady state nuclear segmentation. That i s , any further iterations of this algorithm do not change the nuclear segmentation. Once the image(s) have been segmented by one method or another, each object in the image needs to be uniquely labelled so that the appropriate features may be calculated for the correct objects. The type of connectivity used to label the objects can make a difference in S 2 cellular measurements. Eight-connectedness, where edge-adjacent and corner adjacent p i x e l s are considered neighbors, assuming a rectangular t e s s e l l a t i o n , i s used i n t h i s work f o r a l l connectivity analysis unless stated otherwise. 2.4. Cellular Features Once the c e r v i c a l c e l l images have been segmented and l a b e l l e d one can numerically describe the cytoplasm nucleus p a i r s which constitute i n d i v i d u a l c e l l s . The numerical d e s c r i p t i o n should be such that the differences between the various c e l l types (normal mature squamous e p i t h e l i a l , m ildly d y s p l a s t i c , etc.) are numerically detectable. Almost a l l of the features described i n t h i s section are eit h e r 19 5 3 6 1 d i r e c t l y from the l i t e r a t u r e ' or are modified features from the l i t e r a t u r e . Some were developed f o r t h i s work, but may also e x i s t elsewhere. Since the c e r v i c a l c e l l s are represented by c o l l e c t i o n s of connected p i x e l s i n rectangular arrays, some of the more s t r a i g h t forward features are defined below. 1) CA Cytoplasm area: the number of connected p i x e l s forming an object which has the s p e c t r a l properties of the cytoplasm. 2) NA Nuclear area: the number of connected p i x e l s forming an object which has the spe c t r a l and shape properties of nuclear material. 6 2 2 3) NComp Nuclear Compactness: Is the (circumference) of the nucleus l i k e object divided by 4HNA. The d e f i n i t i o n of 6 3 circumference used i s very s i m i l a r to that of Freeman with a corrective term similar to that of Vossepoel and 64 Smeulders. The actual algorithm used for the circumference was: Circumference - Nx + J2 II N 2 + 2.0 N 3 (16) where: Nj is the number of edge pixels in the object with only 1 non object neighbor, N 2 is the number of edge pixels with 2 non object neighbors, and N 3 is the number of edge pixels in the object with 1 neighbor which belong to the object. Four-connectedness, where only edge-adjacent pixel are considered neighbors was used in this algorithm. Nuclear Compactness is thus defined as: 2 NComp - (circumference) /4IINA (17) 6 5 4) NInert Nuclear Inertia: is 211 times the moment of inertia of the nuclear mask, J, divided by the nuclear area squared. In this instance J- S r where r is the distance of the pixels from the ftxels centre. object Thus NInert - 2IIJ/(NA)2 (18) 5) NMeanR Nuclear Mean Radius: is the average distance from the center of the nucleus to the edge pixels of the nucleus. 6) NMaxR Nuclear Maximum Radius: is the largest distance from the center of the nucleus to the edge pixels of the nucleus. 7) NRVar Nuclear radial variance: is the normalized variance of the distribution of the distance from the center of the nucleus to the edge pixels of the nucleus. NRVar = Radial variance/NMeanR. (19) The other shape features depend upon interpreting the nuclear boundary as a radial function of orientation. The closed contour of the nuclear boundary can be transformed into a p e r i o d i c r a d i a l function of theta r ( j ) . In t h i s implementation the boundary i s represented by 128 r a d i a l vectors which represent the distance of the boundary pi x e l s from the center of the nucleus. A Fast Fourier Transform (FFT) i s then performed on the r a d i a l data so that the r a d i a l function r ( j ) can be represented by a truncated series of s i n u s o i d a l waveforms.* 0 m m r(9) = a 0 + 2 a n cos n8 + S b n s i n n6 (20) o n-1 n-1 In t h i s s e r i e s a Q/2 represents the average r a d i a l length. The f i r s t two terms a x and bi represent the o f f s e t required by a c i r c l e which best f i t s ( i n the l e a s t square sense) the o r i g i n a l contour. The next two terms a 2 and b 2 determine an e l l i p s e which i s f i t t e d i n the least square sense to the contour. The major axis of this e l l i p s e i s aQ 2 2 1 /2 2 + 2 ( a 2 + b 2 ) ' and the minor axis of t h i s e l l i p s e i s aQ - 2(a 2 + b 2 2 ) V 2 . 8) NEllong Nuclear elongation: i s the r a t i o of the major axis/minor axis of the above e l l i p s e . A useful shape descriptor should be independent of p o s i t i o n , s i z e , and o r i e n t a t i o n of the object to be described. Shape descriptors which 40 include a n and b n i n a symmetric way w i l l have the above properties. The power spectra of the Fourier components has t h i s property. 9) NBdycrc Nuclear boundary v a r i a t i o n course: measures the energy i n the frequency spectrum from the t h i r d to the tenth harmonic. 51 NBdycrc 2 2 = 2 (a n + b n) (21) n=3 10) NBdyfin Nuclear boundary variation fine: measures the energy in the frequency spectra from the 10th to the 31st harmonic of the fourier transform description of the nuclear contour. The frequency, intensity, and spatial organization of the density distributions in the objects are properties which are very useful in the discrimination between c e l l types. The following features describe the intensity distribution in the objects. 11) RMeanI Red mean intensity: is the mean intensity in the red image of the object after the individual pixel intensity values have been corrected for the non-uniform behaviour of the camera. 12) GMeanI Green mean intensity. 13) BMeanI Blue mean intensity. 14) IODR Integrated Optical Density of Red: is the integrated optical density of the object in the red image after the individual pixel intensity values have been corrected for 6 6 the non-uniform behaviour of the camera. The IOD is defined as n=31 NBdyfn - 2 ( a n 2 + b n 2) (22) n - l l ) = 2 OD (x,y) where OD(x,y) = l o g 1 0 I ( x , y ) - log I R and A n (23) i s the sum over the n p i x e l s i n the object. I(x,y) i s the corrected i n t e n s i t y at p i x e l (x,y) and l g i s the average i n t e n s i t y of the background a f t e r camera correction. From an 3 7 i n v e s t i g a t i o n of the IOD measurement process i t has been demonstrated that the background value l g should be estimated with the highest accuracy. For the cytoplasmic IOD c a l c u l a t i o n s this requires the c a l c u l a t i o n of the average background i n t e n s i t y value. For the nuclear IOD c a l c u l a t i o n s t h i s requires the accurate c a l c u l a t i o n of the average i n t e n s i t y of the cytoplasm i n which the nucleus i s located. The average cytoplasmic i n t e n s i t y value i s used as the background value i n the OD c a l c u l a t i o n . By excluding a l l those areas of the cytoplasm i n the v i c i n i t y of the a r t i f a c t s and n u c l e i from the c a l c u l a t i o n of the average cytoplasmic i n t e n s i t y an accurate estimation of the average cytoplasmic i n t e n s i t y can be generated. Figure 15 demonstrates how the area used to estimate the average cytoplasmic i n t e n s i t y i s determined. 15) IODG Integrated O p t i c a l Density of Green. 16) IODB Integrated O p t i c a l Density of Blue. 17) ODMax Optical Density Maximum: i s the largest o p t i c a l density value detected inside the object i n the red image. 5 5 18) ODVar Optical Density V a r i a t i o n : i s the normalized v a r i a t i o n of the o p t i c a l density values found i n the object i n the red image. s 6 19) ODSkew Optical Density skewness : i s the t h i r d moment of the o p t i c a l density d i s t r i b u t i o n s found i n the red image 5 3 FIGURE 15: DETERMINATION OF CYTOPLASM TO BE USED TO CORRECT NUCLEAR OD VALUES The accurate c a l c u l a t i o n of nuclear OD values requires the accurate c a l c u l a t i o n of the average i n t e n s i t y of the cytoplasm i n which the nucleus i s found. A) i s an image of a stained c e r v i c a l c e l l . B) i s the nuclear mask generated by the segmentation of image A). The post-processing routine recognizes the a r t i f a c t s i n B and removes them. Resulting nuclear and cytoplasmic mask shown i n C). I f th i s cytoplasmic mask was used to cal c u l a t e the average cytoplasmic i n t e n s i t y the r e s u l t would be a f f e c t e d by the dark a r t i f a c t s located i n the cytoplasm. Instead the nuclear mask i n B) i s d i l a t e d several times and the p i x e l s which are l e f t i n the cytoplasm are used to ca l c u l a t e the average cytoplasmic i n t e n s i t y shown as the medium gray area i n D). representation of the object. Normalized by the second moment of the optical density distribution in the object. ODSkew = E (OU(x.v) - ODmean)3 (24) A n (n - 1)(0DV)3 ss 20) ODKurt Optical Density Kurtosis : is the fourth moment of the optical density distribution found in the red image representation of the object. Normalized by ODV squared. ODKurt - S (OP (x.v) - ODmean)* (25) ^ (n - 1)(0DV)4 2.4.1. Markovian Texture Analysis The following density distribution measures not only describe the frequency of occurrence but also the spatial organization of the density distributions and are usually referred to as texture features. These texture features can be further broken down into continuous and discrete categories. The continuous texture features describe the spatial density distribution s t a t i s t i c a l l y . Markovian analysis of the spatial density distribution of the nuclei was chosen since i t is util i z e d by many other researchers in this f i e l d and has demonstrated encouraging 3 8 results. The Markovian analysis of texture in images treats images as stochastic processes. This analysis yields matrices of gray level transition probabilities which usually require substantial computation 6 8 time and memory space. The various texture parameters are usually 6 8 , calculated from such matrices; however, Unser has demonstrated that i t is possible to calculate equivalent parameters using similar methods which require less memory and computational time. It is this analysis which is used in this work. For the sake of understanding, both definitions w i l l be given for each feature. 67 The co-occurrence matrix required for the Markovian analysis is defined as the matrix, P ( i , j ) , where i and j range over the various gray levels present In the image. P(i,j) is the conditional probability of a pixel of gray level i occurring next to (8-connectedness) a pixel of gray level j . fl 8 In the sum and difference (SD) method, P s ( i ) is the probability of neighboring pixels having gray levels which sum to i and P d(j) is the probability of neighboring pixels having a gray level difference of j . The intensity range of the images is 0 to 255 which would result in the calculation of « 65,000 conditional probabilities for the Markovian co-occurrence matrix or «» 1000 conditional probability calculations for the sum and difference textural analysis. Most of the objects contain =* 200 pixels (- 800 different neighbor interactions) which can be a very sparse distribution from which to calculate the 1000 (SD) or 65,000 (Markovian) conditional probability values. For this reason and to save memory, the image intensity range levels for these texture calculations were compressed into 20 gray levels. 21) REntropy Red Entropy: is defined as 6 7 Entropy = ? ? - P ( i , j ) l o g 1 0 P(i,j) Markovian (26) 6 8 - ? - P_(i) l o g 1 0 Pc(D + ? - Pd(j) logio Pd(j) SD I s J where the conditional probabilities are defined by that data in the red image. A large value for the entropy represents a nucleus for which there is l i t t l e spatial or gray scale distribution organization. 22) GEntropy Entropy of the nucleus in the green image. 23) BEntropy Entropy of the nucleus in the blue image. 24) Energy Energy: is defined as 2 67 Energy = ? ? P ( i , j ) Markovian (27) 2 2 68 - ? P_ (i) + S P d (j) SD where the conditional probabilities are defined by the red image data. A large value of the energy parameter represents a nucleus which has a spatially organized gray scale distribution. Almost the direct opposite of feature 24. 25) Contrast Contrast in the red image is defined by 2 , 6 7 Contrast ( i - j ) P(i.j) Markovian (28) 2 68 - ? j P d(j) SD A nucleus with a large contrast has large gray scale variations at high spatial frequencies. 26) Correlation Correlation for the red image is defined by 6 7 Correlation - Z E (i-/i)(j-p) P(i,j) Markovian (29) = 1/2[ Z ( i - 2 / i ) 2 P (i) - Z j 2 P d(j)] SD 8 i J where m is the mean intensity value of the nucleus under consideration. A large value for the correlation parameter indicates a nucleus in which there are large connected areas with the same gray level and that the gray level difference between adjacent areas is large as well. 27) Homogeneity Homogeneity of the red image is defined by 2 67 Homogeneity - Z S (1/(1 + ( i - j ) ) P(i,j) Markovian i J es ( 3 0 ) - Z (1/(1 + j ) 2 ) P d(j) SD A large value for the homogeneity parameter indicates that in the nucleus the spatial gray level variation is slight and spatially smooth (low spatial frequency). 28) Cluster-Shade Cluster-Shade of the red image is defined by 3 6 7 Cluster-Shade - Z 2 (i+j-2/i) P(i,j) Markovian (31) - Z (i-2/i) 3 P (I) SD 6 8 l s Since the cluster shade can be thought of as the 3 moment or skewness of the sum probability distribution in this work i t is standardized by dividing the above cluster shade by the second moment, M2, of the sum distribution to the 3/2 power M2 = Z (i-2/i) 2 P (i) (32) l s Thus the actual formula used to find cluster shade is Cluster Shade = S (i-2/i) 3P 0 (i) (33) 3 12 (M2) / A large absolute value for the cluster shade parameter indicates that in the nucleus there are a few distinct (uniform intensity within) clumps with a large contrast between the clumps and the rest of the nucleus. A negative value indicates dark clumps on a bright background and a positive value indicates bright clumps on a dark background. 29) Cluster Prominence Cluster prominence in the red image is defined by Cluster Prominence = ? ? (i+j-2/i) P(i,j) Markovian 6 7 - S (i-2/i) P s(i) SD Since the cluster prominence is equivalent to the 4th moment of 2 the P s(i) distribution i t is standardized by dividing i t by M2 . Thus the actual formula used to calculate cluster prominence is Cluster Prominence - S (i-2/i) 4 P_(i) 1 — - — — (35) (M a ) Attaching a simple non-statistical interpretation to this measure would probably be more deceiving than useful. Another set of features which measure the spatial distribution of the intensity variations within the nucleus is based upon there existing some form of self similarity in the spatial density distribution when i t is examined at different scales. For this feature the distribution of the optical density values (hence the DNA) is assumed to be similar to 6 9 that of a fractal. For a good definition of fractals see Mandelbrot. In measuring an object which is assumed to have the properties of a fractal one is looking for a predictable increase in the measured property as one examines the object at finer and finer scales. 70 Similar to the work done by C. Caldwell on mammograms we interpret the optical density values of the individual pixels to represent the height of the pixel in a three-dimensional space. The property measured is the surface area of the thus defined three-dimensional surface. 30) FAl Fractal Area Scale 1: i s the surface area of the generated nuclear three-dimensional surface from the nuclear optical density data in the red image. Each pixel is assumed to have a unit area and the rectangle joining adjacent pixels in the three-dimensional space i s assumed to have an area proportional to i t ' s height. FAl -2 pixel areas +2 joining rectangle areas (36) 31) FA2 Fractal Area Scale 2: is the surface area of the generated nuclear three-dimensional surface where squares of four pixels in the original image are averaged to produce a single new pixel thus reducing the scale and the size of the image by a factor of 2. FA2 -2 pixel areas +2 joining rectangle areas (37) Figure 16 depicts how FAl and FA2 are determined. 32) FD Fractal dimension: assuming that the intensity distribution exhibits fractal behaviour then the fractal dimension for the image of the nucleus can be defined as FD - (log 1 0FAl - log 1 0FA2)/log 1 02 Mandelbrot 6 9 (38) Assigning a mass to each pixel depending upon i t s optical density, i t is possible to calculate the center of mass of the nucleus. This 60 Gr I d 1 Gr i d 2 Each square is one pixel F A l average of 4 pixels Nuclear OD values as a 3-Di mens I onal surface T o Io p H r e a FIGURE 16: DETERMINATION OF AREAS FAl AND FA2 FOR THE FRACTAL DIMENSION CALCULATION Grid 1 depicts an array of pix e l s i n which a nucleus i s located. In g r i d 1 the nuclear p i x e l s are shaded. The shaded squares i n g r i d 2 are those squares of four p i x e l s which are e n t i r e l y contained within the nucleus. The corresponding p i x e l s i n g r i d 1 are also darkly shaded. For the f r a c t a l dimension c a l c u l a t i o n only the darkly shaded pi x e l s (or squares i n g r i d 2 ) are used. One can imagine that these pixels (or squares) form a 3 dimensional surface, where the height of each p i x e l (or square) corresponds to i t s o p t i c a l density value (or average o p t i c a l density value for g r i d 2 ) . The o v e r a l l area of the 3 dimensional surface i s calculated by summing the area of the tops and the exposed sides of the pix e l s (or squares) together. The area so calculated for the dark p i x e l s i n g r i d 1 i s FAl and the area so calculated for the dark pixels i n g r i d 2 i s F A 2 . gives rise to another measure which describes in a general fashion the distribution of the DNA in the nucleus. 33) DCM Distance to center of mass: is the distance in pixels that the center of mass of the nucleus is from the geometric center of the nucleus. 34) DenMax Density of Maxima: is the number of local maxima detected in the red image of the nucleus, divided by the area of the nucleus. The red image is spatially averaged to reduce the amount of speckle noise. Local maxima are detected when the four side adjacent pixels are less than the central pixel. 35) DenMin Density of minima: similar to DenMax except that local minima are detected when the four side adjacent pixels are greater than the central pixel. 36) ExtrRange Extrema Range: is the absolute intensity difference between the smallest minimum detected by the DenMin feature and the largest maximum detected by the DenMax feature. 37) AveRange Average Range: is the absolute intensity difference between the average minimum value detected by the DenMin feature and the average maximum value detected by the DenMax feature. 2.4.2. Discrete Texture Features Interphase chromatin i s thought to exist in a variety of condensation states. There is evidence to suggest nuclear morphological change is associated with fundamental alterations in chromatin structure and function and these changes can be observed in the chromatin 7 1 structure or DNA condensation state. Thus i t is possible and reasonable to divide the nucleus into different sections corresponding to the different states of the chromatin in the nucleus. It is assumed that the non-condensed chromatin is located in the light areas of the nucleus, that the condensed chromatin is located in the dark areas of the nucleus. Further, the condensed chromatin area can be subdivided into medium and highly condensed sections. Since some benign cells such as leukocytes are normally hyperchromatic,* i.e. their DNA exists in a highly condensed chromatin state, their optical density distribution characteristics generate good reference values for determining the three classes of chromatin condensation. The average mean optical density and average variance of optical density of several leukocytes from the same slide as the cells to be analyzed can be used to determine the optical density threshold, ODMID, separating the high and medium chromatin states. This threshold and the average variance of the optical density distributions of the leukocytes may be used to determine the threshold IBND, separating the medium and low density chromatin. In this work 8 to 15 leukocytes from each slide were analyzed and the mean integrated optical density of the leukocytes found. This value was used to standardize the ODMID and IBND thresholds. For a 15 slide sample set ODMID and IBND were experimentally determined for each slide. The relationship between the mean IOD value of the leukocytes and the experimentally determined ODMID and IBND thresholds was calculated. This relationship was used to determine ODMID and IBND thresholds for the rest of the slides. Thus low density chromatin is defined to exist in the areas where the optical density ranges from 0 to IBND, medium density chromatin in the areas where the optical density ranges from IBND to ODMID, and high density chromatin in the areas where the optical density is above ODMID. Once the boundaries have been determined one can extract several features which describe the shape, intensity and spatial distribution of the various nuclear sections. Figure 17 shows a nucleus divided into sections as described above. 7 2 38) TARL Total area ratio for low density chromatin: is the area occupied by the low density sections in the nucleus divided by the area of the nucleus. 39) TARM Total area ratio for medium density chromatin. 40) TARH Total area ratio for high density chromatin. 41) TERL Total extinction ratio for the low density 7 2 chromatin: is the integrated optical density of the low density sections divided by the integrated optical density of the nucleus. 42) TERM Total extinction ratio for medium density chromatin. 43) TERH Total extinction ratio for high density chromatin. 7 3 44) NL Number of low density chromatin clusters: this is the number of distinct groups of low density chromatin pixels in the nucleus. 45) NM Number of medium density chromatin clusters. 46) NH Number of high density chromatin clusters. 47) NLS Number of low density chromatin single pixel , 73 clusters. 48) NMS Number of medium density chromatin single pixel clusters. 64 FIGURE 17: DIVISION OF NUCLEUS INTO AREAS OF DIFFERENT CHROMATIN CONDENSATION STATES C h r o m a t i n i n n u c l e i i s b e l i e v e d to e x i s t i n v a r i o u s c o n d e n s a t i o n s t a t e s ( low, medium and h i g h ) . From l e u k o c y t e s p r e s e n t on the same s l i d e as the n u c l e i b e i n g i n v e s t i g a t e d i t i s p o s s i b l e to d e t e r m i n e o p t i c a l d e n s i t y t h r e s h o l d s w h i c h w i l l s e p a r a t e the n u c l e i i n t o low d e n s i t y c h r o m a t i n , medium d e n s i t y c h r o m a t i n and h i g h d e n s i t y c h r o m a t i n . A) i s a g r a y s c a l e image o f a c e r v i c a l c e i l i n w h i c h the c y t o p l a s m i c and n u c l e a r b o u n d a r i e s have been d e l i n e a t e d . B) i s a r e p r e s e n t a t i o n o f the c h r o m a t i n s t a t e s i n the n u c l e u s o f A) . C) i s a m a g n i f i e d v e r s i o n o f image B), i n w h i c h the c h r o m a t i n i n the h i g h c o n d e n s a t i o n s t a t e has been l a b e l l e d b l a c k , the medium c o n d e n s a t i o n s t a t e c h r o m a t i n has been l a b e l l e d d a r k g r a y , and the low c o n d e s n a t i o n s t a t e c h r o m a t i n has been l a b e l l e d l i g h t g r a y . 49) NHS Number of high density chromatin single pixel clusters. While the thresholds IBND and ODMID divide the nucleus into various sections i t may be useful to know the average contrast between the various sections. 72 50) MAER Medium average extinction ratio: which is the ratio of the mean optical density of the medium density chromatin divided by the mean optical density of the low density chromatin. 51) HAER High average extinction ratio: same as above but for the high density chromatin. 52) MHAER Medium and high average extinction ratio: is the ratio of the mean optical density of the medium and high density chromatin sections grouped into one section divided by the mean optical density of the low density chromatin. 72 53) CL Compactness of the light density chromatin areas: i s similar to the compactness measure for the entire nucleus, NComp. In this case 2 CL - (Number of edge pixels for the lipht density areas) (39) 4JT (total area of the light density pixels) 54) CM Compactness of the medium density chromatin areas. 55) CH Compactness of the high density chromatin areas. 56) CMH Compactness of the medium and high density chromatin areas where the medium and high density areas are considered to be of the same type. 57) ADL Average distance of the light density chromatin from 7 2 the nuclear center: is the average separation between the individual light pixels and the center of the nucleus normalized by the mean radius, NMeanR, of the nucleus. 58) ADM Average distance of the medium density chromatin from the nuclear center. 59) ADH Average distance of the high density chromatin from the nuclear center. 60) ADMH Average distance of the medium and high density chromatin from the nuclear center. 61) SCLCN Separation between the center of the light density chromatin and the centre of the nucleus: where the separation has been normalized by the average radius of the nucleus. 62) SCMCN Separation between the center of the medium density chromatin and the center of the nucleus. 63) SCHCN Separation between the center of the high density chromatin and the center of the nucleus. 2.4.3. Postprocessing Once a few of the more obvious features have been calculated, the various nuclear like objects in the images are interpreted by an artifact removal routine before the more involved features are determined. This routine considers holes in objects to be artifacts and f i l l s them in. Objects which are irregularly shaped (determined heuristically from the NRVar/NmeanR, NComp, and NIert features), too small (determined heuristically from the NA and NMeanR features) , or do not contain enough DNA material (determined heuristically from the IODR feature) are discarded as artifacts. Thus artifacts are considered to be objects which e x h i b i t some but not a l l of the c h a r a c t e r i s t i c s of n u c l e i . 2 .5 . C e l l C l a s s i f i c a t i o n The s e l e c t i o n of c e l l s and t h e i r manual c l a s s i f i c a t i o n as ei t h e r normal or abnormal was performed by the author and a second manual c l a s s i f i c a t i o n performed by Dr. H. Tezcan, the M.D. responsible f o r the development of the s t a i n i n g and deposition procedure u t i l i z e d i n t h i s work. Any c e l l s f o r which the two manual c l a s s i f i c a t i o n s d i f f e r e d were discarded. A c e l l by c e l l c l a s s i f i c a t i o n into the various grades of abnormality was not made, keeping the number of groups to be discriminated to two. The method used to automatically c l a s s i f y the c e l l s based upon t h e i r features was l i n e a r stepwise discriminant function analysis. The analysis was performed by a commercially a v a i l a b l e program BMDF (BioMedical Data Processing). The routine used i s re f e r r e d to as, 7M, stepwise discriminant function analysis. This routine performs a discriminant analysis between two or more groups. The features used f o r the computation of the l i n e a r c l a s s i f i c a t i o n functions were chosen i n a stepwise fashion. For more information on l i n e a r discriminant function analysis and on the BMDP implementation, one should see the BMDP 7 4 7 5 7 6 S t a t i s t i c a l Software manual and other references. ' A b r i e f summary of discriminant function analysis and the assumptions t h i s analysis makes about the d i s t r i b u t i o n , s i z e , and form of data follows. Discriminant function analysis finds the hyperplane(s), i n the N-dimensional space defined by the feature data used, X , which best separates the two or more groups. See f i g u r e 18 f o r a two-dimensional example. The discriminant function analysis a c t u a l l y finds a discriminant function f o r each group. These discriminant functions are l i n e a r weighted combinations of the feature data. n f x ( X ) -S W n X t + C 0 1 (40) i - 1 where f x ( X ) i s the discriminant function f o r the f i r s t group, n i s the number of features i n the discriminant function; W^ ^ i s the weight assigned to feature f o r the f i r s t group, and C 0 1 Is a constant. For the two group case, two discriminant functions are found, i . e . f x ( X ) and f 2 ( X ) . In our case f x ( X ) i s the discriminant function fo r the normal c e l l s and f 2 ( X ) i s the discriminant function f o r the abnormal c e l l s . Using the N features of a c e l l , the two discriminant functions can c l a s s i f y the c e l l . I f f i ( X ) - f 2 ( X ) > P where P i s a constant, then the c e l l i s c l a s s i f i e d as a normal c e l l , else i t i s c l a s s i f i e d as an abnormal c e l l . In t h i s fashion the two discriminant functions may be used to automatically c l a s s i f y a l l the c e l l s . TABLE 1 Discriminant Function C l a s s i f i c a t i o n Table Automatic C l a s s i f i c a t i o n Manual C l a s s i f i c a t i o n Normals Abnormals Normals a b Abnormals c d In the previous table, b represents the number of f a l s e p o s i t i v e s (normal c e l l s which have i n c o r r e c t l y been automatically c l a s s i f i e d as 69 Feature 1 FIGURE 18: A TWO DIMENSIONAL EXAMPLE OF GROUP SEPARATION USING LINEAR DISCRIMINANT FUNCTION ANALYSIS Each e l l i p s e i n this figure bounds a region i n which a c e r t a i n proportion of a l l the members of one of the groups w i l l f a l l , e.g. 50% or 90% of group one. In this - example, the two groups are assumed to have i d e n t i c a l covariance matrices. -Thus the set of points between the two groups for which the pr o b a b i l i t y of belonging to one group or the other i s equal i s a straight l i n e . This l i n e is shown as the li n e a r decision boundary i n the figure and would be the one dimensional hyperplane determined by l i n e a r discriminant function analysis. abnormals) and c represents the number of f a l s e negatives (abnormal c e l l s which have i n c o r r e c t l y been automatically c l a s s i f i e d as normals). Thus the f a l s e p o s i t i v e rate for the above table i s b/(b+a) and the f a l s e negative rate i s c/(c+d). The t o t a l e r r o r of c l a s s i f i c a t i o n (TEC) i s b+c/(a+b+c+d). The value of P i s u s u a l l y chosen to minimize TEC. However f o r an automated c e r v i c a l c e l l screening system the consequences of a large f a l s e negative rate are more serious than a large f a l s e p o s i t i v e rate. Consequently one would adjust the value of P to reduce the f a l s e negative rate i n such a system. The question that stepwise discriminant function analysis i s most commonly used to answer i s : Which set of N features w i l l most c l e a r l y d i s t i n g u i s h between the groups. There are several methods of f i n d i n g the best N features to use. 1) Complete subset method: Find and evaluate a l l of the possible subsets of N features. This i s a very time consuming method for large numbers of v a r i a b l e s (This method was not used). 2) Forward stepping method: F i r s t f i n d the s i n g l e feature with the maximum goodness measure (to be defined l a t e r ) . Next f i n d the feature which when paired with the f i r s t feature, maximizes the goodness measure. Continue i n t h i s fashion adding features which when grouped with the already selected features maximize the goodness measure u n t i l N features have been selected. 3) Backward stepping method: Begin with a l l features included. In a stepwise fashion, successively remove the feature that results In the least decrease in the goodness measure unt i l only N features are l e f t . 4) Full stepwise method: At each step, test the decrease in the goodness measure i f a feature were to be removed; i f the decrease is below a specified level, remove the feature . If no feature meets this c r i t e r i a then add a feature by the forward stepping method. Two goodness measures are available in the BMDP 7M routine. One is Wilk's lambda, L. I W I L (41) | V + B | where V is within the group's sums of squares and cross products matrix, and B is between groups sums of squares and cross products matrix. This measure can be converted to an approximate F - ratio to test group differences. The other is the conditional F ratio. This is the univariate F ratio, associated with a particular feature, once the variation in the feature data due to the already entered features has been removed using a multivariate linear regression. Some of the assumptions and requirements of linear discriminant function analysis are: 1) A large data set with approximately 10 - 20 data points (cells) per feature used in the discriminant function analysis. 7 6 2) That the group data for each feature is normally distributed, i.e. the feature distribution for each group is 76 multivariate normal. 3) That the variance and covariance of the features in each 7 6 group be the same. See figure 19 for a graphic example. 4) That the number of data points (cells) in each group in the data set used to calculate the discriminant functions are approximately the same, i.e. group a/group b - less than 1.5 7 6 or greater than 2/3. The last requirement is not as strong i f the f i r s t three c r i t e r i a are met. If the f i r s t criterion is not met then the discriminant functions generated may not obtain the same performance on the other sets of data as they do on the learning set used to generate them. If the second criterion is not met then the features selected by the stepwise discriminant function analysis may not be the optimal ones and the classification power assigned to each feature may be inaccurate. If the third criterion is not met then the hyper-plane is no longer the optimal surface to divide the two groups and a quadratic or higher order surface would decrease TEC. The affects of violating one or more of the f i r s t three c r i t e r i a can be mitigated somewhat i f the sample sizes of the groups used are large and close to being the same. 73 B) FIGURE 19: OPTIMAL SEPARATION BOUNDARIES FOR GROUPS WITH DIFFERENT COVARIANCE , MATRICES In these two examples, the covariance A) and the variance B) of the features i n each group are not the same. Consequently the optimal separation boundary is no longer a hyperplane (straight l i n e i n 2D case) but a quadratic surface, represented as the quadratic decision boundaries i n figures A) and B). 3. RESULTS 3.1 Segmentation Results Segmentation is one of the most important tasks an automated analysis system must perform. The accuracy of the segmentation directly affects the results of: shape descriptors, many of the optical density distribution features, and indirectly affects the various texture features. Thus the objectives are: 1) to determine the accuracy of the various segmentation procedures against a visual standard (human interpretation of the cellular boundaries); 2) to determine the dependence of the various features on segmentation accuracy; 3) to determine the dependence of cellular classification upon feature accuracy; and 4) to test the hypothesis that by using quantitative cellular descriptors to classify the cells one can match the performance of a skilled cytology technician. The determination of the accuracy of the various segmentation procedures was done in three steps. The f i r s t step evaluated the segmentation accuracy of several of the possible segmentation procedure combinations on the small database of 150 images. The second step evaluated the segmentation accuracy and effect on feature calculations using the larger 1000 image database. The third step evaluated the best procedure from steps 2 and 3 on the 3680 image database. Of the multitude of possible segmentation procedures, i.e. algorithm 1 followed by algorithm 8 followed by algorithm 9 with postprocessing, 32 different combinations were evaluated for segmentation accuracy on the database of 150 images. The results of this evaluation are shown in Table 2. A nucleus was considered correctly segmented only i f no area of the nucleus could be visually TABLE 2 Comparison of Segmentation Procedures on a Database of 150 Images Secondary Primary Segmentation Algorithm A d d i t i o n a l Segmentation Procedures Algorithm 1 2 3 4 6 Tested None Cytoplasm 97.4% 96 .0% 90.0% 89.3% 74.7% 93.3% f o r Nucleus 24. 7X 24 .7% 40.7% 15.3% 19.3% Algorithm 5 7 Nucleus 28.7% 34 .7% 46.7% 33.3% NA 8 Nucleus 30.0% 40 .7% 48.7% 19.0% NA 9 Nucleus 78.7% 81 .3% 78.0% 66.7% 68.7% Algorithms 3 and 7 3 and 8 Post Processing Nucleus 75.3% 82 .7% 82.7% 80.0% 33.3% 86.7% 83.3% 9 + Post Processing 1+9 and Nucleus 89.3% 92 .0% 92.0% 92.0% 78.7% 94.0% A l l r e s u l t s represent the % of the images which were c o r r e c t l y segmented. Secondary segmentation algorithm indicates a segmentation method which refines a previously generated nuclear segmentation. NA not appl i c a b l e secondary algorithm requires a threshold. The accuracy of a l l measurements i s ± 7%. found incorrectly segmented. See figure 20 for examples of correctly segmented nuclei and figure 21 for examples of incorrectly segmented nuclei. While the criterion for nuclear segmentation was very stringent, the cytoplasm had to be relatively poorly segmented in order to be considered incorrectly segmented. A segmentation error which involved less than 10% of the area of the cytoplasm was s t i l l considered correctly segmented. The definition of the extent and shape of the cytoplasm is much less c r i t i c a l than the definition of the nucleus to the classification of the c e l l . The cytoplasm of a lot of the cervical cells examined was larger than the 128 x 120 pixel array used to store the cervical c e l l images. Thus the value of the cytoplasmic area was determined by using the three-dimensional thresholding algorithm to segment the cytoplasm over a much larger, 384 x 360 pixel area centered on the original 128 x 120 area. The cytoplasmic area so determined, which belonged to the cervical c e l l in the 128 x 120 area was saved along with the RGB 128 x 120 pixel images. It is this area which is used as the CA feature. To reduce the number of artifacts for which features would need to be calculated and to reduce the effect of artifacts on the determination of the discriminate function to be used for the classification of the individual ce l l s , only those segmentation procedures which included postprocessing were used on the second database of 1000 images. Also tested on the database of 1000 images were the segmentation procedures of: algorithm 3 followed by algorithm 7 followed by algorithm 9 and postprocessing, and algorithm 3 followed by algorithm 8 followed by algorithm 9 followed by postprocessing. When one includes the results 77 The cytoplasm i n t e n s i t y values shown are from the blue images. The nuclear i n t e n s i t y values are from the red Images. 7,3 of algorithm 6, 16 different algorithms were tested on the database of 1000 images. A summary of the segmentation accuracy for these 16 procedures on the 1000 image database can be seen in Table 3. One should note that due to a loss of image data, four of the procedures tested in Table 3 were done using only 800 images (due to a loss of image data from the back-up media). To better define the accuracy of the best of the nuclear segmentation procedures used in Tables 2 and 3 the procedure of primary segmentation method 1 followed by two iterations of the secondary segmentation algorithm 9 plus post-processing was used to segment the large 3680 image database. The results are shown in Table 4. 3.2 V a r i a t i o n of Features To examine the variation of features due to the different segmentation procedures the coefficient of variation for each feature was calculated. The coefficient of variation i s defined as the standard deviation divided by the mean. For each nucleus in each image the mean and the standard deviation for each feature was calculated across a l l the procedures which segmented the nucleus. Thus for each nucleus the coefficient of variation (CV) was calculated for every feature. This was done for a l l of the images in the 1000 image database. The CV results were collected into six different groups based upon c e l l type and segmentation success information. An average CV for each feature in each group was calculated from the group data. The six groups are: 1) A l l c e l l types, regardless of segmentation outcome. 2) Normal cells only, regardless of segmentation outcome. TABLE 3 Comparison of Selected Segmentation Procedures on a Database of 1000 Images Segmentation Algorithms ( a l l include Post Processing) Secondary Primary Additional Procedures 1 2 3 4 6 Tested Algorithm None Nucleus 88.6% 85 .4X 89.4X 84.6X 52. ,5X 5 Cytoplasm 98.5X 94 .3X 97. IX 97. IX 95. OX 97. 2X 1 & 9 9 Nucleus 97. ex 97 .6X 97.7X 98. OX 91. .5X 98. OX 7 Nucleus 91.5X* 8 Nucleus Not tested 67.OX* Not tested 7 and 9 Nucleus 94.IX* 8 and 9 Nucleus 93.3X* A l l results represent the numerical percentage of the images which were correctly segmented. *These results were derived using an 800 image subset of the 1000 image database. The accuracy of a l l measurements is ± IX. TABLE 4 Segmentation Performance of Simple 2D Histogram Analysis Followed by Two Iterations of the Edge Relocation Algorithm Plus Postprocessing Images from Images from Total Image Normal Samples Dysplastic Samples Database of (Total 1671) (Total 2009) 3680 Images # of nuclei 1640 1977 3617 correctly segmented % of nuclei correctly 98.IX 98.4X 98.3X segmented. 3) Abnormal cells only, regardless of segmentation outcome. 4) A l l c e l l types, but only those nuclei which were correctly segmented by a l l 15 segmentation procedures. 5) Normal cells only, consisting of only those nuclei which were correctly segmented by a l l 15 segmentation procedures. 6) Abnormal cells only, consisting of only those nuclei which were correctly segmented by a l l 15 segmentation procedures. The last three groups were included to demonstrate how much features can vary even when the segmentation appears to be correct. They may also be used as an estimate of how much the features could vary given that different individuals segment the images or the same individual was to segment the same image multiple times. Since there were 63 features examined in this way the results have been spread over three tables; Table 5, Table 6 and Table 7. From a visual examination of some of the more logical combinations of the previously described features, four more features were added to the feature space. These features are: . 64) Ratio: The ratio of the nuclear area divided by the cytoplasmic area of the c e l l (NA/CA). (42) 65) HArea: High density chromatin area (NA*TARH). (43) 66) DNum: Density number: is the difference between the number of local minima and the number of local maximal detected in the red image of the nucleus (NA*[DenMin-DenMax]). (44) 67) ODMean: Optical density mean value of the nucleus (IODR/NA). (45) Most of the calculated features discriminate to some extent between the two groups. The group means for each feature were compared TABLE 5 Variation of the Shape Features and Some Texture Features among the IS Segmentation Procedures Coefficients of Variation for Features (in %) Both Groups Normal Nuclei Abnormal Nuclei Both Groups Normal Nuclei Abnormal Nuclei Both Groups Normal Nuclei Abnormal Nuclei Both Groups Normal Nuclei Abnormal Nuclei" Both Groups t Normal Nuclei Abnormal Nuclei Both Groups Normal Nuclei Abnormal Nuclei NA NComp NInert NMeanR NMaxR NRVar NEllong 9.0 5.1 1.2 4.5 5.7 28.1 3.2 7.2 4.4 1.0 3.7 4.8 24.9 2.9 10.0 5.6 1.3 5.0 6.3 30.0 3.3 5.9 2.7 0.6 3.0 3.0 13.6 2.2 5.9 2.8 0.6 3.0 3.2 14.3 2.1 5.8 2.6 0.7 2.9 2.9 13.0 - 2.2 NBdycrc NBdyfin RMeanI GMeanI BMeanI FD FAl 48.6 43.4 6.3 6.7 6.0 1.0 16.8 53.4 35.9 4.9 6.4 7.6 1.0 13.6 45.8 47.7 7.2 6.8 5.1 1.1 18.6 30.9 30.7 4.3 4.7 4.6 0.8 12.2 34.7 27.7 4.2 5.6 6.8 0.9 11.4 27.0 33.8 4.3 3.7 2.3 0.7 12.9 FA2 DCM 17.8 29.1 13.4 23.8 20.3 32.2 12.2 22.5 10.9 18.6 13.5 26.4 Results for images segmented. Results for only those images which were correctly segmented by a l l of the 15 procedures. TABLE 6 Variation of TARL Both Groups 21.2 Normal Nuclei 1 17.8 Abnormal Nuclei 23.3 Both Groups 17.8 Normal Nuclei 2 15.8 Abnormal Nuclei 19.8 i CM Both Groups 1.5 Normal Nuclei 1.7 Abnormal Nuclei 1.4 Both Groups 0.7 Normal Nuclei 0.8 2 Abnormal Nuclei 0.7 l MAER Both Groups 14.4 Normal Nuclei 11.0 Abnormal Nuclei 16.4 Both Groups 11.2 Normal Nuclei 10.5 2 Abnormal Nuclei 11.9 t NMS Both Groups 6.5 Normal Nuclei 7.3 Abnormal Nuclei 6.0 Both Groups 3.1 Normal Nuclei 5.1 2 Abnormal Nuclei 1.1 Discreate Texture Features among Coefficients of Variation for , TARM TARH TERL 7.3 7.3 14.1 6.7 7.2 12.5 7.7 7.3 15.0 5.7 5.6 11.3 5.9 6.1 10.2 5.5 5.1 12.5 CH CMH ADL 1.1 1.2 4.4 1.1 1.5 3.8 1.0 1.0 4.7 0.4 0.5 3.1 0.6 0.6 3.5 0.3 0.4 2.8 HAER MHAER NL 14.1 14.5 41.2 11.1 11.1 37.7 15.9 16.4 43.2 11.0 11.2 40.0 10.5 10.5 38.9 11.6 11.9 41.0 NHS SCLCN SCMCN 1.4 36.5 17.1 2.1 36.8 18.0 1.0 36.4 16.6 1.4 35.2 15.2 1.1 35.7 17.3 1.8 34.6 13.0 15 Segmentation Procedures tures (in X) TERM TERH CL 2.7 2.5 16.5 2.8 2.9 17.1 2.6 2.2 16.2 1.9 1.7 14.6 2.1 2.0 16.1 1.8 1.4 13.1 ADM ADH ADMH 4.6 4.6 4.6 4.2 4.5 4.2 4.9 4.6 4.9 3.4 3.3 3.4 3.8 3.9 3.8 3.0 2.8 3.0 NM NH NLS 2.1 1.3 74.0 2.9 1.4 64.5 1.7 1.2 79.5 1.7 0.6 65.5 1.9 0.8 57.0 1.4 0.5 74.1 SCHCN 18.5 18.1 18.7 13.3 13.7 12.9 2Results for images segmented. Results for only those images which are correctly segmented by a l l of the 15 procedures. 00 TABLE 7 Variation of Continuous Texture Features among the 15 Segmentation Procedures Both Groups Normal Nuclei Abnormal N u c l e i Both Groups Normal Nuclei Abnormal Nuclei" Both Groups Normal Nuclei Abnormal N u c l e i Both Groups Normal Nuclei Abnormal. Nuclei" Both Groups Normal Nuclei Abnormal Nuclei Both Groups Normal Nuclei Abnormal Nuclei Coefficients of Variation for Features (in %) IODR IODG IODB ODMax ODVar ODSkew 2.9 79.1 121.0 0.4 18.5 61.0 2.2 26.0 57.0 0.4 15.3 62.9 3.2 110.8 158.3 0.4 20.3 59.8 1.7 26.1 60.6 0.2 14.8 39.2 1.5 27.3 41.3 0.2 14.2 47.7 1.9 24.8 80.3 0.2 15.5 30.5 REntro GEntro BEntro Energy Correlation Contrast 2.1 2.0 2.5 12.1 19.6 9.5 1.0 1.9 2.4 10.3 16.7 9.2 2.2 2.0 2.6 13.1 21.2 9.6 1.8 1.6 2.1 8.7 15.5 8.2 1.8 1.6 2.1 8.9 14.6 8.3 1.8 1.6 2.0 8.6 16.4 8.0 Cluster Cluster Shade Prominence DenMax DenMin ExtrRange AveRange 8.1 258.9 123.2 9.5 27.7 22.3 7.1 100.1 74.7 8.4 17.1 14.2 8.6 351.4 151.4 10.2 34.0 27.0 6.1 243.0 79.3 7.6 16.9 12.9 6.2 106.2 59.9 6.7 12.7 9.7 5.9 382.6 102.2 8.5 21.1 16.0 ODKurt 8.4 7.2 9.1 6.4 6.4 6.3 Homogeneity 2.4 2.5 2.3 1.9 2.1 1.7 2Results for images segmented. Results for only those images which are correctly segmented by a l l of the 15 procedures. OO U l 86 using two-sample t tests, with and without assuming equality of variances between the two groups. The equality of variance of each 7 7 group for each feature was also tested using the Levene W test. When the feature distributions in each group are not normal (which is in fact the usual condition for the abnormal group) and i f the normal and abnormal group variances are significantly different, then the accuracy of the test is questionable. For this reason a nonparametric s t a t i s t i c ; 7 8 the Mann-Whitney rank-sum test, was used. This test is the nonparametric version of the two-sample test for independent groups. Only the following features had group variances which were somewhat similar (p values >- 0.05); CA, REntropy, BEntropy, TARL, TERL, TERH, CM, CMH, MAER, NH, NHS, ODKurt, Energy, DCM, SCMCN. Of these features only the group means of NH, NHS, and ODKurt were not s t a t i s t i c a l l y (p - <0.005) different. For the Mann-Whitney rank-sum test, the means of the following features were not s t a t i s t i c a l l y different (p > 0.005); CM, CH, CMH, MHAER, NLS, NHS, ODKurt, Correlation, Cluster Prominence, FD, SCLCN, and SCMCN. . While i t is not feasible to show histograms of the two groups for each feature, figures 22, 23, 24, 25, 26, 27, 28, 29 and 30 show the histograms for some of the more commonly used and interesting features. 3.3 Variation of Classification For each segmentation procedure used in Table 8, a l l the c e l l feature data and the human classification of the individual cells were grouped together. This data was used to generate for each segmentation procedure a Discriminant Function (DF), which predicted the human classification based upon the cellular feature data. The program used 87 0.12 0.10-0.08 -I Nuclear Area a N o r m a l cells • A b n o r m a l cells l 1 1 1 1.0 1.5 2.0 2.5 Area (pixels) x 103 FIGURE 22: DISTRIBUTION OF THE NUCLEAR AREA OF NORMAL AND ABNORMAL CELLS From t h i s graph i t i s cl e a r that n u c l e i of normal c e l l s are generally smaller than abnormal c e l l s , however there i s a s i g n i f i c a n t amount of overlap between the two groups. 88 0.14 0.12 0.10-1 0.08 cu §f 0.06-1 0.04 0.02 0.00 f l Cytoplasm Area a N o r m a l cells • A b n o r m a l cells 6 Area (pixels) x 104 FIGURE 23: DISTRIBUTION OF CYTOPLASMIC AREA OF NORMAL AND ABNORMAL CELLS Normal c e l l s generally have more cytoplasm than do abnormal c e l l s . However, the two histograms s i g n i f i c a n t l y overlap. 0.30 0.25-1 rr> 0.20 A NA/CA Ratio • N o r m a l cells • A b n o r m a l cells 1 1 1 1 1 1 1 1 1 1 " i i T p i 1 1 1 1 1 1 1 1 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Ratio (no units) FIGURE 24: DISTRIBUTION OF NA/CA RATIO FOR NORMAL AND ABNORMAL CELLS The NA/CA Ratio feature discriminates quite well between normal and abnormal c e l l s as demonstrated by the histograms i n t h i s graph. However, there i s s t i l l a s i g n i f i c a n t area of overlap between the two histograms. 90 O cu CU fe 0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 Nuclear IOD i ! • N o r m a l cells • A b n o r m a l cells 0.0 0.2 0.4 0.6 0.8 1.0 1.2 IOD x 103 FIGURE 25: DISTRIBUTION OF THE NUCLEAR IOD OF NORMAL AND ABNORMAL CELLS In this graph the distribution of the normal cells is sharply peaked around the diploid peak at 120, with a few tetraploid cells indicated by the much smaller normal c e l l peak at 260. The abnormal c e l l IOD distribution also has characteristic diploid and tetraploid peaks. The small amount of overlap between these two distributions indicates that nuclear IOD is a very discriminating feature. Compactness (unitless) FIGURE 26: DISTRIBUTION OF COMPACTNESS RATIO OF NORMAL AND ABNORMAL CELLS There i s only a very s l i g h t difference between the compactness r a t i o d i s t r i b u t i o n s of normal c e l l s and abnormal c e l l s , i n d i c a t i n g that t h i s feature by i t s e l f i s not that discriminating but might be i f used along with other features. 0 CD fe 0.14 0.12-0.10-0.08-0.06 0.04 0.02-0.00 } • " |H« -20 -10 0 DNum • N o r m a l cells • A b n o r m a l cells n f l T L - n rJ1, 20 30 40 50 60 70 DNum (number of points) FIGURE 27: DISTRIBUTION OF DNum FOR NORMAL AND ABNORMAL CELLS DNum i s a non-linear combination of features found i n the l i t e r a t u r e which appears i n t h i s graph to be f a i r l y d i scriminating between normal and abnormal c e l l s . Also of note from t h i s graph are the shapes of the two histograms. Of a l l of the reasonably discriminating features DNum is the only one for which the feature d i s t r i b u t i o n f o r each group tends to be that of a normal d i s t r i b u t i o n . I X 9 8 7-6 5 4 3 2-1-1 Intensity Correlation i n Nucleus o II 11 a N o r m a l cells • A b n o r m a l cells 0 2 4 6 8 10 12 14 Correlation (no units) FIGURE 28: DISTRIBUTION OF THE MARKOV TEXTURE FEATURE CORRELATION FOR NORMAL AND ABNORMAL CELLS In this graph c o r r e l a t i o n does not appear to be very discriminating between normal and abnormal c e r v i c a l c e l l s . When used i n discriminant function analysis with other features, however, c o r r e l a t i o n i s one of the most disc r i m i n a t i n g features. Total Area Ratio High Density C h r o m a t i n 0.14 6 J 0.12-0.10 I 0.08 cu fe a N o r m a l cells • A b n o r m a l cells TARH (no units) 1.0 FIGURE 29: DISTRIBUTION OF THE DISCRETE TEXTURE FEATURE TARH FOR NORMAL AND ABNORMAL CELLS In t h i s graph TARH does not appear to discriminate between normal and abnormal c e l l s . Also of note i s the non-normal d i s t r i b u t i o n of this feature. This i s the case f o r most of the discrete texture features. C\2 I X >^  CD CD fe 9 8-7-6-5-4 3^ 2 1-1 0 2.0 Fracta l Dimension a N o r m a l cells • A b n o r m a l cells 2.2 2.4 3.0 FD (dimension number) FIGURE 30: DISTRIBUTION OF THE FRACTAL DIMENSION FEATURE FOR NORMAL AND ABNORMAL CELLS The f r a c t i a l dimension feature does not appear to be able to discriminate between normal and abnormal cervical c e l l s . The distribution of this feature for both normal and abnormal cells seems to be the same and close to that of a normal distribution. When used in discriminant function analysis i t was found to be moderately discriminating. It is also worth noting that a l l the values f a l l between 2.0 and 3.0 as required for a 2 dimensional Euclidian surface. to generate and test the DF was the commercial stepwise discriminant analysis package, 7M, which is part of the BioMedical Data Processing (BMDP) package available as part the University of British Columbia MTS-G computing network. The use and application of this package is 7 4 described in the BMDP Statistical Software manual. This package allows the user to, force the program to use a l l of the features in the DF, or let the program select the most discriminating features to be used in the DF, or let the user select a subset of the features to be used to form the DF. Table 8 has examples of a l l three methods. The A l l Selected columns present the classification accuracy of the DF generated for the various segmentation procedures using a l l of the available features. The BMDP Selected columns present the classification accuracy of the DF composed of features selected by BMDP. In this case, BMDP was programmed such that during forward stepping a l l the features were entered into the DF. The backward stepping was programmed so that only features which significantly contributed to the DF performance were allowed to remain in the DF. The F st a t i s t i c was used to determine i f a feature contributed significantly to the DF performance. The significance level was set to 0.5% (i.e. o: - 0.005). The Author Selected columns present the classification accuracy of the DF composed of features which upon visual examination of one-, two-and three-dimensional graphs appeared to separate the two classes. The features used were; CA, NA, NMaxR, NComp, REntropy, IODR, ODMax, ODVar, ODSkew, ODKurt, NElong, NBdycrc, DMax, DMin, Contrast, Homogeneity, Cluster Shade, DCM, TARH, Ratio, AreaH, DNum. The Regular Classification heading in Tables 8 and 9 indicates the classification accuracy of the DF on the learning set of cells and data. TABLE 8 C l a s s i f i c a t i o n Results for Various Segmentation Procedures and Combinations of Features Features used i n C l a s s i f i c a t i o n A l l Selected BMDP Selected Autbor Selected Segmentation C e l l Number Regular Jackknife Regular Jackknife Regular Jackknife Method Type of c e l l s Class. Class. Class. Class. Class. Class. 1 normal 3 0 1 9 9 , , 0 % ( 3 ) 9 8 . 3 % ( 5 ) abnormal 5 0 5 9 8 , . 2 ( 9 ) 9 6 . 4 ( 1 8 ) 2 • normal 3 1 0 9 8 , . 7 ( 4 ) 9 8 , . 4 ( 5 ) abnormal 4 9 1 9 6 , . 9 ( 1 5 ) 9 5 , . 3 ( 2 3 ) 3 normal 3 0 7 9 8 , , 7 ( 4 ) 9 8 , . 7 ( 4 ) abnormal 5 0 2 9 6 . . 6 ( 1 7 ) 9 5 , . 8 ( 2 1 ) 4 normal 3 0 6 9 9 . . 0 ( 3 ) 9 8 , . 7 ( 4 ) abnormal 5 0 5 9 7 , . 6 ( 1 2 ) 9 5 , . 8 ( 2 1 ) 6 normal 3 0 0 9 8 , . 0 ( 6 ) 9 7 , . 3 ( 8 ) abnormal 4 9 2 9 7 , . 6 ( 1 2 ) 9 5 , . 5 ( 2 2 ) 3 - 7 normal 2 9 4 9 8 , 6 ( 4 ) 9 8 . . 3 ( 5 ) abnormal 5 1 5 9 6 , . 7 ( 1 7 ) 9 5 . . 3 ( 2 4 ) 3 - 8 normal 2 7 9 9 8 , . 2 ( 5 ) 9 7 , . 8 ( 6 ) abnormal 4 7 4 9 7 , . 0 ( U ) 9 5 , . 6 ( 2 1 ) ( ) denotes number of c e l l s m i s c l a s s i f i e d . 9 9 , , 0 X ( 3 ) 9 8 . .72 : ( 4 ) 9 8 , ,7% ( 4 ) 9 8 . , 3 % ( 5 ) 9 7 , . 4 ( 1 3 ) 9 7 . .0 ( 1 5 ) 9 4 , . 1 ( 3 0 ) 9 3 , , 9 ( 3 1 ) 9 9 , . 0 ( 3 ) 9 9 , . 0 ( 3 ) 9 8 , . 1 ( 6 ) 9 8 , , 1 ( 6 ) 9 6 , . 5 ( 1 7 ) 9 6 . . 3 ( 1 8 ) 9 3 . . 4 ( 3 2 ) 9 2 . . 5 ( 3 7 ) 9 8 . .7 ( 4 ) 9 8 . ,7 ( 4 ) 9 8 . . 8 ( 7 ) 9 7 . . 4 ( 8 ) 9 6 , . 0 ( 2 0 ) 9 5 , . 2 ( 2 4 ) 9 2 . . 0 ( 4 0 ) 9 1 . . 8 ( 4 1 ) 9 9 , . 0 ( 3 ) 9 9 . . 0 ( 3 ) 9 8 . . 0 ( 6 ) 9 8 , , 0 ( 6 ) 9 6 , . 6 ( 1 7 ) 9 6 . . 4 ( 1 8 ) 9 2 . . 5 ( 3 8 ) 9 2 . , 3 ( 3 9 ) 9 8 , . 3 ( 4 ) 9 8 . . 0 ( 6 ) 9 7 . . 0 ( 8 ) 9 7 . , 3 ( 9 ) 9 7 , . 0 ( 1 5 ) 9 5 . . 9 ( 2 0 ) 9 2 , , 7 ( 3 6 ) 9 2 , , 7 ( 3 6 ) 9 9 . . 3 ( 2 ) 9 9 . . 3 ( 2 ) 9 8 , . 0 ( 6 ) 9 8 . 6 ( 7 ) 9 6 . . 5 ( 1 8 ) 9 5 . . 7 ( 2 2 ) 9 2 , , 0 ( 4 1 ) 9 1 . ,7 ( 4 2 ) 9 8 . . 6 ( 4 ) 9 8 . . 2 ( 5 ) 9 7 , , 5 ( 7 ) 9 7 , , 1 ( 8 ) 9 7 . . 7 ( 1 1 ) 9 7 , . 3 ( 1 3 ) 9 2 , . 8 ( 3 4 ) 9 2 , . 4 ( 3 6 ) Class. - C l a s s i f i c a t i o n TABLE 8 (Continued) C l a s s i f i c a t i o n Results f o r Various Segmentation Procedures and Combinations of Features Features used i n C l a s s i f i c a t i o n A l l Selected Segmentation C e l l Number Normal Jackknife Method Type of C e l l s Class. Class. 1 - 9 norm 3 1 5 9 9 . . 0 % ( 3 ) 9 7 , . 8 % ( 7 ) abnorm 5 1 9 9 7 , . 3 ( 1 4 ) 9 5 , . 8 ( 2 2 ) 2 - 9 norm 3 1 5 9 8 . . 7 ( 4 ) 9 7 , . 8 ( 7 ) abnorm 5 1 8 9 6 , . 7 ( 1 7 ) 9 5 , . 2 ( 2 5 ) 3 - 9 norm 3 1 4 9 8 , . 1 ( 6 ) 9 7 , 8 ( 7 ) abnorm 5 1 9 9 7 , , 1 ( 1 5 ) 9 5 . . 6 ( 2 3 ) 4 - 9 norm 3 1 4 9 8 . , 1 ( 6 ) 9 7 . . 8 ( 7 ) abnorm 5 2 2 9 7 , . 7 ( 1 2 ) 9 6 . . 7 ( 1 7 ) 6 - 9 norm 3 1 0 9 8 , . 7 ( 4 ) 9 8 . . 4 ( 5 ) abnorm 5 1 8 9 6 , . 7 ( 1 7 ) 9 5 , . 8 ( 2 2 ) 3 - 7 - 9 norm 2 9 7 9 8 , . 7 ( 4 ) 9 7 , . 3 ( 8 ) abnorm 5 2 1 9 6 . . 9 ( 1 6 ) 9 6 , . 0 ( 2 1 ) 3 - 8 - 9 norm 2 9 4 9 8 , . 3 ( 5 ) 9 7 , . 6 ( 7 ) abnorm 5 2 0 9 6 . . 5 ( 1 8 ) 9 5 . 2 ( 2 5 ) 1 - 9 - 9 norm 3 1 5 9 8 , . 7 ( 4 ) 9 7 , . 8 ( 7 ) abnorm 5 2 0 9 7 , . 5 ( 1 3 ) 9 6 . 4 ( 1 6 ) ( ) indicates number of c e l l s m i s c l a s s i f i e d . BMDP Selected Author Selected Normal Jackknife Normal Jackknife Class. Class. Class. Class. 9 8 , . 4 5 ! : ( 5 ) 9 8 . . 4 ? : ( 5 ) 9 9 . 4 3 ! : ( 2 ) 9 9 , . 4 2 : ( 2 ) 9 6 , . 0 ( 2 1 ) 9 5 , . 4 ( 2 4 ) 9 2 , . 7 ( 3 8 ) 9 1 , . 9 ( 4 2 ) 9 9 , . 0 ( 3 ) 9 9 , . 0 ( 3 ) 9 9 . 4 ( 2 ) 9 9 , . 4 ( 2 ) 9 5 , . 9 ( 2 1 ) 9 5 , . 6 ( 2 3 ) 9 1 , . 5 ( 4 4 ) 9 1 , . 1 ( 4 6 ) 9 9 . . 0 ( 3 ) 9 8 , , 7 ( 4 ) 9 8 , . 4 ( 5 ) 9 8 , . 1 ( 6 ) 9 6 , . 3 ( 1 9 ) 9 5 , , 4 ( 2 4 ) 9 1 , , 9 ( 4 2 ) 9 1 , . 3 ( 4 5 ) 9 9 . . 0 ( 3 ) 9 8 . . 7 ( 4 ) 9 8 , , 4 ( 5 ) 9 8 , . 1 ( 6 ) 9 6 , . 4 ( 1 9 ) 9 6 , . 2 ( 2 0 ) 9 2 , . 3 ( 4 0 ) 9 2 , . 3 ( 4 0 ) 9 9 . . 0 ( 3 ) 9 9 . . 0 ( 3 ) 9 8 , . 7 ( 4 ) 9 8 , . 4 ( 5 ) 9 5 , . 8 ( 2 1 ) 9 5 , . 9 ( 2 2 ) 9 0 . . 9 ( 4 7 ) 9 0 . . 7 ( 4 8 ) 9 9 . 3 ( 2 ) 9 9 . . 0 ( 3 ) 9 9 . . 0 ( 3 ) 9 8 , . 7 ( 4 ) 9 6 , . 2 ( 2 0 ) 9 5 , . 4 ( 2 4 ) 9 1 . . 6 ( 4 4 ) 9 1 . . 2 ( 4 6 ) 9 9 . 0 ( 3 ) 9 8 , . 6 ( 4 ) 9 8 . 6 ( 4 ) 9 8 , . 6 ( 4 ) 9 6 . 3 ( 1 9 ) 9 5 , . 8 ( 2 2 ) 9 1 . . 2 ( 4 6 ) 9 0 . . 8 ( 4 8 ) 9 8 . 7 ( 4 ) 9 7 , . 8 ( 7 ) 9 9 . . 0 ( 3 ) 9 8 . . 7 ( 4 ) 9 7 . 1 ( 1 3 ) 9 6 , . 7 ( 1 7 ) 9 2 . . 9 ( 3 7 ) 9 2 . . 1 ( 4 1 ) Class. - C l a s s i f i c a t i o n TABLE 9 Classification Results for Various Segmentation Procedures and Combinations of Features Features used i n C l a s s i f i c a t i on Segmentation Cell Number Method Type of cells Simple Nuclear Simple Nuclear & Cytolplasm BMDP Min. Sel. Regular Jackknife Regular Jackknife Regular Jackknife Class. Class. Class. Class. Class. Class. normal 301 83. .7% (49) 83. .4% (50) 97. .7% (7) 97. .3% (8) 98, .3% (5) 98 .0% (6) abnorm 505 86 .1 (70) 86. .1 (70) 92. .3 (39) 92. .1 (40) 97, .4 (13) 96 .8 (16) norm 310 94, .2 (18) 93. .5 (20) 97. .4 (8) 97, .4 (8) 98, .1 (6) 97 .7 (7) abnorm 491 80. .4 (96) 80. .2 (97) 91. .2 (43) 91. .0 (44) 96, .7 (16) 95 .9 (20) norm 307 84. .4 (48) 84. .0 (49) 98. .0 (6) 97, .7 (7) 98, .0 (6) 98 .0 (6) abnorm 502 81. .9 (91) 81. .7 (92) 90. .2 (49) 90. .0 (50) 96. .2 (19) 96 .0 (20) norm 306 81, .7 (56) 81. .4 (57) 98. .0 (6) 98, .0 (6) 98 .4 (5) 98 .4 (5) abnorm 505 83 .4 (84) 83, .2 (85) 89. .9 (51) 89. .7 (52) 96, .8 (16) 96 .2 (19) norm 300 88. .7 (34) 88. .7 (34) 97. .7 (7) 97, .7 (7) 97, .7 (7) 97 .3 (8) abnorm 492 82 .9 (84) 82, .3 (87) 90, .7 (46) 90. .4 (47) 96, .3 (18) 95 .9 (20) norm 294 76 .5 (69) 76. .2 (70) 96, .9 (9) 96 .9 (9) 98 .6 (4) 98 .6 (4) abnorm 515 81 .9 (93) 81, .7 (94) 89. .1 (56) 88. .9 (57) 95, .5 (23) 95 .3 (24) norm 279 89 .6 (29) 89. .2 (30) 95, .7 (12) 95. .3 (13) 97, .1 (8) 96 .4 (10) abnorm 474 83 .1 (80) 82. .9 (81) 91 .4 (41) 91, .4 (41) 95 .8 (20) 95 .4 (22) 1 2 3 4 6 3-7 3-8 £ ) denotes number of cells misclassified. features used were: NA, NComp, IODR, ODVar and ODMean. 3Features used were: CA, Ratio, NA, NComp, IODR, ODVar and ODMean. Minimum number of features selected by BMDP for any of the segmentation methods in Table 6. These features were: CA, Ratio, NA, NMeanR, RMeanI, GMeanI, BMeanI, IODR, ODMax, ODVar, Correlation, Homogeneity, FA2, FD, DNum and HAER. Class. = Classification VO TABLE 9 (Continued) C l a s s i f i c a t i o n Results f o r Various Segmentation Procedures and Combinations of Features Features used i n C l a s s i f i c a t i o n 1 2 Simple Nuclear Simple Nuclear & Cytoplasm BMDP Minimum Selected Segmentation C e l l Number Normal Jackknife Normal Jackknife Normal Jackknife Method Type of C e l l s Class. Class. Class. Class. Class. Class. 1-9 norm 315 72. 1% (86) 72. 7% (86) 97. 1% (9) 96. 8% (10) 98. 4% (5) 98. ,4% (5) abnorm 519 86. 9 (68) 86. 9 (68) 91. .7 (43) 91. 1 (46) 96. 0 (21) 95. ,4 (24) 2-9 norm 315 77. .4 (87) 71. 7 (89) 97. 5 (8) 97. .1 (9) 98. ,4 (5) 98. .4 (5) abnorm 518 85. 1 (77) 84. 7 (79) 90. 3 (50) 90. 2 (51) 95. ,2 (25) 94. 8 (27) 3-9 norm 314 72. 6 (86) 72. 6 (86) 97. 5 (8) 97. 5 (8) 99. ,0 (3) 98. .7 (4) abnorm 519 86. ,1 (72) 86. 1 (72) 91. ,1 (46) 90. 9 (47) 95. ,4 (24) 94. ,6 (28) 4-9 norm 314 70. .7 (92) 69. ,7 (95) 96. .5 (ID 96. .5 (11) 98, ,1 (6) 98, ,1 (6) abnorm 522 86. .6 (70) 86. .2 (72) 90. ,4 (50) 90, ,0 (52) 96, .0 (21) 95, .6 (23) 6-9 norm 310 70, .0 (93) 69, ,7 (94) 97, ,4 (8) 96, 8 (10) 98, .7 (4) 98. .3 (5) abnorm 518 84, .7 (79) 84, .2 (82) 90, .0 (52) 90. .0 (52) 95, .9 (21) 95, .8 (22) 3-7-9 norm 297 69 .0 (92) 68, .7 (93) 96, .9 (10) 96, .6 (10) 98 .7 (4) 98, .3 (5) abnorm 521 86 .0 (73) 85, .6 (75) 90, .6 (49) 90, .4 (50) 95 .4 (24) 94 .8 (27) 3-8-9 norm 294 70 .4 (87) 70 .1 (88) 98 .0 (6) 98 .0 (6) 98 .3 (5) 98 .3 (5) abnorm 520 82 .9 (89) 82 .5 (91) 89 .8 (53) 89 .4 (55) 95 .2 (25) 95 .2 (25) 1-9-9 norm 315 72 .1 (88) 71 .4 (90) 96 .8 (10) 96 .5 (11) 98 .7 (4) 98 .7 (4) abnorm 520 87 .9 (63) 87 .7 (64) 90 .4 (50) 90 .4 (50) 96 .0 (21) 95 .6 (23) £ ) denotes number of c e l l s m i s c l a s s i f i e d . Features used were: NA, NComp, IODR, ODVar and ODMean. 3Features used were: CA, Ratio, NA, NComp, IODR, ODVar and ODMean. Minimum number of features selected by BMDP for any of the segmentation methods i n Table 6. These features were: CA, Ratio, NA, NMeanR, RMeanI, GMeanI, BMEanI, IODR, ODMax, ODVar, Co r r e l a t i o n , Homogeneity, FA2, FD, DNum and HAER. Class. = C l a s s i f i c a t i o n The jackknife c l a s s i f i c a t i o n heading i n Tables 8 and 9 indicate the c l a s s i f i c a t i o n accuracy of the average DF generated when the data are subdivided into learning and te s t sets i n the fashion suggested by 7 9 Lachenbrach et al. 7 9 The jackknife procedure or Lachenbrach "holdout" procedure can 80 be described as follows : 1) Start with a group of N x observations. Leave out one observation from t h i s group and c a l c u l a t e a c l a s s i f i c a t i o n function based on the remaining N ^ - l , and N 2 observations (two group case). 2) C l a s s i f y the observation l e f t out i n step 1. 3) Repeat steps 1 and 2 u n t i l a l l of the N x observations have been c l a s s i f i e d while being l e f t out. Let n ^ be the number of l e f t out observations m i s c l a s s i f i e d i n group Nx. u 4) Repeat steps 1 through 3 f o r the N 2 group. Let n2 M be the number of l e f t out observations m i s c l a s s i f i e d i n the N 2 group. Let xii be the number of observations i n group N x and n 2 be the number of observations i n group N 2. The jackknife H H m i s c l a s s i f i c a t i o n rate i s j u s t n ^ /nx f o r group Ni and n 2 M /n 2 for group N 2. The jackknife c l a s s i f i c a t i o n accuracy i s a much more true representation of how the DF would perform on other t e s t sets. While there i s no theory which can estimate the error of the c l a s s i f i c a t i o n 7 9 accuracy of the DF on other data, empirical studies indicate that the difference between the claimed jackknife c l a s s i f i c a t i o n accuracy and the actual performance on a new data set should be within ± 20% of the jackknife classification error. For example, i f the jackknife classification accuracy is 95%, and thus the classification error is 5%, then the actual performance of the DF on a new data set should range between 94 - 96%. The actual feature and c e l l classification data used to generate and test the Dfs for Tables 8 and 9 were the data from the 800 image subset of the 1000 image database used to examine the segmentation performance of the various algorithms, so that the same cells were used to generate a l l DFs. In Table 9 a l l the features used were user selected. The f i r s t set of features, the Simple Nuclear features, were chosen to match those most commonly used in the literature excluding cytoplasmic features. The next set, Simple Nuclear and Cytoplasmic, includes these features and the cytoplasmic ones. In Table 8 BMDP selected the features used in one of the sets of DF. The number of features selected by BMDP for each segmentation method differed. The average number of features used was 23 and ranged from a low of 16 for procedure 1 + 9 + P to a high of 29 for procedure 3 + 8 + P. Looking for significant differences in the classification accuracy between the various segmentation methods, the minimum set of 16 features selected by BMDP were used to form the Dfs for a l l the > segmentation procedures. The results are presented in the last two columns of Table 9. As stated before, there is no easy way to determine which combination of features are most discriminating, short of testing a l l possible combinations of features. Since the features used to form the DF can be ranked according to their calculated F to remove s t a t i s t i c as generated by the BMDP 7M program, a l l the BMDP selected features for each segmentation procedure examined in Table 8 were ranked. The average rank of each feature across a l l 15 segmentation procedures was calculated and used to rank the BMDP selected features. These results are presented in Table 10. While the ranking of the features and the average ranks calculated for each feature should not be taken as an absolute ranking of a feature's discriminating power, i t does roughly indicate the relative discriminating power of each feature. For example, while the feature Ratio is ranked 5th, i t does not mean that Ratio is always more discriminating than is the feature GMeanI; however Ratio is very li k e l y more discriminating than the feature FD. TABLE 10 Feature Importance in Discrimant Function Analysis Feature Mean Rank ± S . P . RMeanI 2 ± 1 Correlation 3 ± 2 ODVar 4 ± 4 NMeanR 5 ± 3 Ratio (NA/CA) 5 ± 2 GMeanI 6 ± 2 ODMax 8 ± 5 IODR 10 + 4 BMeanI 10 ± 3 NA 15 + 6 DNum 16 ± 5 HAER 16 ± 4 FD 16 ± 6 Cluster Shade 16 ± 6 ODSkew 18 ± 9 Homogeneity 18 ± 7 Contrast 18 ± 6 CRMH 19 ± 5 TARM 19 ± 5 ODKurt 19 ± 5 CRL 19 ± 5 BEntropy 20 ± 5 TARH 20 ± 5 CRM 21 ± 4 SCMCN 22 ± 4 CA 21 ± 4 Energy 21 ± 5 TERL 21 ± 4 FA2 22 ± 4 Cluster Prominence 22 ± 4 NL 22 ± 3 Dmln 22 ± 1 NM 22 ± 1 NComp 23 ± 2 NIert 23 ± 2 DMax 23 ± 2 NMaxR 23 ± 2 NH 23 ± 1 DCM 23 ± 1 NMS 23 ± 1 NBdycrc 23 ± 1 NElong 23 ± 1 NRVar 23 ± 1 NBdyfin 23 + 1 4. DISCUSSION 4.1. Segmentation Accuracy It was found that the best procedure for segmenting images of stained cervical cells is algorithm 1 to segment the cytoplasm and obtain an i n i t i a l nuclear segmentation, followed by 2 iterations of algorithm 9 and then postprocessing. In Table 2 six different cytoplasmic segmentation procedures were examined. The best cytoplasmic segmentation was achieved by the primary segmentation algorithm number 1, which correctly segmented 97.4% of the images, followed closely by primary segmentation algorithm 2, which correctly segmented 96.0% of the images. Both of these methods depend upon the global thresholding of the blue image to segment the cytoplasm. Algorithm 1 uses an unmodified histogram while algorithm 2 uses modified histograms. The difference between the performance of these two algorithms is very slight. They both work so well because they assume that the background intensity distribution is very narrow and spatially uniform. This is almost always the case once the images have been decalibrated. Thus, i t is usually very easy for the algorithms to find the correct threshold. Primary segmentation algorithm 5 segments the cytoplasm correctly for 93.3% of the c e l l s . Its performance is slightly less than the other two because in some of the red images the cytoplasm is very faint and was occasionally incorrectly segmented as background. Algorithm 5 assumes that there w i l l be more than a 1 or 2 gray level contrast between the background and the cytoplasm in a l l three color images, and this is not always true of the red images. Primary segmentation algorithms 3 & 4 gave very similar results (90% and 89.3% respectively). These results are slightly worse than the other algorithms because these algorithms assume that significant edges exist at the boundaries of the cytoplasm which is not always the case. Both of these algorithms use spatially localized information to determine a threshold which varies across the image. These algorithms work as well or better than algorithms 1, 2 and 5, when the background intensity varies from area to area in the image. However, in the image database tested, the image background varies very l i t t l e and these algorithms (3 and 4) do not perform as well. Primary segmentation algorithm 6 produced the worst results (74.7%) on cytoplasmic segmentation. This algorithm assumes there is significant contrast between the background and the cytoplasm and that the cytoplasm is relatively uniform in intensity. In a f a i r number of the images, one or both of these conditions are not true and hence the algorithm does not correctly segment the cytoplasm of many of the cervical c e l l images. The best primary segmentation of the nucleus was performed by algorithm 3 which correctly segmented the nuclei in 61 of the 150 images. In general a l l of the secondary segmentation algorithms increased the number of correctly segmented nuclei. Of the various combinations of primary segmentation algorithms with a single secondary segmentation algorithm, algorithm 9 on average provided the greatest improvement in the nuclear segmentation, yielding approximately 75 more nuclei per primary segmentation algorithm. While the postprocessing routine does not actually change the segmentation of the c e l l i t does remove some of the artifacts which have been mistakenly segmented as nuclear objects. Thus, the routine increases the number of images for which the nuclear material is correctly identified. The post processing routine increases the accuracy of the nuclear segmentation almost as much, on average, as does algorithm 9. For some of the primary segmentation algorithms (2, 3, 4) the postprocessing routine increases the nuclear segmentation accuracy more than does algorithm 9. Not surprisingly, the best results were obtained when both algorithm 9 and postprocessing were employed. The best nuclear segmentation results were obtained when 2 iterations of algorithm 9 were employed. The worst nuclear segmentation was obtained by algorithm 4 (15.3%). For primary segmentation algorithms, the nuclear segmentation accuracy generally reflects their a b i l i t y to ignore artifacts in the red image, i.e. dark areas due to overlapping or folded cytoplasm, staining artifacts, or di r t . This can be seen in the results of the primary segmentation methods plus post processing. The differences in the nuclear segmentation accuracy between algorithms 1, 2, 3 and 4, plus post processing were only -6%, whereas for the primary segmentation algorithms the difference in the nuclear segmentation accuracy ranged up to 25% for algorithm 3 and 4. Since the post processing only removes non-nuclear objects from the segmentation, these results indicate that algorithm 4 incorrectly segments many more artifacts as nuclei than does algorithm 3. The primary segmentation algorithm 6 does a poor job of delineating the nuclei (19.3%). It does not segment many artifacts as 108 nuclei as demonstrated by the only moderate increase in the correct nuclear segmentation rate (33.3%) when post processing is also applied. Using the secondary segmentation algorithm 7 improves the segmentation accuracy for a l l the applicable primary algorithms. This algorithm however, generates a threshold to segment the images and this again results in a large number of artifacts being segmented as nuclei as demonstrated by the increase of 40% in segmentation accuracy for procedure 3 and 7 (46.7%) versus procedure 3 and 7 plus post processing (86.7%). The results are similar for the secondary segmentation algorithm 8. The secondary segmentation algorithm 9 generates the largest increase in nuclear segmentation accuracy of a l l the secondary segmentation algorithms. This algorithm is not as susceptible to artifacts as the other algorithms as indicated by only a slight -12% increase in the nuclear segmentation accuracy when post processing is performed. Comparing Table 3 results with Table 2 results, one finds a 6.0% increase in the correct nuclei segmentation rate, reflecting the fact that the large database was not preselected to contain images which are very d i f f i c u l t to segment. In Table 3 i t is the 1 + 9 + 9 + Post Processing procedure which gives the highest combined (nucleus and cytoplasm) correct segmentation rate. For the nuclei in the 1000 image database the correct segmentation rate varied from 52.5% to 98.0% across the various segmentation procedures presented in Table 3. There is not much difference between the five best procedures (ranging from 97.6 to 98.0) in nuclear segmentation accuracy. A l l of these five procedures use one or more iterations of algorithm 9. Only the 3 + 8 segmentation procedures produced poorer results on the 800 image database than on the 150 image database. Also the 3 + 8 procedure was worse than the procedure 3 for the images used for Table 3. On closer examination i t was apparent that the accuracy of algorithm 8 is more influenced by the large variations in the intensity of the nucleus than are the other secondary segmentation procedures. Abnormal nuclei usually exhibit more variation in intensity than do normal nuclei. The images used to determine the results in Table 3 were predominantly abnormal images. Only 1/5 of the images in the 150 image database were of abnormal cel l s . Thus, the 3 + 8 procedure's performance was worse on the larger database. In Table 4, the 63 (3680-3617) images which gave incorrectly segmented nuclei, can be subdivided into 3 groups. In the f i r s t group of 31 images, no nucleus was found because the postprocessing step recognized the presence of an incorrectly segmented nucleus and thus removed i t . These images can be readily recognized by the system as incorrectly segmented ce l l s . The second group of 28 images contained nuclei which were "mildly incorrect segmentation" to indicate the cases in which the segmentation errors should not cause the c e l l in the image to be misclassified by an automated classification procedure as features such as IOD would be only slightly affected. Many features, particularly most of the texture features would not be affected at a l l by these slight errors in segmentation. The last group of 4 images contained nuclei so poorly segmented that they could be misclassified by an automated classification procedure. The results thus indicate that i f this procedure is employed in a fu l l y automated cervical c e l l screening, only 4 out of approximately 4000 nuclei (0.1%) analyzed could be misclassifled due to the segmentation algorithm used. The accuracy of the better segmentation procedures tested in this work compares favorably with the results stated by other authors. Borst 8 1 et al. reports a nuclear segmentation accuracy of 87% when using their algorithm on 322 monochromatic images of Pap-stained cervical c e l l s . For a set of 148 two color Pap-stained cervical cells Nordih 1 1 reports correct segmentation of the nuclei in 82% of the images and( of correct segmentation of the cytoplasm in 68% of the images. i 11 1 On Pap-stained c e l l s , Nordin reports the correct segmentation of 82% of the nuclei in a 148 image database and of the 26 nuclei incorrectly segmented, the system failed to recognize that i t had incorrectly segmented 3 of the nuclei. Thus, 2% of the nuclei segmented by this system could be incorrectly segmented and not recognized as such by the system and possibly be incorrectly classified. These are the results for Pap-stained cervical c e l l s . The same segmentation method (the nuclear radial contouring algorithm) correctly segmented 91.5% of the nuclei in the 1000 image database used in this work. The algorithm's improvement being due to the better images on which i t worked, i.e. stained with a quantitative nuclear stain and large colour separation between the nuclear and cytoplasmic stains. However, this is s t i l l much less than the performance of the best nuclear segmentation procedure used in this work. Also, only 0.1% of the nuclei segmented by the most accurate procedure in this thesis could be expected to be misclassifled due to incorrect segmentation while 2% of the cells could be misclassified due to the segmentation in the work reported by Nordin. I l l 4.2. Feature Variation It was found that features are affected by the segmentation method used. The coefficient of variation (CV) of each nuclear feature, due to the segmentation procedures was calculated to determine the sensitivity of the features to exactness of the delineation of the nucleus. The results are shown in Tables 5, 6 and 7. The coefficient of variation results for those images which were correctly segmented by a l l 15 procedures indicate how much a feature could vary given that the nucleus had been correctly segmented where "correct segmentation" is determined by a human observer. Another way of interpreting these results is how much the features would vary i f different individuals were to have manually segmented the images, or i f the same individual manually segmented the images a number of times. The coefficients of variation ranged from 0.4% for the ODMax feature to a high of 258.9% for the Cluster Prominence feature. The average CV value was 21.9%. As one would expect, those features which measure the shape of the nuclear boundary (NRVar, NBdycrc, and NBdyfin) vary more than the features which measure bulk properties of the nucleus (NA, RMeanI, IODR, etc.). In table 5, some features which one would expect to be sensitive to variations in the nuclear segmentation, such as the shape features NComp, NIert, and NMaxR, appear not to be such. Some features in table 5 which are surprisingly sensitive to the segmentation method are FA1, FA2 and DCM. On closer examination, the large variation of DCM is due to the fact that the mean DCM value for most nuclei is close to zero and hence as the denominator in the coefficient of variation calculation causes DCM to appear a r t i f i c i a l l y sensitive to segmentation. The large CVs of FAl and FA2 is due to the large changes in optical density at the edges of the nucleus. A small change (one or two pixels) in the position of the boundary of the nuclear mask can result in the inclusion of several pixels whose optical density values differ significantly from their neighbors. This difference is expressed as large height differences between neighboring pixels in the three-dimensional surface representation, (see definition of FAl and FA2 in section 4) resulting in sizable changes in the area of FAl and FA2. Since one cannot include new pixels into FAl without including the corresponding area into FA2, the changes in FAl and FA2 due to segmentation method have a tendency to cancel. This is seen in the low variance of the FD feature, which is calculated from FAl and FA2. It is the low density chromatin features, TARL, TERL, CL, NL, NLS and SCLCN (Table 6) which have the largest CVs. Since most of the low density chromatin is usually located near the edges of the nucleus, the amount and intensity distribution of the low density chromatin pixels varies greatly with small changes in the location of the nuclear boundaries. MAER, HAER and MHAER have large variances because they are features which are normalized by the mean optical density of the low density chromatin. The features SCMCN and SCHCM have large coefficients of variation for the same reason that the feature DCM does, i.e. their mean values are close to zero. In Table 7 one can see that the important feature IODR is not very dependent upon the segmentation method, while IODG and IODB are. This is probably due to two factors: 1) as one gets further from the red part of the spectrum, the nuclear stain absorbs less light, thus and the o v e r a l l absorption of the nucleus becomes smaller; 2) as one gets c l o s e r to the blue part of the spectrum, the cytoplasm absorbs more l i g h t and the c o r r e c t i o n f o r cytoplasmic absorption becomes larger. For IODG most of the variance i s due to the d i f f i c u l t y i n making the c o r r e c t i o n f o r the l i g h t absorption of the cytoplasm. For the IODB feature t h i s i s further complicated by the small IODB values of the nucleus ( i t can a c t u a l l y become negative a f t e r c o r r e c t i n g f or the absorption of the cytoplasm). The ODVar and ODSkew features are s e n s i t i v e to the determination of the boundary of the nucleus because they both describe the d i s t r i b u t i o n of the OD values i n the nucleus. S l i g h t v a r i a t i o n s i n the boundary of the nucleus can add or remove p i x e l s with OD values which are i n the t a i l regions of the nuclear OD d i s t r i b u t i o n . Both the variance measure and the skewness measure are s e n s i t i v e to the small changes i n the t a i l regions of the d i s t r i b u t i o n s which they describe. Some of the Markovian texture features have large v a r i a t i o n c o e f f i c i e n t s and others do not. There i s no simple explanation why some are more s e n s i t i v e to segmentation differences than others. The Cluster Prominence feature has such a large v a r i a t i o n c o e f f i c i e n t because i t s mean value i s frequently very close to zero. The DenMax, ExtrRange and AveRange features a l l have large v a r i a t i o n c o e f f i c i e n t s . These features measure the number and values of i n t e n s i t y maxima within the nucleus. Frequently, most of the maxima (bright spots) are located along the edges of the nucleus. Thus, small changes i n the segmentation of the nucleus can change the number of maxima i n the nucleus and the i n t e n s i t y d i s t r i b u t i o n of the maxima believed to be present i n the nucleus. Generally the features of the abnormal cells had larger coefficients of variation than did the features of normal cells indicating that the features of abnormal cells are more sensitive to segmentation differences. 4.3. Cell Classification and Discriminating Power of Features It has been found that the correct classification of cells is not dependent on the segmentation procedure. The results in Table 8 and Table 9 indicate that the correct classification of cervical cells is not dependent on the segmentation procedure. However, the classification of cervical cells is strongly dependent on the features used. For comparison purposes i t is not the normal/abnormal classification accuracy which should be compared, but the total classification accuracy (combines the normal-abnormal classification results). Table 11 presents the total jackknife classification results for the various segmentation procedures and feature combinations shown in Table 8. Table 11 does not represent new data since i t is calculated from the results in Table 8. In Table 11 the jackknife classification results when a l l the features were used to classify the cells ranged from 97.2% correct c e l l classification to 96.1% correct c e l l classification. The jackknife classification results estimate how well the discriminant function would perform on a different set of data. The accuracy of this estimation is generally accepted to be approximately 20% of the misclassification rate. The "error" on the 97.2% result is thus 0.6% and on the 96.1% result is 0.8%. The results 97.2 ± 0.6% and 96.1 ± 0.8% are not s t a t i s t i c a l l y different. The pattern is similar for the BMDP selected features (column 2) where the jackknife classification TABLE 11 Combined Normal and Abnormal Jackknife C l a s s i f i c a t i o n Results f o r Various Segmentation Procedures and Combinations of Features Features used i n C l a s s i f i c a t i o n Segmentation Method A l l Selected BMDP Selected Author Selected 1 97. IX 97.6X 95.5X 2 96.5 97.4 94.6 3 96.9 96.5 93.6 4 96.9 97.4 94.5 6 96.2 96.7 94.3 3-7 96.4 97.0 93.9 3-8 96.4 97.6 94.2 1-9 96.5 96.5 94.7 2-9 96.2 96.9 94.2 3-9 96.4 96.6 93.9 4-9 97.1 97.1 94.5 6-9 96.7 97.0 93.6 3-7-9 96.5 96.7 93.9 3-8-9 96.1 96.8 93.6 1-9-9 97.2 97.1 94.6 accuracy ranges from 97.6 ± 0.5% to 96.5 ± 0.7% and for the author selected features (columns 3) where the jackknife classification accuracy ranged from 95.5 ± 0.9% to 93.6 ± 1.3%. A small independent database of 160 images was used to verify the classification accuracy of several (12) of the discriminant functions on a database which was not part of the learning set used to generate the discriminant functions. The classification accuracy of the discriminant functions on the 160 image database was that predicted by the jackknife classification results in Tables 8 and 9, within the experimental error associated with the jackknife classification results. While none of the segmentation procedures resulted in classification results which were s t a t i s t i c a l l y different from the rest of the other segmentation procedures, some segmentation procedures had consistently higher or lower classification results, but were s t i l l s t a t i s t i c a l l y identical because of the errors associated with the results. Segmentation methods 1 or 1-9-9 had consistently the best classification results, while methods 6 and 6-9 had the worst classification results. The difference in classification accuracy between using a l l the features to form the discriminant function and letting BMDP select the features used in the discriminant function is smaller than the error attached to the classification results. The classification results when BMDP selects the features versus when the author selected the features show a significant difference (difference larger than the associated error) between the two. Allowing BMDP to st a t i s t i c a l l y select the features to be used in the discriminant function provided the best results. The results in Table 9 display similar characteristics as those of Table 8. When the same features are used in the discriminant function there was no experimental difference between the classification accuracy of the various segmentation procedures. There is a definite difference in the classification results in Table 9 depending on which features are used in the discriminant function. Similar to Table 11, Table 12 represents the total jackknife classification results for the segmentation procedures and features shown in Table 9. For the simple nuclear features the classification accuracy ranged from 85.4 ± 2.9% for method 2 to 78.0 ± 4.4% for method 3-8-9. While the difference between these classification results is experimentally significant, i.e. larger than the experimental error, i t is only 0.1% larger which makes attaching any significance to this result d i f f i c u l t . For the simple nuclear and cytoplasm features the classification features the classification accuracy ranged from 94.0 ± 1.2% for method 1 to 91.8 ± 1.6% for method 3-7. Several other sets of features were used to form the discriminant functions while searching for evidence that would indicate that the segmentation method can influence the classification results. A set of features made up of those features with the largest variation coefficient in Tables 5, 6 and 7 were tried. No significant difference was found. Letting BMDP select only from the nuclear features again did not result in a significant difference. However, i t was found that the BMDP selected nuclear features alone were able to classify the cervical cells with an overall accuracy of 97.0 ± 0.6%. As previously noted in Tables 8 and 11, BMDP selected the features for one set of discriminant functions. The number of features BMDP selected depended upon the segmentation method used. The number of features used ranged from 16 to 29 and averaged around 23. Using the minimum of 16 features selected by BMDP as a feature test set the last two columns in Table 9 were generated. In Table 12, segmentation method 3-8 was found to give a classification accuracy of 95.8 ± 0.8% which was significantly lower than the classification accuracy of 97.3 ± 0.5% of segmentation method 1. While the difference between these classification results is experimentally significant (larger than the experimental error) i t represents a very small difference when compared with the difference in classification accuracy due to the number and type of the features used in the discriminant function analysis. Although the segmentation procedure applied to the images does not have a significant effect on c e l l classification performance, some segmentation procedures appear to consistently produce slightly worse classification results than others. The segmentation procedure (1-9-9) which correctly segments the most nuclei does not appear to produce the most accurate classification results. This is an unexpected result. It could be due to the fact that while a l l procedures are applied to the same image database, the postprocessing routine rejects the poorly segmented images which are more frequent in the worse segmentation procedures, thus the number of cells to be cl a s s i f i e d varies from segmentation procedure to segmentation procedure. It is also possible that the cells which are d i f f i c u l t to segment may also be d i f f i c u l t to classify. This would cause the better segmentation procedures to produce worse classification results. To test this hypothesis, the cells which were eliminated from the results of the segmentation TABLE 12 Combined Normal and Abnormal Jackknife Classification Results for Various Segmentation Procedures and Combinations of Features Features used in Classification Segmentation Method Simple Nuclear Cytoplasm and 2 Simple Nuclear BMDP Minimum Selected 1 2 3 4 6 3-3-1-2-3-4-•7 •8 •9 •9 •9 •9 6-9 3-7-9 3-8-9 1-9-9 85. IX 85.4 82.6 82 84 79 85 81 79.8 81.0 80.0 78.7 79.5 78.0 81.6 94. OX 93.5 93.0 92.8 93.2 91.8 92.8 93.3 92.8 93.4 92.5 92.5 92.7 92.5 92.7 97. 3X 96.6 96.8 97.0 96.5 96.5 95.8 96.5 96.2 96.2 96.5 96.7 96.1 96.3 96.8 features used were: NA, NComp, IODR, ODVar and ODMean. 3Features used were: CA, Ratio, NA, NComp, IODR, ODVar and ODMean. Minimum number of features selected by BMDP for any of the segmentatic methods in Table 6. These features were: CA, Ratio, NA, NMeanR, RMeanI, GMeanI, BMeanI, IODR, ODMax, ODVar, Correlation, Homogeneity, FA2, FD, DNum and HAER. 120 procedure 1 were also removed from the results of the 1-9-9 routine. A discriminant function analysis was performed on the reduced data of the 1-9-9 procedure. The analysis used only the selected nuclear features used in Tables 9 and 12. The total Jackknife classification accuracy for the reduced 1-9-9 data set was 82.9%. This is an improvement over the old value of 81.6% for the 1-9-9 procedure in Table 12, indicating that i t is possible that the cells which are d i f f i c u l t to segment may also be d i f f i c u l t to classify. One should note that the difference between 82.9% and 81.6% is slight and less than the error associated with the measurements. A l l of the segmentation methods used in Tables 8, 9, 11 and 12 used post-processing. One interpretation of the results in Tables 8, 9, 11 and 12 is as follows. Once a c e l l has not been rejected by the post-processing routine i t is sufficiently well segmented that i t can be accurately classified. Or, in order for the segmentation of the c e l l to affect the classification results, i t s segmentation must be so bad that the object no longer looks like a c e l l (hence i t is removed by the post-processing routine). One can also interpret this to indicate that for the features which do most of the classifying the variation due to segmentation method is much less than the variation in the features due to the c e l l type. The better classification results in Tables 11 and 12 compare well 1 2 with the results stated in the literature. Zahniser et al. found that they can correctly classify 99.5% of the normal cells and 92.3% of the 4 0 abnormal cells in a 1354 image database. Holmquist et al. found that they could correctly classify 96.8% of the normal cells and 97.6% of the abnormal cells of a 244 image set. They also found that the r e p r o d u c i b i l i t y of manual c l a s s i f i c a t i o n of stained c e r v i c a l c e l l s by a experienced c y t o l o g i s t i s such that the f a l s e p o s i t i v e and f a l s e negative errors when manually c l a s s i f y i n g c e l l s as normal or abnormal are 4.5% and 4.4%, res p e c t i v e l y . Therefore, t h i s indicates that i f the automatic c l a s s i f i c a t i o n procedure can c o r r e c t l y c l a s s i f y 96.6% of the c e l l s i t has equaled the c l a s s i f i c a t i o n performance of a trained c y t o l o g i s t . The ordering of the features i n Table 10 based upon t h e i r mean rank as ca l c u l a t e d i n the r e s u l t s section of t h i s thesis should be taken with a grain of s a l t . The F s t a t i s t i c generated by BMDP and used to cal c u l a t e the rank of these features i s a parametric s t a t i s t i c and depends upon c e r t a i n conditions e x i s t i n g i n the d i s t r i b u t i o n of the feature among the various c e r v i c a l c e l l s . Most of these conditions do not e x i s t f o r most of these features. Thus, the order of the features should be considered as only a rough approximation of their| importance i n the c l a s s i f i c a t i o n of c e r v i c a l c e l l s . While i t i s not s u r p r i s i n g that RMeanI i s considered to be one of the most di s c r i m i n a t i n g features, i t i s s u r p r i s i n g that GMeanI and BMeanI are also indicated as being quite d i s c r i m i n a t i n g . One would expect that the mean i n t e n s i t y information found i n the green and blue images would be j u s t constant f r a c t i o n s of the red mean i n t e n s i t y f o r a l l c e l l types. However going back to the o r i g i n a l images, i n a large number of the abnormal c e l l images the n u c l e i appear to be a s l i g h t l y d i f f e r e n t i n shade/color than most of the n u c l e i i n the images of normal c e r v i c a l c e l l s . This may be due to the i n t e r a c t i o n of the i n t e n s i t y of the nuclear s t a i n with the amount of and s t a i n i n t e n s i t y of the overlying cytoplasm. Normal n u c l e i have a dark brown color whereas some of the abnormal nuclei had more of a blue hue to them. This would indicate that perhaps there is less cytoplasm overlapping the nuclei of abnormal cell s , or that the cytoplasm of abnormal cells might not absorb as much Orange II stain as does the cytoplasm of normal c e l l s . The tasks which an automated prescreening device for cervical slides need to do are 1) find and recognize cells on a slide, 2) segment the cel l s , 3) numerically describe the cells, 4) classify the cells, and 5) classify the slide. This work has dealt with, to some degree, steps 2-4. Work s t i l l needs to be done on steps 1 and 5. This would require determining: the number of cells needed to classify a slide; the best or a satisfactory method of locating and recognizing individual cells on a slide; and the best method of classifying a slide, i.e. using c e l l by c e l l classification information or c e l l population feature data or both. 4.4. Conclusion This work was performed to test two hypotheses: i) can the accuracy of cervical c e l l segmentation procedures be significantly improved and i i ) does segmentation performance effect cervical c e l l classification accuracy. Therefore, the thrust of this thesis was to determine: 1) the segmentation performance of several (32) segmentation procedures, developed for this work and from the literature, to segment stained cervical cells; 2) the effect of segmentation performance on numerical descriptions of the cervical cells; 3) the effect of the method used to segment the cervical cells on the automatic classification of these cells; and 4) test the hypothesis that using quantitative cellular descriptors one can use these to match the performance of s k i l l e d cytology technicians. Segmentation involves identifying the cytoplasm and the nucleus of the cervical c e l l . It was found that the most accurate nuclear segmentation out of the 32 procedures tested could be achieved by using a simple two-dimensional histogram analysis, two iterations of the edge relocation algorithm and the postprocessing routine, a l l of which were created by the author. This segmentation procedure correctly segmented 98.3% of the nuclei of a 3680 image database (only 61 nuclei incorrectly segmented). Only four of the incorrectly segmented nuclei were so i poorly delineated that they might be misclassified by an automated classification procedure. The best cytoplasmic segmentation was achieved by the simple two-dimensional histogram analysis segmentation procedure which was developed for this work. It managed to correctly segment the cytoplasm in 98.5% of the images In a 1000 image database. Sixty-seven different numerical c e l l descriptors (features) were calculated for every c e l l segmented and for each segmentation of the c e l l by a different segmentation procedures. The data so generated was used to calculate the coefficient of variation (CV) for each feature across the segmentation procedures. It was found that the feature which was least sensitive to the segmentation procedures used was ODMax, which had a CV of 0.4%. The feature most sensitive to the segmentation procedure used was Cluster Prominence which had a CV of 260%. The average CV for the 67 different features was 22%. Thus, segmentation method affects the feature values. The sizeable variations in feature value as a function of segmentation procedure used did not effect the classification of the cervical c e l l s . One can also interpret this to indicate that for the features which do most of the classifying the variation due to 124 segmentation method is much less than the variation in the features due to the c e l l type. It was found that using a l l of the cellular features or selected subsets of the features that one could automatically classify the cells with as high accuracy as an experienced cytologist. The best automatic classification achieved was the correct classification of 98.7% of the normal cervical cells and 97.0% of the abnormal ce l l s . A subset of 16 features was the smallest subset which could achieve this classification accuracy. It was also found that a subset of 20 nuclear features alone could also achieve this classification accuracy. Thus the thesis that a computerized system can classify cervical cells at least as well as a trained cytologist has been demonstrated. This result requires that the system can segment cervical cells and recognize when i t makes errors. From the results of this work, i t appears that i f images of Thionin S02 and Orange II stained cervical cells were collected by a device i t would be possible to automatically classify individual cells as well as i t can be performed by an experienced cytologist. The results also suggest that the same can be achieved using nuclear stain alone and thus employing only nuclear features, greatly simplifying c e l l recognition and segmentation tasks. 125 REFERENCES 1. Silverberg E, Lubera JA: Cancer Statistics, 1988. Ca-A Cancer J for Clinicians, Vol 38, No 1, pp 5-22, 1988. 2. Anderson GH, Boyes DA, Benedet JL, LeRiche JC, Matistic JP, Suen KC, Worth AJ, Millner A, Bennett OM: The organization and results of the cervical cytology screening program in British Columbia from 1955 to 1985. Lancet, 1988. 3. Atlas of Cancer Mortality in the Peoples Republic of China. China Map Press, Shanghai, 1981. 4. Naib ZN: Morphology of malignant cells and their precursors in exfoliative cytopathology, 3rd Edition. L i t t l e Brown & Company, Toronto, 1985, pp 152-153. 5. Mellors RC, Glassman A, Papanicoloaou, GN: A microfluorometric scanning method for the detection of cancer cells in smears of exfoliated c e l l s . Cancer, Vol 5, pp 458-468, 1954. 6. Caspersson TO, Santesson L: Studies on protein metabolism in the cells of epithelial tumors. Acta Radiol (suppl), Vol 46, pp 1-105, 1942. 7. Caspersson TO, Lomakka G: Recent progress in cytochemistry: Instrumentation and results, In Introduction of Quantitative Cytochemistry - II. Wied GL and Bahr GF, Eds., Academic Press, New York, 1970, p 27. 8. Wied GL, Bahr GF, Bartels PH: Automatic analysis of c e l l images by TICAS, In Automated Cell Identification and Cell Sorting. Wied, GL and Bahr, GF, Eds, Academic Press, New York, 1970, pp 195-360. 9. Caspersson TO: Cell Growth and Cell Function: A cytochemical study, WW Norton & Co, New York, 1950. 10. Caspersson TO: Quantitative tumor cytochemistry. Cancer Res, Vol 38, pp 2341-2355, 1979. 11. Nordin B: The Development of an automatic prescreener for the early detection of cervical cancer: Algorithms and implementation. Ph.D. Thesis, Uppsala University, 1989. 12. Zahniser DJ, Oud PS, Raaijmakers MCT, Vooys GP, Van de Walle RT: BioPEPR: A system for the automatic prescreening of cervical smears. J Histochem Cytochem, Vol 27, No 1, pp 635-641, 197.9. 13. Ploem JS, van Driel-Kulker AMJ, Goyarts-Veldstra L, Ploem-Zaaijer JJ, Verwoerd NP, van der Zwan M: Image analysis combined with quantitative cytochemistry, Results and instrumental developments for cancer diagnosis. Histochem, Vol 84, pp 549-555, 1986. 126 14. Bartels PH, Bibbo M, Bahr GF, Taylor J, Wied GL: Cervical cytology: Descriptive statistics for nuclei of normal and atypical c e l l types. Acta Cytol, Vol 17, pp 449-453, 1973. 15. Spencer CC, Bostrom RC: Performance of the cytoanalyzer in recent c l i n i c a l t r i a l s . J Natl Cancer Inst, Vol 29, pp 267-276, 1962. 16. Nadel EM: Computer analysis of cytophotometric fields by CYDAC and i t s historical evolution from the cytoanalyzer. Acta Cytol, Vol 9, pp 203-206, 1965. 17. Wied GL, Bartels PH, Bahr GF, Oldfield DB: Taxonomic Intra-cellular Analytic System (TICAS) for c e l l identification. Acta Cytol, Vol 12, pp 180-204, 1968. 18. Wied GL, Bartels PH, Dytch HE, Pishotto FT, Bibbo M: Rapid high-resolution cytometry. Analyt Quant Cytol, Vol 4, No 4, pp 257-262, 1982. 19. Smeulders AWM, Leyte-Veldstra L, Ploem JS, Cornelisse CJ: Texture analysis of cervical c e l l nuclei by segmentation of chromatin patterns. J Histochem Cytochem, Vol 27, No 1, pp 199-203, 1979. 20. Brugal G, Garbay C, Giroud F, Adelh D: A double scanning microphotometer for image analysis: Hardware, software and biomedical applications. J Histochem Cytochem, Vol 27, pp 144-153, 1979. 21. Pycock D, Taylor CJ: Use of Magiscan image analyzer in automated uterine cancer cytology. Analyt Quant Cytol, Vol 2, pp 195-202, 1980. 22. Mukawa A, Kamitsuma Y, Tsunekawa S, Tanaka N: Report on a long-term t r i a l of Cybest Model 2 for prescreening for squamous c e l l carcinoma of the uterine cervix. Analyt Cellular Pathol, Vol 1, pp 225-233, 1989. 23. Zahniser DJ, Oud PS, Raaijmakers MCT, Vooys GP, Van de Walle RT: Field test results using the BioPEPR cervical smear prescreening system. Cytometry, Vol 1, No 3, pp 200-203, 1980. 24. Tucker JH, Shippey G: Basic performance tests on the CERVIFIP linear array prescreener. Analyt Qualt Cytol, Vol 5, No 2, pp 129-137, 1983. 25. Papanicolaou GN: A new procedure for staining vaginal smears. Science, Vol 95, pp 438-439, 1942. 26. Tezcan H, Personal communication. 27. Husain 0AN, Page-Roberts BA, Millet JA: A Sample preparation for automated cervical cancer screening. Acta Cytologica, Vol 22, No 1, pp 15-21, 1978. 127 28. Oud PS, Henderik JBJ, Huysmans ACLM, Pahlplatz MMM, Hermkens HG, Tas J, James J, Vooijs GP: The use of light green and orange II as quantitative protein stains and their combination with the Feulgen method for the simultaneous determination of protein and DNA. Histochem, Vol 80, pp 49-57, 1984. 29. MacAulay C, Tezcan H, Palcic B: Adaptive colour basis transformation: A segmentation aid. Analyt Quant Cytol Vol 11, No 1, pp 53-58, 1989. 30. Jaggi B, Poon SSS, MacAulay C, Palcic B: Imaging system for morphometric assessment of conventionally and fluorescently stained ce l l s . Cytometry, Vol 9, pp 566-572, 1988. 31. Harms H, Aus HM, Haucke M, Gunzer U: Segmentation of stained blood c e l l images measured at high scanning density with high magnification and high numerical aperture optics. Cytometry, Vol 7, pp 522-531, 1986. 32. MacAulay C, Palcic B: A comparison of some quick and simple threshold selection methods for stained c e l l s . Analyt Quant Cytol, Vol 10, pp 134-138, 1988. 33. Jarvis LR: A microcomputer system for video image analysis and diagnostic microdensitometry. Analyt Quant Cytol, Vol 8, No 3, pp 201-209, 1986. 34. Liedtke CE, Gahm T, Kappei F, Aeikens B: Segmentation of microscopic c e l l scenes. Analyt Quant Cytol, Vol 9, No 3, pp 197-211, 1987. 35. Nevatia R: Image segmentation, Chapter 9, In Handbook of Pattern Recognition and Image Processing. Young TY, and Fu KS, Eds, Academic Press, Inc, Toronto, 1986. 36. Kohler R: A segmentation system based on thresholding. Comput Graph Image Proc, Vol 15, pp 319-338, 1981. 37. Weszka JS, Verson JA, Rosenfeld A: Threshold selection techniques - 2, TR-260. Computer Science Center, University of Maryland, 1973. 38. Weszka JS: Threshold selection 4, TR-3376. Computer Science Center, University of Maryland, 1974. 39. K i t t l e r J, Illingworth J, Foglein J: Threshold selection based on a simple image s t a t i s t i c . Comp Vision Graph Image Proc, Vol 30, pp 125-147, 1985. 40. Holmquist J, Bengtsson E, Eriksson 0, Nordin B, Stenkvist B: Computer analysis of cervical c e l l s : Automatic feature extraction and classification. J Histochem Cytochem, Vol 26, No 11, pp 1000-1017, 1978. 128 41. Haussman G, Lledtke CE: A region extraction approach to blood smear segmentation. Comp Graphis Image Proc, Vol 25, pp 133-150, 1984. 42. Horowitz SL, Pavlidis T: Picture segmentation by a directed s p l i t and merge procedure, In Proceedings of the 2nd Int'l Joint Conf on Pattern Rocognition. Copenhagen, 1974, pp 424-433. 43. Cheevasuvit F, Maitre H, VIdal-Madjar D: Robust method for picture segmentation based on a s p l i t and merge procedure. Comp Vision Graph Image Proc, Vol 34, pp 268-281, 1986. 44. Peleg S, Rosenfeld A: Determining compatibility coefficients for curve enhancement relaxation processes. IEEE Trans Syst Man Cybern, MC-8, pp 548-555, 1978. 45. Rosenfeld A, Smith RC: Thresholding using relaxation. IEEE Trans Pattern Anal Machine Intell, PAMI-3, No 5, pp 598-606, 1981. 46. Bengtsson E, Eriksson 0, Holmquist J, Nordin B, Stenkvist B: High resolution segmentation of cervical c e l l s . J Histochem Cytochem, Vol 27, pp 621-628, 1979. 47. Rosenfeld A: Relaxation: Pixel-based methods. In Fundamentals in Computer Vision. Fangeras OD, Ed, Cambridge University Press, New York, 1983, pp 373-383. 48. Rosenfeld A, Kak AC: Iterative segmentation: "Relaxation". In Digital Picture Processing, Vol 2, 2nd Edition. Academic Press Inc, Toronto, 1982, pp 152-190. 49. Gonzalez RC: Image enhancement and restoration, In Handbook of Pattern Recognition and Image Processing. Young TY, Fu KS, Eds, New York, Academic Press, 1986, pp 191-213. 50. Bailey DG, Hodgson RM: Range f i l t e r s : local-intensity subrange f i l t e r s and their properties. Image Vision Comput, Vol 3, pp 99-109, 1985. 51. Rosenfeld A, Kak AC: Digital picture processing, Vol 2, Second Volume. New York, Academic Press, 1982, pp 100-101. 52. Horn BKP: Robot Vision. The MIT Press, Cambridge Mass, 1986, p 67. 53. Peet FG, Sahota TS: A computer-assisted c e l l identification system. Analyt Quant Cytol, Vol 6, No 1, pp 59-70, 1984. 54. Komitowski D, Zinser G: Quantitative description of chromatin structure during neoplasia by the method of image processing. Analyt Quant Cytol Histol, Vol 7, No 3, pp 178-182, 1985. 129 55. Brugal G, Quirion C, Vassilakos P: Detection of bladder cancers using a SAMBA 200 Cell Image Processor. Analyt Quant Cytol Histol, Vol 8, No 3, pp 187-194, 1986. 56. Young IT, Vanderlain M, Kramhout L, Jensen R, Grover A, King E: Morphologic changes in rat urothelial cells during carcinogenesis: II Image Cytometry. Cytometry, Vol 5, pp 454-462, 1984. 57. Holmquist J, Bengtsson E, Eriksson 0, Stenkvist B: A program system for interactive measurements on digitized c e l l images. J Histochem Cytochem, Vol 25, No 7, pp 641-654, 1977. 58. Strojny P, Traczyk Z, Rozycka M, Bern W, Sawicki W: Fourier analysis of nuclear and cytoplasmic shape of blood lymphoid cells from healthy donors and chronic lyphocytic leukemia patients. Analyt Quant Cytol Histol, Vol 9, No 6, pp 475-479, 1987. 59. Katzko MW, Pahlphatz MMM, Oud PS, Vooijs GP: Carcinoma in situ specimen classification based on intermediate c e l l measurements. Cytometry, Vol 8, pp 9-13, 1987. 60. Bibbo M, Bartels PH, Sychra JJ, Wied, GL: Chromatin appearance in intermediate cells from patients with uterine cancer. Acta Cytol, Vol 25, pp 23-28, 1981. 61. Lockart RZ, Pezzella KM, Kelley MM, Toy ST: Features independent of stain intensity for evaluating feulgen-stained c e l l s . Analyt Quant Cytol, Vol 6, No 2, pp 105-111, 1984. 62. Ballard PH, Brown CM: Computer Vision. Prentice-Hall Inc, Toronto, 1982, p 256. 63. Freeman H: Boundary encoding and processing, In Picture Processing and Psychopictorics. Lipkin BS, Rosenfeld A, Eds, Academic Press, New York, 1970, pp 241-266. 64. Vossepoel AM, Smeulders AWM: Vector code probability and metrication error in the representation of straight lines of fi n i t e length. Comp Graph Image Proc, Vol 20, pp 347-364, 1982. 65. Daniellson PE: A New Shape Factor. Comp Graph Image Proc, Vol 7, pp 292-299, 1978. 66. Smeulders AWM, Dorst L: Measurement issues in morphometry. Analyt Quant Cytol Histol, Vol 7, No 4, pp 242-249, 1985. 67. Pressman NJ: Markovian analysis of cervical c e l l images. J Histochem Cytochem, Vol 24, No 1, pp 138-144, 1976. 68. Unser M: Sum and difference histograms for texture classification. IEEE Trans Pattern Anal Mach Intell, Vol PAMI-8, pp 118-125, 1986. 130 69. Mandelbrot BB: The fractal geometry of nature. WH Freeman and Company, San Francisco, 1983, pp 33-39. 70. Caldwell C, Stapleton SJ, Holdsworth, Yaffe MJ: Characterization of Mammary Parenchymal Pattern by Fractal Dimension. Digital imaging Technology for Oncology, Terry Fox Workshop, Vancouver, B.C., Oct 19-22, 1988. 71. Panno JP, Nair KK: Age-related changes in c e l l nuclei, In Insect Aging. Collaty KG, Sohal RS, Eds, Springer-Verlag, Berlin, 1987, pp 155-167. 72. Vidal DCB, Schluter G, Moore GW: Cell nucleus pattern recognition: Influence of staining. Acta Cytol, Vol 17, pp 510-515, 1973. 73. Panno JP: Computer analysis of age related chromatin condensation in the somatic cells of the housefly Musca domestica. MSc Thesis, Simon Fraser University, 1984. 74. Dixon WJ (Ed): BMDP Statistical Software, 1983 Printing with Additions, 1983 Edition. University of California Press, Berkeley, 1983, pp 519-537. 75. Tatsuoka MM: Multivariate analysis: Techniques for education and psychological research. John Wiley & Sons Inc, Toronto, 1971, pp 157-242. 76. Hirschberg N, Humphreys LG: Multivariate analysis in the social sciences. Lawrence Erlbaum Associates, 1982. 77. Brown MB, Forsythe AB: Robust tests for the equality of variances. J Amer Stat Assoc, Vol 69, pp 364-367, 1974. 78. Kruskal WH, Wallis WA: Use of ranks in one-criterion variance analysis. J Amer Stat Assoc, Vol 47, pp 583-621, 1952. 79. Lachenbruch PA, Mickey MR: Estimation of Error Rates in Discriminant Analysis. Technometries, Vol 10, No 1, pp 1-11, 1968. 80. Johnson RA, Wichern DW: Evaluating classification functions, In Applied Multivariate Statistical Analysis. Prentice-Hall, Inc., Englewood C l i f f s , NJ, 1982, pp 485-493. 81. Borst H, Abmayr W, Gais P: A thresholding method for automatic c e l l image segmentation. J Histchem Cytochem, Vol 27, No 1, pp 180-187, 1979. 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.831.1-0085030/manifest

Comment

Related Items