Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Development, implementation and evaluation of segmentation algorithms for the automatic classification… MacAulay, Calum Eric 1989

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-UBC_1989_A1 M32_5.pdf [ 8.76MB ]
Metadata
JSON: 831-1.0085030.json
JSON-LD: 831-1.0085030-ld.json
RDF/XML (Pretty): 831-1.0085030-rdf.xml
RDF/JSON: 831-1.0085030-rdf.json
Turtle: 831-1.0085030-turtle.txt
N-Triples: 831-1.0085030-rdf-ntriples.txt
Original Record: 831-1.0085030-source.json
Full Text
831-1.0085030-fulltext.txt
Citation
831-1.0085030.ris

Full Text

DEVELOPMENT, IMPLEMENTATION AND EVALUATION OF SEGMENTATION ALGORITHMS FOR THE AUTOMATIC CLASSIFICATION OF CERVICAL CELLS  by  Cal vim E r i c MacAulay B.'Sc, Dalhousie University, 1982 M.Sc,  Dalhousie University, 1984  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE STUDIES Department of Physics  We accept this thesis as conforming to the required standard  THE UNIVERSITY OF BRITISH COLUMBIA August 1989 © C a l u m E r i c MacAulay, 1989  In presenting this thesis  in partial fulfilment of the  requirements for an advanced  degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department  or  by his or  her  representatives.  It  is  understood  that  copying or  publication of this thesis for financial gain shall not be allowed without my written permission.  Department of The University of British Columbia Vancouver, Canada •ate  DE-6 (2/88)  fr^//  / f t ?  ABSTRACT  Cancer of the uterine cervix i s one of the most common cancers i n women.  An e f f e c t i v e screening program f o r pre-cancerous and cancerous  lesions can dramatically reduce the mortality rate f o r this disease. In B r i t i s h Columbia where such a screening program has been i n place f o r some time, 2500 to 3000 slides of c e r v i c a l smears need to be examined daily.  More than 35 years ago, i t was recognized that an automated pre-  screening system  system  could  would need  greatly  to f i n d  images of these c e l l s  assist  people  i n this  and recognize stained  task.  cells,  Such a  segment the  into nucleus and cytoplasm, numerically describe  the c h a r a c t e r i s t i c s of the c e l l s , and use these features to discriminate between normal and abnormal research  and  performance  develop  new  cells.  The thrust of this work was 1) to  segmentation  to those i n the l i t e r a t u r e ,  methods  and  compare  their  2) to determine dependence of  the numerical c e l l descriptors on the segmentation method used, 3) to determine  the dependence  segmentation used,  of c e l l  and 4) to test  classification  accuracy  on the  the hypothesis that using numerical  c e l l descriptors one can correctly c l a s s i f y the c e l l s . The  segmentation  procedures  were  accuracies  examined.  of  I t was  32  found  different that  segmentation  the best  nuclear  segmentation procedure was able to c o r r e c t l y segment 98% of the nuclei of a 1000 and a 3680 image database. segmentation cytoplasm numerical  procedure  of the same cell  segmented c e l l .  was found  to c o r r e c t l y  1000 image  descriptors  S i m i l a r l y the best cytoplasmic  database.  (features)  were  segment  98.5% of the  Sixty-seven d i f f e r e n t calculated  f o r every  On a database of 800 c l a s s i f i e d c e r v i c a l c e l l s these - i i -  features  when used  i n a linear  discriminant  function analysis  could  correctly c l a s s i f y 98.7% of the normal c e l l s and 97.0% of the abnormal cells.  While some features were found to vary a great  segmentation  procedures,  the c l a s s i f i c a t i o n  accuracy  deal between of  groups  of  features was found to be independent of the segmentation procedure used. The c e l l u l a r c l a s s i f i c a t i o n accuracy was found to be very dependent on the  number  and  types  of  features  used  to  form  the discriminant  functions. The thesis that a computerized system can c l a s s i f y c e r v i c a l c e l l s at  least  This  as well as an experienced c y t o l o g i s t has been demonstrated.  result  requires  that  the system  can segment  r e l i a b l y recognize i n c o r r e c t l y segmented c e l l s .  - i i i -  cervical  c e l l s . and  TABLE OF CONTENTS  PAGE ABSTRACT  i i  LIST OF TABLES  vi  LIST OF FIGURES  vii  ACKNOWLEDGEMENT  ix  1.  INTRODUCTION 1.1  1  Status of Automatic Cervical C e l l Analysis Using Image Systems  2.  4  MATERIALS AND METHODS  6  2.1  Sample Preparation  6  2.2  Image A c q u i s i t i o n  7  2.3  Segmentation Methods  16  2.3.1  Simple 2D Histogram Analysis  18  2.3.2  Three Histogram Analysis  19  2.3.3  Threshold Selection Based on a Simple.Image S t a t i s t i c  22  2.3.4  Local Histogram Threshold Selection  27  2.3.5  Three Dimensional Thresholding  28  2.3.6  A S p l i t and Merge Procedure  29  2.3.7  Nuclear Radial Contouring  34  2.3.8  A Relaxation Process  35  2.3.9  An Edge Relocation Algorithm  40  - iv -  2.4  2.5  Cellular  Features  2.4.1  Markovian Testure Analysis  2.4.2  Discrete Texture Analysis  2.4.3  Post-Processing  Cell Classification  RESULTS 3.1  Segmentation Results  3.2  V a r i a t i o n of Features  3.3  V a r i a t i o n of C l a s s i f i c a t i o n  DISCUSSION and CONCLUSION 4.1  Segmentation Accuracy  4.2  Feature V a r i a t i o n  4.3  C e l l C l a s s i f i c a t i o n and Discriminating Power of Features  4.4  Conclusion  REFERENCES  LIST  OF TABLES  Page  Title  Discriminant Function C l a s s i f i c a t i o n  T a b l e 2:  Comparison o f Segmentation Procedures on a Database o f 150 Images  75  Comparison o f S e l e c t e d Segmentation on a Database o f 1000 Images  80  T a b l e 3:  T a b l e 4:  T a b l e 5:  Table 6 :  T a b l e 7:  T a b l e 8:  T a b l e 9:  T a b l e 10:  T a b l e 11:  T a b l e 12:  Table  68  T a b l e 1:  Procedures  Segmentation Performance o f a Simple 2D Histogram A n a l y s i s Followed by Two I t e r a t i o n s o f the Edge R e l o c a t i o n Algorithm Plus Postprocessing  81  V a r i a t i o n o f the Shape F e a t u r e s and Some Texture F e a t u r e s Among the 15 Segmentation Procedures  83  V a r i a t i o n o f D i s c r e t e Texture F e a t u r e s Among the 15 Segmentation Procedures  84  V a r i a t i o n o f Continuous Texture F e a t u r e s Among the 15 Segmentation Procedures  85  C l a s s i f i c a t i o n R e s u l t s f o r V a r i o u s Segmentation Procedures and Combination o f F e a t u r e s  97  C l a s s i f i c a t i o n R e s u l t s f o r V a r i o u s Segmentation Procedures and Combinations o f F e a t u r e s  99  F e a t u r e Importance i n D i s c r i m i n a n t F u n c t i o n Analysis  104  Combined Normal and Abnormal J a c k k n i f e C l a s s i f i c a t i o n R e s u l t s f o r V a r i o u s Segmentation Procedures and Combinations o f F e a t u r e s  115  Combined Normal and Abnormal J a c k k n i f e C l a s s i f i c a t i o n R e s u l t s f o r V a r i o u s Segmentation Procedures and Combinations o f F e a t u r e s  119  - vi -  L I S T OF FIGURES  Page  Title Figure 1:  Spectra of C e l l u l a r Stains  Figure 2:  The D i g i t i z e d RGB Cervical C e l l  Figure 3:  Interactive Segmentation of a Stained Cervical C e l l Using the Two Dimensional Histogram of Red and Blue Images  10  Figure 4:  Diagram of Major Components of the Modified C e l l Analyzer Imaging System  12  Figure 5:  Schematic Diagram of System  13  Figure 6:  Sony DXC-3000 3 Chip CCD Camera Light Intensity Response Curve  15  Figure 7:  Gradient-Weighted and Average Gradient Histograms  20  Figure 8:  Individual Pixel Threshold Assignment Using a Four Point Langrangean Interpolation  26  Figure 9:  Three Different Methods of Determining A Threshold  30  Figure 10:  An Example of the Nuclear Radial Threshold Selection Process  36  Figure 11:  An Example of the Effects of a Relaxation  Images of a Stained  Process on the Intensity D i s t r i b u t i o n of an Image  41 43  Figure 12:  Generation of Possible Edge Mask  Figure 13:  Erosion of Possible Edge Mask  Figure 14: Figure 15:  Results of the Edge Relocation Algorithm Determination of Cytoplasm to be used to Correct Nuclear OD Value  Figure 16:  Determination of Areas FA1 an FA2 for the F r a c t a l Dimension Calculation  60  Figure 17:  D i v i s i o n of Nucleus into areas of D i f f e r e n t Chromatin Condensation States  64  - vii-  45 46 53  Figure 18: A Two Dimensional Example of Group Separation Using Linear Discriminant Function Analysis Figure 19:  69  Optimal Separation Boundaries f o r Groups with Different Covariance Matrices  73  Figure 20:  Four Correctly Segmented Images  77  Figure 21:  Four Mildly Incorrectly Segmented Images  78  Figure 22:  D i s t r i b u t i o n of the Nuclear Area of Normal and Abnormal Cells Figure 23: D i s t r i b u t i o n of the Cytoplasmic Area of Normal and Abnormal C e l l s Figure 24: Figure 25: Figure 26: Figure 27: Figure 28: Figure 29: Figure 30:  87 88  D i s t r i b u t i o n of the NA/CA Ratio f o r Normal and Abnormal Cells  89  D i s t r i b u t i o n of the Nuclear IOD of Normal and Abnormal C e l l s  90  D i s t r i b u t i o n of the Compactness Ratio of Normal and Abnormal C e l l s  91  D i s t r i b u t i o n of the DNUM f o r Normal and Abnormal C e l l s  92  D i s t r i b u t i o n of the Markov Texture Feature Correlation f o r Normal and Abnormal C e l l s  93  D i s t r i b u t i o n of the Discrete Texture TARH f o r Normal and Abnormal C e l l s  94  Feature  D i s t r i b u t i o n of the Fractal Dimension Feature for Normal and Abnormal Cells  - viii -  95  ACKNOWLEDGEMENT I would l i k e to acknowledge the Dr. Haluk Tezcan's assistance i n the staining  and deposition of the c e r v i c a l c e l l s .  his suggestions whose  were always most h e l p f u l .  guidance and  Harrison  and  supervision  Steven Poon's  greatly appreciated.  Also thanks to Dr. B. Palcic  made a l l this  programming  Our discussions and  work  and hardware  possible. assistance  Alan was  F i n a l l y , I would l i k e to thank V e l Kinnie, Susan  Grose and Paddi Tieszen f o r their unestimable assistance i n the typing and presentation of t h i s  thesis.  - ix -  1.  INTRODUCTION  Cancer of the uterine cervix i s one of the most common cancers i n women.  The incidence rate of this cancer i s approximately 28 i n 100,000  women and the mortality rate can be as high as 15 per 100,000 women.  1  The incidence of pre cancerous lesions, which are believed to eventually 2  transform into cancerous cancerous  lesions, i s increasing.  I f pre-cancerous and  lesions are undetected or l e f t untreated, the mortality rate  upon age 65 ranges from .1% to 1%.  3  Where  an e f f e c t i v e  screening programme has been  i n place the 2  mortality rate can drop  to as low as 3 per 100,000 women per year.  Such a screening programme has been i n place i n B r i t i s h many years.  In B r i t i s h Columbia,  woman on average  Columbia f o r  a c e r v i c a l smear i s taken from every  once every 2 years from the onset of sexual a c t i v i t y 2  u n t i l age 35, and every f i v e years thereafter.  For a c e r v i c a l smear a  tissue sample i s scraped o f f the cervix and smeared onto a s l i d e .  This  material i s then fixed, stained and the c e r v i c a l c e l l s on the s l i d e are examined.  In examining  the s l i d e  the pathologist  or experienced  cytotechnicians are looking f o r abnormal c e l l s which do not exhibit the usual features of c e r v i c a l c e l l s . Changes i n the nucleus are the most important  c r i t e r i a used f o r  4  the  cytological  organization  diagnosis of c e r v i c a l  reflects  the c e l l ' s  cancer.  biological  The v i s i b l e status.  No  nuclear single  s t r u c t u r a l change i n the nucleus i s considered diagnostic i n i t s e l f . combination diagnosis.  4  of several  nuclear  abnormalities  i s necessary  A  for a  The  following  are some of the c e l l u l a r  changes used  i n visual  4  examinations: 1)  Nuclear hypertrophy or size: of the cytoplasm.  Usually compared with the size  A large nucleus with l i t t l e cytoplasm can  be indicative of an abnormal c e l l . 2)  Nuclear shape v a r i a t i o n : or  elliptical  When the nucleus i s no longer oval  but becomes i r r e g u l a r  i n shape.  Single or  multiple nuclear protrusions can be important. 3)  Hyperchromatin:  Usually indicative of increased amounts of  DNA i n the nucleus. 4)  Chromatin of  DNA  irregularity: in  parameters. for 5)  the  Changed, non-uniform  nucleus  measured  by  distribution  nuclear  texture  These give one of the most important  criteria  malignancy.  Multinucleation:  Multinucleation of the c e l l can r e s u l t i n  a convoluted nuclear shape. 6)  Nuclear  membrane  changes:  Indentation,  lobulation,  protrusion, and extensive wrinkling are important indicators i n the diagnosis of atypia. In  B r i t i s h Columbia  (population approx.  2.4 m i l l i o n ) , on average  2500 to 3000 of these s l i d e s are examined every working day.  This i s a  very labour intensive task, which only highly s k i l l e d cytotechnicians can  perform.  Therefore,  only  a  few countries i n the world  succeeded i n monitoring the population.  have  Even some developed countries,  e.g. U.S.A., Great B r i t a i n and others cannot manage this task due to the tedious nature of this work.  More than 35 years ago,  i t was  recognized  that an automated pre6  screening system could greatly a s s i s t people i n t h i s task. period excellent ideas components and been  and  algorithms  for the  6 7 8 9 10 developed. • > > > Numerous systems , r, . 81112 13 1 4 , •  e f f e c t i v e system which can  perform the  w i l l be  recognition of  cellular  the numerical description of the c e l l u l a r features have  tested for this purpose. •  way  During this  '  made possible by  '  I t now  have  been designed , , .  appears l i k e l y that a cost  required tasks  the high  and  i n an  automated  speed computational device  and  s o l i d state sensors developed i n this decade. An  automated  measurements  on  device  stained  which  cells  is  and  able  to  perform  discriminate  quantitative  between  normal  and  atypical c e l l s must perform the following tasks: 1)  f i n d and recognize stained c e l l s on the s l i d e ;  2)  segment the images of the c e l l s into nuclear and  cytoplasmic  areas; 3)  numerically describe the c h a r a c t e r i s t i c s (features) of the cells,  4)  and;  use these features to discriminate between normal and malignant c e l l s .  The humans  most  find  the  constituents. humans  find  difficult easiest:  The the  tasks  most  of  these the  tasks  to  recognition  which are  difficult  to  simpler  automate of  the  are  those  cells  and  to automate are  perform  reliably:  the  that their  the  ones  feature  extraction and the c e l l discrimination. The  thrust of this thesis i s to research  segmentation  methods,  l i t e r a t u r e , and  compare  their  and  performance  a f t e r finding the best,  develop several  new  to  the  those  from  test the hypothesis that using  quantitative c e l l u l a r descriptors (features) one can use these to match the performance of s k i l l e d cytology technicians. determine the dependence of the numerical images  upon  segmentation  accuracy  Doing this we want to  descriptors of the  and  the  dependence  cellular of  cell  c l a s s i f i c a t i o n upon the type and accuracy of the numerical descriptors.  1.1  Status  of Automatic  Cervical  Cell  Analysis Using  Imaging  Systems Since the introduction of the PAP smear to detect the early stages of c e r v i c a l cancer i n the 1950's, various researchers have undertaken to develop  a quantitative system  c e r v i c a l smears or c e l l s .  to perform  or a i d i n the  analysis of  In the late 1950's a group at the  Airbourne  15  Instruments  Lab developed the Cytoanalyzer,  to analyze c e r v i c a l smears.  the f i r s t system developed  Its screening performance was judged to be 16  inadequate  for  the  instrument  to be  practical.  An  outgrowth  of  16  Cytoanalyzer was Cydac  but this system was never used to automatically  screen c e r v i c a l smears.  The TICAS project started i n 1967  and depended  8  heavily on operator i n t e r a c t i o n screening device.  and was  However, i t was  never intended as an  automatic  intended to be used as an a i d i n the  17  f i n a l diagnosis i n hard cases.  I t also was used as a research tool to  investigate the parameters and decision c r i t e r i a that should be used i n an automated screening system.  This project has evolved into a system 18  for  rapid  high-resolution cytometry.  Following TICAS a number  of  19 2 0 2 1  interactive image analysis systems were developed 1980s, several groups were beginning automatically DIASCANNER  screen c e r v i c a l  (Swedish  group),  11  '  .  In the early  to b u i l d systems (prototypes) to  smears. the  '  Some of these  CYBEST  systems are  (Biomedical  the  Laboratories,  Japan),  22  the  BioPEPR  (Edinburgh, S c o t l a n d ) , CYBEST i s one  (Nijmegen University, Netherlands), 24  of  and the LEYTAS (Netherlands). the  most advanced  systems  CERVIFIP  23  13  that has  yet  been  developed f o r prescreening c e r v i c a l smears but has a f a l s e p o s i t i v e rate 22  of 30.7% The  and a false negative rate of 2% on a s l i d e by s l i d e basis.  DIASCANNER  system  is  the  culmination  of  eleven  analysis algorithm development, software implementation evaluation. the  11  As such  requirements  of  algorithms developed  an  i t i s one  years and  device.  image  statistical  of the devices closest to  automated prescreening  of  fulfilling  Most  of  the  for this work w i l l be compared with those defined  i n the body of work describing DIASCANNER.  6  2.  MATERIALS AND METHODS  2.1.  Sample  The  Preparation  samples were c o l l e c t e d from the  transformation  zone of  the 25  uterine cervix of 37  d i f f e r e n t subjects.  staining r e s u l t s generally overlapping  with  cytoplasm.  In addition,  DNA.  While  poor  this  standard Paponicolaou  i n clumps of c e l l s with many of the  colour  method  The  the is  separation nuclear the  between the  s t a i n i s not  one  routinely  and  the  stoichiometric  for  used  nucleus  cells  by  cytological  laboratories for the human interpretation of c e r v i c a l smears, i t i s a non-optimal procedure for automatic, quantitative assessment of c e r v i c a l cells. To extract meaningful features from stained c e l l s one needs s l i d e s optimized f o r quantitative measurements.  These s l i d e s should meet the  following requirements: 1)  that the s l i d e contains a r e l a t i v e l y constant number of c e l l s i n a monolayer with a low overlap rate among the cells;  2)  that there be a large detectable difference i n the spectral c h a r a c t e r i s t i c s (colour) between the cytoplasm and nuclei;  3)  the  and  that the nuclear s t a i n should be stoichiometric for DNA that quantitative measurements of the DNA  so  content of the  nuclei are possible. 2  Dr. Haluk Tezcan meets the  6 has developed a sample preparation method which  above requirements.  with a wooden spatula and  In this method the  suspended i n a c e l l f i x  sample i s c o l l e c t e d solution of 15%  PBS  ethanol and two  stage  dlthiothreitol.  The  syringing approach.  enrichment  (increases c e l l  before the c e l l achieved by  samples are disaggregated One  syringing takes  concentration  place before  i n suspension) and  deposition on a microscope s l i d e .  concentration.  The  cell  one  just  C e l l enrichment i s  centrifuging the c e l l s at high speeds and  them to the desired c e l l  by using a  then  dissolving  c e l l s are then  deposited  2 7  using a simple sedimentation the  cells  are  and smearing deposition method.  stained using  the  Feulgen-Thionin  Finally,  ( S 0 2 ) nuclear  stain  2 8  combined with the Orange (II) cytoplasmic s t a i n . These stains gave superior spectral separation when using a colour camera which we  employed i n this work.  The  absorption spectra of the  two stains are shown i n figure 1 and a three colour image of a stained c e l l i s shown i n figure 2 . the red region of the  The Orange (II) absorption i s very weak i n  spectrum where the  Thionin ( S 0 2 ) i s very strong. segmentation  absorption by  This enabled one  (recognition of the  extent  and  the  to perform  Feulgen-  the nuclear  location of the  nucleus)  task using the red image and the cytoplasmic segmentation using the blue Figure 3 shows how  image. slide  variation  of  these  this can be done i n t e r a c t i v e l y . stains  i s very  low  and  the  The  intra  inter  slide  20  v a r i a t i o n , while somewhat larger, i s s t i l l manageable.  2.2. All  Image A c q u i s i t i o n of  the  images used i n this work were acquired,  stored  and  3 o  analyzed  on  a modified  Cell  Analyzer  consists of four major modules: of a microscope and camera, this work),  3)  2)  1)  Imaging  System.  The  system  image a c q u i s i t i o n module consisting  microscope control module (not used i n  image processing module c o n s i s t i n g of a frame grabber  400  450  500  550  600  650 700  Wavelength (nm) FIGURE 1:  SPECTRA OF CELLULAR STAINS  In this work Feulgen-Thionin S0 nuclear stain and Orange II cytoplasmic stains were used. Their v i s i b l e absorption spectra d i f f e r as shown above. The Feulgen-Thionin S0 stain i s s t o i c h i o r e t i c for DNA, thus o p t i c a l density is proportional to DNA amount. 2  2  9  FIGURE 2:  THE DIGITIZED RGB IMAGES OF A STAINED CERVICAL CELL  These a r e images o f a s t a i n e d c e r v i c a l c e l l taken from three d i f f e r e n t p a r t s of the v i s i b l e spectrum. Image A) i s from the r e d p a r t o f the spectrum. Image B) i s from the green p a r t o f the spectrum. Image C) i s from the b l u e p a r t o f the spectrum.  10  FIGURE  INTERACTIVE  3:  SEGMENTATION OF  DIMENSIONAL HISTOGRAM OF The by  two d i m e n s i o n a l finding  red  (2D) h i s t o g r a m s  the frequency  and blue  images  having  is  represented  (high blob  b y how b r i g h t  intensity i n the  background  has  automatically) figure red  2 have  light,  been of  C).  of  points  D) .  in  been  both  middle  pixels as is  pixels  is displayed  spot  b. pair is  as a b r i g h t I n B)  intervention o f the c e l l blue  pixels  the not  shown i n  light,  but not  of points  and has  have  been  d i s p l a y e d i n the lower light  the  i n A) .  line  and appears  o f t h e 2D h i s t o g r a m .  and the r e s u l t  of  b  is uniformly bright  (human  a vertical  i n the  intensity,  i t appears  i n a n image  i n D ) , the corresponding  removed  images,  which absorbs  r e d and blue  generated  o f an r,  and the p o s i t i o n  The c o r r e s p o n d i n g  area  and a blue  t h e 2D h i s t o g r a m  The c y t o p l a s m , in C).  r,  the background  interactively  and the r e s u l t  absorbs  i n the upper circled  of  t h e 2D h i s t o g r a m  circled  is  Since  C ) a n d D) w e r e  i n t h e same p o s i t i o n  o f occurrence  the r e d and blue  and the c o r r e s p o n d i n g been r e m o v e d .  The n u c l e u s  again  spot  corner  circled  o f the c e l l  interactively of  been  interactively  a n image  the  hand  value,  the frequency  o f r and b .  i n both  right  appears  in  have  value)  lower  of a pixel  a red intensity  t h e two d i m e n s i o n a l h i s t o g r a m by the v a l u e s  shown i n A ) , B),  of occurrence  In  determined  A STAINED CERVICAL CELL USING THE TWO RED AND BLUE IMAGES.  This  removed  left as a  area  i n a n image i n the lower  corner cluster  has been  o f the c e l l left  corner  and  imaging  board,  and  4)  host  computer,  which  includes  storage  devices. The  microscope  used  i n this  work was a Nikon  Optiphot with a  PlanApo 40x (40/0.95) objective, a 100W halogen l i g h t with a s t a b i l i z e d power supply, a dispersion f i l t e r , a neutral color balance f i l t e r , and a IX video projection lens. When used with the 3-chip CCD video camera (Sony, DXC-3000) the corresponding p i x e l size was 0.34mm x 0.34mm. camera  provides  a  simultaneous  acquisition  of red (600nm),  This green  31 (540nm), and blue (460nm) images. Matrox image processing board.  The frame  grabber was an MVP-AT  Figures 4 and 5 show the lay out of the  system. The q u a l i t y of images as r e f l e c t e d by the s p a t i a l and photometric resolution  i s very important.  High q u a l i t y  images make segmentation  process simpler and more robust, and are also required for meaningful measurements of chromatin d i s t r i b u t i o n .  To achieve t h i s , the following  steps were performed: 1)  at the beginning of each  image  acquisition  session the  camera was calibrated to ensure the proper color balance of the  images  and that  the f u l l  photometric  range  of  the  d i g i t i z e r (256 gray levels) was u t i l i z e d ; 2)  30-50 images for each color were c o l l e c t e d and averaged to reduce the random noise i n the images; 32  3)  each image was decalibrated, of  view  without  any c e l l s  i.e.  the image of the f i e l d  or other objects  present was  subtracted from the image o f the c e l l s and an o f f s e t  added  to return the image background l e v e l to i t s pre-subtraction value, removing the effects of uneven illumination caused by  RGB A N A L O G U E MONITOR  T T L M O N O C H R O M E MONITOR  3- CHIP C C D CAMERA  II RGB  RGB  llllilllllll  IIIIIIRHHIinHllllllllllllllnllllllll  1  @  I  f-l J  KEYBOARD Figure 4:  MOUSE  MICROSCOPE  DIAGRAM OF MAJOR COMPONENTS OF THE MODIFIED CELL ANALYZER IMAGING SYSTEM  The major components of the system are: a Nikon o p t i p h o l microscope, a Sony DXC-3000 3-chip CCD camera, a PC AT microcomputer and add i n boards, an IBM monochrome monitor, and an RGB analogue monitor.  IMAGE  ACQUISITION  IMAGE  PROCESSING  RG CCD or C a m e r a CInIB D I m a g e C o n t r o l l er RnG Bor tCa enmse ifria ed M o i t A / D D i g i t a l I / P rface Buff. SoSlciadnSntearte ISnctae& H LUT^MUXlCn Contrnonlelrer 11 ODSP A L U I m a g e Microscope L U T / A C R T C ~ ~ f c > F r a m e D / A M U X M e m o r y M e m o r y X , Y , Z ( 1 M b y t e ) MicSrtoasgceope P r o c e s s o r 1 _ I _B —=?Fff uf-fer/_Contro Controller 1 LiSgthabtilizSedource I  ]  1r r  —7v  %  XT"  -H  i i  ~1  =>  *  I  H  ft  fr  80286/87  MICROSCOPE  CONTROL  HOST C O M P U T E R  Figure 5: SCHEMATIC DIAGRAM OF SYSTEM  the microscope  optics  and  the f i x e d pattern noise of the  camera. Using the above procedures, the background v a r i a t i o n was less than or equal to ±1 gray l e v e l . A fundamental requirement f o r the determination of the integrated optical  density  of n u c l e i  i s an  accurate measurement of the  optical  3 3  density of the individual pixels of the image.  While charge coupled 3 2  devices  (CCD)  are  notable  for  amplification/translation circuits that  they exhibit  a markedly  their  response  linearity,  i n the camera electronics  non-linear response  to l i g h t  the  are such intensity.  Therefore, a Kodak step tablet No. 3 was used to determine the camera's photometric response.  The response curve i s shown i n figure 6,  was  into  then  translated  a  look-up  table  (LUT).  This  which  LUT  was  subsequently used to correct individual p i x e l density measurements. Approximately  4700  RGB  images  stained  cervical  cells  were  c o l l e c t e d from 18 c y t o l o g i c a l l y normal (also not infected with the Human Papilloma Virus) samples and 19 dysplastic samples.  From the o v e r a l l  database two  subset contained  subset databases were also formed.  150 c e l l s which were more d i f f i c u l t to segment. more of the  following  staining a r t i f a c t s These  images  traits:  i n the c e l l s ,  were used  to  investigate  procedures used  on  or debris  to perform  various segmentation procedures. used  overlapping  the  the 150  One  They contained one or  and/or  folded  i n and around  a preliminary  accuracy  image database.  the  evaluation  A second database of 1000  segmentation  cytoplasm,  of  Also  cells. of  the  images was  several  of  the  examined were the  effects of the segmentation procedure on the calculated features, and on the c l a s s i f i c a t i o n of the individual c e l l s .  Due  to a f a i l u r e  of the  15  120  Camera Calibration Curve -r  0  50  100  150  200  250 300  Measured Intensity (gray levels) FIGURE 6:  SONY DXC-3000 3 CHIP CCD CAMERA LIGHT INTENSITY RESPONSE CURVE  Due to the n o n - l i n e a r response t o t h i s camera (and most other v i d e o cameras) the output o f the camera must be measured as a f u n c t i o n o f known i l l u m i n a t i o n . A Kodak step t a b l e t #3 was used to generate known l i g h t i n t e n s i t i e s and enable the c o n s t r u c t i o n o f the above graph. T h i s response curve was transformed i n t o a look-up t a b l e (LUT) so t h a t a l l subsequent i n t e n s i t y measurements were c o r r e c t e d u s i n g t h i s LUT.  back up media, 200 images of normal c e l l s were l o s t near the end of the analysis of the 1000  image database.  quoted  image database  for the  1000  Therefore, some of the results may  be  calculated  from  only  800  images, this has been estimated to have a n e g l i g i b l e e f f e c t on the f i n a l results  and  establish  conclusions.  the  The  accuracy of  full  4700 image database  the most appropriate  was  (accurate  used  to  and rapid)  segmentation procedure on a large set of images. The  150  image database consisted of approximately 120  normal c e l l s  and 30 images of abnormal  cells  images of  (CIN II or worse).  The  1000 image database consisted of images of 500 normal c e r v i c a l c e l l s and 500  images  of  abnormal  cells  (CIN  I  or  worse) .  A  cell  by  cell  c l a s s i f i c a t i o n for the f u l l 4700 image database was not performed.  2.3.  An stained  Segmentation Methods  adequate cells  segmentation of the areas of interest  i s an  meaningful c e l l u l a r  important  features.  prerequisite  be  non-  nuclear  objects  c h a r a c t e r i s t i c s of n u c l e i , nuclei  therefore  background  or  being  which  extraction  i n this work are  one  or  more  To  darker  perform  segmentation i s extremely important. and  difficult,  correct  but  segmentation  critical  of  the  i . e . absorbing l i g h t i n a s i m i l a r manner to than  quantitative  the  surrounding  measurements  morphological features of stained c e l l s i n an automated way,  precise  of  A r t i f a c t s are considered  display  significantly  cytoplasm.  the  The areas of interest  background, cytoplasm, nuclei, and a r t i f a c t s . to  for  i n images of  of  step i s this  For automated the  nuclei  process.  of  adequate  systems, i t i s the which  For  i s the  this  most  reason the  majority  of  the  n u c l e i from the  segmentation methods d e s c r i b e d  the more a priori  algorithm,  the  delineate  algorithm  knowledge t h a t i s i n c o r p o r a t e d i n t o a  more 3  automated  work  cytoplasm.  Generally, segmentation  in this  robust,  reliable,  and  accurate  the  4 A  becomes.  priori  knowledge  in  this  context  r e f e r s to i n f o r m a t i o n and c o n d i t i o n s assumed to e x i s t i n the image p r i o r to  i t s analysis.  The  methods as w e l l as  f o l l o w i n g chapter  the  type  and  discusses  extent  several  a priori  o f the  segmentation  knowledge  used  i m p l i c i t l y or e x p l i c i t l y by these methods. To  detect  and  delineate  objects  in  an  l o c a l i z e d photometric  non-uniformity  must e x i s t  values,  color values,  or  the  perceived  This non-uniformity One  method  uniform to  is  to  that  the  i n some measured p r o p e r t y .  search  assumed  for  to  separation  local  regions  separate  the  boundaries  are  of  areas  areas The  of  "significant" of  usually  i n one  interest  other,  o f the  "local"  How are  segmentation segmentation  the  terms  defined  by  "moderately", the  performance. algorithms  have  t h e s i s are d e s c r i b e d below.  Using been  the  In to  as  greatly above  developed.  two  non-uniformity the  ways.  which  latter  "edges"  "significantly",  algorithms,  measure.  "moderately"  g e n e r a l l y c h a r a c t e r i z e d by a " l a r g e " change i n some " l o c a l " property.  of  intensity  texture  are  form  complimentary method i s  interest. referred  some  i n e i t h e r the  some l o c a l i z e d  i n f o r m a t i o n can be u t i l i z e d assume  image,  case,  and  are  photometric "large"  determines  principles, Those  is  related  and their  several to  this  2.3.1.  Simple 2D Histogram Analysis  Critical  to  this  algorithm  different  wavelengths,  A  algorithm  assumes t h a t  the background  i n both area  images.  of  intensity  deviation  relative  to  l t  A,  the  of  with  the  cytoplasm and n u c l e u s .  red  and  t h r e s h o l d to  a  narrow  in  background's removed.  contribution  Finally,  the f i r s t  the m o d i f i e d h i s t o g r a m threshold  to  to  of  i s assumed  x  and  t o be  finite  and  that  between  the  background  the  histogram  o f the A  from of  the  the  small  trough,  image, as a  2  cytoplasm.  A  image  x  The  is  then  t r o u g h below the h i g h t r a n s m i s s i o n peak i n image i s found  nucleus  i n the  the cytoplasm has  troughs i n histograms  The  background,  A  x  and  used  image.  as  This  a  global  step  uses  and assumes  that  than does  a r e l a t i v e l y uniform absorption  over i t s e n t i r e a r e a , thus g e n e r a t i n g a peak i n the h i s t o g r a m . of  bright  and  image (e.g. red) the cytoplasm t r a n s m i t s more l i g h t  the n u c l e u s  at  This  distribution.  knowledge o f the s t a i n i n g p r o p e r t i e s d e s c r i b e d e a r l i e r i n the A  images.  t h i s method f i n d s the f i r s t  the  the Ax  d e l i n e a t e the  images  uniform  gaussian  intensity  Consequently  separate  two  blue  i s moderately  below the h i g h t r a n s m i s s i o n peak i n the h i s t o g r a m global  of  i n d i c a t e s t h a t the background i s an  gaussian  differences  collection  e.g.  2  Here, "moderately"  uniform  standard  and  i s the  as t h r e s h o l d s f o r segmentation  The  use  i s quite general 36  and  can be  found  i n almost  m u l t i t u d e o f methods e x i s t s of  histograms,  information.  most  of  any  text  on  image p r o c e s s i n g .  f o r generating a v a r i e t y  which  incorporate  some  However, a  of d i f f e r e n t  forms  of  a  types priori  2.3.2. This red  Three Histogram A n a l y s i s  a l g o r i t h m may  be  sequentially  applied  image when used as a c o l o r segmentation  to the  analysis  histograms.  36  histograms  Two  segmentation  histograms,  while  the  method  third  histogram  method.  calculates  H j ( i ) and  then  the  a l g o r i t h m o r a p p l i e d to o n l y  a s i n g l e image when used as a g r a y - s c a l e segmentation histogram  blue  H (i), 2  H (i),  three  are  three  different  gradient-weighted  i s an  3  The  average  gradient  37 histogram.  Examples o f these histograms  The  a  priori  knowledge  or  assumptions  about  t h i s a l g o r i t h m are t h a t the areas o f i n t e r e s t n u c l e u s ) are a p p r o x i m a t e l y u n i f o r m as  peaks  valleys  the  gradient  the  intensity  been suggested  In  histograms  the p i x e l s  i n areas  the  histogram, 38  the  valleys  of  the  used  cytoplasm  T h e r e f o r e they and  are  the  appear by  valleys  i n the  of large  in and  separated  infrequently  p r e f e r e n t i a l l y l o c a t e d on the edges o f o b j e c t s . from  images  a conventional histogram  v a l u e s which o c c u r  that  the  (background,  i n intensity.  weighted  i n these h i s t o g r a m s .  represent I t has  in  7.  are shown i n f i g u r e  image.  g r a d i e n t s are  I f they are e l i m i n a t e d  histogram  will  become  more  pronounced. The h i s t o g r a m s that  the  pixels  cutoff value,  H ( i ) and H ( i ) are c o n v e n t i o n a l histograms x  with  2  the  g r a d i e n t magnitudes  " c " , are n o t  between h i s t o g r a m s  larger  than  i n c l u d e d i n the h i s t o g r a m s .  The  a  except  specified difference  H ( i ) and H ( i ) i s t h a t the v a l u e o f " c " i s twice t  2  l a r g e f o r h i s t o g r a m H ( i ) as t h a t o f H i ( i ) , 2  as  hence H ( i ) i n c l u d e s more o f 2  the edge p i x e l s than does H ( i ) . 1  The v a l u e s o f the average as f o l l o w s .  gradient histogram H ( i ) 3  are c a l c u l a t e d  For each p i x e l o f i n t e n s i t y i , the g r a d i e n t magnitude v a l u e  i s c a l c u l a t e d and the average, g r a d i e n t f o r a l l p i x e l s o f i n t e n s i t y i , i s  20  Histogram 1  Histogram 2  Average Gradient  Histogram  •a C CT CO  o c CD  3 CT  J L  b/c  CD  >  <  b/c c / n  Gray-levels  FIGURE 7:  CD O) CO  b/c c/n  Gray-levels  Gray-levels  GRADIENT-WEIGHTED AND AVERAGE GRADIENT HISTOGRAMS  Gradient-weighted histograms are essentially conventional histograms ( d e p i c t i n g the number o f p i x e l s i n the image w i t h the same i n t e n s i t y value as a f u n c t i o n o f i n t e n s i t y v a l u e ) except t h a t p i x e l s w i t h a l a r g e r than predetermined g r a d i e n t magnitude a r e removed from the histogram. The d i f f e r e n c e between h i s t o g r a m 1 ( H i ( i ) ) and histogram 2 ( H ( i ) ) i s t h a t the pre-determined g r a d i e n t magnitude v a l u e i n H ( i ) i s twice as l a r g e as that o f H i ( i ) . The average g r a d i e n t histogram, H ( i ) , i s c a l c u l a t e d as f o l l o w s ; f o r each p i x e l of i n t e n s i t y i , the g r a d i e n t magnitude v a l u e i s c a l c u l a t e d and the average g r a d i e n t f o r a l l p i x e l s o f i n t e n s i t y i , i s then determined. 2  2  3  then determined.  The histogram  H ( i ) represents the average gradient 3  37 value of the pixels of i n t e n s i t y i . range  of  intensity  values  The peaks i n H ( i ) represent a 3  f o r which  the photometric  property  of  i n t e n s i t y varies " s i g n i f i c a n t l y " and consequently these i n t e n s i t y values should be i n the v i c i n i t y of a " l o c a l " edge. separate areas which are moderately  uniform  Assuming that the edges  i n intensity, the location  of the peaks i n H ( i ) should represent acceptable thresholds. 3  The  gradient  histograms  operator  involved  i n the formation  i s a modified Sobel gradient operator: •5  0  5  5  7  5  gradient magnitude -  •7  0  7  0  0  0  of central p i x e l  •5  0  5  •5  -7  -5  All median  histograms  filter  filter)  Tib/c,  (1x3 median  to remove  procedure  are smoothed  small  twice  filter)  (1)  by a one-dimensional  irregularities.  A  simple  (1-D)  (1x3 mean  valley-finding  i s used on each of Hx(i) and H ( i ) to f i n d the thresholds: i ) 2  into background and cytoplasm,  threshold  1/2  "1 2  and an ID mean f i l t e r  the threshold from histogram  subdivides  of a l l three  the image from  into  H ^ i ) which  H ( i ) which subdivides  the image  x  i i ) T b/c, the threshold from H ( i ) which 2  2  background and cytoplasm, subdivides  the image  i i i ) T ^ / c , the  into  cytoplasm  and  nucleus, and i v ) T n/c, the threshold from H ( i ) which subdivides the 2  2  image into cytoplasm determined  by  and nucleus.  finding  the largest  The thresholds T b/c and T n/c are 3  peaks  i n histogram  3  H ( i ) i n the 3  v i c i n i t y of T b/c and T b/c f o r T b/c, and i n the v i c i n i t y of T ^ / c and x  T n/c f o r T n/c. 2  3  2  3  From the three background/cytoplasm thresholds Tjb/c,  T b/c, and T b/c the median threshold i s the one used to perform the 2  3  actual  segmentation.  Similarly,  the c y t o p l a s m / n u c l e u s t h r e s h o l d I n the c o l o r analyzed  and  background  implementation f i r s t  the background/cytoplasm  threshold  from b o t h the r e d and b l u e images.  three  histogram  from  the b l u e image i s  i s used  i s used  to  t o remove  What i s l e f t  the background/cytoplasm  analysis  i s selected  group.  segmentation  image i s then a n a l y z e d and the  the median t h r e s h o l d  o f the r e d  threshold  separate  the  the  found  cytoplasm  by and  n u c l e u s i n the r e d image.  2.3.3. This  T h r e s h o l d S e l e c t i o n Based on a Simple Image S t a t i s t i c algorithm  segmentation,  the  cytoplasm  nucleus,  and  may  be  subdivision or  used of  color  to  a  perform  monochrome  single-image  segmentation,  the  into  background,  subdivision  images o f d i f f e r e n t c o l o r s i n t o t h r e e c l a s s e s ; background,  image  of  two  c y t o p l a s m and  3 9  nucleus.  I t i s based on K i t t l e r  and  Illingworth's  f o r m u l a t i o n o f an  i m a g e - s t a t i s t i c t h r e s h o l d s e l e c t i o n procedure which i s s i m p l e and w i d e l y applicable.  While  this  monochrome images i n t o  method  only  two  was  originally  developed  c l a s s e s , background  and  to  segment  the o b j e c t , i t  i s p o s s i b l e to extend i t to segment monochrome images i n t o t h r e e c l a s s e s o r t o segment t h r e e c l a s s e s from b l u e image and r e d image d a t a f o r c o l o r segmentation.  The  following  is a  short  description  of this  threshold  s e l e c t i o n algorithm. I f an image i s made up o f n * m p i x e l s , and the v a l u e o f the p i x e l i n row  x and column y i s d e f i n e d as s ( x , y ) then the g r a d i e n t  e(x,y) f o r p i x e l s ( x , y ) can be d e f i n e d as the g r e a t e r o f e i n the o r i g i n a l work: e  x  =  x  magnitude  o r ey, where  39  | s ( x - l , y ) - s(x+l,y) |  (2)  and e  - | s(x.y-l) - s(x,y+l)  y  |  (3) 3 2  However, for a more robust implementation e  = | s(x+l,y-l) + 2s(x+l,y) + s(x+l,y+l)  x  one can instead use: -  (4)  -  (5)  s ( x - l . y - l ) - 2s(x-l,y) - s(x-l,y+l) | and e  - | s(x-l,y+l) + 2s(x,y+l) + s(x+l,y+l)  y  s ( x - l . y - l ) - 2s(x,y-l) - s(x+l,y-l) | The threshold s e l e c t i o n algorithm defines the threshold, T, as: n-1 2 x-2  m-1 S y-2  n-1 S  m-1 2  x-2  y-2  [e(x.y) x s(x,y)]  T -  (6)  Equation the  and  image. the  (6) could be described as a quotient between the sum for  entire image of  pixel  e(x,y)  the m u l t i p l i c a t i o n  i t s intensity  value,  and  the  between the sum  gradient  of a l l gradients  of of  each the  Thus, p i x e l s that contribute s i g n i f i c a n t l y to the numerator of  equation  are  only  those  that have large gradients  l i k e l y to be a part of the edge of the object.  and  hence  are  This algorithm assumes  that the p i x e l s which are part of the background or the inner part of the objects are more uniform i n intensity than those at the edges of the objects.  D i v i s i o n by  the  averaging  a l l of the edge p i x e l values  value which best represents  total  gradient of the i n order  image i s a means of to arrive at a p i x e l  the average edge p i x e l i n t e n s i t y and hence  would be the best choice to divide the image into the background and the objects.  As  this  algorithm  background and the  and  above  form  the o b j e c t i n the image,  o b j e c t has  nucleus  i n the  two  o r more areas  l i g h t e r cytoplasm.  calculating  a  global  threshold.  In  that  that  intensities,  T h i s problem can be  the  but  by  here  surrounded by that  is  that  computing  image i s f i r s t  for  cytoplasm, and  i t i s u n l i k e l y that  stained  one  any  of  a  spatially  the  the  smaller  is  windows  cervical group  cell,  for  the  the  of  thresholds  boundary.  the  background/cytoplasm image can ignored  image  cluster  A  compared  local against  as  encompasses  into  are  to  two  distinct and  the  derived  reasonable.  global  other  Bright  zones i n the cytoplasm) can a l s o be d e t e c t e d and  t r u e edges. discarding  The any  recognizes  threshold  which  the  window in  the  detected  and  Similarly,  low  ( c a u s i n g dark  ignored.  windows w i t h  for  one  global  taken s i n c e some windows w i l l not encompass  algorithm window  the spots  easily  threshold.  stained  for  the  if  t h r e s h o l d s g e n e r a t e d by the edges o f o v e r l a p p i n g cytoplasm  Some c a r e must be  cytoplasm  groups:  from  determine  t h r e s h o l d s which are the  regions.  f o r example i n a  boundary  used  usually  cytoplasm/nucleus  threshold  is  thresholds  generate h i g h  when  the n u c l e u s ,  background/cytoplasm  cytoplasm/nucleus histogram  and  The  image such  For images t h a t e x h i b i t a c o n t r a s t between the background and cytoplasm  smaller  calculated.  nucleus  not  variable  partitioned into  cells  edges from b o t h the background/cytoplasm and  and between the  only  i . e . a dark  o n l y needs to p a r t i t i o n the  one  is  circumvented by  windows f o r which r e l e v a n t t h r e s h o l d s are i n d e p e n d e n t l y assumption  there  i t w i l l not p e r f o r m p r o p e r l y i f  of d i f f e r e n t  threshold, case,  assumes  the  m e a n i n g f u l edges denominator  any by  of  the  using  the  3 0  equation simple  (6)  image  i s not  large  statistic  enough.  algorithm  to  This  can  threshold  be  the  done by  denominators  of a l l  windows c o v e r i n g the image.  T h i s assumes t h a t the edge magnitude o f  background/cytoplasm boundary i s approximately magnitude o f true,  the  cytoplasm/nucleus  i t i s sometimes not  edge  of  the  boundary chooses  nucleus  i s weak. a  is In  denominator  images  example,  the  the  can  a l s o be  a p i x e l by  this  i s generally  i n some images  simple  image  static  is  high  and  too  used  to  the  background/cytoplasm  boundary are d i s c a r d e d (not  window t h r e s h o l d s  t h r e s h o l d on  While  while  which  thresholds f o r i n d i v i d u a l p i x e l s . the  For  intense  threshold  the same s i z e as the edge  boundary.  case.  very  these  c o n t a i n i n g the cytoplasm The  the  the  algorithm  the  windows  thresholded).  c a l c u l a t e appropriate  In t h i s a l g o r i t h m the method d e f i n e s  p i x e l basis using a four point  Langrangean  3 9  interpolation each p i x e l  among window t h r e s h o l d s  i s determined by  the windows n e a r e s t  a  the p i x e l .  threshold  at  the  threshold  of  Each window t h r e s h o l d i s assumed to  be  l o c a t e d i n the window c e n t e r and determined by  . For (blue)  the  the Xi image  (red) into  image, and two  for  which  classes:  implementation implementation.  i t was should  cytoplasm  originally be  The  the a l g o r i t h m  a l g o r i t h m must determine o n l y two task  8. this  o n l y background and  nucleus.  each t h r e s h o l d i s  l o c a t i o n under c o n s i d e r a t i o n ,  c o l o r segmentation a p p l i c a t i o n o f  i n c l u d i n g the  of  the weight a s s i g n e d  i n figure  image i s segmented i n t o  cytoplasm  weighted sum  i t s d i s t a n c e from the p i x e l  as i n d i c a t e d by the formula  8).  The  linear  (figure  background segments the and  nucleus.  algorithm,  object of i s then  this  more r o b u s t  and  Consequently,  accurate  than  the  2  from  remainder o f the In  A  interest,  removed  c l a s s e s f o r each p r o c e s s e d designed.  the  way  t  the  Image, the  X  the  color  monochrome  26  I I  To"  FIGURE 8 :  WINDOW A  WINDOW B  WINDOW C  WINDOW D  (a + b ) ( c + d)  [bdT  A  + bcT  INDIVIDUAL PIXEL THRESHOLD LANGRANGEAN INTERPOLATION  R  + daT  c  + caT ]  ASSIGNMENT  D  USING  j  A  Individual pixel threshold (Tp) assignment i s based on a i n t e r p o l a t i o n ( a c t u a l formula i s given i n the f i g u r e ) between thresholds ( T , T T and f ) o f the n e a r e s t f o u r windows. A  fi>  c  n  FOUR  POINT  four-point the window  2.3.4.  L o c a l Histogram T h r e s h o l d S e l e c t i o n  This algorithm i s similar to that of 2.3.3 i n that i t subdivides the  images  into  smaller windows and calculates  those windows which cover s i g n i f i c a n t edges. threshold  selection  i s performed  a threshold only f o r  However, the actual window  by a v a l l e y - f i n d i n g  conventional histogram of the window.  routine  on a  This algorithm assumes that the  areas of interest are r e l a t i v e l y uniform only l o c a l l y and that these are separated by sparse, non-uniform  areas.  Thus the algorithm does not  depend on the areas of interest being uniform over the entire  image.  The color segmentation application of this algorithm i s also similar to that of 2.3.3, however the l o c a l thresholds are calculated by the l o c a l histogram threshold selection algorithm. When algorithms 2.3.3 and 2.3.4 were used to perform monochrome segmentations, thresholds  i t was  found  were neither  However, these  neighborhood  of the best  histogram. searched previously  location  the calculated  consistently  thresholds.  starting  that  nor r e l i a b l y  cytoplasm/nucleus the best  thresholds were usually  thresholds and could thus  for a  restricted  search  possible  i n the immediate be used  employing  the  as the global  The smoothed ( f i l t e r e d ) global histogram of the image can be f o r the deepest determined  valley  i n the immediate  cytoplasm/nucleus  vicinity  thresholds  rather  of the than  throughout the histogram, saving time while also increasing accuracy. When used to perform color segmentation algorithms 2.3.3 and 2.3.4 usually segment the nuclei correctly, but occasionally miss most or part of  the cytoplasm.  For this  reason  a  separate  cytoplasm  color  segmentation method was implemented which could be applied independently of the nuclear segmentation method.  This i s described below (2.3.5).  2.3.5. This  Three-Dimensional Thresholding  method  defines  the  thresholding of a t r i v a r i a t e  cytoplasm  histogram.  using  The  a  three-dimensional  algorithm  i s very  similar 4  to  the  two-dimensional t h r e s h o l d i n g used by  assumes  that  three  images  at  different  E.  Bengtsson et  wavelengths,  A  al.  A,  1 (  0 , but  A,  2  are  3  a v a i l a b l e . I t i s assumed t h a t the h i g h e s t t r a n s m i s s i o n a r e a i n the image is  the  background  which  intensity distribution.  a l s o has  a narrow,  symmetric,  These are reasonable  sharply  peaked  assumptions f o r c a l i b r a t e d  images o f c e l l s d e p o s i t e d as monolayers. The the  intensity  location  one-dimensional  images i s found  of  the h i g h  t r a n s m i s s i o n peak  histograms  corresponding  denoted as  P,  and  P,  x  and  2  to P.  image i s c a l c u l a t e d i n the f o l l o w i n g manner. (filtered)  and  examined.  If a  trough  the A  3  A  i n each A,  l f  then  location  the  i s used as a t h r e s h o l d .  second  t h r e s h o l d i n g method  i n the  histogram  t o t a l number o f p i x e l s , t r a n s m i s s i o n peak P. p i x e l s N^(I) and  the  intensity  >  1.5N  is  value,  fall  I,  l a r g e s t value  This  method  not  between the h i g h  including  satisfied  the  this  found,  finds  than  the  the high  L  for  P >  I >  the  range  I n t h i s method P-5  t r a n s m i s s i o n peak  number  o f I f o r which N ( I )  image, p r o v i d e d  t h r e s h o l d i n g method i s used. the t h r e s h o l d .  then  A l s o c a l c u l a t e d f o r t h i s method are the number o f  t h r e s h o l d f o r the  N (I) L  The  between  can not be  which have i n t e n s i t i e s h i g h e r  whose i n t e n s i t i e s  i n t e n s i t y I. the  N,  I f a trough  is tried.  3  i s smoothed  the h i g h t r a n s m i s s i o n peak P and P minus 15 gray l e v e l s , P-15, trough  A  t h r e s h o l d f o r each  Each h i s t o g r a m  exists  and  2  of  pixels  with  > 1.5 N i s used  P-15. of  of  as  I f the c o n d i t i o n I  then  the  third  i s a r b i t r a r i l y used  as  The the  three  highest  thresholds  intensity  found,  values  i n each o f A  l f  A  and X  2  o f the 3D h i s t o g r a m  will  images, and  3  define  a box  around the h i g h t r a n s m i s s i o n peak which c o n t a i n s o n l y background p i x e l s . All  p i x e l s which have i n t e n s i t i e s  assumed t o b e l o n g to the cytoplasm  t o the background, and the remaining  thresholds are  p i x e l s to belong  Figure 9 depicts t h i s procedure.  and n u c l e i .  2.3.6.  l a r g e r than a l l t h r e e  S p l i t and Merge Procedure  In t h i s procedure the areas o f connected image p o i n t s photometric  property,  homogeneity  i s measured  determines  the  t h a t a r e moderately homogeneous i n some  e.g. by  intensity a  areas  values,  uniformity  interpretation  d e t e c t i o n o f uniform  o f i n t e r e s t a r e assumed t o be r e g i o n s  of  texture,  predicate.  "moderately  etc.  This  The  predicate  homogeneous".  i n images can be d i v i d e d i n t o t h r e e  local  The  different  approaches.*\ 1)  Region merging: small regions  the image i s i n i t i a l l y  constructed  o f many  ( p i x e l s ) which a r e merged so t h a t the image i s  c o n s t r u c t e d o f a few l a r g e r r e g i o n s ; 2)  Region s p l i t t i n g : split  a l a r g e r e g i o n , i . e . the e n t i r e image, i s  i n t o s m a l l e r and s m a l l e r r e g i o n s  regions s a t i s f y a uniformity 3)  the i n d i v i d u a l  criterion;  Region s p l i t t i n g and merging: two  until  a combination o f the p r e v i o u s  approaches. 42  This  approach was o r i g i n a l l y  incorporates  the p r o c e d u r a l  used by Horowitz and P a v l i d i s  modification  suggested  by  Cheevasuvit  and et  4 3  al.  A quick  summation o f t h i s procedure i s as f o l l o w s .  image and s ( i , j )  the i n t e n s i t y v a l u e  o f the p i x e l  located  L e t X be an i n position  30 P  100 o  80-  53  60-  53  40-  T  P-15  200250  200  150  100  50  Intensity (gray levels) >->  80-  o 53  60-  5-  40-  <D  P-15  PT  100-n  200200  150  100  50  Intensity (gray levels)  250  200  Intensity (gray levels) FIGURE 9:  THREE DIFFERENT METHODS OF DETERMINING A THRESHOLD  In segmentation method 2.3.5, the thresholds for the individual colors may be determined by three d i f f e r e n t procedures shown i n A), B) and C). A l l three procedures need to f i r s t find the position of two features i n the histograms, one the background peak location, P, and two the background peak location minus 15 gray levels, P-15. In A) the existence and location of a v a l l e y between P and P-15 determines the threshold value. In B) the background peak is not symmetric and the threshold is . positioned where the peak asymmetry becomes larger than some pre-determined amount (see text). In C) no valley exists between P and P-15, and the peak i s symmetric, thus the segmentation method 2.3.5 decides that the threshold should be located at P-5, a value which was h e u r i s t i c a l l y determined.  Let  B  connected  be  n  the n - t h connected  pixels  i n X which share  subset  n  uniformity  -  True  i f and  i . e . Bn  i s a group  some common p r o p e r t y .  p r e d i c a t e P can be d e f i n e d f o r any subset B P(B )  o f X,  only  i f the  n  of  The u n i f o r m i t y  o f the image X as:  image  subset  B  fulfills  n  the  criteria. - false  otherwise 42  The  u n i f o r m i t y c r i t e r i o n used  i s t h a t o f Horowitz  and  Pavlidis  which r e q u i r e s t h a t IsdjL.Ji) for  - s(i ,j )| 2  uniform,  states  t h a t the  than e o v e r the e n t i r e The  i  split  and  that  The  n  B  n  a l l subsets,  intensity variation  B,  of  n  merge procedure  w i t h i n each  image  subset  X  is  be less  regions  of  provides  a  segmentation  f o r which  met: connected  c o m p l e t e l y make up the  pixels,  image X,  B ,  when  n  taken  together  i . e . t h e r e are no p i x e l s  X which are not i n one o f the s u b s e t s 2.  the  subset.  the f o l l o w i n g c o n d i t i o n s are 1.  7  2  simply  such  <>  (i,j),and (i .J2)  a l l pixels This  < e  2  in  B . n  None o f the r e g i o n s o f connected p i x e l s , B ,  o v e r l a p o r more  n  f o r m a l l y , the i n t e r s e c t i o n o f subset B  x  w i t h By i s the empty  set. 3.  For a l l the p i x e l s  i n each subset,  B, n  (7)  is  which  are  f o r some o f  the  the e q u a t i o n  true. 4.  I f B„ next  and  B„ are two  t o each o t h e r ,  pixels  i n the  pixels of B .  r e g i o n s o f connected equation  r e g i o n made up  (7)  pixels  i s false  o f the p i x e l s  of B  x  plus  the  The  a c t u a l procedure f o l l o w e d by the a l g o r i t h m i s :  1.  Partition  the  b l o c k s e.g. 2.  Evaluate  3.  Merge  image  8x8  into  a  regular  array  of  large  square  pixels.  the u n i f o r m i t y p r e d i c a t e f o r each b l o c k .  adjacent  satisfy  the  sequence  used  blocks  which  uniformity is  not  individually  predicate  straight  and  (the  forward  collectively  actual and  merging  is  described  below). 4.  S p l i t a l l b l o c k s f o r which the u n i f o r m i t y p r e d i c a t e i s f a l s e into four smaller blocks.  5.  I f blocks  are  now  the  size  of  single pixels  go to step 2 and  s t o p , otherwise  do  step  3  and  continue.  Each square b l o c k f o r which the u n i f o r m i t y p r e d i c a t e has j u s t been evaluated  has  four  adjacent  neighbor  regions.  The  standard  merging  4 2  procedure  does  not  check  a l l the  neighbors  determine the o p t i m a l merging o f r e g i o n s .  The  of  a  region  union  o f each  (block)  to  neighboring  r e g i o n ( f o r which the u n i f o r m i t y p r e d i c a t e i s t r u e ) w i t h the b l o c k under c o n s i d e r a t i o n s h o u l d be e v a l u a t e d and  the g r o u p i n g  o f r e g i o n s , f o r which  the u n i f o r m i t y c r i t e r i o n i s the s m a l l e s t , are merged. to  the p r e v i o u s l y d e s c r i b e d merge procedure 43  merging. is  The  optimized  competition  improves the  above m o d i f i c a t i o n even w i t h only  between  segmentation p r o c e s s ,  locally  and  adjacent e.g.  not  the s p l i t  globally.  subsets,  This modification  a  a slight variation  and  Thus,  small  q u a l i t y of  the  merge method due  to  the  variation  in  the  point  or  i n the  starting  43  the s i z e o f e can produce v e r y l a r g e d i f f e r e n c e s i n the Another m o d i f i c a t i o n to by  Hassman and  Liedtke,  4 1  the  i s to  split bias  and the  result.  merge procedure, merging procedure  suggested such  that  blocks are merged p r e f e r e n t i a l l y with larger, older neighboring regions as opposed to smaller, younger neighboring regions. The split  determination  of e  and merge algorithm.  greatly  the  performance  of  the  A straightforward determination of e,  suggested by Hassman and Liedtke* peaks found  affects  1  as  i s based on the number of s i g n i f i c a n t  i n the i n t e n s i t y histogram  of the image.  requires that the spacing between peaks be constant.  This algorithm Thus a piecewise  l i n e a r gray l e v e l transform must be applied to the image such that the spacing  between  the  peaks becomes constant  and  the  gray  levels  adjusted such as to cover the f u l l eight b i t dynamic range. levels  i n between the newly s h i f t e d peaks are  linearly  The  are gray  interpolated.  The value of e, i s then calculated to be 256 - c e -  (8) Numpeaks  where Numpeaks - number of s i g n i f i c a n t  peaks, and  c is a  corrective  term, the value of which i s determined experimentally. This  split  becoming one  and merge routine usually  connected  results  i n the background  region and the cytoplasm and nucleus forming a  c o l l e c t i o n of connected regions which cannot be merged without v i o l a t i n g the uniformity predicate. to be unduly coarse.  Also, the region boundaries have a  The l a t t e r can be reduced by replacing each region  i n the image by the average 44  a relaxation process  tendency  i n t e n s i t y for that region and then applying  4S  '  to force the c o l l e c t i o n of connected  to belong to either the nucleus or cytoplasm.  The  regions  relaxation process  w i l l also smooth out the coarse boundaries of the various regions. actual relaxation process i s described i n 2.3.8.  The  Without complicated  using  involved  r e l a x a t i o n process,  scale  image  split  and  i n t o background,  merge p r o c e s s .  method uses the to  an  cytoplasmic  When u s i n g  r e d image t o f i n d  2.3.7.  Nuclear  algorithm,  segmentation r e s u l t are  used  to  improve  some g e n e r a l  well  from one the  a  much  more  to segment a s i n g l e gray  and  nuclear  areas  c o l o r images, the  the n u c l e a r  areas  using  split  and  the  and  merge  the b l u e  image  Contouring as  a l l subsequent  o f the  to the  which the segmentation i s t o be  methods,  require  p r e v i o u s l y d e s c r i b e d methods.  d e l i n e a t i o n of  i n f o r m a t i o n as  and  cytoplasm.  Radial  as  scheme  i t is difficult  i d e n t i f y the background and  The  labelling  They  the  nuclear  and  l o c a t i o n o f the n u c l e i f o r  size  area.  a  A l l require  improved. 46  c o n t o u r i n g method as d e s c r i b e d by Bengtsson e t al.  A radial  be u s e d to r e f i n e the segmentation o f the n u c l e i . area  of  interest  segmentation. information  has  already  been  roughly  I n t h i s procedure  defined  t o p o l o g i c a l knowledge  about  the  i s accomplished  defined nucleus. coordinate coordinate  radial vectors. filter. are  the  This  system array  The  by  and  to  an  The  filter  point the  finding C  of  is  center, as  C,  the  transformed  from the  nucleus  center.  such  of  as  the  origin from of  image i s then f i l t e r e d u s i n g that  nuclear  the  the shape o f n u c l e i .  a p r e s c r i b e d number  o r i e n t a t i o n i s such  filter.  the  i s then used  image  array  radial  same d i s t a n c e  c i r c u l a r averaging  first  a  the  threshold level  nuclei,  l o c a t i o n o f the c e n t e r o f the i n d i v i d u a l n u c l e i and This  using  T h i s p a r t i c u l a r a p p l i c a t i o n uses p r e v i o u s  and  may  of a  roughly a  cartesian  evenly a 1x3  i t averages Thus t h i s  polar  spaced average  p i x e l s which filter  is a  From t h i s of  intensity  new  image i s o d e n s i t y c o n t o u r s a r e g e n e r a t e d f o r a  levels  c e n t e r e d around  n u c l e a r segmentation.  F o r each i n t e n s i t y  contour  i s g e n e r a t e d by  centre)  location  level  being  c o n n e c t i n g the  a l o n g each  analyzed.  absolute  radius  The  c o n t o u r i s determined. the  the t h r e s h o l d used by  first  that  relative  level  the  examined, an (closest  i s lighter smoothness  to  original  isodensity  the  than the of  range  nuclear  intensity  each  isodensity  R e l a t i v e smoothness i s c a l c u l a t e d as the sum  differences  in  radial  length  between  each  pair  of of  s u c c e s s i v e l o c a t i o n s a l o n g the i s o d e n s i t y c o n t o u r d i v i d e d by the average radial  length  small  for  circular average on  the  of  contours  contour. which  i n the c a r t e s i a n radial  size  length  contour  this  constant  coordinate  smoothness radial  image.  as  The  i n t e n s i t y value,  the  new  threshold  coordinate  which  are  n o r m a l i z a t i o n by  the  from  dependent  the  previously  the smoothest  isodensity  i s used  to  segment  the  example  o f some o f  the  r e l a x a t i o n p r o c e s s t o modify  the  image.  An  s t e p s i n t h i s a l g o r i t h m a r e shown i n f i g u r e  10.  R e l a x a t i o n Process  T h i s method uses a p r o b a b i l i s t i c  n u c l e a r segmentation o f one o f the p r e v i o u s methods and I t a l s o a selected is  threshold.  derived  be  i.e.  i s to make the smoothness measure l e s s  cartesian  2.3.8.  measure w i l l  distance,  The  o f i n t e n s i t y v a l u e s , which has  i s chosen i n the  Thus,  have  o f the n u c l e u s .  s p e c i f i e d range  nuclei  the  from  the  expressed  i n the  form  cytoplasm  and/or  The  a priori  results of a  background  of  i n f o r m a t i o n used by the  previous  probability (thereafter  that  called  a  this  requires algorithm  segmentation pixel  belongs  background),  and  is  to  the  or belongs  to the n u c l e u s , o r b e l o n g s t o the background b u t i s a l s o a d j a c e n t t o the  36  FIGURE 10:  AN EXAMPLE OF PROCESS  THE  NUCLEAR RADIAL  CONTOURING THRESHOLD  SELECTION  The n u c l e a r r a d i a l c o n t o u r i n g a l g o r i t h m i s used to r e f i n e the segmentation o f the n u c l e u s , thus a rough n u c l e a r mask i s assumed to have been generated previously. T h i s rough mask i s used to f i n d the approximate c e n t e r o f the nucleus i n image A) and a r a d i a l t r a n s f o r m image c e n t e r e d on t h i s p o i n t i s generated and shown i n B ) . I n the r a d i a l t r a n s f o r m image the nucleus i s seen as a dark s t r i p a l o n g the l e f t s i d e o f B ) . The roughness (see t e x t ) o f i s o d e n s i t y c o n t o u r s a t v a r i o u s i n t e n s i t i e s i s c a l c u l a t e d and can be d i s p l a y e d as h i s t o g r a m C) where h e i g h t i s p r o p o r t i o n a l to roughness. The i n t e n s i t y l e v e l w i t h the lowest roughness v a l u e , the deepest trough i n the histogram, i s used as a t h r e s h o l d to segment the n u c l e u s o f image A ) . The r e s u l t i n g n u c l e a r segmentation i s shown as the dark a r e a i n D).  nucleus, or to the nucleus but i s also adjacent to the background. algorithm  assumes  that a  pixel  which  has  bright  neighboring  should i t s e l f be bright and s i m i l a r l y a p i x e l which has dark  This pixels  neighbors  should i t s e l f be dark. The relaxation process used i n this work i s very s i m i l a r to that 46  et  al.  described by  Rosenfeld  suggested by  Peleg et a l . * *  c l a s s i f y a set of A^, case, we  wish  image) pixels  48  47  '  '  with  some of the modifications  P r o b a b i l i s t i c relaxation can be used  . . .P^  objects into m classes C^,  to c l a s s i f y n  ...C .  In our  m  (where n i s the number of pixels  into only 2 classes: background and nucleus.  to  i n the  Thus the  process described below i s for a subset of two classes of a more general case. Each object p i x e l  has a l o c a l measurement of Intensity i  x  which  can be used to estimate the probability, P (C^) or P ( C ) , of the object X  A  x  belonging to the class  X  or the class C .  2  The p r o b a b i l i t y of an  2  object belonging to one of the two possible classes unity.  or C2 must equal  Thus P (d)  - 1 - P (C ) for a l l A ^  x  The  X  1  (9)  X  relaxation process i s an algorithm which i t e r a t i v e l y updates  the p r o b a b i l i t i e s of an object either  and 0 <P (C,)^1.  2  or  2).  These  belonging to class Cy (where y can be  probabilities  are  updated  using  a  set  of  calculated compatibility c o e f f i c i e n t s r(A ,Cy, Aj ,C j), which vary from 1 ,  x  to  J  (  -1. The  compatibility  coefficient  r(A ,Cy;A ,C ) x  b  d  reflects  compatibility of p i x e l A  x  belonging to class C^.  In this case we are interested only i n nearest  neighbor  interactions,  being assigned to the class Cy and p i x e l  the  thus  r() - 0  f o r a l l non-neighboring  pairs  A^  of  pixels  i . e . r()=0  o n l y the  i f pixel  8 pixels  A^  i s not next  touching a p i x e l  to p i x e l  A .  In  x  are c o n s i d e r e d to be  the  image  neighboring  pixels. The  compatibility  compatible  class  coefficient  assignments,  c l o s e to minus one  (A^  and  should  be  A  both  are  Q  f o r incompatible c l a s s  q  - I  x y  8 Thus, q ^ x  and  q  8  2  2  2  to c l a s s  x  i  x  Z  +  i  (  C  y  the average  to c l a s s C  d  b  value of r ( ) f o r  i s calculated:  y  (10)  d  A^  updated  to a s s i g n i n g A^  (nucleus).  two  2  +  1  x these  x y  's  to c l a s s  In the  1  are  calculated,  (background)  g e n e r a l case,  the  and  updating  w<**v P  (C ) K  q  as  .  )  2  to be  Z x  (C )(l q j  )  z  +  x j  thus f o r our case o f 2 c l a s s e s , t h i s i s reduced  In  pixels)  48  E  P  background  for  b=l d=l  process i s d e f i n e d  P  b  corresponding  assigning A  one  rCA^Cy A , C ) P ( C )  f o r each p i x e l  x 2  to  assignments.  In the p r o b a b i l i t y u p d a t i n g procedure each o f the p o s s i b l e assignments o f  close  y'  y  1 P  two  +  Z x  (C )(q 1  equations  to:  7  - q  Z x l  P ()  Z x 2  ) + q  x 2  represents  z  (12)  Z  the  classification  z+1 p r o b a b i l i t y f o r the z t h i t e r a t i o n o f the u p d a t i n g p r o c e s s and P c l a s s i f i c a t i o n p r o b a b i l i t y f o r the z+1 One ways.  can  estimate  way  was  One  approximate  the  the  initial  iteration. probabilities 4 8  suggested by R o s e n f e l d image's  histogram  Gaussian p r o b a b i l i t y d i s t r i b u t i o n s \  () the  by (one  and a  P  o  i s used  linear  (C )  in  various  i n t h i s work to  combination  f o r each c l a s s ) and  of  two  then d e f i n e  P  x  (C-^) and P  (C2) as the p r o b a b i l i t y of a gray l e v e l belonging to one  x  or the other of the Gaussian d i s t r i b u t i o n s . The value of r ( ) can also be estimated  i n a number of ways.  In 4 5  this instance the method used was r  ( , C y ; A , C ) - log-^Q b  that suggested by Rosenfeld et  [prob(object x i n class y  d  and  al.  object b  class d)/((prob(object x i n class y)*prob(object b i n class d))]  in  (13)  I f one class of objects i s prominent (makes up most of the image), using the above value of r ( ) can lead to a non-informative  segmentation  44  (entire  image  becomes  one  class).  Peleg  al.  et  suggested  a  modification which corrects for this problem. r * ( A , C ; A ,C ) = r(A ,C ;A ,C )[l-P °(C )][l-P °(C )] x  y  b  Unfortunately,  d  z  y  b  d  x  y  b  (14)  d  this has a tendency to make the s e l f support  term  (y-d)  for rare classes very large which i s usually an undesirable e f f e c t . We  have  found  that  the  following equation  produces  reasonable  results for the majority of the images used i n this study. r*(A ,C ;A ,C ) - r ( A , C ; A , C ) [ l - l / 2 ( p r o b ( o b j e c t x i n class y x  y  b  d  x  y  b  d  and object b In class d))] Further the values of r () are normalized  (15)  such that the largest  value of r () has an absolute value of unity. The  exact  d e f i n i t i o n of r ( ) does not ^  -• „  practice according to Rosenfeld et  seem to matter greatly i n  , 4 5 4 8  al.  '  To summarize, this method assumes that i f a p i x e l ' s neighbors are predominantly of one class, the p r o b a b i l i t y that the c e n t r a l p i x e l also belongs to that class should be increased. is  dependent upon the  certainty neighbors.  (probability The  The amount of the  compatibility c o e f f i c i e n t s , of belonging  to one  and  class or  increase  the number  and  another) of i t s  i t e r a t i v e p r o b a b i l i t y modification continues  until  90%  of the p i x e l s are  unambiguously defined  once class or another >—90%).'  (probability of belonging  to  Figure 11 demonstrates the e f f e c t s of  this procedure on the intensity d i s t r i b u t i o n of the image.  2.3.9. This along  the  Edge Relocation Algorithm segmentation  method generates  edge of an area  of interest,  roughly defined previously. edge  relocation algorithm  nucleus  and  the  The are  a priori the  a e.g.  contour  nucleus,  precisely  which has  been  information u t i l i z e d by  intensity  cytoplasm/background,  closed  difference between  connectivity  of  the  boundary gradient magnitude information along the edge of the  the the  nucleus, nucleus,  size of the nucleus, edge connectivity information, and the approximate location of the nucleus i n the image. The edge relocation algorithm requires as input: 1)  the image for which the segmentation of the nucleus i s to be refined;  2)  the nuclear segmentation which i s to be refined (to be known as the roughly segmented nuclear mask);  3)  the gradient transform of the image to be segmented.  Gradient operators  tested were Sobel operator,  40  SO  a 3x3 Range f i l t e r ,  s1  and the Kirsch operator. The nuclear  algorithm segmentation  uses nuclear in  the  boundary information  following  fashion.  The  from the input  input nuclear  segmentation boundary, (the p i x e l s i n the nucleus which have non-nuclear neighbors) i s d i l a t e d several times, i . e . any p i x e l touching a boundary pixel  becomes part  of  the  dilated  boundary.  The  actual number  d i l a t i o n s used should be matched to the magnification and  of  size of the  B)  C)  o  o  c  G <u  tu  3  3  cr ai  cr a) u  C/N T h r e s h o l d Gray L e v e l s  F i g u r e 11:  u  C/N T h r e s h o l d Gray L e v e l s  C/N T h r e s h o l d Gray L e v e l s  AN EXAMPLE OF THE EFFECTS OF A RELAXATION PROCESS ON THE INTENSITY DISTRIBUTION  OF AN IMAGE  The r e l a x a t i o n p r o c e s s i s used t o i n c r e a s e the c o n t r a s t b e t w e e n t h e cytoplasm and t h e n u c l e u s . This enhances the a b i l i t y o f the o r i g i n a l t h r e s h o l d generated by one o f the primary segmentation p r o c e d u r e s t o c o r r e c t l y segment the n u c l e u s . F i g u r e A) i s the o r i g i n a l h i s t o g r a m o f an image a l o n g w i t h the t h r e s h o l d s e l e c t e d by one o f the p r i m a r y segmentation p r o c e d u r e s . F i g u r e B) shows the histogram o f the image a f t e r 6 i t e r a t i o n s o f the r e l a x a t i o n p r o c e s s . F i g u r e C) shows the h i s t o g r a m o f the image a f t e r 20 i t e r a t i o n s o f t h e r e l a x a t i o n process. Note how narrow the peaks i n the h i s t o g r a m are and how l a r g e the v a l l e y between the peaks i s . Thus any t h r e s h o l d w i t h i n the l a r g e v a l l e y w i l l now c o r r e c t l y segment the n u c l e u s whereas i n A) the range f o r a c o r r e c t t h r e s h o l d i s much s m a l l e r . 4>  objects being  segmented.  I t was found  d i l a t i o n s were adequate. formation  i n the r o u g h l y  larger  than  segmented  a given threshold.  i n a number o f ways.  calculating  analysis  pixels such  i n the r o u g h l y their  have  intensity  i n the n u c l e u s .  pixels.  does  o f the p o s s i b l e  A more complex procedure  o f the n u c l e a r  The t h i r d  different  parts  Intensities,  t h r e s h o l d as a f u n c t i o n  mask,  step  mask below  not a l t e r  edge  r e l a t i v e to  the t h r e s h o l d which i n c l u d e s a s e t  segmented n u c l e a r  removal  values  The a c t u a l t h r e s h o l d may be determined  and s e l e c t i n g  topology)  The  which  o f the h i s t o g r a m  o f the n u c l e a r  that  nucleus  the number o f p i x e l s below a g i v e n  o f the t h r e s h o l d v a l u e proportion  i s denoted as the  F o r example one c a n use a s e t v a l u e  range o f i n t e n s i t y v a l u e s include  i n the  The second s t e p i s t o a l s o i n c l u d e i n t h i s mask a l l  pixels  would  step  the image l o c a t i o n s which must be  t o f i n d the r e f i n e d n u c l e a r boundary and thus  p o s s i b l e edge mask.  the  The d i l a t e d boundary i s the f i r s t  o f a mask which r e p r e s e n t s  analyzed  t h a t f o r the 40X o b j e c t i v e two  i s t o remove a l l a given  the c o n n e c t i v i t y  i . e . the p i x e l s  o f the p o s s i b l e edge mask t o g e t h e r  threshold (Euclidian  which  connect  cannot be removed.  t h r e s h o l d used i n t h i s s t e p i s independent o f t h a t d e s c r i b e d i n the  second s t e p . Those areas  which  a r e i n c l u d e d i n the r o u g h l y  mask b u t do n o t b e l o n g interior areas  t o the edge mask  o f the n u c l e u s )  to  determined.  refill  the  Figure  p o s s i b l e edge mask.  12  a r e saved as they  nucleus shows  once an  the  example  (they will exact  segmented  are part  nuclear  o f the dark  l a t e r be used as nuclear  seed  boundary  o f the d e t e r m i n a t i o n  is  of a  43  FIGURE 12:  GENERATION OF POSSIBLE EDGE MASK  A) Roughly segmented and l a b e l l e d n u c l e a r mask o f the image shown i n F i g u r e 14A. B) D i l a t i o n o f the boundaries o f image A. C) I n c l u s i o n o f the l i g h t areas o f the nucleus t o the p o s s i b l e edge mask i n B. T h i s i s the r e s u l t o f the second step as d e s c r i b e d i n the t e x t . D) E x c l u s i o n o f the dark areas o f the nucleus from the p o s s i b l e edge mask C. T h i s i s the r e s u l t o f step three which i n c l u d e s the requirement that a p i x e l removal does not a l t e r the topology o f the p o s s i b l e edge mask. The dark area i n image D i s the p o s s i b l e edge mask.  Once  the  possible  c o n d i t i o n a l l y eroded.  edge  mask  has  been  determined  i t is  The c r i t e r i a f o r the erosion of a p i x e l from the  possible edge mask are: 1)  The gradient  magnitude of the corresponding  pixel  i n the  gradient magnitude image must be below a given threshold. 2)  When processing the image from l e f t to right, top to bottom, the p i x e l s immediately to the l e f t and above the p i x e l under consideration must not have been j u s t removed.  This ensures  that i n an area of s i m i l a r gradient values, the edge w i l l be located i n the middle of the area and not along one of the extremities. 3)  The  removal  of  the  pixel  must  not  change  the  connectivity/topology of the possible edge mask. In the beginning of the erosion process, a low gradient  threshold  i s selected and the possible edge mask i s eroded u n t i l no more p i x e l s can be eroded f o r that threshold. and the erosion process can be constant  Then the threshold i s r a i s e d s l i g h t l y  i s repeated.  The amount i t i s raised each time  or variable; i n this work we used the l a t t e r approach.  The amount the threshold  was  raised each time was  chosen  so that  a  constant number of p i x e l s i n the possible edge mask would have gradient values  less than the new threshold.  gradient  threshold  gradient image.  This process  continues  reaches the maximum gradient value  u n t i l the  present  i n the  See figure 13 f o r an example of the erosion process on  a possible edge mask. The r e s u l t of this conditional erosion i s a closed contour which surrounds  the nucleus  and  follows  the path  gradient values around the circumference  which has  the largest  of the nucleus (see figure 14).  A) P o s s i b l e edge mask (same as Figure 12D). B) E r o s i o n o f the p o s s i b l e edge mask i n image A. C) E r o s i o n o f image B. D) E r o s i o n o f image C.  46  A  FIGURE 14:  •  0  RESULTS OF THE EDGE RELOCATION ALGORITHM  A) Red image of a stained c e r v i c a l c e l l . This i s the c e l l processed i n Figure 12 and 13. B) F i n a l edge mask from the erosion process as described i n Figure 13. C) Gradient transformation of red image with the superposition of the f i n a l edge mask. D) The f i n a l nuclear mask of the nucleus of the c e l l i n A.  This closed contour represents the edge of the nucleus. previously saved are used to f i l l  The seed areas  the appropriate closed contours.  In  this manner the closed contours, generated by the interaction of nuclear concave indentations  i n the n u c l e i during  the d i l a t i o n  step,  are not  f i l l e d i n as nuclei. In  the f i n a l  step,  the algorithm  must  determine  whether the  individual edge p i x e l s belong to the nucleus or to the background.  Two  methods were examined: 1)  based on the p i x e l ' s intensity,  assign  nucleus  i s closer to the nuclear  i f i t ' s intensity value  the p i x e l  to the  mean intensity than to the outside mean intensity; 2)  f o r a 3x3 p i x e l square neighborhood centered on the p i x e l i n question, calculate the means of those p i x e l s belonging to the nucleus  and those  belonging  to the outside  and then  assign the central p i x e l to that class which has the closer mean. A feature of this algorithm i s that the nuclear mask so generated can be used as an input f o r the roughly segmented nuclear mask defined previously. quickly,  The i t e r a t i v e use of the edge r e l o c a t i o n algorithm  i n 2 to 3 i t e r a t i o n s ,  segmentation.  results  in a  steady  state  very  nuclear  That i s , any further iterations o f this algorithm do not  change the nuclear segmentation. Once the image(s) have been segmented by one method or another, each object appropriate  i n the image needs to be uniquely  labelled  so that the  features may be calculated f o r the correct objects.  The  type of connectivity used to label the objects can make a difference i n S  c e l l u l a r measurements.  Eight-connectedness,  2  where edge-adjacent and  corner adjacent p i x e l s tessellation, stated  are considered neighbors,  assuming a r e c t a n g u l a r  i s used i n t h i s work f o r a l l c o n n e c t i v i t y a n a l y s i s u n l e s s  otherwise.  2.4.  C e l l u l a r Features Once  the c e r v i c a l  cell  images  have  been  l a b e l l e d one can n u m e r i c a l l y d e s c r i b e the cytoplasm constitute that  individual  cells.  The n u m e r i c a l  the d i f f e r e n c e s between  squamous  epithelial,  the v a r i o u s  mildly  dysplastic,  segmented  nucleus  p a i r s which  description  s h o u l d be  cell  (normal  types  etc.)  are  and  such  mature  numerically  detectable. Almost  a l l o f the f e a t u r e s d e s c r i b e d 53  19  directly  from the l i t e r a t u r e  literature.  Some were  i n this  s e c t i o n are e i t h e r  61  '  or are modified  developed  f o r this  work,  f e a t u r e s from the  b u t may  also  exist  elsewhere. Since connected forward  the c e r v i c a l  pixels  cells  are represented  i n rectangular  arrays,  some  by  collections  o f the more  of  straight  f e a t u r e s a r e d e f i n e d below.  1)  CA Cytoplasm a r e a : an  object  which  the number o f connected has  the  spectral  pixels  forming  properties  o f the  cytoplasm. 2)  NA N u c l e a r object  area:  which  has  the number o f connected the s p e c t r a l  and  p i x e l s forming an  shape  properties  of  nuclear material. 62  3)  NComp N u c l e a r  Compactness:  nucleus  object  like  2  I s the ( c i r c u m f e r e n c e )  d i v i d e d by 4HNA.  o f the  The d e f i n i t i o n  c i r c u m f e r e n c e used i s v e r y s i m i l a r t o t h a t o f Freeman  63  of with  a  corrective  term  similar  to  that  of  Vossepoel  and  64  Smeulders.  The  actual  algorithm  used  for  the  circumference was: Circumference - N  x  + J2 II N  2  + 2.0 N  (16)  3  where: Nj i s the number of edge pixels i n the object with only 1 non object neighbor, neighbors, and N  N 3  2  i s the number of edge pixels with  i s the number of edge p i x e l s  neighbor which belong to the object.  2 non object  i n the object with 1  Four-connectedness,  where only  edge-adjacent p i x e l are considered neighbors was used i n this algorithm. Nuclear Compactness i s thus defined as: 2  (17)  NComp - (circumference) /4IINA 65  4)  NInert Nuclear Inertia:  i s 211 times the moment of i n e r t i a  of the nuclear mask, J , divided by the nuclear area squared. In this instance J- S r where r i s the distance of the pixels from the ft centre. xels  object  Thus NInert - 2IIJ/(NA) 5)  (18)  2  NMeanR Nuclear Mean Radius: i s the average distance from the center of the nucleus to the edge pixels of the nucleus.  6)  NMaxR Nuclear Maximum Radius: i s the largest distance from the center of the nucleus to the edge pixels of the nucleus.  7)  NRVar Nuclear r a d i a l variance: i s the normalized variance of the d i s t r i b u t i o n  of the distance from  the center of the  nucleus to the edge pixels of the nucleus. (19)  NRVar = Radial variance/NMeanR. The  other  shape features depend upon  boundary as a r a d i a l function of orientation.  interpreting  the nuclear  The closed contour of the  nuclear  boundary can  theta r ( j ) .  In  r a d i a l vectors the c e n t e r A data so  this  transformed  t h a t the  Transform  radial  first  two  n-1  next  series  terms  which b e s t two  a  fits  a /2 Q  2  and  b  2  2  + b  2  is  then  bi  performed  represented  n  128  2  the  by  a  radial  truncated  s i n n6  (20)  the  average  represent  the  o f f s e t required  l e a s t square sense) the b  on  determine an The  ellipse  radial  original which  length. by  a  The circle  contour.  is fitted  The  in  ) '  the  major a x i s o f t h i s e l l i p s e i s aQ  1/2  2  + 2 ( a  by  boundary p i x e l s from  represents  l e a s t square sense to the contour. 2  of  0  n-1  and  x  (FFT)  m cos n8 + S b  n  ( i n the  terms a  i s represented  o f the  f u n c t i o n r ( j ) can be  m + 2 a  0  o  this  boundary  the d i s t a n c e  s e r i e s o f s i n u s o i d a l waveforms.*  In  the  function  nucleus.  Fourier  r(9) = a  into a periodic radial  implementation  which r e p r e s e n t  o f the  Fast  be  2  and  the  minor a x i s  of  this  ellipse  i s aQ  - 2(a  +  2  )V . 2  2  8)  NEllong  Nuclear  elongation:  is  the  ratio  of  the  major  a x i s / m i n o r a x i s o f the above e l l i p s e . A u s e f u l shape d e s c r i p t o r should be and  o r i e n t a t i o n o f the  object  to be  independent o f p o s i t i o n , s i z e ,  described.  Shape d e s c r i p t o r s which 40  include a The  n  and  b  n  i n a symmetric way  will  have the  power s p e c t r a o f the F o u r i e r components has 9)  NBdycrc  Nuclear  energy i n the harmonic.  boundary  this  variation  above  property.  course:  frequency spectrum from the  properties.  measures  t h i r d to the  the tenth  51 2 2 (a + b )  NBdycrc = 2  n  (21)  n  n=3  10)  NBdyfin Nuclear boundary v a r i a t i o n f i n e : measures the energy in the frequency spectra from the 10th to the 31st harmonic of the fourier transform description of the nuclear contour. n=31  NBdyfn - 2 ( a  2 n  + b  2 n  )  (22)  n-ll  The frequency, intensity, and s p a t i a l organization of the density distributions i n the objects are properties which are very useful i n the discrimination  between c e l l types.  The following  features describe the  intensity d i s t r i b u t i o n i n the objects. 11)  RMeanI Red mean intensity: i s the mean intensity i n the red image of the object  after  the individual  pixel  intensity  values have been corrected f o r the non-uniform behaviour of the camera. 12)  GMeanI  Green mean intensity.  13)  BMeanI  Blue mean intensity.  14)  IODR  Integrated Optical  optical  Density of Red: i s the integrated  density of the object  individual  pixel  i n the red image after the  intensity values have been corrected f o r 66  the  non-uniform  defined as  behaviour  of the camera.  The IOD  is  ) = 2 OD (x,y) where OD(x,y) = l o g I ( x , y ) - l o g I 1 0  A  and  R  n  i s the sum  I(x,y)  over  the n p i x e l s  i s the c o r r e c t e d i n t e n s i t y  intensity  of  the  (23)  i n the o b j e c t .  background  at p i x e l  after  (x,y)  camera  and  l g i s the  correction.  average From  an 37  investigation that  the  background  accuracy. calculation IOD  o f the IOD  For  measurement p r o c e s s  value  the  should  cytoplasmic  this  IOD  r e q u i r e s the  i n t e n s i t y o f the cytoplasm intensity  calculation.  estimated  with  calculations  By  accurate  value  i s used as  excluding  a l l those  intensity  the  this  requires  average  i s located.  The  average  of  the  in  OD the  from the c a l c u l a t i o n o f the average  estimate the average c y t o p l a s m i c i n t e n s i t y i s IODG  Integrated Optical  D e n s i t y o f Green.  16)  IODB  Integrated Optical  Density of Blue.  17)  ODMax  Maximum:  density value detected inside  cytoplasmic  the a r e a used to  determined.  15)  Density  i n the  cytoplasm  F i g u r e 15 demonstrates how  Optical  nuclear  the  an a c c u r a t e e s t i m a t i o n o f the average  i n t e n s i t y can be generated.  the  of  the background v a l u e areas  highest  For the  calculation  i n which the nucleus  v i c i n i t y o f the a r t i f a c t s and n u c l e i cytoplasmic  be  been demonstrated  of the average background i n t e n s i t y v a l u e .  calculations  cytoplasmic  lg  i t has  is  the  largest  optical  the o b j e c t i n the r e d image. 5 5  18)  ODVar  Optical  Density  Variation  :  v a r i a t i o n o f the o p t i c a l d e n s i t y v a l u e s  is  the  found  normalized  i n the o b j e c t  i n the r e d image.  s 19)  ODSkew the  Optical  optical  6  D e n s i t y skewness  density distributions  : is  the  found  third i n the  moment red  of  image  53  FIGURE 15:  DETERMINATION OF CYTOPLASM TO BE USED TO CORRECT NUCLEAR OD VALUES  The accurate calculation of nuclear OD values requires the a c c u r a t e c a l c u l a t i o n o f the average i n t e n s i t y o f the cytoplasm i n which the n u c l e u s i s found. A) i s an image o f a s t a i n e d c e r v i c a l c e l l . B) i s the n u c l e a r mask generated by the segmentation o f image A ) . The p o s t - p r o c e s s i n g r o u t i n e r e c o g n i z e s the a r t i f a c t s i n B and removes them. R e s u l t i n g n u c l e a r and c y t o p l a s m i c mask shown i n C ) . I f t h i s c y t o p l a s m i c mask was u s e d t o c a l c u l a t e the average c y t o p l a s m i c i n t e n s i t y the r e s u l t would be a f f e c t e d by the dark a r t i f a c t s l o c a t e d i n the cytoplasm. I n s t e a d the n u c l e a r mask i n B) i s d i l a t e d s e v e r a l times and the p i x e l s which a r e l e f t i n the cytoplasm a r e used to c a l c u l a t e the average c y t o p l a s m i c i n t e n s i t y shown as the medium gray a r e a i n D).  representation  of the object.  Normalized  by the second  moment of the o p t i c a l density d i s t r i b u t i o n i n the object. (OU(x.v)  ODSkew = E  A  - ODmean)  (n - 1)(0DV)  n  (24)  3  3  ss  20)  ODKurt the  Optical Density Kurtosis  optical  density  distribution  representation of the object. ODKurt - S  found  i n the red image  Normalized by ODV squared.  (OP (x.v) - ODmean)*  ^  2.4.1.  : i s the fourth moment of  (n - 1)(0DV)  (25)  4  Markovian Texture A n a l y s i s  The following density d i s t r i b u t i o n measures not only describe the frequency of occurrence but also the s p a t i a l organization of the density distributions and are usually referred to as texture  features.  These  texture features can be further broken down into continuous and discrete categories.  The continuous  texture  density d i s t r i b u t i o n s t a t i s t i c a l l y .  features  describe  the s p a t i a l  Markovian analysis of the s p a t i a l  density d i s t r i b u t i o n of the nuclei was chosen since i t i s u t i l i z e d by many other  researchers  i n this f i e l d and has demonstrated encouraging  3 8  results. The stochastic  Markovian  analysis  processes.  This  of texture  i n images  treats  analysis y i e l d s matrices  images as  of gray  level  t r a n s i t i o n p r o b a b i l i t i e s which usually require substantial computation 68  time and memory space.  The various  texture parameters are usually 68  calculated from such matrices; however, Unser  ,  has demonstrated that i t  is  possible  to calculate  which require which  i s used  equivalent parameters  using similar methods  less memory and computational time. i n this  work.  For the sake  d e f i n i t i o n s w i l l be given f o r each  I t i s this  analysis  of understanding,  both  feature. 67  The co-occurrence matrix required  f o r the Markovian analysis  is  defined as the matrix, P ( i , j ) , where i and j range over the various gray levels present In the image.  P ( i , j ) i s the conditional p r o b a b i l i t y of a  p i x e l of gray l e v e l i occurring  next to (8-connectedness)  a p i x e l of  gray l e v e l j . fl 8  In the sum and difference  (SD)  method, P ( i ) i s the p r o b a b i l i t y s  of neighboring pixels having gray levels which sum to i and P ( j ) i s the d  p r o b a b i l i t y of neighboring pixels having a gray l e v e l difference of j . The intensity range of the images i s 0 to 255 which would r e s u l t in  the c a l c u l a t i o n  Markovian  of «  co-occurrence  65,000  matrix  conditional  or «» 1000  calculations f o r the sum and difference objects contain =* 200 pixels  probabilities conditional  probability  textural analysis.  (- 800 d i f f e r e n t neighbor  f o r the  Most of the interactions)  which can be a very sparse d i s t r i b u t i o n from which to calculate the 1000 (SD)  or 65,000 (Markovian)  conditional  p r o b a b i l i t y values.  reason and to save memory, the image i n t e n s i t y range  levels f o r these  texture calculations were compressed into 20 gray l e v e l s . 21)  REntropy Red Entropy: i s defined as  For this  Entropy = ? ? - P ( i , j ) l o g  - ? - P_(i) l o g I  1 0  s  1 0  P(i,j)  Pc(D  Markovian  6 7  + ? - Pd(j) logio Pd(j) J  SD  6 8  (26)  where the conditional p r o b a b i l i t i e s are defined by that data i n the red image. A large value for the entropy represents a nucleus for which there is l i t t l e s p a t i a l or gray scale d i s t r i b u t i o n organization. 22)  GEntropy  Entropy of the nucleus i n the green image.  23)  BEntropy  Entropy of the nucleus i n the blue image.  24)  Energy Energy: i s defined as  67  2  Energy = ? ? P ( i , j )  Markovian (27)  2  68  2  - ? P_ ( i ) + S P  d  (j)  SD  where the conditional p r o b a b i l i t i e s are defined by the red image data. A large value of the energy parameter represents a nucleus which has a s p a t i a l l y organized gray scale d i s t r i b u t i o n .  Almost  the d i r e c t  opposite of feature 24. 25)  Contrast  Contrast i n the red image i s defined by 2  Contrast  ,  ( i - j ) P(i.j)  6  7  Markovian (28) 68  2  - ? j  P (j) d  SD  A nucleus with a large contrast has large gray scale variations at high s p a t i a l frequencies. 26)  Correlation  Correlation f o r the red image i s defined by  67  Correlation  - Z E (i-/i)(j-p)  P(i,j)  Markovian  = 1/2[ Z ( i - 2 / i )  P (i) - Z j  (29) 2  i  J  2  P (j)]  SD  d  8  where m i s the mean intensity value of the nucleus under consideration. A large value for the c o r r e l a t i o n parameter indicates there are large connected  a nucleus i n which  areas with the same gray l e v e l and that the  gray l e v e l difference between adjacent areas i s large as well. 27)  Homogeneity  Homogeneity of the red image i s defined by 2  67  Homogeneity - Z S (1/(1 + ( i - j ) ) P ( i , j )  Markovian  i J  es SD  - Z (1/(1 + j ) ) P ( j ) 2  d  A large value f o r the homogeneity parameter indicates nucleus the s p a t i a l gray l e v e l v a r i a t i o n  (  3  0  )  that i n the  i s s l i g h t and s p a t i a l l y smooth  (low s p a t i a l frequency). 28)  Cluster-Shade  Cluster-Shade of the red image i s defined by 3  Cluster-Shade - Z 2 (i+j-2/i)  67  P(i,j)  Markovian (31)  - Z (i-2/i) l Since skewness  of  the cluster the sum  3  P (I)  shade  s  can be thought  probability  standardized by dividing  SD  distribution  the above cluster  68  of as the 3 moment or i n this  work  i t is  shade by the second moment,  M , of the sum d i s t r i b u t i o n to the 3/2 power 2  M  2  = Z (i-2/i) l  2  P (i) s  Thus the actual formula used to f i n d cluster shade i s  (32)  Cluster Shade = S (i-2/i) P (i)  (33)  3  0  3 12  (M ) / 2  A large absolute value f o r the cluster  shade parameter  indicates  that i n the nucleus there are a few d i s t i n c t (uniform intensity within) clumps with a large contrast between nucleus.  A negative value indicates  the clumps  and the rest  of the  dark clumps on a bright background  and a positive value indicates bright clumps on a dark background. 29)  Cluster Prominence  Cluster prominence i n the red image i s  defined by Cluster Prominence = ? ? (i+j-2/i) P ( i , j ) - S (i-2/i)  P (i)  Markovian  67  SD  s  Since the cluster prominence i s equivalent to the 4th moment of 2  the P ( i ) d i s t r i b u t i o n i t i s standardized by dividing s  i t by M  2  .  Thus  the actual formula used to calculate cluster prominence i s Cluster Prominence -  S (i-2/i) P_(i) — - — — 4  (35)  1  (M ) a  Attaching a simple n o n - s t a t i s t i c a l  interpretation  to this measure  would probably be more deceiving than useful. Another set of features which measure the s p a t i a l d i s t r i b u t i o n of the intensity variations  within the nucleus i s based upon there existing  some form of s e l f s i m i l a r i t y i n the s p a t i a l density d i s t r i b u t i o n when i t is examined at d i f f e r e n t  scales.  For this feature the d i s t r i b u t i o n of  the o p t i c a l density values (hence the DNA) i s assumed to be similar to 69  that of a f r a c t a l .  For a good d e f i n i t i o n of f r a c t a l s see Mandelbrot.  In measuring an object which i s assumed to have the properties of a  fractal  one  i s looking  f o r a predictable  increase  i n the measured  property as one examines the object at f i n e r and f i n e r scales. 70 Similar interpret  the  represent property  the work  optical  the height measured  dimensional 30)  to  done  density  by  C.  values  of the p i x e l  Caldwell of  the  on  mammograms  individual  i n a three-dimensional  i s the surface  area  of  the  thus  we  p i x e l s to space.  defined  The  three-  surface. FAl  F r a c t a l Area  Scale  1:  i s the surface  generated nuclear three-dimensional optical  density  assumed  to have  data a  area  and  of the  surface from the nuclear  i n the red image.  unit  area  Each p i x e l i s  the rectangle  adjacent p i x e l s i n the three-dimensional  joining  space i s assumed to  have an area proportional to i t ' s height. FAl -2 p i x e l areas +2 j o i n i n g rectangle areas 31)  FA2  Fractal Area Scale 2:  i s the surface  generated nuclear three-dimensional  (36) area  of the  surface where squares of  four p i x e l s i n the o r i g i n a l image are averaged to produce a single new p i x e l thus reducing the scale and the size of the image by a factor of 2. FA2 -2 p i x e l areas +2 j o i n i n g rectangle areas  (37)  Figure 16 depicts how FAl and FA2 are determined. 32)  FD  Fractal dimension:  distribution  assuming  exhibits f r a c t a l  that  the  behaviour  then  intensity the  fractal  dimension f o r the image of the nucleus can be defined as FD - ( l o g F A l - l o g F A 2 ) / l o g 2 1 0  10  10  Mandelbrot  69  (38)  Assigning a mass to each p i x e l depending upon i t s o p t i c a l density, it  i s possible to calculate the center of mass of the nucleus.  This  60  Gr I d 1 Each square is one pixel  Gr i d 2  FAl  average of 4 pixels Nuclear OD values as a 3-Di mens I onal surface  FIGURE 16:  DETERMINATION CALCULATION  T  o  Io p  Hrea  OF AREAS FAl AND  FA2 FOR  THE FRACTAL DIMENSION  G r i d 1 d e p i c t s an a r r a y o f p i x e l s i n which a nucleus i s l o c a t e d . In g r i d 1 the n u c l e a r p i x e l s a r e shaded. The shaded squares i n g r i d 2 are those squares o f f o u r p i x e l s which are e n t i r e l y c o n t a i n e d w i t h i n the n u c l e u s . The c o r r e s p o n d i n g p i x e l s i n g r i d 1 are a l s o d a r k l y shaded. For the f r a c t a l dimension c a l c u l a t i o n o n l y the d a r k l y shaded p i x e l s ( o r squares i n g r i d 2 ) are used. One can imagine t h a t these p i x e l s ( o r squares) form a 3 dimensional s u r f a c e , where the h e i g h t o f each p i x e l (or square) corresponds to i t s o p t i c a l d e n s i t y v a l u e ( o r average o p t i c a l d e n s i t y v a l u e f o r g r i d 2 ) . The o v e r a l l a r e a o f the 3 d i m e n s i o n a l s u r f a c e i s c a l c u l a t e d by summing the a r e a o f the tops and the exposed s i d e s o f the p i x e l s (or squares) t o g e t h e r . The a r e a so c a l c u l a t e d f o r the dark p i x e l s i n g r i d 1 i s F A l and the area so c a l c u l a t e d f o r the dark pixels i n grid 2 i s F A 2 .  gives r i s e to another measure which describes i n a general fashion the d i s t r i b u t i o n of the DNA i n the nucleus. 33)  DCM  Distance to center of mass: i s the distance i n pixels  that the center of mass of the nucleus i s from the geometric center of the nucleus. 34)  DenMax  Density  detected  of Maxima:  of l o c a l maxima  i n the red image of the nucleus,  area of the nucleus. reduce  i s the number  the amount  divided by the  The red image i s s p a t i a l l y averaged to of speckle  noise.  detected when the four side adjacent  Local  maxima are  pixels are less  than  to DenMax except  that  the central p i x e l . 35)  DenMin  Density  of minima: s i m i l a r  l o c a l minima are detected when the four side adjacent pixels are greater than the central p i x e l . 36)  ExtrRange difference DenMin  Extrema between  feature  and  Range:  is  the  absolute  intensity  the smallest  minimum  detected  by the  the largest  maximum  detected  by the  DenMax feature. 37)  AveRange  Average  Range:  is  the  absolute  intensity  difference between the average minimum value detected by the DenMin feature and the average maximum value detected by the DenMax feature.  2.4.2.  Discrete Texture  Interphase  chromatin  condensation states.  Features  i s thought  to  exist  in  a  variety  of  There i s evidence to suggest nuclear morphological  change i s associated with fundamental alterations i n chromatin structure  and  function 7  and these  changes  can be observed  i n the chromatin  1  structure  or DNA  condensation  state.  Thus  i t i s possible and  reasonable to divide the nucleus into d i f f e r e n t sections corresponding to the d i f f e r e n t states of the chromatin i n the nucleus.  I t i s assumed  that the non-condensed chromatin i s located i n the l i g h t areas of the nucleus, that the condensed the nucleus.  chromatin i s located i n the dark areas of  Further, the condensed  chromatin area can be subdivided  into medium and highly condensed sections.  Since some benign c e l l s such  as leukocytes are normally hyperchromatic,* highly  condensed  chromatin  state,  their  i . e . their DNA exists i n a  optical  density  distribution  c h a r a c t e r i s t i c s generate good reference values f o r determining the three classes of chromatin condensation. The average mean o p t i c a l density and average variance of o p t i c a l density of several leukocytes from the same s l i d e as the c e l l s  to be  analyzed can be used to determine the o p t i c a l density threshold, ODMID, separating the high and medium chromatin states.  This threshold and the  average variance of the o p t i c a l density d i s t r i b u t i o n s of the leukocytes may be used to determine the threshold IBND, separating the medium and low density chromatin.  In this work 8 to 15 leukocytes from each s l i d e  were analyzed and the mean integrated o p t i c a l density of the leukocytes found.  This  thresholds.  value For a  was used 15  slide  to standardize sample  experimentally determined f o r each s l i d e .  the ODMID  set ODMID  and  and IBND IBND  were  The relationship between the  mean IOD value of the leukocytes and the experimentally determined ODMID and  IBND thresholds was calculated.  This relationship  was used to  determine ODMID and IBND thresholds for the rest of the s l i d e s .  Thus  low density chromatin i s defined to exist i n the areas where the o p t i c a l  density ranges from 0 to IBND, medium density chromatin i n the areas where the o p t i c a l density ranges from IBND to ODMID, and high  density  chromatin i n the areas where the o p t i c a l density i s above ODMID. Once the boundaries have been determined one can extract several features which describe the shape, i n t e n s i t y and s p a t i a l d i s t r i b u t i o n of the various nuclear sections.  Figure  17 shows a nucleus divided into  sections as described above. 72  38)  TARL area  Total area r a t i o f o r low density chromatin: occupied  by the low density  sections  i s the  i n the nucleus  divided by the area of the nucleus. 39)  TARM  Total area r a t i o for medium density chromatin.  40)  TARH  Total area r a t i o for high density chromatin.  41)  TERL  Total  extinction  ratio  for  the  low  density  72  chromatin:  i s the integrated o p t i c a l density of the low  density sections divided by the integrated o p t i c a l  density  of the nucleus. 42)  TERM  Total e x t i n c t i o n r a t i o for medium density chromatin.  43)  TERH  Total e x t i n c t i o n r a t i o for high density chromatin. 73  44)  NL  Number of low density chromatin c l u s t e r s :  this i s the  number of d i s t i n c t groups of low density chromatin p i x e l s i n the nucleus. 45)  NM  Number of medium density chromatin c l u s t e r s .  46)  NH  Number of high density chromatin c l u s t e r s .  47)  NLS  Number  ,  of  low  density  chromatin  single  pixel  single  pixel  73  clusters. 48)  NMS  Number  clusters.  of medium  density  chromatin  64  FIGURE 17: DIVISION OF NUCLEUS STATES  INTO AREAS OF DIFFERENT CHROMATIN  CONDENSATION  C h r o m a t i n i n n u c l e i i s b e l i e v e d to e x i s t i n v a r i o u s c o n d e n s a t i o n s t a t e s ( l o w , medium a n d h i g h ) . From l e u k o c y t e s p r e s e n t o n t h e same s l i d e as t h e n u c l e i being investigated it is p o s s i b l e to determine optical density thresholds which w i l l s e p a r a t e the nuclei into low d e n s i t y c h r o m a t i n , medium d e n s i t y c h r o m a t i n and h i g h d e n s i t y c h r o m a t i n . A) i s a g r a y s c a l e image o f a c e r v i c a l c e i l i n w h i c h the c y t o p l a s m i c and n u c l e a r b o u n d a r i e s have been d e l i n e a t e d . B) i s a r e p r e s e n t a t i o n o f t h e c h r o m a t i n s t a t e s i n t h e n u c l e u s o f A) . C) i s a m a g n i f i e d v e r s i o n o f image B), i n w h i c h t h e c h r o m a t i n i n t h e h i g h c o n d e n s a t i o n s t a t e has been l a b e l l e d b l a c k , t h e medium c o n d e n s a t i o n s t a t e c h r o m a t i n h a s been l a b e l l e d dark g r a y , a n d t h e low c o n d e s n a t i o n s t a t e c h r o m a t i n h a s b e e n l a b e l l e d l i g h t gray.  49)  NHS  While  Number of high density chromatin single p i x e l c l u s t e r s .  the thresholds  IBND  and ODMID divide  the nucleus  into  various sections i t may be useful to know the average contrast between the various sections. 72  50)  MAER  Medium average extinction r a t i o :  which i s the r a t i o  of the mean o p t i c a l density of the medium density chromatin divided  by the mean  optical  density  of the low density  chromatin. 51)  HAER  High average extinction r a t i o :  same as above but f o r  the high density chromatin. 52)  MHAER  Medium and high  ratio  average e x t i n c t i o n r a t i o :  i s the  of the mean o p t i c a l density of the medium and high  density chromatin sections grouped into one section divided by the mean o p t i c a l density of the low density chromatin. 72  53)  CL  Compactness of the l i g h t density chromatin areas:  is  similar to the compactness measure f o r the entire nucleus, NComp.  In this case 2  CL  - (Number of edge p i x e l s f o r the l i p h t density areas)  (39)  4JT ( t o t a l area of the l i g h t density p i x e l s ) 54)  CM  Compactness of the medium density chromatin areas.  55)  CH  Compactness of the high density chromatin areas.  56)  CMH  Compactness of the medium and high density chromatin  areas where the medium and high density areas are considered to be of the same type. 57)  ADL  Average distance of the l i g h t density chromatin from 72  the nuclear center:  i s the average separation between the  individual  light  pixels  and  the  center  of  the  nucleus  normalized by the mean radius, NMeanR, of the nucleus. 58)  ADM  Average distance of the medium density chromatin  from  the nuclear center. 59)  ADH  Average distance of the high density chromatin from the  nuclear center. 60)  ADMH  Average  distance  of  the  medium  and  high  density  chromatin from the nuclear center. 61)  SCLCN  Separation between the center of the  chromatin  and  the  centre  of  the  l i g h t density  nucleus:  where  the  separation has been normalized by the average radius of the nucleus. 62)  SCMCN  Separation between the center of the medium density  chromatin and the center of the nucleus. 63)  SCHCN  Separation between the center of the high  density  chromatin and the center of the nucleus.  2.4.3.  Postprocessing  Once a few of the more obvious features have been calculated, the various artifact  nuclear  like  removal  objects  routine  in  the  before  images  the  more  are  interpreted by  involved  features  an are  determined.  This routine considers holes i n objects to be a r t i f a c t s and  fills  in.  them  heuristically  from  Objects  which  are  irregularly  the NRVar/NmeanR, NComp, and  shaped NIert  (determined  features), too  small (determined h e u r i s t i c a l l y from the NA and NMeanR features) , or do not contain enough DNA  material (determined h e u r i s t i c a l l y from the IODR  feature) are discarded as a r t i f a c t s .  Thus a r t i f a c t s are considered to  be  objects  which  exhibit  some b u t n o t a l l o f t h e c h a r a c t e r i s t i c s o f  nuclei.  2.5.  Cell  The normal  Classification  selection of cells  o r abnormal  and t h e i r manual c l a s s i f i c a t i o n  was performed  by t h e author  as e i t h e r  and a second  manual  c l a s s i f i c a t i o n performed by Dr. H. Tezcan, the M.D. r e s p o n s i b l e f o r the development work.  o f the s t a i n i n g and d e p o s i t i o n  Any c e l l s  discarded.  f o r which  A cell  abnormality  was  procedure u t i l i z e d  i n this  the two manual c l a s s i f i c a t i o n s d i f f e r e d were  by c e l l  classification  n o t made,  keeping  into  the v a r i o u s  t h e number  of  grades o f  groups  t o be  d i s c r i m i n a t e d t o two. The  method  based  upon  t h e i r f e a t u r e s was l i n e a r stepwise d i s c r i m i n a n t f u n c t i o n a n a l y s i s .  The  analysis  was  (BioMedical stepwise  used  to automatically  performed  by  a  Data P r o c e s s i n g ) .  discriminant  classify  commercially  available  function  analysis.  This  the computation o f t h e l i n e a r c l a s s i f i c a t i o n  analysis  F o r more i n f o r m a t i o n  and on  t h e BMDP  A  brief  assumptions of data  summary  this  one s h o u l d 75  discriminant  a n a l y s i s makes about  performs  7M, a  The f e a t u r e s used f o r  and o t h e r r e f e r e n c e s .  of  BMDF  f u n c t i o n s were chosen i n a  4  S t a t i s t i c a l Software manual  routine  on l i n e a r d i s c r i m i n a n t  implementation,  7  program  The r o u t i n e used i s r e f e r r e d t o a s ,  d i s c r i m i n a n t a n a l y s i s between two o r more groups.  stepwise f a s h i o n .  the c e l l s  function  see the BMDP  76  '  function  analysis  the d i s t r i b u t i o n ,  size,  and the and form  follows.  Discriminant dimensional  space  f u n c t i o n a n a l y s i s f i n d s the hyperplane(s), defined  by  the f e a t u r e  data  used,  X,  i n the Nwhich  best  s e p a r a t e s the example.  two  or more groups.  The  discriminant  f i g u r e 18  See  function  d i s c r i m i n a n t f u n c t i o n f o r each group.  for a  analysis  two-dimensional  actually  These d i s c r i m i n a n t  l i n e a r weighted combinations o f the f e a t u r e  finds  a  functions  are  data.  n f (X) -S  W  x  X  n  + C  t  (40)  0 1  i-1 where  f (X)  is  x  the  discriminant  group, n i s the number o f f e a t u r e s the  weight  assigned  to  constant.  For the two  i.e.  and  for  f (X) x  the  feature  In our  2  normal  cells  and  for  the  is  first  i s the  x  2  the  group,  and  C  first  the  discriminant  discriminant  function  is  Is  0 1  d i s c r i m i n a n t f u n c t i o n s are  case f ( X )  f (X)  for  i n the d i s c r i m i n a n t f u n c t i o n ; W^^  group case, two  f (X).  function  a  found,  function for  the  functions  can  abnormal c e l l s . Using  the  N  c l a s s i f y the If  this  of  a  cell,  the  two  discriminant  cell.  fi(X) - f (X)  >  2  c l a s s i f i e d as In  features  a normal c e l l ,  fashion  the  two  P where P  discriminant  functions  may  be  is  cell.  used  to  Classification  Normals  Abnormals  Normals  a  b  Abnormals  c  d  cells  cell  cells.  Automatic  (normal  the  Discriminant Function C l a s s i f i c a t i o n Table  Manual Classification  I n the  then  e l s e i t i s c l a s s i f i e d as an abnormal  a u t o m a t i c a l l y c l a s s i f y a l l the TABLE 1  i s a constant,  previous  table, b represents  which have  incorrectly  the number o f f a l s e p o s i t i v e s  been  automatically  classified  as  69  Feature 1  FIGURE 18: A  TWO DIMENSIONAL EXAMPLE OF DISCRIMINANT FUNCTION ANALYSIS  GROUP  SEPARATION  USING  LINEAR  Each e l l i p s e i n t h i s f i g u r e bounds a r e g i o n i n which a c e r t a i n p r o p o r t i o n o f a l l the members o f one o f the groups w i l l f a l l , e.g. 50% or 90% o f group one. In t h i s - example, the two groups are assumed to have i d e n t i c a l covariance matrices. -Thus the s e t o f p o i n t s between the two groups f o r which the p r o b a b i l i t y o f b e l o n g i n g to one group or the other i s equal i s a s t r a i g h t line. T h i s l i n e i s shown as the l i n e a r d e c i s i o n boundary i n the f i g u r e and would be the one d i m e n s i o n a l hyperplane determined by l i n e a r d i s c r i m i n a n t function analysis.  abnormals)  and  c  represents  the  number  of  false  c e l l s which have i n c o r r e c t l y been a u t o m a t i c a l l y Thus  the  false  f a l s e negative is  positive  rate  for  r a t e i s c/(c+d).  b+c/(a+b+c+d).  The  value  the  The  of  P  above  classified  table  a  large  positive  false  rate.  negative  Consequently one  the f a l s e n e g a t i v e The  rate  as  normals).  i s b/(b+a) and  i s usually  chosen  are  more  serious  would a d j u s t  (TEC)  to minimize  system the than  the v a l u e  the  TEC.  consequences  a  large  false  o f P to  reduce  r a t e i n such a system.  question  that  stepwise d i s c r i m i n a n t  commonly used to answer i s : d i s t i n g u i s h between  the  the b e s t N f e a t u r e s t o 1)  (abnormal  t o t a l error of c l a s s i f i c a t i o n  However f o r an automated c e r v i c a l c e l l s c r e e n i n g of  negatives  Which s e t o f N  groups.  There  are  f u n c t i o n a n a l y s i s i s most features  several  will  most  clearly  methods o f  finding  use.  Complete  subset  possible  subsets  method: of  N  Find  and  features.  evaluate This  is  a l l of a  very  the time  consuming method f o r l a r g e numbers o f v a r i a b l e s ( T h i s method was 2)  not  used).  Forward s t e p p i n g method: the find  F i r s t f i n d the s i n g l e f e a t u r e w i t h  maximum goodness measure the  maximizes  ( t o be  defined  f e a t u r e which when p a i r e d w i t h the  goodness measure.  the  Continue  later). first  in this  adding f e a t u r e s which when grouped w i t h the a l r e a d y  Next  feature, fashion selected  f e a t u r e s maximize the goodness measure u n t i l N f e a t u r e s have been s e l e c t e d . 3)  Backward s t e p p i n g method: In a stepwise f a s h i o n ,  Begin w i t h a l l f e a t u r e s  s u c c e s s i v e l y remove the  included.  feature  that  results In the least decrease i n the goodness measure u n t i l only N features are l e f t . 4)  F u l l stepwise method:  At each step, test  the decrease i n  the goodness measure i f a feature were to be removed; i f the decrease i s below a s p e c i f i e d l e v e l ,  remove the feature .  I f no feature meets this c r i t e r i a then add a feature by the forward stepping method. Two goodness measures are available i n the BMDP 7M routine.  One  i s Wilk's lambda, L.  I  W  I  L  (41) | V + B |  where V i s within the group's sums of squares and cross products matrix, and B i s between groups sums of squares and cross products matrix. measure can be converted  to an approximate F - r a t i o  to test  This group  differences. The  other  i s the conditional F r a t i o .  This i s the univariate F  r a t i o , associated with a p a r t i c u l a r feature, once the v a r i a t i o n i n the feature data due to the already entered features has been removed using a multivariate linear regression. Some of the assumptions and requirements of l i n e a r  discriminant  function analysis are: 1)  A large  data  (cells)  per feature  analysis. 2)  That  set with  approximately used  10 - 20 data  i n the discriminant  points  function  76  the  group  data  f o r each  feature  is  normally  distributed, i . e . the 76 feature d i s t r i b u t i o n for each group i s multivariate normal.  3)  That the variance  and covariance  of the features  i n each 76  group be the same. 4)  See figure 19 f o r a graphic example.  That the number of data points ( c e l l s ) i n each group i n the data  set used to calculate the discriminant  functions are  approximately the same, i . e . group a/group b - less than 1.5 76  or greater than 2/3. The l a s t requirement i s not as strong i f the f i r s t three are  met.  functions  I f the f i r s t  criterion  i s not met  then  criteria  the discriminant  generated may not obtain the same performance on the other  sets of data as they do on the learning set used to generate them. I f the  second  criterion  i s not met  then  the features  selected by the  stepwise discriminant function analysis may not be the optimal ones and the c l a s s i f i c a t i o n power assigned to each feature may be inaccurate.  If  the t h i r d c r i t e r i o n i s not met then the hyper-plane i s no longer the optimal surface to divide the two groups and a quadratic or higher  order  surface would decrease TEC. The affects of v i o l a t i n g one or more of the f i r s t can be mitigated  three  criteria  somewhat i f the sample sizes of the groups used are  large and close to being the same.  73  B)  FIGURE 19: OPTIMAL SEPARATION BOUNDARIES FOR GROUPS WITH DIFFERENT COVARIANCE , MATRICES In these two examples, the c o v a r i a n c e A) and the v a r i a n c e B) o f the f e a t u r e s i n each group a r e n o t the same. Consequently the o p t i m a l s e p a r a t i o n boundary i s no longer a hyperplane ( s t r a i g h t l i n e i n 2D case) b u t a q u a d r a t i c s u r f a c e , r e p r e s e n t e d as the q u a d r a t i c d e c i s i o n boundaries i n f i g u r e s A) and B ) .  3.  RESULTS 3.1  Segmentation Results  Segmentation  i s one  of  the  analysis system must perform.  most  important  tasks  an  automated  The accuracy of the segmentation d i r e c t l y  affects the results of: shape descriptors, many of the o p t i c a l density distribution features. various  features,  and  indirectly  Thus the objectives are: segmentation  interpretation dependence  of  determine  the  accuracy;  of  procedures  the  the  and  4)  to  test  a  various  visual  boundaries);  features  dependence  the  texture  1) to determine the accuracy of the against  cellular  various  affects  on  2)  standard to  segmentation  of  cellular  the  hypothesis  determine  accuracy;  classification that by  (human  upon  the  3)  to  feature  using quantitative  c e l l u l a r descriptors to c l a s s i f y the c e l l s one can match the performance of a s k i l l e d cytology technician. The procedures  determination was  done  of  the  i n three  accuracy  of  steps.  The  the  various  first  step  segmentation evaluated  the  segmentation accuracy of several of the possible segmentation procedure combinations  on  the  small database of  evaluated the segmentation accuracy using the larger 1000  150  images.  The  second  step  and e f f e c t on feature calculations  image database.  The t h i r d step evaluated the best  procedure from steps 2 and 3 on the 3680 image database. Of algorithm  the 1  multitude followed  postprocessing,  32  by  of  possible  algorithm  different  8  segmentation  procedures,  followed  algorithm  combinations  segmentation accuracy  on the database of 150  this  shown  evaluation  are  in  correctly segmented only i f no  Table  2.  A  by were  9  with  evaluated  for  images.  The  nucleus  was  area of the nucleus  i.e.  results  of  considered  could be  visually  TABLE 2 Comparison o f Segmentation Procedures on a Database o f 150 Images  Secondary Segmentation Algorithm  Primary Segmentation A l g o r i t h m  1  2  Cytoplasm Nucleus  97.4% 24. 7X  96 .0% 24 .7%  90.0% 40.7%  89.3% 15.3%  74.7% 19.3%  7  Nucleus  28.7%  34 .7%  46.7%  33.3%  NA  8  Nucleus  30.0%  40 .7%  48.7%  19.0%  NA  9  Nucleus  78.7%  81 .3%  78.0%  66.7%  68.7% 3 and 7  Algorithms 3 and 8  75.3%  82 .7%  82.7%  80.0%  33.3%  86.7%  78.7%  1+9 and 94.0%  None  Post P r o c e s s i n g Nucleus 9 + Post P r o c e s s i n g Nucleus  89.3%  92 .0%  3  92.0%  4  92.0%  6  Additional Procedures Tested  93.3% f o r Algorithm 5  83.3%  A l l r e s u l t s r e p r e s e n t the % o f the images which were c o r r e c t l y segmented. Secondary segmentation a l g o r i t h m i n d i c a t e s a segmentation method which r e f i n e s a p r e v i o u s l y g e n e r a t e d n u c l e a r segmentation. NA not a p p l i c a b l e secondary a l g o r i t h m r e q u i r e s a t h r e s h o l d . The a c c u r a c y o f a l l measurements i s ± 7%.  found  incorrectly  segmented.  20 for examples of  correctly  for examples of i n c o r r e c t l y  segmented  and  nuclei.  While  the  stringent,  the cytoplasm had to be r e l a t i v e l y poorly segmented i n order  considered  21  figure  segmented nuclei  to be  figure  See  criterion  incorrectly  for  nuclear  segmented.  A  segmentation  segmentation  involved less than 10% of the area of the cytoplasm was correctly  segmented.  The  d e f i n i t i o n of  cytoplasm  i s much less c r i t i c a l  the  extent  was  error  still and  very  which  considered  shape of  the  than the d e f i n i t i o n of the nucleus  to  the c l a s s i f i c a t i o n of the c e l l . The cytoplasm  of a l o t of the c e r v i c a l c e l l s examined was  than the 128 x 120 p i x e l array used to store the c e r v i c a l c e l l Thus the value  of  the  cytoplasmic  area was  determined by  larger images.  using  the  three-dimensional thresholding algorithm to segment the cytoplasm over a much larger, area.  The  cervical 128 x 120  384 x 360  pixel  cytoplasmic  area  i n the  x  cell  pixel  128  images.  area centered so 120  on  determined, area  I t i s this  the  o r i g i n a l 128 x 120  which  belonged  was  saved  along  with  area  which  i s used  to  the  the  RGB  as  the  CA  feature. To reduce the number of a r t i f a c t s for which features would need to be calculated and to reduce the e f f e c t of a r t i f a c t s on the determination of the discriminate function to be used for the c l a s s i f i c a t i o n of the individual  cells,  only  those  segmentation  procedures  postprocessing were used on the second database of 1000 tested on the database of 1000 of:  which  included  images.  images were the segmentation  Also  procedures  algorithm 3 followed by algorithm 7 followed by algorithm 9 and  postprocessing,  and  algorithm  3  followed by  algorithm 9 followed by postprocessing.  algorithm  When one  8  followed  includes the  by  results  77  The c y t o p l a s m i n t e n s i t y v a l u e s shown are i n t e n s i t y v a l u e s are from the r e d Images.  from  the b l u e  images.  The  nuclear  7,3  of algorithm 6, 16 d i f f e r e n t algorithms were tested on the database of 1000  images.  A  summary  of the segmentation  accuracy f o r these  16  procedures on the 1000 image database can be seen i n Table 3. One should note that due to a loss of image data, four of the procedures tested i n Table 3 were done using only 800 images (due to a loss of image data from the back-up media). To  better  define  the  accuracy  of  the  best  of  the nuclear  segmentation procedures used i n Tables 2 and 3 the procedure of primary segmentation  method  1  followed  by  two  iterations  of  the  secondary  segmentation algorithm 9 plus post-processing was used to segment the large 3680 image database.  3.2  To  The results are shown i n Table 4.  V a r i a t i o n of Features  examine  the  variation  of  features  segmentation procedures the c o e f f i c i e n t was calculated.  due  to  the  different  of v a r i a t i o n f o r each  feature  The c o e f f i c i e n t of v a r i a t i o n i s defined as the standard  deviation divided by the mean.  For each nucleus i n each image the mean  and the standard deviation f o r each feature was calculated across a l l the  procedures which segmented the nucleus.  Thus f o r each nucleus the  c o e f f i c i e n t of v a r i a t i o n (CV) was calculated f o r every feature. was done f o r a l l of the images i n the 1000  This  image database.  The CV results were c o l l e c t e d into s i x d i f f e r e n t groups based upon c e l l type and segmentation success information. feature  i n each group  was calculated  An average CV f o r each  from the group  data.  The s i x  groups are: 1)  A l l c e l l types, regardless of segmentation outcome.  2)  Normal c e l l s only, regardless of segmentation outcome.  TABLE 3 Comparison of Selected Segmentation Procedures on a Database of 1000 Images Segmentation Algorithms ( a l l include Post Processing) Secondary  Primary 1  2  3  4  6  None  Nucleus Cytoplasm  88.6% 98.5X  85 .4X 94 .3X  89.4X 97. IX  84.6X 97. IX  52.,5X 95. OX  9 7 8 7 and 9 8 and 9  Nucleus Nucleus Nucleus Nucleus Nucleus  97. ex  97 .6X  97.7X 91.5X* 67.OX* 94.IX* 93.3X*  98. OX  91..5X  Not tested  Additional Procedures Tested Algorithm 5 97. 2X 1 & 9 98. OX  Not tested  A l l results represent the numerical percentage of the images which were correctly segmented. *These r e s u l t s were derived using an 800 image subset of the 1000 image database. The accuracy of a l l measurements i s ± IX.  TABLE 4  Segmentation Performance of Simple 2D Histogram Analysis Followed by Two Iterations of the Edge Relocation Algorithm Plus Postprocessing Images from Normal Samples (Total 1671)  # of n u c l e i correctly segmented % of n u c l e i correctly segmented.  Images from Dysplastic Samples (Total 2009)  Total Image Database of 3680 Images  1640  1977  3617  98.IX  98.4X  98.3X  3)  Abnormal c e l l s only, regardless of segmentation outcome.  4)  A l l c e l l types, but only those n u c l e i which were correctly segmented by a l l 15 segmentation  5)  Normal c e l l s  only,  procedures.  consisting of only  those  n u c l e i which  were c o r r e c t l y segmented by a l l 15 segmentation 6)  procedures.  Abnormal c e l l s only, consisting of only those n u c l e i which were c o r r e c t l y segmented by a l l 15 segmentation procedures.  The  last  features can They may given  three  vary  groups  were  even when the  included  to  demonstrate  segmentation appears  how  to be  much  correct.  also be used as an estimate of how much the features could vary  that  different  individual was  individuals segment  the  images  to segment the same image multiple times.  were 63 features examined i n this way  or  the  same  Since  there  the r e s u l t s have been spread over  three tables; Table 5, Table 6 and Table 7. From a v i s u a l examination of some of the more l o g i c a l combinations of the previously described features, four more features were added to the feature space. . 64)  These features are:  Ratio:  The  ratio  of  the  nuclear  area  divided  by  the  cytoplasmic area of the c e l l (NA/CA).  (42)  65)  HArea:  (43)  66)  DNum: Density number:  High density chromatin area (NA*TARH).  i s the difference between the number  of l o c a l minima and the number of l o c a l maximal detected i n the red image of the nucleus 67)  ODMean:  Optical  density  (NA*[DenMin-DenMax]). mean  value  of  the  (IODR/NA). Most  of  the  nucleus (45)  calculated features  between the two groups.  (44)  discriminate  to  some  extent  The group means for each feature were compared  TABLE 5 V a r i a t i o n of the Shape Features and Some Texture Features among the IS Segmentation Procedures C o e f f i c i e n t s of V a r i a t i o n f o r Features ( i n %)  Both Groups Normal Nuclei Abnormal N u c l e i Both Groups Normal Nuclei Abnormal Nuclei  NA 9.0 7.2 10.0 5.9 5.9 5.8  NComp 5.1 4.4 5.6 2.7 2.8 2.6  NInert 1.2 1.0 1.3 0.6 0.6 0.7  NMeanR 4.5 3.7 5.0 3.0 3.0 2.9  NMaxR 5.7 4.8 6.3 3.0 3.2 2.9  NRVar 28.1 24.9 30.0 13.6 14.3 13.0  Both Groups Normal Nuclei Abnormal Nuclei Both Groups Normal Nuclei Abnormal Nuclei"  NBdycrc 48.6 53.4 45.8 30.9 34.7 27.0  NBdyfin 43.4 35.9 47.7 30.7 27.7 33.8  RMeanI 6.3 4.9 7.2 4.3 4.2 4.3  GMeanI 6.7 6.4 6.8 4.7 5.6 3.7  BMeanI 6.0 7.6 5.1 4.6 6.8 2.3  FD 1.0 1.0 1.1 0.8 0.9 0.7  Both Groups Normal Nuclei Abnormal N u c l e i Both Groups Normal Nuclei Abnormal Nuclei  FA2 17.8 13.4 20.3 12.2 10.9 13.5  DCM 29.1 23.8 32.2 22.5 18.6 26.4  t  Results f o r images segmented. Results f o r only those images which were c o r r e c t l y segmented by a l l of the 15 procedures.  NEllong 3.2 2.9 3.3 2.2 2.1 - 2.2 FAl 16.8 13.6 18.6 12.2 11.4 12.9  TABLE 6 V a r i a t i o n of Discreate Texture Features among C o e f f i c i e n t s of V a r i a t i o n f o r Both Groups Normal N u c l e i Abnormal Nuclei Both Groups Normal N u c l e i Abnormal Nuclei 1  2  i  Both Groups Normal Nuclei Abnormal Nuclei Both Groups Normal Nuclei  2  Abnormal Nuclei l  Both Groups Normal Nuclei Abnormal Nuclei Both Groups Normal Nuclei  2  Abnormal Nuclei  2  TERL 14.1 12.5 15.0 11.3 10.2 12.5  TERM 2.7 2.8 2.6 1.9 2.1 1.8  TERH 2.5 2.9 2.2 1.7 2.0 1.4  CL 16.5 17.1 16.2 14.6 16.1 13.1  CH 1.1 1.1 1.0 0.4 0.6 0.3  CMH 1.2 1.5 1.0 0.5 0.6 0.4  ADL 4.4 3.8 4.7 3.1 3.5 2.8  ADM 4.6 4.2 4.9 3.4 3.8 3.0  ADH 4.6 4.5 4.6 3.3 3.9 2.8  ADMH 4.6 4.2 4.9 3.4 3.8 3.0  HAER 14.1 11.1 15.9 11.0 10.5 11.6  MHAER 14.5 11.1 16.4 11.2 10.5 11.9  NL 41.2 37.7 43.2 40.0 38.9 41.0  NM 2.1 2.9 1.7 1.7 1.9 1.4  NH 1.3 1.4 1.2 0.6 0.8 0.5  NLS 74.0 64.5 79.5 65.5 57.0 74.1  NHS 1.4 2.1 1.0 1.4 1.1 1.8  SCLCN 36.5 36.8 36.4 35.2 35.7 34.6  SCMCN 17.1 18.0 16.6 15.2 17.3 13.0  SCHCN 18.5 18.1 18.7 13.3 13.7 12.9  , TARM 7.3 6.7 7.7 5.7 5.9 5.5  CM 1.5 1.7 1.4 0.7 0.8 MAER 14.4 11.0 16.4 11.2 10.5 11.9  t  Both Groups Normal Nuclei Abnormal Nuclei Both Groups Normal Nuclei  TARH 7.3 7.2 7.3 5.6 6.1 5.1  TARL 21.2 17.8 23.3 17.8 15.8 19.8  0.7  NMS 6.5 7.3 6.0 3.1 5.1  15 Segmentation Procedures tures ( i n X)  Abnormal Nuclei 1.1 Results f o r images segmented. Results f o r only those images which are c o r r e c t l y segmented by a l l of the 15 procedures. 2  00  TABLE 7 V a r i a t i o n of Continuous Texture Features among the 15 Segmentation Procedures C o e f f i c i e n t s of V a r i a t i o n f o r Features ( i n %)  Both Groups Normal Nuclei Abnormal N u c l e i Both Groups Normal Nuclei Abnormal Nuclei"  IODR 2.9 2.2 3.2 1.7 1.5 1.9  IODG 79.1 26.0 110.8 26.1 27.3 24.8  IODB 121.0 57.0 158.3 60.6 41.3 80.3  ODMax 0.4 0.4 0.4 0.2 0.2 0.2  ODVar 18.5 15.3 20.3 14.8 14.2 15.5  ODSkew 61.0 62.9 59.8 39.2 47.7 30.5  ODKurt 8.4 7.2 9.1 6.4 6.4 6.3  Both Groups Normal Nuclei Abnormal N u c l e i Both Groups Normal Nuclei Abnormal. Nuclei"  REntro 2.1 1.0 2.2 1.8 1.8 1.8  GEntro 2.0 1.9 2.0 1.6 1.6 1.6  BEntro 2.5 2.4 2.6 2.1 2.1 2.0  Energy 12.1 10.3 13.1 8.7 8.9 8.6  Correlation 19.6 16.7 21.2 15.5 14.6 16.4  Contrast 9.5 9.2 9.6 8.2 8.3 8.0  Homogeneity 2.4 2.5 2.3 1.9 2.1 1.7  Both Groups Normal Nuclei Abnormal N u c l e i Both Groups Normal Nuclei Abnormal Nuclei  2  Cluster Shade 8.1 7.1 8.6 6.1 6.2 5.9  Cluster Prominence 258.9 100.1 351.4 243.0 106.2 382.6  DenMax 123.2 74.7 151.4 79.3 59.9 102.2  DenMin 9.5 8.4 10.2 7.6 6.7 8.5  ExtrRange 27.7 17.1 34.0 16.9 12.7 21.1  AveRange 22.3 14.2 27.0 12.9 9.7 16.0  Results f o r images segmented. Results f o r only those images which are c o r r e c t l y segmented by a l l of the 15 procedures.  OO Ul  86  using  two-sample  t  tests,  variances between the two  with  and  groups.  without  The  assuming  equality  equality of variance  of  of each 77  group for each feature was also tested using the Levene W test.  When  the feature d i s t r i b u t i o n s i n each group are not normal (which i s i n f a c t the usual  condition  f o r the abnormal  group)  and  i f the normal  and  abnormal group variances are s i g n i f i c a n t l y d i f f e r e n t , then the accuracy of the test i s questionable.  For this reason a nonparametric s t a t i s t i c ; 7  the  Mann-Whitney  rank-sum  8  test,  was  used.  This  test  is  the  nonparametric version of the two-sample test f o r independent groups. Only  the  following  features  had  group  variances  which  were  somewhat similar (p values >- 0.05); CA, REntropy, BEntropy, TARL, TERL, TERH, CM, features  CMH, only  statistically test,  the  different  MAER, NH, the  group  NHS,  ODKurt, Energy, DCM,  means  of  NH,  (p - <0.005) d i f f e r e n t .  means  of  (p  0.005);  >  the  NHS,  Of these  ODKurt  were  not  For the Mann-Whitney rank-sum  following  features  CM,  CMH,  CH,  and  SCMCN.  were  MHAER,  not NLS,  statistically NHS,  ODKurt,  Correlation, Cluster Prominence, FD, SCLCN, and SCMCN. . While i t i s not feasible to show histograms of the two groups f o r each feature,  figures 22, 23, 24, 25, 26, 27, 28, 29 and 30 show the  histograms f o r some of the more commonly used and interesting features. 3.3  Variation of C l a s s i f i c a t i o n  For each segmentation procedure used i n Table 8,  a l l the  cell  feature data and the human c l a s s i f i c a t i o n of the individual c e l l s were grouped together. procedure  a  This data was used to generate f o r each segmentation  Discriminant  Function  (DF), which  predicted  c l a s s i f i c a t i o n based upon the c e l l u l a r feature data.  the  human  The program used  87  0.12  Nuclear Area a Normal cells • Abnormal cells  0.100.08 -I  l  1  1.0  1  1  1.5  2.0  2.5  Area (pixels) x 10 3  FIGURE 22:  DISTRIBUTION  OF THE NUCLEAR AREA OF NORMAL AND ABNORMAL CELLS  From t h i s graph i t i s c l e a r t h a t n u c l e i o f normal c e l l s a r e g e n e r a l l y s m a l l e r than abnormal c e l l s , however there i s a s i g n i f i c a n t amount o f o v e r l a p between the two groups.  88  Cytoplasm Area  0.14 0.12  fl  a Normal cells • Abnormal cells  0.10-1 cu  0.08  §f 0.06-1  0.04 0.02 0.00  6  Area (pixels) x 10 4  FIGURE 23:  DISTRIBUTION OF CYTOPLASMIC AREA OF NORMAL AND ABNORMAL CELLS  Normal c e l l s g e n e r a l l y have more cytoplasm than However, the two histograms s i g n i f i c a n t l y o v e r l a p .  do  abnormal  cells.  NA/CA Ratio  0.30  • Normal cells • Abnormal cells  0.25-1 rr> 0.20 A  1 1 1 1 1 1 1 1 1 1 "i iT  0.1  0.2  p i  0.3 0.4  1  1 1 1 1 1 1 1 1  0.5  0.6 0.7  Ratio (no units) FIGURE 24:  DISTRIBUTION OF NA/CA RATIO FOR NORMAL AND ABNORMAL CELLS  The NA/CA R a t i o f e a t u r e d i s c r i m i n a t e s q u i t e w e l l between normal and abnormal c e l l s as demonstrated by the histograms i n t h i s graph. However, t h e r e i s s t i l l a s i g n i f i c a n t a r e a o f o v e r l a p between the two histograms.  90  O cu  CU fe  Nuclear IOD  0.50 0.45 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00  • Normal cells • Abnormal cells  i!  0.0  0.2  0.4  0.6  IOD x FIGURE 25:  0.8  10 3  1.0  1.2  DISTRIBUTION OF THE NUCLEAR IOD OF NORMAL AND ABNORMAL CELLS  In this graph the d i s t r i b u t i o n of the normal c e l l s i s sharply peaked around the d i p l o i d peak at 120, with a few t e t r a p l o i d c e l l s indicated by the much smaller normal c e l l peak at 260. The abnormal c e l l IOD d i s t r i b u t i o n also has c h a r a c t e r i s t i c d i p l o i d and t e t r a p l o i d peaks. The small amount of overlap between these two d i s t r i b u t i o n s indicates that nuclear IOD i s a very discriminating feature.  Compactness (unitless) FIGURE 26:  DISTRIBUTION OF COMPACTNESS RATIO OF NORMAL AND ABNORMAL CELLS  There i s o n l y a v e r y s l i g h t d i f f e r e n c e between the compactness r a t i o d i s t r i b u t i o n s o f normal c e l l s and abnormal c e l l s , i n d i c a t i n g t h a t t h i s f e a t u r e by i t s e l f i s n o t t h a t d i s c r i m i n a t i n g b u t might be i f used along w i t h other f e a t u r e s .  DNum  0.14  • Normal cells • Abnormal cells  0.120.100  0.08-  CD  0.06  fe  0.04  0.020.00 } • " |H« -20 -10 0  nf  lTL-n J1, r  20 30 40 50 60 70  DNum (number of points) FIGURE 27:  DISTRIBUTION OF DNum FOR NORMAL AND ABNORMAL CELLS  DNum i s a n o n - l i n e a r combination o f f e a t u r e s found i n the l i t e r a t u r e which appears i n t h i s graph t o be f a i r l y d i s c r i m i n a t i n g between normal and abnormal c e l l s . A l s o o f note from t h i s graph a r e the shapes o f the two h i s t o g r a m s . Of a l l o f the r e a s o n a b l y d i s c r i m i n a t i n g f e a t u r e s DNum i s the o n l y one f o r which the f e a t u r e d i s t r i b u t i o n f o r each group tends to be t h a t o f a normal d i s t r i b u t i o n .  Intensity Correlation i n Nucleus  9  a Normal cells • Abnormal cells  8 I  X  76  II 11  5 4 3 2-  1-1 o  0  2  4  6  8  10  12 14  Correlation (no units) FIGURE 28:  DISTRIBUTION OF THE MARKOV TEXTURE FEATURE CORRELATION FOR NORMAL AND ABNORMAL CELLS  In t h i s graph c o r r e l a t i o n does n o t appear t o be v e r y d i s c r i m i n a t i n g between normal and abnormal c e r v i c a l c e l l s . When used i n d i s c r i m i n a n t f u n c t i o n a n a l y s i s w i t h o t h e r f e a t u r e s , however, c o r r e l a t i o n i s one o f the most d i s c r i m i n a t i n g f e a t u r e s .  Total Area Ratio High Density Chromatin 0.14  0.12-  I  6  J  a Normal cells • Abnormal cells  0.10 0.08  cu fe  1.0  TARH (no units) FIGURE 29:  DISTRIBUTION OF THE DISCRETE TEXTURE FEATURE TARH FOR NORMAL AND ABNORMAL CELLS  In t h i s graph TARH does n o t appear t o d i s c r i m i n a t e between normal and abnormal c e l l s . A l s o o f note i s the non-normal d i s t r i b u t i o n o f t h i s feature. T h i s i s the case f o r most o f the d i s c r e t e t e x t u r e f e a t u r e s .  Fractal Dimension  9 C\2 I  X  >^ CD CD fe  a Normal cells • Abnormal cells  87654 3^ 2 1-1  0 2.0  2.2  2.4  3.0  FD (dimension number) FIGURE 30:  DISTRIBUTION OF THE FRACTAL DIMENSION FEATURE FOR NORMAL AND ABNORMAL CELLS  The f r a c t i a l dimension feature does not appear to be able to discriminate between normal and abnormal cervical cells. The d i s t r i b u t i o n of this feature f o r both normal and abnormal c e l l s seems to be the same and close to that of a normal d i s t r i b u t i o n . When used i n discriminant function analysis i t was found to be moderately discriminating. I t i s also worth noting that a l l the values f a l l between 2.0 and 3.0 as required f o r a 2 dimensional Euclidian surface.  to generate and test  the DF was the commercial stepwise  discriminant  analysis package, 7M, which i s part of the BioMedical Data Processing (BMDP) package available as part the University of B r i t i s h Columbia MTSG computing  network.  The use and a p p l i c a t i o n of this 7  described  i n the BMDP  Statistical  Software  package i s  4  manual.  This  package  allows the user to, force the program to use a l l of the features i n the DF,  or l e t the program select the most discriminating features  used i n the DF, or l e t the user select a subset used to form the DF.  Table  All  present  Selected  generated available  columns  f o r the various features.  of the features to be  8 has examples of a l l three methods. the c l a s s i f i c a t i o n  accuracy  segmentation procedures  The  BMDP  Selected  to be  of the DF  using  columns  The  a l l of the present  the  c l a s s i f i c a t i o n accuracy of the DF composed of features selected by BMDP. In this case, BMDP was programmed such that during forward stepping a l l the  features  were entered  into  the DF.  The backward  stepping was  programmed so that only features which s i g n i f i c a n t l y contributed to the DF performance were allowed used  to determine  performance.  to remain i n the DF.  i f a feature  contributed  The F s t a t i s t i c was  significantly  to the DF  The significance l e v e l was set to 0.5% ( i . e . o: - 0.005).  The Author Selected columns present the c l a s s i f i c a t i o n accuracy of the DF composed of features which upon v i s u a l examination of one-, twoand three-dimensional features used were; ODSkew, ODKurt,  graphs appeared to separate  the two classes. The  CA, NA, NMaxR, NComp, REntropy, IODR, ODMax, ODVar,  NElong,  NBdycrc,  DMax, DMin,  Contrast,  Homogeneity,  Cluster Shade, DCM, TARH, Ratio, AreaH, DNum. The Regular C l a s s i f i c a t i o n heading i n Tables 8 and 9 indicates the c l a s s i f i c a t i o n accuracy of the DF on the learning set of c e l l s and data.  TABLE 8 C l a s s i f i c a t i o n Results  f o r V a r i o u s Segmentation Procedures and Combinations o f F e a t u r e s F e a t u r e s used i n C l a s s i f i c a t i o n  Segmentation Method 1 2 3 4 6 3-7 3-8  Cell Type  normal abnormal • normal abnormal normal abnormal normal abnormal normal abnormal normal abnormal normal abnormal  Number of c e l l s 301 505 310 491 307 502 306  99..0 97,.6 98,.0  492 294 515 279 474  97,.6 98, 6 9 6 , .7 98,.2 97,.0  Classification  Selected Jackknife Class.  99,,0%(3) 98,.2 (9) 9 8 , .7 ( 4 ) 96, .9 (15) 9 8 , ,7 ( 4 ) 96. .6 (17)  505 300  ( ) denotes number o f c e l l s Class. -  All Regular Class.  (3) (12) (6) (12) (4) (17) (5)  (U)  misclassified.  98 .3% 96 .4 98,.4 95, .3 9 8 , .7 95, .8 98,.7 95, .8  (5) (18)  97,.3 95, .5 98..3 95. .3 97,.8 95, .6  (8) (22)  (5) (23) (4) (21) (4) (21)  (5) (24) (6) (21)  BMDP S e l e c t e d Regular Jackknife Class. Class.  Autbor Regular Class.  9 9 , ,0X 97,.4 99,.0 96, .5 9 8 . .7 96, .0 99,.0 96, .6 98,.3  9 8 , ,7% 9 4 , .1 9 8 , .1 93. .4 9 8 . .8 92. .0  97,.0 99. .3 96. .5 98..6 97..7  (3) (13) (3) (17) (4) (20) (3) (17) (4) (15) (2) (18) (4) (11)  9 8 . .72 : ( 4 ) 9 7 . .0 ( 1 5 ) 99,.0 (3) 96. .3 (18) 9 8 . ,7 ( 4 ) 95, .2 ( 2 4 ) 99..0 (3) 96. .4 (18) 98..0 (6) 95. .9 99. .3 9 5 . .7 98..2 97,.3  (2) (22)  98..0 9 2 . .5 97..0 92, ,7 98,.0 92, ,0  (5) (13)  9 7 , ,5 9 2 , .8  (20)  (4) (30) (6) (32) (7) (40) (6) (38) (8) (36) (6) (41) (7) (34)  Selected Jackknife Class. 98.,3%(5) 9 3 , ,9 ( 3 1 ) 9 8 , ,1 ( 6 ) 9 2 . .5 ( 3 7 ) 9 7 . .4 ( 8 ) 9 1 . .8 ( 4 1 ) 98, ,0 ( 6 ) 9 2 . ,3 97.,3 9 2 , ,7 98. 6 9 1 . ,7 9 7 , ,1 92, .4  (39) (9) (36) (7) (42) (8) (36)  TABLE 8 (Continued) C l a s s i f i c a t i o n Results  f o r Various  Segmentation Procedures and Combinations o f F e a t u r e s  Features  Segmentation C e l l Method Type 1-9 2-9 3-9  4 - 9 6-9  3-7-  9  3-8- 9  1-9- 9  norm abnorm norm abnorm norm abnorm norm abnorm norm abnorm norm abnorm norm abnorm norm abnorm  A l l Selected Normal Jackknife Class. Class.  BMDP S e l e c t e d Normal Jackknife Class. Class.  Author Normal Class.  315  99..0% (3)  97, .8% (7)  9 8 , .45! : ( 5 )  98..4?:(5)  9 9 .43! : ( 2 )  519  97,.3  (14)  95,.8  (22)  96,.0  (21)  95,.4  (24)  9 2 , .7  (38)  91,.9  (42)  315  9 8 . .7  (4)  97,.8  (7)  99,.0  (3)  99,.0  (3)  99 .4  (2)  99,.4  (2)  518  9 6 , .7  (17)  95,.2  (25)  95,.9  (21)  95,.6  (23)  91,.5  (44)  9 1 , .1  (46)  314  9 8 , .1  (6)  97,  (7)  99..0  (3)  9 8 , ,7  (4)  98,.4  (5)  9 8 , .1  (6)  519  9 7 , ,1  (15)  95..6  (23)  96,.3  (19)  95,,4  (24)  91,,9  (42)  91,.3  (45)  314  98. ,1  (6)  97. .8  (7)  99..0  (3)  9 8 . .7  (4)  98,,4  (5)  9 8 , .1  (6)  522  9 7 , .7  (12)  9 6 . .7  (17)  96,.4  (19)  96,.2  (20)  92, .3  (40)  92, .3  (40)  310  9 8 , .7  (4)  98..4  (5)  99..0  (3)  99..0  (3)  9 8 , .7  (4)  98,.4  (5)  518  9 6 , .7  (17)  95,.8  (22)  95,.8  (21)  95,.9  (22)  90..9  (47)  9 0 . .7  (48)  297  9 8 , .7  (4)  97,.3  (8)  99 .3  (2)  99..0  (3)  99..0  (3)  9 8 , .7  (4)  521  96..9  (16)  96,.0  (21)  96,.2  (20)  95,.4  (24)  91..6  (44)  91..2  (46)  294  98,.3  (5)  97,.6  (7)  99 .0  (3)  98,.6  (4)  98 .6  (4)  98,.6  (4)  520  96..5  (18)  95 .2  (25)  96 .3  (19)  95,.8  (22)  91..2  (46)  90..8  (48)  315  9 8 , .7  (4)  97,.8  (7)  9 8 .7  (4)  97,.8  (7)  99..0  (3)  9 8 . .7  (4)  520  97,.5  (13)  96 .4  (16)  97 .1  (13)  9 6 , .7  (17)  92..9  (37)  9 2 . .1  (41)  Number of Cells  ( ) i n d i c a t e s number o f c e l l s Class. -  used i n C l a s s i f i c a t i o n  Classification  misclassified.  8  Selected Jackknife Class. 99,.42:(2)  TABLE 9 Classification  Results f o r Various Segmentation Procedures and Combinations of Features F e a t u r e s used i n C l a s s i f i c a t i on  Segmentation Method 1 2 3 4 6 3-7 3-8  Cell Type  normal abnorm norm abnorm norm abnorm norm abnorm norm abnorm norm abnorm norm abnorm  Number of c e l l s  Simple Nuclear Regular Jackknife Class. Class.  Simple Nuclear & Cytolplasm BMDP Min. S e l . Regular Jackknife Regular Jackknife Class. Class. Class. Class.  301 505 310 491 307 502 306 505 300 492 294 515 279 474  83..7% 86 .1 94,.2 80..4 84..4 81..9 81,.7 83 .4 88..7 82 .9 76 .5 81 .9 89 .6 83 .1  97..7% 92..3 97..4 91..2 98..0 90..2 98..0 89..9 97..7 90,.7 96,.9 89..1 95,.7 91 .4  (49) (70) (18) (96) (48) (91) (56) (84) (34) (84) (69) (93) (29) (80)  83..4% 86..1 93..5 80..2 84..0 81..7 81..4 83,.2 88..7 82,.3 76..2 81,.7 89..2 82..9  (50) (70) (20) (97) (49) (92) (57) (85) (34) (87) (70) (94) (30) (81)  (7) (39) (8) (43) (6) (49) (6) (51) (7) (46) (9) (56) (12) (41)  97..3% 92..1 97,.4 91..0 97,.7 90..0 98,.0 89..7 97,.7 90..4 96 .9 88..9 95..3 91,.4  (8) (40) (8) (44) (7) (50) (6) (52) (7) (47) (9) (57) (13) (41)  98,.3% 97,.4 98,.1 96,.7 98,.0 96..2 98 .4 96,.8 97,.7 96,.3 98 .6 95,.5 97,.1 95 .8  (5) (13) (6) (16) (6) (19) (5) (16) (7) (18) (4) (23) (8) (20)  98 .0% 96 .8 97 .7 95 .9 98 .0 96 .0 98 .4 96 .2 97 .3 95 .9 98 .6 95 .3 96 .4 95 .4  (6) (16) (7) (20) (6) (20) (5) (19) (8) (20) (4) (24) (10) (22)  £ ) denotes number of c e l l s m i s c l a s s i f i e d . f e a t u r e s used were: NA, NComp, IODR, ODVar and ODMean. Features used were: CA, Ratio, NA, NComp, IODR, ODVar and ODMean. Minimum number of features selected by BMDP f o r any of the segmentation methods i n Table 6. These features were: CA, Ratio, NA, NMeanR, RMeanI, GMeanI, BMeanI, IODR, ODMax, ODVar, Correlation, Homogeneity, FA2, FD, DNum and HAER. Class. = C l a s s i f i c a t i o n 3  VO  TABLE 9 (Continued) C l a s s i f i c a t i o n Results  f o r Various  Segmentation Procedures and Combinations o f F e a t u r e s  Features  used i n C l a s s i f i c a t i o n  1  Segmentation Method  1-9 2-9 3-9 4-9 6-9 3-7-9 3-8-9 1-9-9  Cell Type  Number of Cells  norm abnorm norm abnorm norm abnorm norm abnorm norm abnorm norm abnorm norm abnorm norm abnorm  315 519 315 518 314 519 314 522 310 518 297 521 294 520 315 520  Simple N u c l e a r Normal Jackknife Class. Class.  72. 1% 86. 9 77..4 85. 1 72. 6 86.,1 70..7 86..6 70,.0 84,.7 69 .0 86 .0 70 .4 82 .9 72 .1 87 .9  (86) (68) (87) (77) (86) (72) (92) (70) (93) (79) (92) (73) (87) (89) (88) (63)  72.7% 86. 9 71. 7 84. 7 72. 6 86. 1 69.,7 86..2 69,,7 84,.2 68,.7 85,.6 70 .1 82 .5 71 .4 87 .7  (86) (68) (89) (79) (86) (72) (95) (72) (94) (82) (93) (75) (88) (91) (90) (64)  Simple N u c l e a r & Cytoplasm Normal Jackknife Class. Class.  97. 1% (9) 91..7 (43) 97. 5 (8) 90. 3 (50) 97. 5 (8) 91.,1 (46) 96..5 ( I D 90.,4 (50) 97,,4 (8) 90,.0 (52) 96,.9 (10) 90,.6 (49) 98 .0 (6) 89 .8 (53) 96 .8 (10) 90 .4 (50)  96. 8% 91. 1 97..1 90. 2 97. 5 90. 9 96..5 90,,0 96, 8 90..0 96,.6 90,.4 98 .0 89 .4 96 .5 90 .4  (10) (46) (9) (51) (8) (47) (11) (52) (10) (52) (10) (50) (6) (55) (11) (50)  2  BMDP Minimum S e l e c t e d Normal Jackknife Class. Class.  98.4% 96. 0 98.,4 95.,2 99.,0 95.,4 98,,1 96,.0 98,.7 95,.9 98 .7 95 .4 98 .3 95 .2 98 .7 96 .0  (5) (21) (5) (25) (3) (24) (6) (21) (4) (21) (4) (24) (5) (25) (4) (21)  98.,4% 95.,4 98..4 94. 8 98..7 94.,6 98,,1 95,.6 98..3 95,.8 98,.3 94 .8 98 .3 95 .2 98 .7 95 .6  £ ) denotes number o f c e l l s m i s c l a s s i f i e d . Features u s e d were: NA, NComp, IODR, ODVar and ODMean. F e a t u r e s used were: CA, R a t i o , NA, NComp, IODR, ODVar and ODMean. Minimum number o f f e a t u r e s s e l e c t e d by BMDP f o r any o f the segmentation methods i n Table 6. These f e a t u r e s were: CA, R a t i o , NA, NMeanR, RMeanI, GMeanI, BMEanI, IODR, ODMax, ODVar, C o r r e l a t i o n , Homogeneity, FA2, FD, DNum and HAER. Class. = C l a s s i f i c a t i o n 3  (5) (24) (5) (27) (4) (28) (6) (23) (5) (22) (5) (27) (5) (25) (4) (23)  The  jackknife classification  heading  i n Tables  8 and  c l a s s i f i c a t i o n a c c u r a c y o f the average DF generated subdivided  into  Lachenbrach et  l e a r n i n g and al.  test  sets  9  indicate  the  when the data are  i n the f a s h i o n  suggested  by  79 79  The  j a c k k n i f e procedure  o r Lachenbrach  "holdout"  procedure  can  80 be d e s c r i b e d as f o l l o w s 1)  Start  :  with  a  observation function (two  group  from  based  this  of N  observations.  x  Leave  group and c a l c u l a t e  on the remaining  a  out  one  classification  N ^ - l , and N  observations  2  group c a s e ) .  2)  C l a s s i f y the o b s e r v a t i o n l e f t out i n s t e p 1.  3)  Repeat steps been  1 and 2 u n t i l  classified  while  a l l o f the N  being  left  observations  x  out.  Let n ^  have  be the  number o f l e f t out o b s e r v a t i o n s m i s c l a s s i f i e d i n group Nx. u  4)  Repeat s t e p s 1 through number  of  left  3 f o r the N  out o b s e r v a t i o n s  2  group.  Let n2  m i s c l a s s i f ied  be the  M  i n the  N  2  group. Let  xii be the number o f o b s e r v a t i o n s  number  of  observations  misclassification for The  i s just  group H n ^ /nx  and n  x  N .  2  be the  The  jackknife H f o r group Ni and n /n 2  2 M  2  group N . 2  jackknife  r e p r e s e n t a t i o n o f how there  rate  in  i n group N  i s no t h e o r y  classification  accuracy  the DF would perform  which can estimate  is  a  on o t h e r  much test  more sets.  true While  the e r r o r o f the c l a s s i f i c a t i o n 79  a c c u r a c y o f the DF on o t h e r data, e m p i r i c a l s t u d i e s  i n d i c a t e t h a t the  d i f f e r e n c e between the c l a i m e d j a c k k n i f e c l a s s i f i c a t i o n a c c u r a c y and the actual  performance  on a new  data  s e t should  be w i t h i n  ±  20% o f the  jackknife  classification  error.  For example,  i f the jackknife  c l a s s i f i c a t i o n accuracy i s 95%, and thus the c l a s s i f i c a t i o n error i s 5%, then the actual performance  of the DF on a new data set should range  between 94 - 96%. The actual feature and c e l l c l a s s i f i c a t i o n data used to generate and test the Dfs f o r Tables 8 and 9 were the data from the 800 image subset  of the 1000 image database  used  to examine  the segmentation  performance of the various algorithms, so that the same c e l l s were used to generate a l l DFs. In Table 9 a l l the features used were user selected.  The f i r s t  set of features, the Simple Nuclear features, were chosen to match those most commonly  used i n the l i t e r a t u r e  excluding cytoplasmic features.  The next set, Simple Nuclear and Cytoplasmic, includes these features and the cytoplasmic ones. In Table 8 BMDP selected the features used i n one of the sets of DF.  The number of features  method d i f f e r e d .  selected by BMDP f o r each  segmentation  The average number of features used was 23 and ranged  from a low of 16 f o r procedure 1 + 9 + P to a high of 29 f o r procedure 3 + 8 + P. Looking f o r s i g n i f i c a n t differences i n the c l a s s i f i c a t i o n accuracy between the various segmentation methods, the minimum set of 16 features  selected  by BMDP were  segmentation procedures.  used  The results  to form  the Dfs f o r a l l the >  are presented i n the l a s t two  columns of Table 9. As  stated  before,  there  i s no easy  way  combination of features are most discriminating, possible combinations of features.  to determine  which  short o f testing a l l  Since the features used to form the  DF can be ranked according to their calculated F to remove s t a t i s t i c as  generated by the BMDP 7M program, a l l the BMDP selected features f o r each  segmentation  procedure  examined  i n Table  8 were  ranked.  The  average rank of each feature across a l l 15 segmentation procedures was calculated and used to rank the BMDP selected features. are presented i n Table 10. average ranks calculated  These  results  While the ranking of the features and the  for each feature  should not be taken as an  absolute ranking of a feature's discriminating power, i t does roughly indicate  the r e l a t i v e  discriminating  power  o f each  feature.  For  example, while the feature Ratio i s ranked 5th, i t does not mean that Ratio i s always more discriminating than i s the feature GMeanI; however Ratio i s very l i k e l y more discriminating than the feature FD.  TABLE 10 Feature Importance i n Discrimant Function Analysis Feature RMeanI Correlation ODVar NMeanR Ratio (NA/CA) GMeanI ODMax IODR BMeanI NA DNum HAER FD Cluster Shade ODSkew Homogeneity Contrast CRMH TARM ODKurt CRL BEntropy TARH CRM SCMCN CA Energy TERL FA2 Cluster Prominence NL Dmln NM NComp NIert DMax NMaxR NH DCM NMS NBdycrc NElong NRVar NBdyfin  Mean Rank ± S . P . 2 3 4 5 5 6 8 10 10 15 16 16 16 16 18 18 18 19 19 19 19 20 20 21 22 21 21 21 22 22 22 22 22 23 23 23 23 23 23 23 23 23 23 23  ± ± ± ± ± ± ± + ± + ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± ± +  1 2 4 3 2 2 5 4 3 6 5 4 6 6 9 7 6 5 5 5 5 5 5 4 4 4 5 4 4 4 3 1 1 2 2 2 2 1 1 1 1 1 1 1  4.  DISCUSSION  4.1.  Segmentation Accuracy  I t was stained obtain  found  cervical an  algorithm  that  the best procedure  cells  i s algorithm 1  to segment  initial  nuclear segmentation,  9  then  and  f o r segmenting  followed by  postprocessing.  In  Table  cytoplasmic segmentation procedures were examined. segmentation was  the 2 2  images of  cytoplasm  and  iterations  of  six  different  The best cytoplasmic  achieved by the primary segmentation algorithm number  1, which c o r r e c t l y segmented 97.4% of the images,  followed c l o s e l y  by  correctly segmented 96.0%  primary  segmentation  of the images.  algorithm 2,  which  Both of these methods depend  upon the global thresholding of the blue image to segment the cytoplasm. Algorithm 1 uses an unmodified histogram while algorithm 2 uses modified histograms.  The  difference  algorithms i s very s l i g h t . that the background uniform.  the  performance  of  They both work so well because  these  two  they assume  i n t e n s i t y d i s t r i b u t i o n i s very narrow and s p a t i a l l y  This i s almost  decalibrated.  between  always  the case  once the  images have been  Thus, i t i s usually very easy f o r the algorithms to f i n d  the correct threshold. Primary segmentation algorithm 5 segments the cytoplasm c o r r e c t l y for 93.3% of the c e l l s . two because was  i s s l i g h t l y less than the other  i n some of the red images the cytoplasm i s very f a i n t  occasionally  assumes that  Its performance  incorrectly  there w i l l  between the background  be  segmented more than a  as  background.  1 or  2 gray  and  Algorithm level  5  contrast  and the cytoplasm i n a l l three color images, and  this i s not always true of the red images.  Primary segmentation  algorithms 3 & 4 gave very s i m i l a r  (90% and 89.3% respectively). other algorithms because  These results are s l i g h t l y worse than the  these algorithms assume that s i g n i f i c a n t edges  e x i s t at the boundaries of the cytoplasm which i s not always Both  of  determine  these  algorithms  use  spatially  localized  a threshold which varies across the image.  the case.  information  i n t e n s i t y varies from area to area i n the image. tested,  the  image background  varies  to  These algorithms  work as well or better than algorithms 1, 2 and 5, when the  database  results  background  However, i n the image very  little  and  these  algorithms (3 and 4) do not perform as well. Primary  segmentation  algorithm  (74.7%) on cytoplasmic segmentation.  6  produced  the  worst  results  This algorithm assumes there i s  s i g n i f i c a n t contrast between the background  and the cytoplasm and  the cytoplasm i s r e l a t i v e l y uniform i n intensity.  that  In a f a i r number of  the images, one or both of these conditions are not true and hence the algorithm  does  not  correctly  segment  the  cytoplasm  of  many of  the  performed  by  c e r v i c a l c e l l images. The  best primary  segmentation  algorithm  3 which  correctly  images.  In  increased  the number of c o r r e c t l y  general  a l l of  of the nucleus was  segmented the  the  nuclei  secondary  i n 61  of  segmentation  segmented n u c l e i .  Of  algorithm, algorithm 9 on average  150  algorithms the various  combinations of primary segmentation algorithms with a single segmentation  the  secondary  provided the greatest  improvement i n the nuclear segmentation, y i e l d i n g approximately 75 more n u c l e i per primary segmentation algorithm. While  the postprocessing routine  does not  actually  change  the  segmentation of the c e l l i t does remove some of the a r t i f a c t s which have  been  mistakenly  segmented  increases  the  number  correctly  identified.  of  as  nuclear  images The  objects.  for which  post  the  processing  Thus,  the  nuclear  routine  routine  material  increases  is the  accuracy of the nuclear segmentation almost as much, on average, as does algorithm 9.  For some of the primary segmentation algorithms  (2, 3,  the postprocessing routine increases the nuclear segmentation  4)  accuracy  more than does algorithm 9. Not algorithm  surprisingly, 9  and  the  best  postprocessing  results were  were  obtained  employed.  The  when  best  both  nuclear  segmentation results were obtained when 2 i t e r a t i o n s of algorithm 9 were employed.  The worst nuclear segmentation was  obtained by algorithm 4  (15.3%). For  primary  segmentation  algorithms,  the  nuclear  segmentation  accuracy generally r e f l e c t s their a b i l i t y to ignore a r t i f a c t s i n the red image, i . e . dark areas due to overlapping or folded cytoplasm, artifacts,  or d i r t .  This can be  segmentation methods plus nuclear post  post  segmentation accuracy  processing were only  seen i n the results  processing.  The  between algorithms  -6%,  whereas  for the  of the  differences 1,  2,  3 and  primary  staining primary in 4,  the plus  segmentation  algorithms the difference i n the nuclear segmentation accuracy ranged up to 25%  for algorithm 3 and 4.  non-nuclear  Since the post processing only removes  objects from the segmentation, these results indicate that  algorithm 4 i n c o r r e c t l y segments many more a r t i f a c t s as n u c l e i than does algorithm 3. The  primary  segmentation  delineating the n u c l e i (19.3%).  algorithm  6  does  a  poor  job  of  I t does not segment many a r t i f a c t s as  108 nuclei  as demonstrated by the only moderate  increase  i n the correct  nuclear segmentation rate (33.3%) when post processing i s also applied. Using  the  secondary  segmentation  algorithm  7  improves  the  segmentation accuracy f o r a l l the applicable primary algorithms.  This  algorithm however, generates a threshold to segment the images and this again r e s u l t s i n a large number of a r t i f a c t s being segmented as n u c l e i as demonstrated  by the increase of 40% i n segmentation accuracy f o r  procedure 3 and 7 (46.7%) versus procedure 3 and 7 plus post processing (86.7%).  The  results  are similar  f o r the secondary  segmentation  algorithm 8. The increase  secondary i n nuclear  segmentation artifacts  segmentation  algorithm  segmentation  algorithms.  This  9  accuracy  algorithm  generates of  the largest  a l l the  secondary  i s not as susceptible  to  as the other algorithms as indicated by only a s l i g h t -12%  increase i n the nuclear segmentation accuracy when post processing i s performed. Comparing Table 3 results with Table 2 r e s u l t s , one finds a 6.0% increase i n the correct nuclei segmentation rate, r e f l e c t i n g  the fact  that the large database was not preselected to contain images which are very d i f f i c u l t Processing  to segment.  procedure  which  In Table 3 i t i s the 1 + 9 + 9 + Post gives  the highest  cytoplasm) correct segmentation rate.  combined  (nucleus and  For the n u c l e i i n the 1000 image  database the correct segmentation rate varied from 52.5% to 98.0% across the various segmentation procedures presented i n Table 3.  There i s not  much difference between the f i v e best procedures (ranging from 97.6 to 98.0)  i n nuclear segmentation accuracy.  use one or more i t e r a t i o n s of algorithm 9.  A l l of these f i v e procedures  Only the 3 + 8  segmentation procedures produced poorer results on  the 800 image database than on the 150 image database.  Also the 3 + 8  procedure was worse than the procedure 3 f o r the images used for Table 3.  On closer examination i t was apparent that the accuracy of algorithm  8 i s more influenced by the large variations  i n the i n t e n s i t y of the  nucleus than are the other secondary segmentation procedures. nuclei  usually  nuclei.  The  exhibit  images used  predominantly abnormal database  more  were  of  variation  to determine  images.  abnormal  in  intensity  the  results  than  Abnormal do  normal  i n Table  3 were  Only 1/5 of the images i n the 150 image cells.  Thus,  the  3 + 8  procedure's  performance was worse on the larger database. In  Table  4,  the  63  (3680-3617) images which  segmented n u c l e i , can be subdivided into 3 groups. of  31  images,  no  nucleus was  recognized the presence removed i t .  These  found because  of an incorrectly  the  gave  incorrectly  In the f i r s t  group  postprocessing step  segmented nucleus and  thus  images can be readily recognized by the system as  i n c o r r e c t l y segmented c e l l s .  The second group of 28 images contained  nuclei which were "mildly incorrect segmentation" to indicate the cases i n which the segmentation errors should not cause the c e l l i n the image to be m i s c l a s s i f i e d by an automated c l a s s i f i c a t i o n procedure as features such  as  IOD  would  be  only  slightly  affected.  Many  features,  p a r t i c u l a r l y most of the texture features would not be affected at a l l by  these s l i g h t  errors  i n segmentation.  The  last  group  of 4 images  contained n u c l e i so poorly segmented that they could be m i s c l a s s i f i e d by an automated c l a s s i f i c a t i o n procedure. The r e s u l t s thus indicate that i f this procedure i s employed i n a fully  automated c e r v i c a l  cell  screening, only 4 out of approximately  4000  nuclei  segmentation  (0.1%)  analyzed  could  be  misclassifled  due  to  the  algorithm used.  The accuracy of the better segmentation  procedures  tested i n this  work compares favorably with the results stated by other authors.  Borst  8 1  et al.  reports a nuclear segmentation accuracy of 87% when using t h e i r  algorithm on  322  monochromatic  images of Pap-stained  cervical  For a set of 148 two color Pap-stained c e r v i c a l c e l l s N o r d i h correct segmentation segmentation  of  the  incorrectly  reports  of the nuclei i n 82% of the images and of correct (  of the cytoplasm i n 68% of the images.  On Pap-stained c e l l s , Nordin 82%  11  cells.  nuclei  in a  segmented,  148  the  11  i 1  reports the correct segmentation  image  system  database  failed  i n c o r r e c t l y segmented 3 of the n u c l e i .  and  to  of  the  recognize  26  that  of  nuclei i t had  Thus, 2% of the n u c l e i segmented  by this system could be i n c o r r e c t l y segmented and not recognized as such by  the system and possibly be  incorrectly c l a s s i f i e d .  results f o r Pap-stained c e r v i c a l c e l l s .  The  These are  same segmentation  method  (the nuclear r a d i a l contouring algorithm) c o r r e c t l y segmented 91.5% the  nuclei  algorithm's  in  the  1000  image  improvement being  due  database to  used  in  this  work.  the better images on  the  of The  which i t  worked, i . e . stained with a quantitative nuclear s t a i n and large colour separation between the nuclear and cytoplasmic stains.  However, this i s  s t i l l much less than the performance of the best nuclear  segmentation  procedure used i n this work.  Also, only 0.1% of the nuclei segmented by  the most accurate  i n this  procedure  thesis  could be  expected  to  be  m i s c l a s s i f l e d due to incorrect segmentation while 2% of the c e l l s could be m i s c l a s s i f i e d due to the segmentation  i n the work reported by Nordin.  Ill  4.2.  Feature Variation  It was found that features are affected by the segmentation method used. the of  The c o e f f i c i e n t of v a r i a t i o n (CV) of each nuclear feature, due to  segmentation procedures was calculated to determine the s e n s i t i v i t y the features to exactness of the delineation of the nucleus.  results are shown i n Tables 5, 6 and 7. results  f o r those  images  which  were  The  The c o e f f i c i e n t of v a r i a t i o n  correctly  segmented  by  a l l 15  procedures indicate how much a feature could vary given that the nucleus had been c o r r e c t l y segmented where "correct segmentation" i s determined by a human observer.  Another way  much the  features  would  vary  manually  segmented  the  images,  of interpreting these results i s how  i f different or  i f the  individuals  were  same  individual  from  0.4%  to have manually  segmented the images a number of times. The  coefficients  of v a r i a t i o n  ranged  feature to a high of 258.9% f o r the Cluster Prominence  f o r the ODMax feature.  The  average CV value was 21.9%. As one would expect, those features which measure the shape of the nuclear  boundary  (NRVar,  NBdycrc,  and  NBdyfin)  vary  more  than  the  features which measure bulk properties of the nucleus (NA, RMeanI, IODR, etc.).  In table 5, some features which one would expect to be sensitive  to variations  i n the nuclear segmentation, such as the shape  NComp, NIert, and NMaxR, appear not to be such.  features  Some features i n table  5 which are s u r p r i s i n g l y sensitive to the segmentation method are FA1, FA2 and  DCM.  On closer examination, the large v a r i a t i o n of DCM  i s due  to the fact that the mean DCM value for most n u c l e i i s close to zero and hence as causes DCM  the denominator  i n the c o e f f i c i e n t  of v a r i a t i o n  to appear a r t i f i c i a l l y sensitive to segmentation.  calculation The large  CVs of FAl and FA2 i s due to the large changes i n o p t i c a l density at the edges  of  the  nucleus.  A  small  change  (one  or  two  pixels)  i n the  p o s i t i o n of the boundary of the nuclear mask can r e s u l t i n the inclusion of several pixels whose o p t i c a l density values d i f f e r s i g n i f i c a n t l y from their  neighbors.  This  difference  is  expressed  as  large  height  differences between neighboring pixels i n the three-dimensional surface representation,  (see d e f i n i t i o n of FAl and FA2  i n sizable changes i n the area of FAl and FA2. new  pixels into FAl without  i n section 4) r e s u l t i n g Since one cannot include  including the corresponding area into  FA2,  the changes i n FAl and FA2 due to segmentation method have a tendency to cancel.  This i s seen i n the low variance of the FD feature, which i s  calculated from FAl and  FA2.  It i s the low density chromatin features, TARL, TERL, CL, NL, and SCLCN (Table 6) which have the largest CVs.  NLS  Since most of the low  density chromatin i s usually located near the edges of the nucleus, the amount and varies  i n t e n s i t y d i s t r i b u t i o n of the low density chromatin  greatly  boundaries.  with  small  changes  in  the  location  of  the  pixels nuclear  MAER, HAER and MHAER have large variances because they are  features which are normalized by  the mean o p t i c a l density of the  low  density chromatin. The features SCMCN and SCHCM have large c o e f f i c i e n t s of v a r i a t i o n for the same reason that the feature DCM  does, i . e . t h e i r mean values  are close to zero. In Table 7 one can see that the important feature IODR i s not very dependent upon the segmentation i s probably due  to two  factors:  method, while IODG and IODB are. 1) as one  This  gets further from the red  part of the spectrum, the nuclear s t a i n absorbs less l i g h t , thus and the  overall closer  absorption to  the  l i g h t and  of  blue  the  part  nucleus  of  the  becomes  2)  smaller;  spectrum,  the  as  cytoplasm  one  gets  absorbs  more  the c o r r e c t i o n f o r c y t o p l a s m i c a b s o r p t i o n becomes l a r g e r .  For IODG most o f the v a r i a n c e i s due  to the d i f f i c u l t y  the c o r r e c t i o n f o r the l i g h t a b s o r p t i o n o f the cytoplasm. feature  this  nucleus  i s f u r t h e r complicated  ( i t can  actually  a b s o r p t i o n o f the The of  the  ODVar and  of  the  the n u c l e u s  are  tail  measure  of  can  regions and  changes i n the t a i l Some  nucleus  values  boundary o f  variance  small after  the  the  sensitive  because  IODB v a l u e s correcting  add  of  or  the  to the  they  i n the n u c l e u s .  of  the  for  the  nuclear  determination  both  describe  Slight variations  remove p i x e l s OD  skewness measure  with  OD  are  i n the  values  distribution. sensitive  the  which  Both  to  the  the  small  r e g i o n s o f the d i s t r i b u t i o n s which they d e s c r i b e . Markovian  texture  c o e f f i c i e n t s and o t h e r s do not.  f e a t u r e has  such  features  have  large  variation  There i s no simple e x p l a n a t i o n why  are more s e n s i t i v e to segmentation Prominence  negative  ODSkew f e a t u r e s are  d i s t r i b u t i o n o f the OD  the  the  For the IODB  cytoplasm).  boundary  in  become  by  i n making  a  d i f f e r e n c e s than o t h e r s .  large v a r i a t i o n  coefficient  some  The C l u s t e r because i t s  mean v a l u e i s f r e q u e n t l y v e r y c l o s e to z e r o . The  DenMax,  ExtrRange  and  AveRange  features  a l l have  large  variation coefficients.  These f e a t u r e s measure the number and v a l u e s o f  intensity  the  maxima w i t h i n  nucleus.  Frequently,  most  of  ( b r i g h t s p o t s ) are l o c a t e d a l o n g the edges o f the n u c l e u s . changes maxima  in in  the the  segmentation nucleus  and  b e l i e v e d to be p r e s e n t i n the  of the  the  nucleus  intensity  nucleus.  can  change  distribution  the of  the  maxima  Thus, s m a l l number the  of  maxima  Generally coefficients indicating  the  features  of v a r i a t i o n  that  of  than  the features  the  abnormal  d i d the  cells  features  of abnormal c e l l s  of  had  larger  normal  cells  are more sensitive to  segmentation differences.  4.3.  C e l l C l a s s i f i c a t i o n and Discriminating Power of Features  I t has been found that the correct c l a s s i f i c a t i o n of c e l l s i s not dependent on the segmentation procedure.  The r e s u l t s i n Table 8 and  Table 9 indicate that the correct c l a s s i f i c a t i o n of c e r v i c a l c e l l s i s not  dependent  on  the  segmentation  procedure.  However,  the  c l a s s i f i c a t i o n of c e r v i c a l c e l l s i s strongly dependent on the features used.  For  comparison  purposes  classification  accuracy  which  classification  accuracy  (combines  results).  Table 11 presents  for the various i n Table 8.  it  should  is be  not  the  compared,  the normal-abnormal  the  total  classification  the t o t a l jackknife c l a s s i f i c a t i o n results  Table 11 does not represent new data since i t i s calculated In Table 11 the jackknife  results when a l l the features were used to c l a s s i f y 97.2%  but  segmentation procedures and feature combinations shown  from the r e s u l t s i n Table 8.  from  normal/abnormal  correct  classification.  cell  classification  to  classification  the c e l l s  96.1%  ranged  correct  cell  The jackknife c l a s s i f i c a t i o n results estimate how well  the discriminant function would perform on a d i f f e r e n t set of data. accuracy of this estimation  The  i s generally accepted to be approximately  20% of the m i s c l a s s i f i c a t i o n rate.  The "error" on the 97.2% r e s u l t i s  thus 0.6% and on the 96.1% r e s u l t i s 0.8%.  The r e s u l t s 97.2 ± 0.6% and  96.1 ± 0.8% are not s t a t i s t i c a l l y d i f f e r e n t . The pattern i s s i m i l a r f o r the BMDP selected features (column 2) where the jackknife c l a s s i f i c a t i o n  TABLE 11  Combined Normal and Abnormal J a c k k n i f e C l a s s i f i c a t i o n R e s u l t s f o r V a r i o u s Segmentation Procedures and Combinations o f Features  Features Segmentation Method  1 2 3 4 6 3-7 3-8 1-9 2-9 3-9 4-9 6-9 3-7-9 3-8-9 1-9-9  used i n C l a s s i f i c a t i o n  A l l Selected  97. IX 96.5 96.9 96.9 96.2 96.4 96.4 96.5 96.2 96.4 97.1 96.7 96.5 96.1 97.2  BMDP S e l e c t e d  97.6X 97.4 96.5 97.4 96.7 97.0 97.6 96.5 96.9 96.6 97.1 97.0 96.7 96.8 97.1  Author S e l e c t e d  95.5X 94.6 93.6 94.5 94.3 93.9 94.2 94.7 94.2 93.9 94.5 93.6 93.9 93.6 94.6  accuracy ranges selected  from 97.6 ± 0.5% to 96.5 ± 0.7% and f o r the author  features  (columns  3) where  the jackknife  classification  accuracy ranged from 95.5 ± 0.9% to 93.6 ± 1.3%. A small independent database of 160 images was used to v e r i f y the c l a s s i f i c a t i o n accuracy of several (12) of the discriminant functions on a database which was not part of the learning set used to generate the discriminant functions.  The c l a s s i f i c a t i o n accuracy of the discriminant  functions on the 160 image database was that predicted by the jackknife c l a s s i f i c a t i o n results i n Tables 8 and 9, within the experimental error associated with the jackknife c l a s s i f i c a t i o n r e s u l t s . While  none  of  the  segmentation  procedures  resulted  in  c l a s s i f i c a t i o n results which were s t a t i s t i c a l l y d i f f e r e n t from the rest of the other  segmentation procedures, some segmentation procedures had  consistently  higher  statistically  or lower  identical  classification  because  r e s u l t s , but were  of the errors  still  associated  with the  or 1-9-9 had consistently  the best  results. Segmentation  methods  classification  results,  classification  results.  between using  1  while  methods  The difference  a l l the features  6  and  6-9  had  i n classification  to form the discriminant  than  the error  attached  to the c l a s s i f i c a t i o n  accuracy  function and  l e t t i n g BMDP select the features used i n the discriminant smaller  the worst  function i s  results.  The  c l a s s i f i c a t i o n r e s u l t s when BMDP selects the features versus when the author selected the features larger than the associated statistically  select  show a s i g n i f i c a n t difference (difference error) between the two.  the features  function provided the best r e s u l t s .  to be used  Allowing  BMDP to  i n the discriminant  The r e s u l t s i n Table 9 display s i m i l a r c h a r a c t e r i s t i c s as those of Table 8.  When the same features are used i n the discriminant function  there was no experimental  difference between the c l a s s i f i c a t i o n  of the various segmentation procedures.  accuracy  There i s a d e f i n i t e difference  i n the c l a s s i f i c a t i o n results i n Table 9 depending on which features are used  i n the discriminant  represents  the  total  function.  jackknife  Similar  to Table  classification  11, Table  results  segmentation procedures and features shown i n Table 9.  12  f o r the  For the simple  nuclear features the c l a s s i f i c a t i o n accuracy ranged from 85.4 ± 2.9% f o r method 2 to 78.0 ± 4.4% f o r method 3-8-9.  While the difference between  these c l a s s i f i c a t i o n results i s experimentally than  the experimental  error,  i t i s only  attaching any s i g n i f i c a n c e to this r e s u l t nuclear  and  cytoplasm  c l a s s i f i c a t i o n accuracy  features  significant, 0.1% larger  difficult.  i.e. which  larger makes  For the simple  the c l a s s i f i c a t i o n  features  the  ranged from 94.0 ± 1.2% f o r method 1 to 91.8 ±  1.6% f o r method 3-7. Several other sets of features were used to form the discriminant functions  while  searching  f o r evidence  that would  indicate that the  segmentation method can influence the c l a s s i f i c a t i o n r e s u l t s . features  made  up  of those  features  with  c o e f f i c i e n t i n Tables 5, 6 and 7 were t r i e d . was found.  the largest  A set of variation  No s i g n i f i c a n t difference  Letting BMDP select only from the nuclear features again d i d  not r e s u l t i n a s i g n i f i c a n t difference.  However, i t was found that the  BMDP selected nuclear features alone were able to c l a s s i f y the c e r v i c a l c e l l s with an o v e r a l l accuracy of 97.0 ± 0.6%. As previously noted i n Tables 8 and 11, BMDP selected the features for one s e t of discriminant  functions.  The number  of features BMDP  selected depended upon the  segmentation method used.  The  features used ranged from 16 to 29 and averaged around 23.  number of Using the  minimum of 16 features selected by BMDP as a feature test set the l a s t two columns i n Table 9 were generated. In  Table  classification  12,  segmentation  accuracy of 95.8  method  ± 0.8%  3-8  was  which was  than the c l a s s i f i c a t i o n accuracy of 97.3 ± 0.5% 1.  While  the  experimentally  difference  between  significant  (larger  found  these than  to  significantly  a  lower  of segmentation method  classification the  give  results  experimental  is  error) i t  represents a very small difference when compared with the difference i n c l a s s i f i c a t i o n accuracy due to the number and type of the features used i n the discriminant function analysis. Although the segmentation procedure applied to the images does not have  a  significant  effect  on  segmentation procedures appear  cell  classification  performance,  to consistently produce  c l a s s i f i c a t i o n results than others.  slightly  some worse  The segmentation procedure (1-9-9)  which c o r r e c t l y segments the most nuclei does not appear to produce the most accurate c l a s s i f i c a t i o n r e s u l t s .  This i s an unexpected r e s u l t .  It  could be due to the fact that while a l l procedures are applied to the same  image  segmented  database, images which  procedures,  thus  the  postprocessing routine  are  more frequent i n the worse  the number  of c e l l s  to be  classified  segmentation procedure to segmentation procedure. that the c e l l s which are d i f f i c u l t  to segment may  classify.  better  produce cells  This  would  cause  worse c l a s s i f i c a t i o n which  were  eliminated  rejects  the  results. from  the  To  results  poorly  segmentation varies  from  I t i s also possible also be d i f f i c u l t to  segmentation test  the  this of  procedures  to  hypothesis, the the  segmentation  TABLE 12 Combined Normal and Abnormal Jackknife C l a s s i f i c a t i o n Results for Various Segmentation Procedures and Combinations of Features Features used i n C l a s s i f i c a t i o n Segmentation Method 1 2 3 4 6 3-•7 3-•8 1-•9 2-•9 3-•9 4-•9 6-9 3-7-9 3-8-9 1-9-9  Cytoplasm and Simple Nuclear  2  Simple Nuclear 85. IX 85.4 82.6 82 84 79 85 81 79.8 81.0 80.0 78.7 79.5 78.0 81.6  94. OX 93.5 93.0 92.8 93.2 91.8 92.8 93.3 92.8 93.4 92.5 92.5 92.7 92.5 92.7  BMDP Minimum Selected 97. 3X 96.6 96.8 97.0 96.5 96.5 95.8 96.5 96.2 96.2 96.5 96.7 96.1 96.3 96.8  f e a t u r e s used were: NA, NComp, IODR, ODVar and ODMean. Features used were: CA, Ratio, NA, NComp, IODR, ODVar and ODMean. Minimum number of features selected by BMDP f o r any of the segmentatic methods i n Table 6. These features were: CA, Ratio, NA, NMeanR, RMeanI, GMeanI, BMeanI, IODR, ODMax, ODVar, Correlation, Homogeneity, FA2, FD, DNum and HAER. 3  120  procedure 1 were also removed from the r e s u l t s of the 1-9-9 routine.  A  discriminant function analysis was performed on the reduced data of the 1-9-9 procedure.  The analysis used only the selected nuclear  used i n Tables 9 and 12. for  The t o t a l Jackknife  the reduced 1-9-9 data set was 82.9%.  features  c l a s s i f i c a t i o n accuracy  This i s an improvement over  the o l d value of 81.6% f o r the 1-9-9 procedure i n Table 12, i n d i c a t i n g that i t i s possible that the c e l l s which are d i f f i c u l t  to segment  also be d i f f i c u l t  the difference  to c l a s s i f y .  between 82.9% and 81.6% i s s l i g h t  One  should  note that  and less than the error  may  associated  with the measurements. All  of the segmentation methods used i n Tables 8, 9, 11 and 12  used post-processing.  One interpretation of the results i n Tables 8, 9,  11 and 12 i s as follows. processing  routine  Once a c e l l has not been rejected by the post-  i t i s s u f f i c i e n t l y well  accurately c l a s s i f i e d .  segmented that  i t can be  Or, i n order f o r the segmentation of the c e l l to  a f f e c t the c l a s s i f i c a t i o n r e s u l t s , i t s segmentation must be so bad that the object no longer looks l i k e a c e l l (hence i t i s removed by the postprocessing the  routine).  features  One can also interpret this to indicate that f o r  which do most of the c l a s s i f y i n g  the v a r i a t i o n due to  segmentation method i s much less than the v a r i a t i o n i n the features due to the c e l l  type.  The better c l a s s i f i c a t i o n results i n Tables 11 and 12 compare well 12  with the r e s u l t s stated i n the l i t e r a t u r e .  Zahniser  et al.  found that  they can c o r r e c t l y c l a s s i f y 99.5% of the normal c e l l s and 92.3% of the 40  abnormal c e l l s i n a 1354 image database.  Holmquist et al.  found that  they could c o r r e c t l y c l a s s i f y 96.8% of the normal c e l l s and 97.6% of the abnormal  cells  of  a  244  image  set.  They  also  found  that  the  r e p r o d u c i b i l i t y o f manual c l a s s i f i c a t i o n o f s t a i n e d experienced negative are  cytologist  errors  i s such  when manually  that  the  false  classifying cells  4.5% and 4.4%, r e s p e c t i v e l y .  i t has  equaled  positive  as normal  Therefore, t h i s indicates  automatic c l a s s i f i c a t i o n procedure can c o r r e c t l y cells  c e r v i c a l c e l l s by a and  false  o r abnormal that  i f the  c l a s s i f y 96.6% o f the  the c l a s s i f i c a t i o n performance  of  a  trained  cytologist. The  ordering  rank as c a l c u l a t e d with  a grain  calculate depends  o f the f e a t u r e s  of salt.  The F s t a t i s t i c  o f these  certain  exist  features  conditions  f e a t u r e among the v a r i o u s not  10 based  upon  i n the r e s u l t s s e c t i o n o f t h i s t h e s i s  the rank  upon  i n Table  mean  s h o u l d be taken  g e n e r a t e d by BMDP and used t o is a  existing  cervical cells.  f o r most o f these f e a t u r e s .  their  parametric  statistic  i n the d i s t r i b u t i o n  and  o f the  Most o f these c o n d i t i o n s do  Thus, the o r d e r o f the f e a t u r e s  s h o u l d be c o n s i d e r e d as o n l y a rough a p p r o x i m a t i o n o f t h e i r | importance i n the c l a s s i f i c a t i o n o f c e r v i c a l While i t i s n o t s u r p r i s i n g the  most  BMeanI  discriminating  are also  expect t h a t  all  cell  i t i s surprising  as b e i n g  the mean i n t e n s i t y  types.  t h a t RMeanI i s c o n s i d e r e d t o be one o f  features,  indicated  images would be j u s t  cells.  constant  quite  discriminating.  nuclear  overlying  One  would  i n f o r m a t i o n found i n the green and b l u e fractions  o f the r e d mean i n t e n s i t y f o r  images the n u c l e i  appear  d i f f e r e n t i n s h a d e / c o l o r than most o f the n u c l e i  the  GMeanI and  However g o i n g back t o the o r i g i n a l images, i n a  number o f the abnormal c e l l  cervical cells.  that  large  t o be a s l i g h t l y  i n the images o f normal  T h i s may be due t o the i n t e r a c t i o n o f the i n t e n s i t y o f  stain  cytoplasm.  with  the amount  of  and  stain  intensity  o f the  Normal n u c l e i have a dark brown c o l o r whereas some  of  the abnormal nuclei  indicate  had  more of a blue  hue  to them.  This would  that perhaps there i s less cytoplasm overlapping the nuclei  of  abnormal c e l l s , or that the cytoplasm of abnormal c e l l s might not absorb as much Orange II s t a i n as does the cytoplasm of normal c e l l s . The  tasks  which  an  automated prescreening  device  for  cervical  s l i d e s need to do are 1) f i n d and recognize c e l l s on a s l i d e , 2) segment the c e l l s , 3) numerically describe the c e l l s , 4) c l a s s i f y the c e l l s , 5) c l a s s i f y the s l i d e . 2-4.  and  This work has dealt with, to some degree, steps  Work s t i l l needs to be done on steps 1 and 5.  This would require  determining: the number of c e l l s needed to c l a s s i f y a s l i d e ; the best or a satisfactory method of locating and recognizing individual s l i d e ; and  c e l l s on a  the best method of c l a s s i f y i n g a s l i d e , i . e . using c e l l  by  c e l l c l a s s i f i c a t i o n information or c e l l population feature data or both.  4.4.  Conclusion  This  work  accuracy  of  improved  and  was  performed  cervical  cell  i i ) does  to  two  segmentation  segmentation  c l a s s i f i c a t i o n accuracy.  test  Therefore,  hypotheses:  procedures  performance  be  effect  the thrust of this  i)  can  the  significantly cervical  cell  thesis was  to  determine:  1) the segmentation performance of several (32) segmentation  procedures,  developed  stained c e r v i c a l numerical method  for this work and from the l i t e r a t u r e , to segment  cells;  2)  descriptions of  used  classification quantitative  to  segment  of these cellular  the e f f e c t the the  cells;  cervical  of segmentation performance cells;  cervical and  4)  test  descriptors one  can  performance of s k i l l e d cytology technicians.  3)  the  effect  on  the  automatic  the hypothesis  that using  use  match  cells  these  to  of  on the  the  Segmentation involves i d e n t i f y i n g the cytoplasm and the nucleus of the  cervical  cell.  I t was  found  that  the  most  accurate nuclear  segmentation out of the 32 procedures tested could be achieved by using a simple two-dimensional histogram analysis, two i t e r a t i o n s of the edge relocation algorithm and the postprocessing routine, a l l of which were created by the author.  This segmentation procedure c o r r e c t l y segmented  98.3% of the nuclei of a 3680 image database (only 61 nuclei i n c o r r e c t l y segmented).  Only  four  of  the  incorrectly  segmented n u c l e i  were  so  i  poorly  delineated  classification achieved by  that  they might  procedure.  The  be best  misclassified cytoplasmic  by  developed for this work.  automated  segmentation  the simple two-dimensional histogram analysis  procedure which was  an  was  segmentation  I t managed to c o r r e c t l y  segment the cytoplasm i n 98.5% of the images In a 1000 image database. Sixty-seven d i f f e r e n t numerical c e l l calculated  f o r every c e l l  descriptors  segmented and f o r each segmentation of the  c e l l by a d i f f e r e n t segmentation procedures. used to calculate  (features) were  the c o e f f i c i e n t  across the segmentation procedures.  The data so generated was  of v a r i a t i o n  (CV) f o r each  feature  I t was found that the feature which  was least sensitive to the segmentation procedures used was ODMax, which had  a  CV  of 0.4%.  procedure used was  The  feature  Cluster  most  Prominence  average CV f o r the 67 d i f f e r e n t  sensitive which  had  features was  to the a CV  22%.  segmentation  of 260%.  Thus,  The  segmentation  method a f f e c t s the feature values. The  sizeable  variations  in  feature  segmentation procedure used d i d not e f f e c t cervical cells. features  which  One do  value  as  a  function  the c l a s s i f i c a t i o n  of  of the  can also interpret this to indicate that for the most  of  the  classifying  the  variation  due  to  124  segmentation method i s much less than the v a r i a t i o n i n the features due to the c e l l  type.  It was found that using a l l of the c e l l u l a r features or selected subsets of the features that one could automatically c l a s s i f y the c e l l s with as high accuracy  as an experienced  cytologist.  The best automatic  c l a s s i f i c a t i o n achieved was the correct c l a s s i f i c a t i o n of 98.7% of the normal c e r v i c a l c e l l s and 97.0% of the abnormal c e l l s .  A subset  of 16  features was the smallest subset which could achieve this c l a s s i f i c a t i o n accuracy.  I t was also found that a subset of 20 nuclear features  could also achieve this c l a s s i f i c a t i o n accuracy.  alone  Thus the thesis that a  computerized system can c l a s s i f y c e r v i c a l c e l l s at least as well as a trained c y t o l o g i s t has been demonstrated.  This r e s u l t requires that the  system can segment c e r v i c a l c e l l s and recognize when i t makes errors. From Thionin S0 device as  the r e s u l t s of this 2  work,  i t appears  that  i f images of  and Orange II stained c e r v i c a l c e l l s were c o l l e c t e d by a  i t would be possible to automatically c l a s s i f y i n d i v i d u a l c e l l s  well  as i t can be performed by an experienced  results also suggest that the same can be achieved  cytologist. using nuclear  The stain  alone and thus employing only nuclear features, greatly simplifying c e l l recognition and segmentation tasks.  125 REFERENCES 1.  Silverberg E, Lubera JA: Cancer S t a t i s t i c s , 1988. for C l i n i c i a n s , Vol 38, No 1, pp 5-22, 1988.  Ca-A  Cancer J  2.  Anderson GH, Boyes DA, Benedet JL, LeRiche JC, Matistic JP, Suen KC, Worth AJ, M i l l n e r A, Bennett OM: The organization and results of the c e r v i c a l cytology screening program i n B r i t i s h Columbia from 1955 to 1985. Lancet, 1988.  3.  Atlas of Cancer Mortality i n the Peoples Republic of China. Map Press, Shanghai, 1981.  4.  Naib ZN: Morphology of malignant c e l l s and their precursors i n e x f o l i a t i v e cytopathology, 3rd Edition. L i t t l e Brown & Company, Toronto, 1985, pp 152-153.  5.  Mellors RC, Glassman A, Papanicoloaou, GN: A microfluorometric scanning method for the detection of cancer c e l l s i n smears of e x f o l i a t e d c e l l s . Cancer, Vol 5, pp 458-468, 1954.  6.  Caspersson TO, Santesson L: c e l l s of e p i t h e l i a l tumors. 105, 1942.  7.  Caspersson TO, Instrumentation Cytochemistry New York, 1970,  8.  Wied GL, Bahr GF, Bartels PH: Automatic analysis of c e l l images by TICAS, In Automated C e l l I d e n t i f i c a t i o n and C e l l Sorting. Wied, GL and Bahr, GF, Eds, Academic Press, New York, 1970, pp 195-360.  9.  Caspersson TO: C e l l Growth and C e l l study, WW Norton & Co, New York, 1950.  China  Studies on protein metabolism i n the Acta Radiol (suppl), Vol 46, pp 1-  Lomakka G: Recent progress i n cytochemistry: and results, In Introduction of Quantitative I I . Wied GL and Bahr GF, Eds., Academic Press, p 27.  Function: A  cytochemical  10.  Caspersson TO: Quantitative tumor cytochemistry. Vol 38, pp 2341-2355, 1979.  Cancer  Res,  11.  Nordin B: The Development of an automatic prescreener for the early detection of c e r v i c a l cancer: Algorithms and implementation. Ph.D. Thesis, Uppsala University, 1989.  12.  Zahniser DJ, Oud PS, Raaijmakers MCT, Vooys GP, Van de Walle RT: BioPEPR: A system for the automatic prescreening of c e r v i c a l smears. J Histochem Cytochem, Vol 27, No 1, pp 635-641, 197.9.  13.  Ploem JS, van Driel-Kulker AMJ, Goyarts-Veldstra L, Ploem-Zaaijer J J , Verwoerd NP, van der Zwan M: Image analysis combined with quantitative cytochemistry, Results and instrumental developments for cancer diagnosis. Histochem, Vol 84, pp 549-555, 1986.  126  14.  Bartels PH, Bibbo M, Bahr GF, Taylor J , Wied GL: Cervical cytology: Descriptive s t a t i s t i c s f o r n u c l e i of normal and atypical c e l l types. Acta Cytol, V o l 17, pp 449-453, 1973.  15.  Spencer CC, Bostrom RC: Performance of the cytoanalyzer i n recent c l i n i c a l t r i a l s . J Natl Cancer Inst, Vol 29, pp 267-276, 1962.  16.  Nadel EM: Computer analysis of cytophotometric f i e l d s by CYDAC and i t s h i s t o r i c a l evolution from the cytoanalyzer. Acta Cytol, Vol 9, pp 203-206, 1965.  17.  Wied GL, Bartels PH, Bahr GF, O l d f i e l d DB: Taxonomic Intrac e l l u l a r Analytic System (TICAS) f o r c e l l i d e n t i f i c a t i o n . Acta Cytol, Vol 12, pp 180-204, 1968.  18.  Wied GL, Bartels PH, Dytch HE, Pishotto FT, Bibbo M: Rapid highresolution cytometry. Analyt Quant Cytol, V o l 4, No 4, pp 257262, 1982.  19.  Smeulders AWM, Leyte-Veldstra L, Ploem JS, Cornelisse CJ: Texture analysis of c e r v i c a l c e l l nuclei by segmentation of chromatin patterns. J Histochem Cytochem, Vol 27, No 1, pp 199-203, 1979.  20.  Brugal G, Garbay C, Giroud F, Adelh D: A double scanning microphotometer f o r image analysis: Hardware, software and biomedical applications. J Histochem Cytochem, Vol 27, pp 144153, 1979.  21.  Pycock D, Taylor CJ: Use of Magiscan image analyzer i n automated uterine cancer cytology. Analyt Quant Cytol, V o l 2, pp 195-202, 1980.  22.  Mukawa A, Kamitsuma Y, Tsunekawa S, Tanaka N: Report on a longterm t r i a l of Cybest Model 2 f o r prescreening f o r squamous c e l l carcinoma of the uterine cervix. Analyt C e l l u l a r Pathol, Vol 1, pp 225-233, 1989.  23.  Zahniser DJ, Oud PS, Raaijmakers MCT, Vooys GP, Van de Walle RT: F i e l d test results using the BioPEPR c e r v i c a l smear prescreening system. Cytometry, Vol 1, No 3, pp 200-203, 1980.  24.  Tucker JH, Shippey G: Basic performance tests on the CERVIFIP l i n e a r array prescreener. Analyt Qualt Cytol, Vol 5, No 2, pp 129-137, 1983.  25.  Papanicolaou GN: A new procedure Science, Vol 95, pp 438-439, 1942.  26.  Tezcan H, Personal communication.  27.  Husain 0AN, Page-Roberts BA, M i l l e t JA: A Sample preparation f o r automated c e r v i c a l cancer screening. Acta Cytologica, Vol 22, No 1, pp 15-21, 1978.  f o r staining  vaginal  smears.  127  28.  Oud PS, Henderik JBJ, Huysmans ACLM, Pahlplatz MMM, Hermkens HG, Tas J , James J, Vooijs GP: The use of l i g h t green and orange II as quantitative protein stains and their combination with the Feulgen method f o r the simultaneous determination of protein and DNA. Histochem, Vol 80, pp 49-57, 1984.  29.  MacAulay C, Tezcan H, Palcic B: Adaptive colour basis transformation: A segmentation a i d . Analyt Quant Cytol Vol 11, No 1, pp 53-58, 1989.  30.  Jaggi B, Poon SSS, MacAulay C, Palcic B: Imaging system f o r morphometric assessment of conventionally and fluorescently stained c e l l s . Cytometry, Vol 9, pp 566-572, 1988.  31.  Harms H, Aus HM, Haucke M, Gunzer U: Segmentation of stained blood c e l l images measured at high scanning density with high magnification and high numerical aperture optics. Cytometry, Vol 7, pp 522-531, 1986.  32.  MacAulay C, Palcic B: A comparison of some quick and simple threshold selection methods f o r stained c e l l s . Analyt Quant Cytol, Vol 10, pp 134-138, 1988.  33.  J a r v i s LR: A microcomputer system f o r video image analysis and diagnostic microdensitometry. Analyt Quant Cytol, Vol 8, No 3, pp 201-209, 1986.  34.  Liedtke CE, Gahm T, Kappei F, Aeikens B: Segmentation of microscopic c e l l scenes. Analyt Quant Cytol, Vol 9, No 3, pp 211, 1987.  197-  35.  Nevatia R: Image segmentation, Chapter 9, In Handbook of Pattern Recognition and Image Processing. Young TY, and Fu KS, Eds, Academic Press, Inc, Toronto, 1986.  36.  Kohler R: A segmentation system based on thresholding. Graph Image Proc, Vol 15, pp 319-338, 1981.  37.  Weszka JS, Verson JA, Rosenfeld A: Threshold s e l e c t i o n techniques - 2, TR-260. Computer Science Center, University of Maryland, 1973.  38.  Weszka JS: Threshold selection 4, TR-3376. Center, University of Maryland, 1974.  39.  K i t t l e r J , Illingworth J , Foglein J : Threshold s e l e c t i o n based on a simple image s t a t i s t i c . Comp V i s i o n Graph Image Proc, Vol 30, pp 125-147, 1985.  40.  Holmquist J , Bengtsson E, Eriksson 0, Nordin B, Stenkvist B: Computer analysis of c e r v i c a l c e l l s : Automatic feature extraction and c l a s s i f i c a t i o n . J Histochem Cytochem, Vol 26, No 11, pp 10001017, 1978.  Comput  Computer Science  128  41.  Haussman G, Lledtke CE: A region extraction approach to blood smear segmentation. Comp Graphis Image Proc, Vol 25, pp 133-150, 1984.  42.  Horowitz SL, Pavlidis T: Picture segmentation by a directed s p l i t and merge procedure, In Proceedings of the 2nd I n t ' l J o i n t Conf on Pattern Rocognition. Copenhagen, 1974, pp 424-433.  43.  Cheevasuvit F, Maitre H, VIdal-Madjar D: Robust method for picture segmentation based on a s p l i t and merge procedure. Comp V i s i o n Graph Image Proc, Vol 34, pp 268-281, 1986.  44.  Peleg S, Rosenfeld A: Determining compatibility c o e f f i c i e n t s for curve enhancement relaxation processes. IEEE Trans Syst Man Cybern, MC-8, pp 548-555, 1978.  45.  Rosenfeld A, Smith RC: Thresholding using relaxation. IEEE Trans Pattern Anal Machine I n t e l l , PAMI-3, No 5, pp 598-606, 1981.  46.  Bengtsson E, Eriksson 0, Holmquist J , Nordin B, Stenkvist B: High resolution segmentation of c e r v i c a l c e l l s . J Histochem Cytochem, Vol 27, pp 621-628, 1979.  47.  Rosenfeld A: Relaxation: Pixel-based methods. In Fundamentals i n Computer Vision. Fangeras OD, Ed, Cambridge University Press, New York, 1983, pp 373-383.  48.  Rosenfeld A, Kak AC: Iterative segmentation: "Relaxation". In D i g i t a l Picture Processing, V o l 2, 2nd Edition. Academic Press Inc, Toronto, 1982, pp 152-190.  49.  Gonzalez RC: Image enhancement and restoration, In Handbook of Pattern Recognition and Image Processing. Young TY, Fu KS, Eds, New York, Academic Press, 1986, pp 191-213.  50.  Bailey DG, Hodgson RM: Range f i l t e r s : l o c a l - i n t e n s i t y subrange f i l t e r s and their properties. Image V i s i o n Comput, Vol 3, pp 99109, 1985.  51.  Rosenfeld A, Kak AC: D i g i t a l picture processing, Vol 2, Second Volume. New York, Academic Press, 1982, pp 100-101.  52.  Horn BKP: Robot Vision. The MIT Press, Cambridge Mass, 1986, p 67.  53.  Peet FG, Sahota TS: A computer-assisted cell identification system. Analyt Quant Cytol, Vol 6, No 1, pp 59-70, 1984.  54.  Komitowski D, Zinser G: Quantitative description of chromatin structure during neoplasia by the method of image processing. Analyt Quant Cytol H i s t o l , Vol 7, No 3, pp 178-182, 1985.  129 55.  Brugal G, Quirion C, Vassilakos P: Detection of bladder cancers using a SAMBA 200 C e l l Image Processor. Analyt Quant Cytol H i s t o l , Vol 8, No 3, pp 187-194, 1986.  56.  Young IT, Vanderlain M, Kramhout L, Jensen R, Grover A, King E: Morphologic changes i n rat u r o t h e l i a l c e l l s during carcinogenesis: II Image Cytometry. Cytometry, Vol 5, pp 454-462, 1984.  57.  Holmquist J , Bengtsson E, Eriksson 0, Stenkvist B: system for i n t e r a c t i v e measurements on d i g i t i z e d c e l l Histochem Cytochem, Vol 25, No 7, pp 641-654, 1977.  58.  Strojny P, Traczyk Z, Rozycka M, Bern W, Sawicki W: Fourier analysis of nuclear and cytoplasmic shape of blood lymphoid c e l l s from healthy donors and chronic lyphocytic leukemia patients. Analyt Quant Cytol H i s t o l , Vol 9, No 6, pp 475-479, 1987.  59.  Katzko MW, Pahlphatz MMM, Oud PS, Vooijs GP: Carcinoma i n s i t u specimen c l a s s i f i c a t i o n based on intermediate c e l l measurements. Cytometry, Vol 8, pp 9-13, 1987.  60.  Bibbo M, Bartels PH, Sychra J J , Wied, GL: Chromatin appearance i n intermediate c e l l s from patients with uterine cancer. Acta Cytol, Vol 25, pp 23-28, 1981.  61.  Lockart RZ, Pezzella KM, Kelley MM, Toy ST: Features independent of s t a i n i n t e n s i t y for evaluating feulgen-stained c e l l s . Analyt Quant Cytol, Vol 6, No 2, pp 105-111, 1984.  62.  B a l l a r d PH, Brown CM: Computer V i s i o n . Toronto, 1982, p 256.  63.  Freeman H: Boundary encoding and processing, In Picture Processing and Psychopictorics. L i p k i n BS, Rosenfeld A, Eds, Academic Press, New York, 1970, pp 241-266.  64.  Vossepoel AM, Smeulders AWM: Vector code p r o b a b i l i t y and metrication error i n the representation of s t r a i g h t l i n e s of f i n i t e length. Comp Graph Image Proc, Vol 20, pp 347-364, 1982.  65.  Daniellson PE: A New Shape Factor. pp 292-299, 1978.  66.  Smeulders AWM, Dorst L: Measurement issues i n morphometry. Analyt Quant Cytol H i s t o l , Vol 7, No 4, pp 242-249, 1985.  67.  Pressman NJ: Markovian analysis of c e r v i c a l c e l l Histochem Cytochem, Vol 24, No 1, pp 138-144, 1976.  68.  Unser M: Sum and difference histograms f o r texture classification. IEEE Trans Pattern Anal Mach I n t e l l , V o l PAMI-8, pp 118-125, 1986.  A program images. J  Prentice-Hall Inc,  Comp Graph Image Proc, Vol 7,  images.  J  130 69.  Mandelbrot BB: The f r a c t a l geometry of nature. Company, San Francisco, 1983, pp 33-39.  WH  Freeman and  70.  Caldwell C, Stapleton SJ, Holdsworth, Yaffe MJ: Characterization of Mammary Parenchymal Pattern by F r a c t a l Dimension. Digital imaging Technology f o r Oncology, Terry Fox Workshop, Vancouver, B.C., Oct 19-22, 1988.  71.  Panno JP, Nair KK: Age-related changes i n c e l l n u c l e i , In Insect Aging. C o l l a t y KG, Sohal RS, Eds, Springer-Verlag, B e r l i n , 1987, pp 155-167.  72.  V i d a l DCB, Schluter G, Moore GW: C e l l nucleus pattern recognition: Influence of staining. Acta Cytol, Vol 17, pp 510-515, 1973.  73.  Panno JP: Computer analysis of age related chromatin condensation i n the somatic c e l l s of the housefly Musca domestica. MSc Thesis, Simon Fraser University, 1984.  74.  Dixon WJ (Ed): BMDP S t a t i s t i c a l Software, 1983 P r i n t i n g with Additions, 1983 E d i t i o n . University of C a l i f o r n i a Press, Berkeley, 1983, pp 519-537.  75.  Tatsuoka MM: Multivariate analysis: Techniques f o r education and psychological research. John Wiley & Sons Inc, Toronto, 1971, pp 157-242.  76.  Hirschberg N, Humphreys LG: Multivariate analysis i n the s o c i a l sciences. Lawrence Erlbaum Associates, 1982.  77.  Brown MB, variances.  78.  Kruskal WH, Wallis WA: Use of ranks i n one-criterion variance analysis. J Amer Stat Assoc, Vol 47, pp 583-621, 1952.  79.  Lachenbruch Discriminant 1968.  80.  Johnson RA, Wichern DW: Evaluating c l a s s i f i c a t i o n functions, In Applied Multivariate S t a t i s t i c a l Analysis. Prentice-Hall, Inc., Englewood C l i f f s , NJ, 1982, pp 485-493.  81.  Borst H, Abmayr W, Gais P: A thresholding method f o r automatic c e l l image segmentation. J Histchem Cytochem, Vol 27, No 1, pp 180-187, 1979.  Forsythe AB: Robust tests f o r the equality J Amer Stat Assoc, Vol 69, pp 364-367, 1974.  of  PA, Mickey MR: Estimation of Error Rates i n Analysis. Technometries, Vol 10, No 1, pp 1-11,  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0085030/manifest

Comment

Related Items